[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866845#comment-17866845
 ] 

ASF subversion and git services commented on SOLR-10255:


Commit 1a36df811feef0deb449c616d35150523d950403 in solr's branch 
refs/heads/branch_9x from Alexey Serba
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=1a36df811fe ]

SOLR-10255 Add support for docValues to solr.BinaryField (#2536)

Co-authored-by: Alexey Serba 
(cherry picked from commit 6ab6c4abf21dbd98a9b9c23a45af67bcffaea4d8)


> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: pull-request-available
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866842#comment-17866842
 ] 

ASF subversion and git services commented on SOLR-10255:


Commit 6ab6c4abf21dbd98a9b9c23a45af67bcffaea4d8 in solr's branch 
refs/heads/main from Alexey Serba
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=6ab6c4abf21 ]

SOLR-10255 Add support for docValues to solr.BinaryField (#2536)


Co-authored-by: Alexey Serba 

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: pull-request-available
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-06-26 Thread Alexey Serba (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860074#comment-17860074
 ] 

Alexey Serba commented on SOLR-10255:
-

[~dsmiley], please review https://github.com/apache/solr/pull/2536 when you 
have a chance.

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-06-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855188#comment-17855188
 ] 

David Smiley commented on SOLR-10255:
-

+1 PR welcome

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-06-14 Thread Alexey Serba (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855155#comment-17855155
 ] 

Alexey Serba commented on SOLR-10255:
-

I'm wondering what do you think about simply adding support for

{noformat}
stored="false" docValues="true"
{noformat}

to solr.BinaryField field type?

Lucene {{BinaryDocValuesField}} is the only field type in Lucene that does not 
offer block level compression (see 
[LUCENE-9843|https://issues.apache.org/jira/browse/LUCENE-9843?focusedCommentId=17321328=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17321328]),
 so for many use cases where users prefer speed vs compression (see twitter and 
amazon use cases in 
[LUCENE-9378|https://issues.apache.org/jira/browse/LUCENE-9378]), this is the 
only option really.

I think this performance issue is important even with small to medium size 
fields, so we don't have to necessary solve the caching part in the same PR 
(i.e. support for docValues can be added to BinaryField first and an option for 
disabling caching can be added later/separately). 

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org