[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-18 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170884#comment-16170884
 ] 

Cao Manh Dat commented on SOLR-8344:


bq. I think I see an existing bug (not introduced here) in the logic you moved 
– calcDocValueFieldsForReturn. If fl=foo*,dvField and dvField has 
useDocValuesAsStored=false then the code won't return dvField even though it's 
been explicitly mentioned. I haven't tried this; I'm just reading the code 
carefully.

[~dsmiley] This seems a bug to me, too. I will spin off this issue into another 
ticket.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
> Fix For: master (8.0), 7.1
>
> Attachments: SOLR-8344.patch, SOLR-8344.patch, SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170876#comment-16170876
 ] 

ASF subversion and git services commented on SOLR-8344:
---

Commit d877ade945c5b143dcd063e2865ab65368710b92 in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d877ade ]

SOLR-8344: Decide default when requested fields are both column and row stored.


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
> Attachments: SOLR-8344.patch, SOLR-8344.patch, SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170873#comment-16170873
 ] 

ASF subversion and git services commented on SOLR-8344:
---

Commit 40f78dd2740122e5fa18f2515effc9272fd2d902 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=40f78dd ]

SOLR-8344: Decide default when requested fields are both column and row stored.


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
> Attachments: SOLR-8344.patch, SOLR-8344.patch, SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-17 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169554#comment-16169554
 ] 

Cao Manh Dat commented on SOLR-8344:


Thanks for your review, that's great.

bq. minor: It's debatable to me that RetrieveFieldsOptimizer is its own class. 
It seems to me that a few static methods on DocsStreamer would be fine? 
(package access for testing). No big deal though.
I think the purpose of the class is clear to me and this make the code look 
better
bq. should we consider the presence of useDocValuesAsStored=false in one of the 
fields that are both stored and docValues as a signal that we should not do 
this optimization? I suppose so.
I maybe misunderstanding but {{useDocValuesAsStored=true}} only relate to 
{{fl=*}} case and in that case, {{storedFields==null}} which is already skipped 
in the patch.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch, SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-13 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165102#comment-16165102
 ] 

David Smiley commented on SOLR-8344:


* SolrDocumentFetcher: minor: {{allStoreds}} variables & getter should simply 
be {{allStored}} (no trailing 's' which reads weird)
* RetrieveFieldsOptimizer.storedFields add comment saying "null" means all 
available
* RetrieveFieldsOptimizer.calcStoredFieldsForReturn: in the hasPatternMatching 
condition check: eagerly initialize storedFields to a new HashSet instead of 
lazy. If it turns out that no stored fields match, we don't want to act like we 
want all stored fields, which is what I think will happen with your logic.  To 
clarify calcStoredFieldsForReturn further, I suggest making storedFields final, 
and return null early for wantsAllFields.  (IMO final fields can help clarify 
non-trivial logic)
* RetrieveFieldsOptimizer.optimize:  couldn't the loop here be replaced with a 
{{dvFields.addAll(storedFields); storedFields.clear();}} ?
* minor: It's debatable to me that RetrieveFieldsOptimizer is its own class.  
It seems to me that a few static methods on DocsStreamer would be fine?  
(package access for testing).  No big deal though.
* do we really need to make this setting toggle-able with a new parameter?  I 
suppose you left this in for performance testing but I think it can be removed 
now.
* should we consider the presence of useDocValuesAsStored=false in one of the 
fields that are both stored and docValues as a signal that we should not do 
this optimization?  I suppose so.

I think I see an existing bug (not introduced here) in the logic you moved -- 
calcDocValueFieldsForReturn.  If {{fl=foo*,dvField}} and dvField has 
useDocValuesAsStored=false then the code won't return dvField even though it's 
been explicitly mentioned.  I haven't tried this; I'm just reading the code 
carefully.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch, SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-11 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162448#comment-16162448
 ] 

Cao Manh Dat commented on SOLR-8344:


bq. IIRC, they are both stored and docValues=true currently, which was a 
conservative decision since we didn't know the impact of trying to use 
docValues only for stored fields. That extra space may not be worth it in the 
average case though.

Yeah, I think in case of single-value field, we should use docValues=true only.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158653#comment-16158653
 ] 

Uwe Schindler commented on SOLR-8344:
-

For the Id field we figured out that it breaks some stuff if stored is missing. 
E.g., More like this breaks. IMHO this is a bug, because MLT should use the doc 
cache / searcher to get the MLT values including id.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158607#comment-16158607
 ] 

Yonik Seeley commented on SOLR-8344:


A related thing worth thinking about: what should the default be for our 
docValue fields in our schema templates / examples?
IIRC, they are both stored and docValues=true currently, which was a 
conservative decision since we didn't know the impact of trying to use 
docValues only for stored fields.  That extra space may not be worth it in the 
average case though.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-08 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158593#comment-16158593
 ] 

David Smiley commented on SOLR-8344:


[~thetaphi] it isn't necessary to have a special condition for distributed 
search.  As long as the uniqueKey field has docValues, the algorithm I've been 
promoting here will choose it.  Can you see that from my proposed documentation 
comment above?  If not can you suggest improved wording?

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158575#comment-16158575
 ] 

Uwe Schindler commented on SOLR-8344:
-

I'd always use docValues for the ID field (if available) for the first pass of 
a cloud/distributed search. I had a customer with huge stored fields and the 
first pass of the distributed search got a major slowdown. Only workaround was 
to use single-pass, which had other problems (of course).

So I'd suggest to add the following special case: If first pass of distributed 
search AND id field has docvalues => also use docValues.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-08 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158560#comment-16158560
 ] 

David Smiley commented on SOLR-8344:


Please post a new patch so we can see before you commit it.  Furthermore, can 
you please include a simple summary of the approach in the javadoc of the 
optimizer something like (assuming this is accurate):  

bq. Sometimes we could fetch a field value from either the stored document or 
docValues.  Such fields have both and are single-valued.  If choosing docValues 
allows us to avoid accessing the stored document altogether for all fields to 
be returned then we do it, otherwise we prefer the stored value when we have a 
choice.

IMO An effective strategy and straight-forward to explain.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158102#comment-16158102
 ] 

Cao Manh Dat commented on SOLR-8344:


[~tomasflobbe] Yeah in my benchmark for Lucene, they will almost the same, so I 
would like to read from docValues, therefore docValues fields can be cached.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158098#comment-16158098
 ] 

Tomás Fernández Löbbe commented on SOLR-8344:
-

bq. Because we already have to pay for seek cost for reading field1 therefore 
reading other fields from stored will be faster than reading from DV
That would be my guess. But then, wouldn't we prefer to use stored fields for 
all fields (if possible) if at least one of the fields needs to come from 
stored fields? In the patch, it looks like, if rows >= 100, then we try to get 
everything from DVs when possible, right?


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158097#comment-16158097
 ] 

Cao Manh Dat commented on SOLR-8344:


BTW I think that for this ticket, the current optimization is good enough ( we 
can more numRows check before doing commit ). And we can always change the 
optimize method latter.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158083#comment-16158083
 ] 

Cao Manh Dat commented on SOLR-8344:


Ah yeah, you're right.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158082#comment-16158082
 ] 

David Smiley commented on SOLR-8344:


I mean just call {{doc.remove(fieldName)}; no need to guard this with an "if".  
If it's not there, doc.remove is harmless.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158073#comment-16158073
 ] 

Cao Manh Dat commented on SOLR-8344:


bq. Which scenario is this in your benchmark above? Or some other benchmark you 
allude to?
Yeah, it belongs to another simple benchmark.
bq. I understand that but don't see what that has to do with the number of 
documents.
Hmm, I re-run the benchmark and you're right. They are almost the same.
bq. In the event every field in "fl" has docValues, stored, single valued: will 
your patch use docValues data always
Yeah, It always use docValues.
bq.If in some request we realize we can avoid accessing the stored document and 
instead do only with docValues, we should avoid polluting the document cache as 
well
With this patch, if we notice that we do not need stored field, so we will skip 
reading the document as well as caching. 



> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158059#comment-16158059
 ] 

David Smiley commented on SOLR-8344:


bq. No, It matter in case of "fl" contains a field which is stored only (let's 
call it field1) and "rows" is small. 

Which scenario is this in your benchmark above?  Or some other benchmark you 
allude to?

bq. Because we already have to pay for seek cost for reading field1 therefore 
reading other fields from stored will be faster than reading from DV

I understand that but don't see what that has to do with the number of 
documents.

In the event _every_ field in "fl" has docValues, stored, single valued: will 
your patch use docValues data always?  If not, why not?  Maybe I should study 
your patch further but I was a bit confused.

I would like to consider the relationship of this optimization on the document 
cache, if not in this patch then perhaps in a follow-up.  If in some request we 
realize we can avoid accessing the stored document and instead do only with 
docValues, we should avoid polluting the document cache as well, I think.  
Maybe we will  Consider the first phase of distributed search that only wants 
the uniqueKey field.  

In your patch, this: {{if (doc.containsKey(fieldName)) doc.remove(fieldName);}} 
can be simplified to remove the needless condition

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157864#comment-16157864
 ] 

Cao Manh Dat commented on SOLR-8344:


[~dsmiley] 
bq.I don't think the number of documents is that pertinent since both 
strategies are O(docs) multiplied by other factors. 
No, It matter in case of "fl" contains a field which is stored only (let's call 
it field1). Because we already have to pay for seek cost for reading field1 
therefore reading both field2 and field3 from stored will be faster than 
reading field2 and field3 from DV. I tested this result with Lucene (not Solr) 
so we don't have to worry about caching will affect the result here.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157156#comment-16157156
 ] 

Yonik Seeley commented on SOLR-8344:


bq. Why not have the strategy simply observe if it's possible at all to 
completely use docValues.
+1

bq. If the "fl" contains a wildcard, we may want to give up on using docValues 
Or resolve wildcards first.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157141#comment-16157141
 ] 

David Smiley commented on SOLR-8344:


I don't think the number of documents is that pertinent since both strategies 
are O(docs) multiplied by other factors.  Remember that testing this is also 
complicated by the existence of the Solr document cache.  Why not have the 
strategy simply observe if it's possible at all to completely use docValues.  
If it is, do it, if not, don't.  If the "fl" contains a wildcard, we may want 
to give up on using docValues altogether.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156597#comment-16156597
 ] 

Cao Manh Dat commented on SOLR-8344:


[~tomasflobbe] I changed start param. Therefore, in the first pass, every shard 
will send back 10010 ids back to the query master. Right now we will use stored 
field to retrieve doc ids, with this patch, we will use doc values field to 
retrieve doc ids ( which is significant faster ).
In case of less values to be retrieved ( case 2 ), the different is not much.

bq. What happens in your benchmark if at least one of the fields to return is a 
text field (or has docValues=false)? Does getting the stored content of one of 
the fields change your benchmark results?

Yeah, I'm about to testing that, we post result latter. But keep in mind that, 
in case of multi-value field, we can't use docValues because values are stored 
as sorted-set.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156497#comment-16156497
 ] 

Tomás Fernández Löbbe commented on SOLR-8344:
-

Interesting! Two questions:
bq. q=:=TITLE_str,id,REVISION_TEXT_str=1
Did you mean "rows=1" or are you actually changing start?
What happens in your benchmark if at least one of the fields to return is a 
text field (or has docValues=false)? Does getting the stored content of one of 
the fields change your benchmark results?

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8344.patch
>
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-09-01 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150447#comment-16150447
 ] 

Cao Manh Dat commented on SOLR-8344:


I think instead of find out the default solution. Why don't we do a quick 
estimation like this 
{code}
dvEstimate = estimateReadingTimeForDocValuesAsDefault(List fields, int 
numDocs);
storedEstimate = estimateReadingTimeForStoredAsDefault(List fields, int 
numDocs);
{code}
By comparing {{dvEstimate}} and {{storedEstimate}} we can choose to use dv or 
stored field here.


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2017-06-05 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036947#comment-16036947
 ] 

David Smiley commented on SOLR-8344:


This issue (SOLR-8344) seems to be about a decision, not about actual code.  
Linking to SOLR-5478 which contains a patch that actually does this.  

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2016-07-19 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385318#comment-15385318
 ] 

David Smiley commented on SOLR-8344:


bq. Is it possible to preferentially return DV when doing the first pass of a 
distributed search?

Yes with customizations.  I have a patch in 
https://issues.apache.org/jira/browse/SOLR-5478 which is used by one of my 
clients, and I have another deployed implementation in which a custom request 
handler extending SearchHandler the id in {{fl}} to be {{field(id)}} in certain 
circumstances, and that didn't require patching Solr.  I'd love to see 
SOLR-5478 get finally put to bed.  Oh and I wrote tests.  Hmmm.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2016-07-19 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385236#comment-15385236
 ] 

Erick Erickson commented on SOLR-8344:
--

Is it possible to preferentially return DV when doing the first pass of a 
distributed search? IIRC, Yonik said at one point that we already get the sort 
values from the index (or perhaps we can pull them from DV fields). But I just 
verified that the logic in DocStreamer decompresses the doc pretty much no 
matter what. I've just verified this in 6.2. The test for whether all the 
fields are DV fields fails due to the presence of the "score" pseudo field.

I added a hack patch to 6810 to illustrate one path that sidesteps 
decompression. 

Anyway, I was wondering if we could somehow detect that this was the first pass 
and return the DV fields all the time, although I do wonder if some obscure 
case where the sorted order of the multiValued DV fields would mess things up.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2016-03-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206354#comment-15206354
 ] 

Yonik Seeley commented on SOLR-8344:


Related: in my testing, it seemed like retrieving docvalue fields that were 
unstored was much slower than the /export handler... and it wasn't the sorting 
part, this was sustained slowness, pointing to our retrieval code.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2016-03-21 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205815#comment-15205815
 ] 

Ishan Chattopadhyaya commented on SOLR-8344:


Erick, I'll take a fresh look at this post 6.0 release, and attempt an initial 
patch, unless someone else wants to take it up.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2016-03-21 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205703#comment-15205703
 ] 

Erick Erickson commented on SOLR-8344:
--

What is the status of this now that SOLR-8220 has been committed? Still under 
development?

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034083#comment-15034083
 ] 

Yonik Seeley commented on SOLR-8344:


Followup of 
https://issues.apache.org/jira/browse/SOLR-8220?focusedCommentId=15034013=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034013

bq. Not quite sure what to do if the user defines both however, perhaps use the 
stored value?
That's what this issue is about.

bq. Why are we trying to anticipate "the right thing to do"?
They are both "right" (semantically equivalent), so we're trying to figure out 
the best performing option.  That will change based on the request, and always 
(statically) choosing one over the other is never going to be optimal.

Example: If a field is column-stored (docValued), one could chose to add it as 
stored so that it would be quicker to retrieve along with other non-docvalue 
fields (if you need to go to stored fields anyway, just get all the field 
values you can from there).  But when accessing that field alone, we'd still 
certainly want to use the column... it will be much faster.


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034076#comment-15034076
 ] 

Erick Erickson commented on SOLR-8344:
--

Moving my comment from SOLR-8220 here, didn't see this ticket before pasting my 
comment there:

My initial take would be something like add a new possibility to stored, 
something like stored="true|false|docvalues".

No guessing involved then. stored="docvalues" would NOT put anything in the fdt 
file for this field. We should probably error out if stored="docvalues" and 
docValues="false".

If it was important to get 2, not 1. then there would need to be a 
stored=true variant of the field.

**Another take...
Should we consider disallowing setting both? I.e. stored="true" 
docValues="true" throws an error...

Now it's totally under the control of the user which path is chosen through the 
schema definition; we don't have to try to guess anything. No new syntax 
either. Maybe with a new "best practice" or something. There would be a 
learning curve for users around using only docValues=true for efficiency and 
not setting stored=true.

That said, I suspect that the break in back-compat is too expensive and I think 
stored=docvalues is better anyway


> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034126#comment-15034126
 ] 

Erick Erickson commented on SOLR-8344:
--

Yeah, removed that all from 8220, thought about it more and saw this ticket...

We do want to keep in mind though that from a _user_ perspective, inputting 
'2.178734872984723984729387" as a float and getting  back "2.17873477935791" 
are not equivalent. So whatever we do needs to take that into account. If we 
use the stored=docvalues approach, then they'd need a copyField with 
stored=true field if it was important to them.

The other thing I like about the stored=docvalues approach, existing 
schemas/applications continue to work exactly as they do now. Upgrading is not 
surprising. Users can selectively take advantage of this new capability.

I think it'd even work if a user changed an existing schema from stored=true to 
stored=docvalues and did not re-index. True, they'd be carrying around some 
useless data in the *.fdt files until it was merged away for updated docs, but 
as long as docValues=true was set in the original schema, it'd "do the expected 
thing".

I suppose the same functionality could come from a new parameter on a  
definition like fl_from_dv=true but aside from the awkward name, I think this 
is harder to wrap your head around.

If you were willing to pay the "wasted disk space" penalty though, a separate 
parameter on the  to control where the "stored" value came from would 
allow one to switch back and forth without re-indexing. But to be clear, I 
don't think this is a good idea; using the  stored=docvalues option forces a 
decision to be made up-front.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034206#comment-15034206
 ] 

Ishan Chattopadhyaya commented on SOLR-8344:


[~erickerickson] As of now, users got the stored value directly, but the dv 
value by using field(). If we load stored fields also from docValues, do you 
think adding a function stored(), which would actually go and retrieve the row 
stored value, solve this usecase?

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034256#comment-15034256
 ] 

Yonik Seeley commented on SOLR-8344:


bq. We do want to keep in mind though that from a user perspective, inputting 
'2.178734872984723984729387" as a float and getting back "2.17873477935791" 

Right.  Numeric values are already stored in binary form, so any truncation 
that may occur is already occurring (and should occur in an identical way for 
docvalues).

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034370#comment-15034370
 ] 

Erick Erickson commented on SOLR-8344:
--

Damn, ya' learn something new every day. 

It's still a problem with dates though. If I index something like: 
2015-11-11T12:12:12Z-3DAYS I get back 2015-11-08T12:12:12Z

I'm not sure it matters enough to worry about, but whatever we do should be 
intentional.

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034324#comment-15034324
 ] 

Erick Erickson commented on SOLR-8344:
--

[~ichattopadhyaya] I think it'd solve the use-case. and I'm +1 on adding 
stored().

I'm also concerned with the upgrade case though. If someone takes their 5x 
schema and app (and I'm assuming all this is 6.0 at the earliest) and does 
nothing to their schema or app, I'd like them to see no change. If we require 
them to use stored() to pull the stored value explicitly, the app needs to 
change or the users will be surprised.

Which is why I kinda like the stored=docvalues option. It would mean that they 
get the same behavior they see now if they do nothing, but the option of doing 
things the new way without re-indexing.

Hmmm, I like _also_ adding the stored() option. That would allow someone to 
take an existing schema/index/app, add stored=docvalues option (for fields that 
are _already_ DV fields) and _still_ get the raw stored value if they want.

Hm2. the more I think about it, the more I think that it's probably better 
to have something like a new option. Rather than stored=docvalues, something 
like defaultStoredReturn=docvalues|stored. That would allow an _existing_ index 
with a DV field to have the maximum flexibility. Plus it would allow a single 
field definition to serve all the use cases. My original idea of 
stored=docvalues wouldn't put anything in the fdt files when indexing, so a 
stored() operation wouldn't work on a new index.

Hm3. Maybe I'm worrying too much about flexibility. We already have enough 
trouble explaining the nuances of stored/indexed/docValues. Adding 
defaultStoredReturn to the mix makes it _really_ ugly. Just using 
stored=docvalues allows people to make a decision, and the back-compat story is 
simple. "To take advantage of these efficiencies, you may have to reindex if 
you care about the edge case of floats/doubles/ints/longs (and maybe dates)".

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034467#comment-15034467
 ] 

Yonik Seeley commented on SOLR-8344:


bq. I'm also concerned with the upgrade case though. If someone takes their 5x 
schema and app (and I'm assuming all this is 6.0 at the earliest) and does 
nothing to their schema or app, I'd like them to see no change.

The plan from SOLR-8220 was to use the schema version for back compat... 
meaning that if you have unstored docValues fields, they won't suddenly start 
appearing when you do fl=*

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8344) Decide default when requested fields are both column and row stored.

2015-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033823#comment-15033823
 ] 

Yonik Seeley commented on SOLR-8344:


Also highly related is what our schema defaults should be.

For example, the fields we add as docValue by default (say dynamic fields like 
*_i), should they also be row-stored (stored="true") or not?

> Decide default when requested fields are both column and row stored.
> 
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a 
> separate issue so that we can have a focused discussion on whether/how to do 
> this.
> If a given set of requested fields are all stored and have docValues (column 
> stored), we can retrieve the values from either place.  What should the 
> default be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org