[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978882#comment-15978882 ] Michael Braun commented on SOLR-10273: -- Please ignore my comment - wrong ticket! > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978856#comment-15978856 ] Michael Braun commented on SOLR-10273: -- Saw this was reopened, is this not fully implemented in the 6.5.1 RC? > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929302#comment-15929302 ] ASF subversion and git services commented on SOLR-10273: Commit 993003b33e33ba78c66ffda41acb12b8239c359a in lucene-solr's branch refs/heads/branch_6x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=993003b ] SOLR-10273: DocumentBuilder move longest field to last position (cherry picked from commit 8fbd9f1) > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929295#comment-15929295 ] ASF subversion and git services commented on SOLR-10273: Commit 8fbd9f1e403cc697f77d827cd1aa85876c8665ae in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8fbd9f1 ] SOLR-10273: DocumentBuilder move longest field to last position > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928389#comment-15928389 ] David Smiley commented on SOLR-10273: - Wonderful, then I'll commit this patch (with the 4k threshold) in a few hours or so. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928250#comment-15928250 ] Michael McCandless commented on SOLR-10273: --- Hi [~dsmiley], yes, you are right: IndexWriter now tries very hard to use consistent field numbers using its global field number map. This isn't always possible, e.g. {{addIndexes(Directory[])}} can bring in inconsistent numbers, and so the matching logic in the bulk merging is still necessary, but I think it should be safe for you to re-order the stored fields here. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928064#comment-15928064 ] David Smiley commented on SOLR-10273: - I'm looking at IndexWriter.globalFieldNumberMap which is initialized from the current segments. This is then referenced by DocumentsWriter for each DocumentsWriterPerThread. It seems this isn't so fragile after all, with respect to ordering? > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928044#comment-15928044 ] David Smiley commented on SOLR-10273: - Thanks for the pointer [~mikemccand]! Wow... it seems index/merge performance could vary quite a bit based on something that seems to me very fragile (not considering large/small fields here; just in general). Why doesn't Lucene sort the FieldInfos such that the field number ascends as the field name alphabetically ascends? I can file an issue if you think that would be a net benefit. Even with that, it's a shame a bulk merge can't happen if some fields simply aren't present in some segments yet are in others. Perhaps again, Lucene could be improved to look across the segments and add all FieldInfo(s) to the segment being written that are in others but not the current one? Perhaps not if doing so would add > X FieldInfos. I have not looked at this part of Lucene in-depth. I suspect that the reason these ideas have yet to be implemented may be because FieldInfo(s) need to be generated in advance of knowing all those that may need to exist. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927720#comment-15927720 ] Michael McCandless commented on SOLR-10273: --- [~dsmiley] have a look @ {{oal.codecs.compressing.MatchingReaders}} ... that's where we compute whether the field numbers are congruent across segments so bulk merge can apply (or not). > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927405#comment-15927405 ] David Smiley commented on SOLR-10273: - [~mikemccand] might you know what Rob is referring to? I'd like to see where this happens in Lucene so I can learn more about it. I've been looking around a bit, like in SegmentMerger. If there's really an issue here, I could modify the patch to ignore field sizes and instead look for only the fields declared as "large" (SOLR-10286). > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925532#comment-15925532 ] David Smiley commented on SOLR-10273: - Thanks for alerting me to this Rob! Is there a size threshold at which you think it's not a de-optimization -- perhaps the 16KB mark? I suppose your point is consistency... so if we _always_ move the values for certain fields last then there's no problem? bq. Also bulk merging relies upon field number consistency across segments Can you point me to a line of code in CompressingStoredFieldsWriter that is pertinent? I don't see it. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925346#comment-15925346 ] Robert Muir commented on SOLR-10273: This is a big deoptimization for the common case... Lucene must preserve field order and you lose compression if its inconsistent across docs. Also bulk merging relies upon field number consistency across segments. IW tries to keep them aligned but this patch os intentionally being adversarial... To benefit what is a rare and esoteric case. This kind of thing should be only enabled by an option. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924296#comment-15924296 ] David Smiley commented on SOLR-10273: - bq. Is there a way to check this while building the Document from the SolrInputDocument instead (may be cheaper?) I briefly contemplated having DocumentBuilder internally collect the Lucene IndexableField instances into some other internal Doc-like inner class that could maintain the largest value as it goes. But that seems over-engineered, and the post-process scanning code later seems pretty quick to me. bq. For multi-valued fields, perhaps we should be using the sum of the multiple fields? As a generalization we could also consider sorting by size, not just picking out the largest single field. In the entire Lucene+Solr codebase, the only place where StoredFieldVisitor.Status.STOP is actually used is the Unified/Postings highlighters, and only when one field is being highlighted. So if there was an overall large document (>16KB), and if we didn't move the 2nd largest value to the end, and if you wanted to highlight on this 2nd largest value alone, and if there were some additional sizable fields inbetween this 2nd largest value and the last one then yes we're doing more work. I don't think it's worth bothering with right now? BTW when I commit this patch, I'll change the min size threshold to 4KB > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923517#comment-15923517 ] Yonik Seeley commented on SOLR-10273: - Is there a way to check this while building the Document from the SolrInputDocument instead (may be cheaper?) For multi-valued fields, perhaps we should be using the sum of the multiple fields? As a generalization we could also consider sorting by size, not just picking out the largest single field. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923512#comment-15923512 ] David Smiley commented on SOLR-10273: - True; it's debatable... I nearly added a comment about being inclined to raise this min length to something higher so I'm glad you brought it up. That Lucene side value might change in the future or based on a user-chosen codec; we needn't track it exactly. Also, just because the longest field is 1024 doesn't mean the document overall is "small" because theoretically there could be a ton of stored values instead of one particularly large one. Perhaps change to 4KB default? Shrug. > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10273) Re-order largest field values last in Lucene Document
[ https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923508#comment-15923508 ] Michael Braun commented on SOLR-10273: -- In LUCENE-6898 a comment says it doesn't have an impact if the last stored value is under 16K - should the value be higher than 1024 by default? > Re-order largest field values last in Lucene Document > - > > Key: SOLR-10273 > URL: https://issues.apache.org/jira/browse/SOLR-10273 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.5 > > Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch > > > (part of umbrella issue SOLR-10117) > In Solr's {{DocumentBuilder}}, at the very end, we should move the field > value(s) associated with the largest field (assuming "stored") to be last. > Lucene's default stored value codec can avoid reading and decompressing the > last field value when it's not requested. (As of LUCENE-6898). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org