[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499152#comment-16499152 ] David Smiley commented on SOLR-12366: - {quote}Nice catch, this stuff has been broken forever! {quote} Thanks; and a lot of credit goes to [~millerjeff0] for the profiling that revealed the slow-down. {quote}MultiFields has slow methods as well {quote} Sure. I meant it's more explicit as to what you actually need – do you need an entire LeafReader or just a MultiTerms perhaps or Multi-something else. Plus, SlowAtomicReader was kicked out of Lucene so if there's an easy alternative that solves the task at hand, as was the case in some cases in this patch, then lets just use that. {quote}A variable name change for "SolrIndexSearcher.leafReader" {quote} +1 yeah, like slowLeafReader > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499147#comment-16499147 ] Yonik Seeley commented on SOLR-12366: - Nice catch, this stuff has been broken forever! Looking back, I think not enough was exposed to be able to work per-segment, so Lucene's MultiReader.isDeleted(int doc) did a binary search each time. Once we gained the ability to operate per-segment, some code wasn't converted. {quote}IMO some callers of SolrIndexSearcher.getSlowAtomicReader should change to use MultiFields to avoid the temptation to have a LeafReader that has many slow methods. {quote} MultiFields has slow methods as well, and if you look at the histories, many places used MultiFields.getDeletedDocs even before (and were replaced with the equivalent?) For example, commit 6ffc159b40 changed getFirstMatch to use MultiFields.getDeletedDocs (which may not have been a bug since it probably was equivalent at the time?) Anyway, I think perhaps we should throw an exception for any place in SlowCompositeReaderWrapper that exposes code that does a binary search. We don't need a full Reader implementation here I think. A variable name change for "SolrIndexSearcher.leafReader" would really be welcome too... it's a bad name. We've been bit by the naming before as well: SOLR-9592 > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496675#comment-16496675 ] ASF subversion and git services commented on SOLR-12366: Commit d65f40f3852be74bf0fc5c17d8252c669ea325d8 in lucene-solr's branch refs/heads/branch_7x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d65f40f ] * SOLR-12366: A slow "live docs" implementation was being used instead of a bitset. Affects classic faceting enum method, JSON Facets enum method, UnInvertedField faceting, GraphTermsQParser, JoinQParser. Renamed SolrIndexSearcher.getLiveDocs to getLiveDocSet. (cherry picked from commit 1e63b32) > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496671#comment-16496671 ] ASF subversion and git services commented on SOLR-12366: Commit ce8735556d994f365e9c95c61243c352a7d50e99 in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ce87355 ] * SOLR-12366: A slow "live docs" implementation was being used instead of a bitset. Affects classic faceting enum method, JSON Facets enum method, UnInvertedField faceting, GraphTermsQParser, JoinQParser. Renamed SolrIndexSearcher.getLiveDocs to getLiveDocSet. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496663#comment-16496663 ] ASF subversion and git services commented on SOLR-12366: Commit 1e63b32731bedf108aaeeb5d0a04d671f5663102 in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1e63b32 ] * SOLR-12366: A slow "live docs" implementation was being used instead of a bitset. Affects classic faceting enum method, JSON Facets enum method, UnInvertedField faceting, GraphTermsQParser, JoinQParser. Renamed SolrIndexSearcher.getLiveDocs to getLiveDocSet. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486633#comment-16486633 ] Lucene/Solr QA commented on SOLR-12366: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}163m 20s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.cloud.autoscaling.sim.TestGenericDistributedQueue | | | solr.security.BasicAuthIntegrationTest | | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | | | solr.cloud.api.collections.TestCollectionsAPIViaSolrCloudCluster | | | solr.cloud.autoscaling.sim.TestLargeCluster | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-12366 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924480/SOLR-12366.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / af59c46 | | ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 | | Default Java | 1.8.0_172 | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/101/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/101/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/101/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483501#comment-16483501 ] David Smiley commented on SOLR-12366: - Updated the patch: * replaced the implementation of SolrIndexSearcher.getFirstMatch to be in terms of lookupId -- less to maintain and one fewer reference to the SlowCompositeReader (field "filterReader"). Slightly faster probably. * simplified getLiveDocsBits further * renamed getLiveDocs to getLiveDocSet (thus changed a bunch of other files) but kept the original and marked deprecated, to be removed in 8.0 > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480048#comment-16480048 ] David Smiley commented on SOLR-12366: - When I tested locally, I got no reproducing test failures. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478710#comment-16478710 ] Lucene/Solr QA commented on SOLR-12366: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 1s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 56s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.cloud.autoscaling.IndexSizeTriggerTest | | | solr.cloud.LeaderElectionContextKeyTest | | | solr.cloud.autoscaling.sim.TestLargeCluster | | | solr.cloud.autoscaling.SearchRateTriggerTest | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-12366 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923789/SOLR-12366.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 0c36289 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 8 2015 | | Default Java | 1.8.0_172 | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/95/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/95/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/95/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478170#comment-16478170 ] David Smiley commented on SOLR-12366: - * adds new {{SolrIndexSearcher.getLiveDocsBits()}} method that works like {{LeafReader.getLiveDocs}} does. I don't actually like the name of this method; IMO it ought to be simply {{getLiveDocs}} but that conflicts with an existing one that I think ought to be named something like {{getLiveDocSet}}. Since these are internal methods I think just rename it but I'm okay with renaming in master. * affects SimpleFacets.getFacetTermEnumCounts (classic faceting), FacetFieldProcessorByEnumTermsStream (JSON facets), UnInvertedField, GraphTermsQParser, JoinQParser, SolrIndexSearcher.getFirstMatch * In GraphTermsQParser I further noticed the non-SolrIndexSearcher fallback logic was broken as it didn't check for a null liveDocs. Will we ever even get to this code? Any way, I decided to replace these many lines with something simpler. IMO some callers of {{SolrIndexSearcher.getSlowAtomicReader}} should change to use {{MultiFields}} to avoid the temptation to have a LeafReader that has many slow methods. I made this change in SimpleFacets.getFacetTermEnumCounts. This could be a follow-up issue. IMO {{SolrIndexSearcher.getFirstMatch}} should be removed in lieu of \{{lookupId}} so there's less code to maintain. Admittedly the latter is more verbose but we could add a utility method for callers who don't care about the segment ordinal and only want the global ID. [~ysee...@gmail.com] could you please review? This touches stuff you have been involved with. > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: SOLR-12366.patch, SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org