[jira] [Commented] (LUCENE-6233) CheckIndex is dog slow when checking term vectors
[ https://issues.apache.org/jira/browse/LUCENE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315133#comment-14315133 ] ASF subversion and git services commented on LUCENE-6233: - Commit 1658832 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1658832 ] LUCENE-6233: speed up CheckIndex when the index has term vectors > CheckIndex is dog slow when checking term vectors > - > > Key: LUCENE-6233 > URL: https://issues.apache.org/jira/browse/LUCENE-6233 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.1 > > Attachments: LUCENE-6223.patch, LUCENE-6233.patch, LUCENE-6233.patch > > > I'm working on a test that creates a biggish index and I noticed the > CheckIndex takes a surprisingly long time to check term vectors. > I profiled it and uncovered that we spend a lot of time (not sure this > explains all of it) in Terms.getMin/getMax. Since > CompressingTermVectorsReader doesn't impl these methods efficiently (which is > fine), we fallback to super's impl, which does a digit-by-digit binary search > using seekCeil. > But for TVs this sometimes results in a linear scan. > I think CheckIndex should not check Terms.getMin/Max for TVs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6233) CheckIndex is dog slow when checking term vectors
[ https://issues.apache.org/jira/browse/LUCENE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315126#comment-14315126 ] ASF subversion and git services commented on LUCENE-6233: - Commit 1658831 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1658831 ] LUCENE-6233: speed up CheckIndex when the index has term vectors > CheckIndex is dog slow when checking term vectors > - > > Key: LUCENE-6233 > URL: https://issues.apache.org/jira/browse/LUCENE-6233 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-6223.patch, LUCENE-6233.patch, LUCENE-6233.patch > > > I'm working on a test that creates a biggish index and I noticed the > CheckIndex takes a surprisingly long time to check term vectors. > I profiled it and uncovered that we spend a lot of time (not sure this > explains all of it) in Terms.getMin/getMax. Since > CompressingTermVectorsReader doesn't impl these methods efficiently (which is > fine), we fallback to super's impl, which does a digit-by-digit binary search > using seekCeil. > But for TVs this sometimes results in a linear scan. > I think CheckIndex should not check Terms.getMin/Max for TVs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6233) CheckIndex is dog slow when checking term vectors
[ https://issues.apache.org/jira/browse/LUCENE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315022#comment-14315022 ] Michael McCandless commented on LUCENE-6233: OK I noticed one case where live docs didn't confess how long it took :) I'll fix that and commit. > CheckIndex is dog slow when checking term vectors > - > > Key: LUCENE-6233 > URL: https://issues.apache.org/jira/browse/LUCENE-6233 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-6223.patch, LUCENE-6233.patch, LUCENE-6233.patch > > > I'm working on a test that creates a biggish index and I noticed the > CheckIndex takes a surprisingly long time to check term vectors. > I profiled it and uncovered that we spend a lot of time (not sure this > explains all of it) in Terms.getMin/getMax. Since > CompressingTermVectorsReader doesn't impl these methods efficiently (which is > fine), we fallback to super's impl, which does a digit-by-digit binary search > using seekCeil. > But for TVs this sometimes results in a linear scan. > I think CheckIndex should not check Terms.getMin/Max for TVs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6233) CheckIndex is dog slow when checking term vectors
[ https://issues.apache.org/jira/browse/LUCENE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314413#comment-14314413 ] Robert Muir commented on LUCENE-6233: - {quote} I think CheckIndex should not check Terms.getMin/Max for TVs? {quote} +1 > CheckIndex is dog slow when checking term vectors > - > > Key: LUCENE-6233 > URL: https://issues.apache.org/jira/browse/LUCENE-6233 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > > I'm working on a test that creates a biggish index and I noticed the > CheckIndex takes a surprisingly long time to check term vectors. > I profiled it and uncovered that we spend a lot of time (not sure this > explains all of it) in Terms.getMin/getMax. Since > CompressingTermVectorsReader doesn't impl these methods efficiently (which is > fine), we fallback to super's impl, which does a digit-by-digit binary search > using seekCeil. > But for TVs this sometimes results in a linear scan. > I think CheckIndex should not check Terms.getMin/Max for TVs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6233) CheckIndex is dog slow when checking term vectors
[ https://issues.apache.org/jira/browse/LUCENE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314391#comment-14314391 ] Michael McCandless commented on LUCENE-6233: This was introduced with LUCENE-5610 I'll fix the nightly Lucene benchmark to plot CheckIndex time ... we could have spotted this performance regression. > CheckIndex is dog slow when checking term vectors > - > > Key: LUCENE-6233 > URL: https://issues.apache.org/jira/browse/LUCENE-6233 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > > I'm working on a test that creates a biggish index and I noticed the > CheckIndex takes a surprisingly long time to check term vectors. > I profiled it and uncovered that we spend a lot of time (not sure this > explains all of it) in Terms.getMin/getMax. Since > CompressingTermVectorsReader doesn't impl these methods efficiently (which is > fine), we fallback to super's impl, which does a digit-by-digit binary search > using seekCeil. > But for TVs this sometimes results in a linear scan. > I think CheckIndex should not check Terms.getMin/Max for TVs? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org