[ https://issues.apache.org/jira/browse/LUCENE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195139#comment-15195139 ]
ASF subversion and git services commented on LUCENE-7101: --------------------------------------------------------- Commit 6168fe1afb40286a7515b5909c3eb41db1ab6d00 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6168fe1 ] LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many temporary file descriptors, for large sorts Conflicts: lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java lucene/core/src/test/org/apache/lucene/util/bkd/Test2BBKDPoints.java > OfflineSorter's merging is O(N^2) cost for large sorts > ------------------------------------------------------ > > Key: LUCENE-7101 > URL: https://issues.apache.org/jira/browse/LUCENE-7101 > Project: Lucene - Core > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Blocker > Fix For: master, 6.0 > > Attachments: LUCENE-7101.patch > > > Our {{OfflineSorter}} acts just like Lucene, writing small initial > segments of sorted values (from what it was able to sort at once in > heap), periodically merging them when there are too many, and doing a > {{forceMerge(1)}} in the end. > But the merge logic is too simplistic today, resulting in O(N^2) > cost. Smallish sorts usually won't hit it, because the default 128 > merge factor is so high, but e.g. the new 2B points tests do hit the > N^2 behavior. I suspect the high merge factor hurts performance (OS > struggles to use what free RAM it has to read-ahead on 128 files), and > also risks file descriptor exhaustion. > I think we should implement a simple log merge policy for it, and drop > its default merge factor to 10. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org