[ https://issues.apache.org/jira/browse/LUCENE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-7101: --------------------------------------- Attachment: LUCENE-7101.patch Patch, adding a trivial "log merge policy" to {{OfflineSorter}}, and changing its default merge factor from 128 to 10. I also fixed the 2B points tests to not "cheat" by giving BKD more heap than the defaults, and improved {{BKDWriter}}'s temp file naming I'm running {{Test2BBKDPoints.test2D}} now ... > OfflineSorter's merging is O(N^2) cost for large sorts > ------------------------------------------------------ > > Key: LUCENE-7101 > URL: https://issues.apache.org/jira/browse/LUCENE-7101 > Project: Lucene - Core > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Blocker > Fix For: master, 6.0 > > Attachments: LUCENE-7101.patch > > > Our {{OfflineSorter}} acts just like Lucene, writing small initial > segments of sorted values (from what it was able to sort at once in > heap), periodically merging them when there are too many, and doing a > {{forceMerge(1)}} in the end. > But the merge logic is too simplistic today, resulting in O(N^2) > cost. Smallish sorts usually won't hit it, because the default 128 > merge factor is so high, but e.g. the new 2B points tests do hit the > N^2 behavior. I suspect the high merge factor hurts performance (OS > struggles to use what free RAM it has to read-ahead on 128 files), and > also risks file descriptor exhaustion. > I think we should implement a simple log merge policy for it, and drop > its default merge factor to 10. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org