[ https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741811#comment-16741811 ]
ASF subversion and git services commented on LUCENE-8623: --------------------------------------------------------- Commit 74ee4ddf4eb7b6c7f60c3e1fb73da0427c0085ac in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74ee4dd ] LUCENE-8623: Decrease I/O pressure when merging high dimensional points > Decrease I/O pressure when merging high dimensional points > ---------------------------------------------------------- > > Key: LUCENE-8623 > URL: https://issues.apache.org/jira/browse/LUCENE-8623 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Ignacio Vera > Priority: Major > Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, > LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png > > > Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion > triangles) using {{LatLonShape}}, the index directory grew to a size of 265 > GB when performing merging of different segments. After the processes were > over the index size was 57 GB. > As an example imagine we are merging several segments to a new segment of > size 10GB (4 dimensions). The BKD tree merging logic will create the > following files: > 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB > 2) Level 1: 6 copies of half of the data, left and right : 30GB > 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB > 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB > 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB > > and so on... So it requires around 100GB to merge that segment. > In this issue is proposed to delay the creation of sorted copies to when they > are needed. It reduces the total size required to half of what it is needed > now. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org