[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741812#comment-16741812
 ] 

ASF subversion and git services commented on LUCENE-8623:
---------------------------------------------------------

Commit 35955b3891ed6621d5faa1c2c20ce0a333bc7b83 in lucene-solr's branch 
refs/heads/branch_7x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35955b3 ]

LUCENE-8623: Decrease I/O pressure when merging high dimensional points


> Decrease I/O pressure when merging high dimensional points
> ----------------------------------------------------------
>
>                 Key: LUCENE-8623
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8623
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to