[ 
https://issues.apache.org/jira/browse/LUCENE-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480770#comment-17480770
 ] 

ASF subversion and git services commented on LUCENE-10375:
----------------------------------------------------------

Commit 7ece8145bcafc60513e196ce10b0007ac05a83c8 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7ece814 ]

LUCENE-10375: Write vectors to file in flush (#617)

In a previous commit, we updated HNSW merge to first write the combined segment
vectors to a file, then use that file to build the graph. This commit applies
the same strategy to flush, which lets us use the same logic for flush and
merge.

> Speed up HNSW merge by writing combined vector data
> ---------------------------------------------------
>
>                 Key: LUCENE-10375
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10375
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Major
>          Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When merging segments together, the HNSW writer creates a VectorValues 
> instance that gives a merged view of all the segments' VectorValues. This 
> merged instance is used when constructing the new HNSW graph. Graph building 
> needs random access, and the merged VectorValues support this by mapping from 
> merged ordinals -> segments and segment ordinals.
> This mapping seems to add overhead. The nightly indexing benchmarks sometimes 
> show substantial time in Arrays.binarySearch (used to map an ordinal to a 
> segment): 
> https://blunders.io/jfr-demo/indexing-1kb-vectors-2022.01.09.18.03.19/top_down_cpu_samples.
> Instead of using a merged VectorValues to create the graph, maybe we could 
> first write all the segment vectors to a file, and use that file to build the 
> graph.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to