[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations

Julie Tibshirani (Jira) Thu, 24 Feb 2022 10:57:07 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497648#comment-17497648
 ]


Julie Tibshirani commented on LUCENE-10391:
-------------------------------------------

Now that the benchmarks are running again, we can see an improvement in index 
throughput. It might be a combined effect between this change and LUCENE-10408.

!Screen Shot 2022-02-24 at 10.18.42 AM.png|width=444,height=277!

In the profiles, we are still seeing some NeighborQueue allocations. These are 
likely from the results queue, which is still not shared. It is not 
straightforward to share it though, since its size changes across the graph 
levels (it's sometimes 1, sometimes topK). I'm inclined to close this out for 
now without making more changes, let me know what you think.
{code:java}
PERCENT       HEAP SAMPLES  STACK
26.77%        145900M       org.apache.lucene.util.fst.BytesStore#writeByte()
                              at org.apache.lucene.util.fst.FST#()
8.22%         44814M        org.apache.lucene.util.LongHeap#()
                              at org.apache.lucene.util.hnsw.NeighborQueue#() 
{code}
 

> Reuse data structures across HnswGraph invocations
> --------------------------------------------------
>
>                 Key: LUCENE-10391
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10391
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Assignee: Julie Tibshirani
>            Priority: Minor
>         Attachments: Screen Shot 2022-02-24 at 10.18.42 AM.png
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. 
> Profiles from nightly benchmarks suggest that allocating data-structures 
> incurs both lots of heap allocations 
> ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)]
>  and CPU usage 
> ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).]
>  It looks like reusing data structures across invocations would be a 
> low-hanging fruit that could help save significant CPU?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations

Reply via email to