[ 
https://issues.apache.org/jira/browse/LUCENE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407363#comment-17407363
 ] 

Michael McCandless commented on LUCENE-9969:
--------------------------------------------

Imagine we had a {{NUMERIC}} doc values field, holding the parent ordinal of 
each ordinal in the taxonomy index.  I think we can easily create that while 
indexing, since we already ensure a parent is assigned an ordinal before its 
children.

Then, at search time, instead of using the big non-sparse hard-allocated 
{{int[] parents}} array, we could pull a {{NumericDocValues}} iterator, sort 
the ordinals we had just counted (the bitset idea from LUCENE-10080 might help 
with that?), and make a single iteration through the DV iterator to find all 
parent ordinals, to then know how to collate the ordinals into each dimension?

Except for the added sort (N * log(N) worst case), performance should be good – 
doc values are already designed for this forward only iteration.  And then we 
wouldn't need {{int[] parents}} for "normal" non-hierarchical facet counting.  
For truly hierarchical facet counting I'm not sure what to do yet :)

> DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机
> ------------------------------------------------
>
>                 Key: LUCENE-9969
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9969
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 6.6.2
>            Reporter: FengFeng Cheng
>            Priority: Trivial
>         Attachments: image-2021-05-24-13-43-43-289.png
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 首先数据量很大,jvm内存为90G,但是TaxonomyIndexArrays几乎占走了一半
> !image-2021-05-24-13-43-43-289.png!
> 请问对于TaxonomyReader是否有更好的使用方式或者其他的优化?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to