[ 
https://issues.apache.org/jira/browse/LUCENE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350589#comment-17350589
 ] 

Gautam Worah commented on LUCENE-9969:
--------------------------------------

If I understand things correctly from Google Translate you are trying to dig 
into why the collective {{TaxonomyIndexArrays}} (parent, child and sibling 
arrays) are taking about 60% of your JVM heap.

I suspect that the cardinality in your data is very high i.e, the number of 
unique taxonomy ordinals is very high causing the parent, children and sibling 
arrays to be massive which in turn occupy space in your heap.

One recommendation that I have is to try to bucket values and reduce this 
cardinality. Does your application need counts for all the individual values or 
can it work with buckets or "coarser" values. This might significantly reduce 
the number of labels in your index and consequently the number of ordinals.

You can even try other techniques of finding very optimized buckets (not just 
uniform ones). Anything that reduces cardinality will reap huge benefits in 
terms of ordinal array size.

Otherwise, maybe you could try upgrading Lucene to its latest release (8.8)? 
Newer versions of Lucene taxonomy faceting have 
[switched|https://github.com/apache/lucene-solr/pull/1733] to 
{{BinaryDocValues}} instead of {{StoredFields}} and generally have a lot of 
useful improvements that help with performance.

It might help to provide more information into how your application uses 
faceting or the type of data it uses...

> DirectoryTaxonomyReader.taxoArray占用内存较大导致系统OOM宕机
> ------------------------------------------------
>
>                 Key: LUCENE-9969
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9969
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 6.6.2
>            Reporter: FengFeng Cheng
>            Priority: Trivial
>         Attachments: image-2021-05-24-13-43-43-289.png
>
>
> 首先数据量很大,jvm内存为90G,但是TaxonomyIndexArrays几乎占走了一半
> !image-2021-05-24-13-43-43-289.png!
> 请问对于TaxonomyReader是否有更好的使用方式或者其他的优化?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to