[jira] [Commented] (LUCENE-6842) No way to limit the fields cached in memory and leads to OOM when there are thousand of fields (thousands)

Jack Krupansky (JIRA) Mon, 19 Oct 2015 07:58:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963420#comment-14963420
 ]


Jack Krupansky commented on LUCENE-6842:
----------------------------------------

Generally, Lucene has few hard limits, but the general guidance is that 
ultimately you will be limited by available system resources such as RAM and 
CPU. There may not be any hard limit to the number of fields, but that doesn't 
mean that you can safely assume that a large number of fields will always work 
for a limited amount of RAM and CPU. Exactly how much RAM and CPU you need will 
depend on your specific application, that you yourself will have to test for - 
known as a proof of concept.

Generally, people have resource problems based on the number of documents 
rather than the number of fields for each document. You haven't detailed how 
many documents you are indexing and how many of these fields are actually 
present in an average document. Who knows, maybe the number of fields is not 
the problem per se and it is the number of documents that is the cause of the 
resource issue, or a combination of the two.

That said, I will defer to the more senior Lucene committers here, but 
personally I would suggest that "hundreds" or "low thousands" is a more 
practical recommended best practice upper limit to total number of fields in a 
Lucene index. Generally, "dozens" or at most "low hundreds" would be most 
recommended and the safest assumption. Sure, maybe 10,000 fields might actually 
work, but then number of documents and operations and query complexity will 
also come into play.

All of that said, I'm sure we are all intently curious why exactly you feel 
that you need so many fields.

> No way to limit the fields cached in memory and leads to OOM when there are 
> thousand of fields (thousands)
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6842
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6842
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.6.1
>         Environment: Linux, openjdk 1.6.x
>            Reporter: Bala Kolla
>         Attachments: HistogramOfHeapUsage.png
>
>
> I am opening this defect to get some guidance on how to handle a case of 
> server running out of memory and it seems like it's something to do how we 
> index. But want to know if there is anyway to reduce the impact of this on 
> memory usage before we look into the way of reducing the number of fields. 
> Basically we have many thousands of fields being indexed and it's causing a 
> large amount of memory being used (25GB) and eventually leading to 
> application to hang and force us to restart every few minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6842) No way to limit the fields cached in memory and leads to OOM when there are thousand of fields (thousands)

Reply via email to