[ 
https://issues.apache.org/jira/browse/LUCENE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092802#comment-14092802
 ] 

Michael McCandless commented on LUCENE-5875:
--------------------------------------------

Hmm, we could simply decrease the PagedGrowableWriter from 1<<30 (1B values in 
each packed array) to 1<<27 (128M values)?  Asking for a single contiguous 
packed array with 1 B values and a highish bpv can easily be a lot of RAM (8 GB 
in the worst case).

One thing to try is enabling doShareSuffix, but then try setting 
doShareNonSingletonNodes to false; this should be a big reduction on RAM 
required, while making the resulting FST a big larger than minimal.  If that's 
still too much RAM, try decreasing shareMaxTailLength from Integer.MAX_VALUE to 
smallish numbers, e.g. maybe 10 or 5 or 4 or so.  As that number gets smaller, 
the RAM required to build will decrease, and the FST will grow in size.

On packing, it looks like the FST code cannot handle > 2.1 B nodes when packing 
is enabled, but this looks like something we could fix (it was just skipped 
when we did LUCENE-3298).  However, you should have hit IllegalStateException, 
not NegativeArraySizeException.  Oh, actually, I suspect this was due to 
LUCENE-5844, which will be fixed in 4.10, at which point you really should hit 
IllegalStateException.  The thing is, even if we fix packing to allow > 2.1 B 
nodes, packing is additionally RAM intensive (i.e., adds to the RAM required 
for normal FST building) ... and I'm not sure how much shrinkage packing 
actually buys these days (we've made some improvements to the unpacked format). 
 Do you have any numbers from your large FSTs?

> Default page/block sizes in the FST package can cause OOMs
> ----------------------------------------------------------
>
>                 Key: LUCENE-5875
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5875
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 4.9
>            Reporter: Christian Ziech
>            Priority: Minor
>
> We are building some fairly big FSTs (the biggest one having about 500M terms 
> with an average of 20 characters per term) and that works very well so far.
> The problem is just that we can use neither the "doShareSuffix" nor the 
> "doPackFST" option from the builder since both would cause us to get 
> exceptions. One beeing an OOM and the other an IllegalArgumentException for a 
> negative array size in ArrayUtil.
> The thing here is that we in theory still have far more than enough memory 
> available but it seems that java for some reason cannot allocate byte or long 
> arrays of the size the NodeHash needs (maybe fragmentation?).
> Reducing the constant in the NodeHash from 1<<30 to e.g. 27 seems to fix the 
> issue mostly. Could e.g. the Builder pass through its bytesPageBits to the 
> NodeHash or could we get a custom parameter for that?
> The other problem we run into was a NegativeArraySizeException when we try to 
> pack the FST. It seems that we overflowed to 0x80000000. Unfortunately I 
> accidentally overwrote that exception but I remember it was triggered by the 
> GrowableWriter for the inCounts in line 728 of the FST. If it helps I can try 
> to reproduce it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to