I am building an FST.  Here is an excerpt from my code;
    //build the FST from the workingSet
    Builder<IntsRef> builder = new Builder<IntsRef>(FST.INPUT_TYPE.BYTE4, 
outputs);
    IntsRef sortedKeys[] = workingSet.keySet().toArray(new 
IntsRef[workingSet.size()]);
    Arrays.sort(sortedKeys);

    int maxPhraseLen = 0;
    int maxDocsLen = 0;
    for (IntsRef termIdsPhrase : sortedKeys) {
      IntsRef solrIds = workingSet.remove(termIdsPhrase);//remove to save memory
      assert termIdsPhrase.length > 0 && solrIds.length > 0;
      builder.add(termIdsPhrase, solrIds);
    }

    return builder.finish();

For what it's worth, the input side is maximum 7 integers long, and the output 
side is typically the same but there are a small number that get as high as 48K 
integers long.  There are 10M entries.

After many calls to builder.add(), and with assertions enabled, I eventually 
this exception:

Exception in thread "main" java.lang.AssertionError: size must be positive (got 
-262796219): likely integer overflow?
        at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:336)
        at org.apache.lucene.util.fst.FST.addNode(FST.java:672)
        at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:122)
        at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:195)
        at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:287)
        at org.apache.lucene.util.fst.Builder.add(Builder.java:392)
        at 
org.mitre.opensextant.solr.TaggerFstCorpus.buildPhrases(TaggerFstCorpus.java:176)
        at 
org.mitre.opensextant.solr.TaggerFstCorpus.doBuild(TaggerFstCorpus.java:61)
        at 
org.mitre.opensextant.solr.BuildCorpusExperiment.main(BuildCorpusExperiment.java:31)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


This is on Lucene 4.0-ALPHA using JDK 7.  I'm using 6GB of heap; my attempts to 
use less resulted in Out-of-memory errors.  What FST size limitation am I 
bumping up against?

~ David
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to