[ https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964504#comment-13964504 ]
Michael McCandless commented on LUCENE-5584: -------------------------------------------- Karl, do you have a test case showing the non-short-lived garbage? As far as I can tell, all usage in Lucene very quickly drops references to all the intermediate outputs, and then saves/returns only the final "result" from an FST traversal. Or, maybe you meant to say that all the short-lived garbage was slowing down your JVM? In either case, net/net I agree it would be nice to not create so much garbage. > Allow FST read method to also recycle the output value when traversing FST > -------------------------------------------------------------------------- > > Key: LUCENE-5584 > URL: https://issues.apache.org/jira/browse/LUCENE-5584 > Project: Lucene - Core > Issue Type: Improvement > Components: core/FSTs > Affects Versions: 4.7.1 > Reporter: Christian Ziech > > The FST class heavily reuses Arc instances when traversing the FST. The > output of an Arc however is not reused. This can especially be important when > traversing large portions of a FST and using the ByteSequenceOutputs and > CharSequenceOutputs. Those classes create a new byte[] or char[] for every > node read (which has an output). > In our use case we intersect a lucene Automaton with a FST<BytesRef> much > like it is done in > org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and > since the Automaton and the FST are both rather large tens or even hundreds > of thousands of temporary byte array objects are created. > One possible solution to the problem would be to change the > org.apache.lucene.util.fst.Outputs class to have two additional methods (if > you don't want to change the existing methods for compatibility): > {code} > /** Decode an output value previously written with {@link > * #write(Object, DataOutput)} reusing the object passed in if possible */ > public abstract T read(DataInput in, T reuse) throws IOException; > /** Decode an output value previously written with {@link > * #writeFinalOutput(Object, DataOutput)}. By default this > * just calls {@link #read(DataInput)}. This tries to reuse the object > * passed in if possible */ > public T readFinalOutput(DataInput in, T reuse) throws IOException { > return read(in, reuse); > } > {code} > The new methods could then be used in the FST in the readNextRealArc() method > passing in the output of the reused Arc. For most inputs they could even just > invoke the original read(in) method. > If you should decide to make that change I'd be happy to supply a patch > and/or tests for the feature. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org