[ https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967796#comment-13967796 ]
Robert Muir commented on LUCENE-5584: ------------------------------------- But this is the *right* thing to do. you can compress it however you want, you can move it to disk (since its like "stored fields" for your top-N), you can do all kinds of things with it. As for numeric outputs being a problem _at all_, I do not believe you. a benchmark is required. > Allow FST read method to also recycle the output value when traversing FST > -------------------------------------------------------------------------- > > Key: LUCENE-5584 > URL: https://issues.apache.org/jira/browse/LUCENE-5584 > Project: Lucene - Core > Issue Type: Improvement > Components: core/FSTs > Affects Versions: 4.7.1 > Reporter: Christian Ziech > > The FST class heavily reuses Arc instances when traversing the FST. The > output of an Arc however is not reused. This can especially be important when > traversing large portions of a FST and using the ByteSequenceOutputs and > CharSequenceOutputs. Those classes create a new byte[] or char[] for every > node read (which has an output). > In our use case we intersect a lucene Automaton with a FST<BytesRef> much > like it is done in > org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and > since the Automaton and the FST are both rather large tens or even hundreds > of thousands of temporary byte array objects are created. > One possible solution to the problem would be to change the > org.apache.lucene.util.fst.Outputs class to have two additional methods (if > you don't want to change the existing methods for compatibility): > {code} > /** Decode an output value previously written with {@link > * #write(Object, DataOutput)} reusing the object passed in if possible */ > public abstract T read(DataInput in, T reuse) throws IOException; > /** Decode an output value previously written with {@link > * #writeFinalOutput(Object, DataOutput)}. By default this > * just calls {@link #read(DataInput)}. This tries to reuse the object > * passed in if possible */ > public T readFinalOutput(DataInput in, T reuse) throws IOException { > return read(in, reuse); > } > {code} > The new methods could then be used in the FST in the readNextRealArc() method > passing in the output of the reused Arc. For most inputs they could even just > invoke the original read(in) method. > If you should decide to make that change I'd be happy to supply a patch > and/or tests for the feature. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org