bq. * Are we going to keep Token.java or not? (Current patch still has it deprecated). I need to know this as well - I have to make a new Token class for the Highlighter package if this one is deprecated. It would seem a convenience to keep it around.
On Tue, Jul 21, 2009 at 8:51 AM, Grant Ingersoll (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733604#action_12733604] > > Grant Ingersoll commented on LUCENE-1693: > ----------------------------------------- > > One of the things that works really well in Solr is that any time some > significant JIRA issue is undertaken, a Wiki page is also generated that > effectively documents the ideas in the patch, as well as how to use it and > thus results in the final page effectively becoming the documentation. I > know Mike and Uwe have done a ton of work on this, but would it be too much > trouble to ask for a Wiki page that describes the current state of the > patch? It is really hard to follow, in JIRA, all the different threads and > which ones are still valid and which are not. > > > AttributeSource/TokenStream API improvements > > -------------------------------------------- > > > > Key: LUCENE-1693 > > URL: https://issues.apache.org/jira/browse/LUCENE-1693 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Analysis > > Reporter: Michael Busch > > Assignee: Michael Busch > > Priority: Minor > > Fix For: 2.9 > > > > Attachments: LUCENE-1693.patch, LUCENE-1693.patch, > lucene-1693.patch, LUCENE-1693.patch, lucene-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, PerfTest3.java, > TestAPIBackwardsCompatibility.java, TestCompatibility.java, > TestCompatibility.java, TestCompatibility.java, TestCompatibility.java > > > > > > This patch makes the following improvements to AttributeSource and > > TokenStream/Filter: > > - removes the set/getUseNewAPI() methods (including the standard > > ones). Instead by default incrementToken() throws a subclass of > > UnsupportedOperationException. The indexer tries to call > > incrementToken() initially once to see if the exception is thrown; > > if so, it falls back to the old API. > > - introduces interfaces for all Attributes. The corresponding > > implementations have the postfix 'Impl', e.g. TermAttribute and > > TermAttributeImpl. AttributeSource now has a factory for creating > > the Attribute instances; the default implementation looks for > > implementing classes with the postfix 'Impl'. Token now implements > > all 6 TokenAttribute interfaces. > > - new method added to AttributeSource: > > addAttributeImpl(AttributeImpl). Using reflection it walks up in the > > class hierarchy of the passed in object and finds all interfaces > > that the class or superclasses implement and that extend the > > Attribute interface. It then adds the interface->instance mappings > > to the attribute map for each of the found interfaces. > > - AttributeImpl now has a default implementation of toString that uses > > reflection to print out the values of the attributes in a default > > formatting. This makes it a bit easier to implement AttributeImpl, > > because toString() was declared abstract before. > > - Cloning is now done much more efficiently in > > captureState. The method figures out which unique AttributeImpl > > instances are contained as values in the attributes map, because > > those are the ones that need to be cloned. It creates a single > > linked list that supports deep cloning (in the inner class > > AttributeSource.State). AttributeSource keeps track of when this > > state changes, i.e. whenever new attributes are added to the > > AttributeSource. Only in that case will captureState recompute the > > state, otherwise it will simply clone the precomputed state and > > return the clone. restoreState(AttributeSource.State) walks the > > linked list and uses the copyTo() method of AttributeImpl to copy > > all values over into the attribute that the source stream > > (e.g. SinkTokenizer) uses. > > The cloning performance can be greatly improved if not multiple > > AttributeImpl instances are used in one TokenStream. A user can > > e.g. simply add a Token instance to the stream instead of the individual > > attributes. Or the user could implement a subclass of AttributeImpl that > > implements exactly the Attribute interfaces needed. I think this > > should be considered an expert API (addAttributeImpl), as this manual > > optimization is only needed if cloning performance is crucial. I ran > > some quick performance tests using Tee/Sink tokenizers (which do > > cloning) and the performance was roughly 20% faster with the new > > API. I'll run some more performance tests and post more numbers then. > > Note also that when we add serialization to the Attributes, e.g. for > > supporting storing serialized TokenStreams in the index, then the > > serialization should benefit even significantly more from the new API > > than cloning. > > Also, the TokenStream API does not change, except for the removal > > of the set/getUseNewAPI methods. So the patches in LUCENE-1460 > > should still work. > > All core tests pass, however, I need to update all the documentation > > and also add some unit tests for the new AttributeSource > > functionality. So this patch is not ready to commit yet, but I wanted > > to post it already for some feedback. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- -- - Mark http://www.lucidimagination.com