Re: New Token API was Re: Payloads and TrieRangeQuery

Grant Ingersoll Mon, 15 Jun 2009 10:11:17 -0700


On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:

I'm not sure why this (currently having to implement next() too) issuch an issue for you. You brought it up at the Lucene meetup too.No user will ever have to implement both (the new API and the old)in their streams/filters. The only reason why we did it this way isto not sacrifice performance for existing streams/filters whenpeople switch to Lucene 2.9. I explained this point in the jira issue:
http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881
The only time when we'll ever have to implement both APIs is betweennow and 2.9, only for new streams and filters that we add before 2.9is released. I don't think it'd be reasonable to consider thisdisadvantage as a show stopper.

It's an issue b/c I don't like writing dead code and who knows when2.9 will actually be out.


I don't think it is a show stopper either.

Add on top of it, that the whole point of customizing the chain isto use it in search and, frankly speaking, somehow I think thatpart of the patch was held back.
I'm not sure what you're implying. Could you elaborate?

Sorry, see my response to Michael M. on this. I didn't mean to implyyou were doing something malicious, just that it always felt half doneto me. Knowing you, you don't strike me as someone who does thingshalf way, so that's why I felt it was held back. But, as Michael Mreminded me, it is complex, so please accept my apologies.

The search side of the API is currently being developed inLucene-1458. 1458 will not make it into 2.9. Therefore I agree thatit is not very advantageous to switch to the new API right now forLucene users. On the other hand, I don't think it hurts either.

I am not sure I agree here. Forcing people to upgrade their analyzerscan be quite involved. Analyzers are one of the main areas thatpeople do custom work. Solr, for instance, has 11 custom TokenFiltersright now as well as custom Tokenizers, not too mention the ones usedduring testing that aren't shipped. Upgrading these is a lot ofwork. I know in previous jobs, I also maintained a fair numberTokenStream related stuff. This should not be underestimated.Furthermore, as I said back in the initial discussion, Lucene'sAnalyzer stuff is often used outside of Lucene.

In fact, I often think the Analysis piece should be a standalone jar(not requiring core) and that core should have a dependency on it. Inother words, move o.a.l.analysis (and contrib/analsis) to a standalonemodule that core depends on. This would make it easier for others toconsume the Analysis functionality.

I personally would vote for reverting until a complete patch thataddresses both sides of the problem is submitted and a bettersolution to cloning is put forth.
If we revert now and put a new flexible API like this into 3.x,which I think is necessary to utilize flexible indexing, then we'llhave to wait until 4.0 before we can remove the old API.Disadvantages like the one you mentioned above, will then probablybe present much longer.
I mentioned in the following thread that I have started working on abetter way of cloning, which will actually be faster compared to theold API. I'll try to get the code out asap.
http://markmail.org/message/q7pgh2qlm2w7cxfx
I'd be happy to discuss other API proposals that anybody brings uphere, that have the same advantages and are more intuitive. We couldalso beef up the documentation and give a better example about howto convert a stream/filter from the old to the new API; aconstructive suggestion that Uwe made at the ApacheCon.

My point here was, at the time, that if others wanted to revert, Iprobably would vote for it. I'm not proposing we do it, as I think wecan make do with what we have. Given the discussion here, I wouldprobably change my mind and not support it now.

I think it might be helpful to have some help for people upgrading.Perhaps an abstract class that provides the "core" Token attributesout of the box as a base class that they can then extend? That beingsaid, forcing people to upgrade could at least help them think aboutthe fact that they have no use for the Type attribute or the Offsetsattributes. And, testing the cloning stuff would help. I think thecurrent approach underestimates the number of people who need tobuffer tokens in memory before handing them out. Sure, it's not asmany as the main use case, but it's not zero either.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Reply via email to