On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'm not sure why this (currently having to implement next() too) is such an issue for you. You brought it up at the Lucene meetup too. No user will ever have to implement both (the new API and the old) in their streams/filters. The only reason why we did it this way is to not sacrifice performance for existing streams/filters when people switch to Lucene 2.9. I explained this point in the jira issue:

http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881

The only time when we'll ever have to implement both APIs is between now and 2.9, only for new streams and filters that we add before 2.9 is released. I don't think it'd be reasonable to consider this disadvantage as a show stopper.

It's an issue b/c I don't like writing dead code and who knows when 2.9 will actually be out.

I don't think it is a show stopper either.

Add on top of it, that the whole point of customizing the chain is to use it in search and, frankly speaking, somehow I think that part of the patch was held back.

I'm not sure what you're implying. Could you elaborate?

Sorry, see my response to Michael M. on this. I didn't mean to imply you were doing something malicious, just that it always felt half done to me. Knowing you, you don't strike me as someone who does things half way, so that's why I felt it was held back. But, as Michael M reminded me, it is complex, so please accept my apologies.


The search side of the API is currently being developed in Lucene-1458. 1458 will not make it into 2.9. Therefore I agree that it is not very advantageous to switch to the new API right now for Lucene users. On the other hand, I don't think it hurts either.

I am not sure I agree here. Forcing people to upgrade their analyzers can be quite involved. Analyzers are one of the main areas that people do custom work. Solr, for instance, has 11 custom TokenFilters right now as well as custom Tokenizers, not too mention the ones used during testing that aren't shipped. Upgrading these is a lot of work. I know in previous jobs, I also maintained a fair number TokenStream related stuff. This should not be underestimated. Furthermore, as I said back in the initial discussion, Lucene's Analyzer stuff is often used outside of Lucene.

In fact, I often think the Analysis piece should be a standalone jar (not requiring core) and that core should have a dependency on it. In other words, move o.a.l.analysis (and contrib/analsis) to a standalone module that core depends on. This would make it easier for others to consume the Analysis functionality.


I personally would vote for reverting until a complete patch that addresses both sides of the problem is submitted and a better solution to cloning is put forth.

If we revert now and put a new flexible API like this into 3.x, which I think is necessary to utilize flexible indexing, then we'll have to wait until 4.0 before we can remove the old API. Disadvantages like the one you mentioned above, will then probably be present much longer.

I mentioned in the following thread that I have started working on a better way of cloning, which will actually be faster compared to the old API. I'll try to get the code out asap.
http://markmail.org/message/q7pgh2qlm2w7cxfx

I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about how to convert a stream/filter from the old to the new API; a constructive suggestion that Uwe made at the ApacheCon.

My point here was, at the time, that if others wanted to revert, I probably would vote for it. I'm not proposing we do it, as I think we can make do with what we have. Given the discussion here, I would probably change my mind and not support it now.

I think it might be helpful to have some help for people upgrading. Perhaps an abstract class that provides the "core" Token attributes out of the box as a base class that they can then extend? That being said, forcing people to upgrade could at least help them think about the fact that they have no use for the Type attribute or the Offsets attributes. And, testing the cloning stuff would help. I think the current approach underestimates the number of people who need to buffer tokens in memory before handing them out. Sure, it's not as many as the main use case, but it's not zero either.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to