On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'm not sure why this (currently having to implement next() too) is
such an issue for you. You brought it up at the Lucene meetup too.
No user will ever have to implement both (the new API and the old)
in their streams/filters. The only reason why we did it this way is
to not sacrifice performance for existing streams/filters when
people switch to Lucene 2.9. I explained this point in the jira issue:
http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881
The only time when we'll ever have to implement both APIs is between
now and 2.9, only for new streams and filters that we add before 2.9
is released. I don't think it'd be reasonable to consider this
disadvantage as a show stopper.
It's an issue b/c I don't like writing dead code and who knows when
2.9 will actually be out.
I don't think it is a show stopper either.
Add on top of it, that the whole point of customizing the chain is
to use it in search and, frankly speaking, somehow I think that
part of the patch was held back.
I'm not sure what you're implying. Could you elaborate?
Sorry, see my response to Michael M. on this. I didn't mean to imply
you were doing something malicious, just that it always felt half done
to me. Knowing you, you don't strike me as someone who does things
half way, so that's why I felt it was held back. But, as Michael M
reminded me, it is complex, so please accept my apologies.
The search side of the API is currently being developed in
Lucene-1458. 1458 will not make it into 2.9. Therefore I agree that
it is not very advantageous to switch to the new API right now for
Lucene users. On the other hand, I don't think it hurts either.
I am not sure I agree here. Forcing people to upgrade their analyzers
can be quite involved. Analyzers are one of the main areas that
people do custom work. Solr, for instance, has 11 custom TokenFilters
right now as well as custom Tokenizers, not too mention the ones used
during testing that aren't shipped. Upgrading these is a lot of
work. I know in previous jobs, I also maintained a fair number
TokenStream related stuff. This should not be underestimated.
Furthermore, as I said back in the initial discussion, Lucene's
Analyzer stuff is often used outside of Lucene.
In fact, I often think the Analysis piece should be a standalone jar
(not requiring core) and that core should have a dependency on it. In
other words, move o.a.l.analysis (and contrib/analsis) to a standalone
module that core depends on. This would make it easier for others to
consume the Analysis functionality.
I personally would vote for reverting until a complete patch that
addresses both sides of the problem is submitted and a better
solution to cloning is put forth.
If we revert now and put a new flexible API like this into 3.x,
which I think is necessary to utilize flexible indexing, then we'll
have to wait until 4.0 before we can remove the old API.
Disadvantages like the one you mentioned above, will then probably
be present much longer.
I mentioned in the following thread that I have started working on a
better way of cloning, which will actually be faster compared to the
old API. I'll try to get the code out asap.
http://markmail.org/message/q7pgh2qlm2w7cxfx
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about how
to convert a stream/filter from the old to the new API; a
constructive suggestion that Uwe made at the ApacheCon.
My point here was, at the time, that if others wanted to revert, I
probably would vote for it. I'm not proposing we do it, as I think we
can make do with what we have. Given the discussion here, I would
probably change my mind and not support it now.
I think it might be helpful to have some help for people upgrading.
Perhaps an abstract class that provides the "core" Token attributes
out of the box as a base class that they can then extend? That being
said, forcing people to upgrade could at least help them think about
the fact that they have no use for the Type attribute or the Offsets
attributes. And, testing the cloning stuff would help. I think the
current approach underestimates the number of people who need to
buffer tokens in memory before handing them out. Sure, it's not as
many as the main use case, but it's not zero either.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org