Grant Ingersoll wrote:
On Oct 19, 2008, at 12:56 AM, Mark Miller wrote:
Grant Ingersoll wrote:
Bear with me, b/c I'm not sure I'm following, but looking at
https://issues.apache.org/jira/browse/LUCENE-1422, I see at least 5
different implemented Attributes.
So, let's say I add a 5 more attributes and now have a total of 10
attributes. Are you saying that I then would have, potentially, 10
different variables that all point to the token as in the code
snippet above where the casting takes place? Or would I just create
a single "Super" attribute that folds in all of my new attributes,
plus any other existing ones? Or, maybe, what I would do is create
the 5 new attributes and then 1 new attribute that extends all 10,
thus allowing me to use them individually, but saving me from having
to do a whole ton of casting in my Consumer.
Potentially one consumer doing 10 things, but not likely right? I
mean, things will stay logical as they are now, and rather than a
super consumer doing everything, we will still have a chain of
consumers each doing its own piece. So more likely, maybe something
comes along every so often (another 5, over *much* time, say) and
each time we add a Consumer that uses one or two TokenStream types.
And then its just an implementation detail on whether you make a
composite TokenStream - if you have added 10 new attributes and see
it fit to make one consumer use them all, sure, make a composite,
super type, but in my mind, the way its done in the example code is
clearer/cleaner for a handful of TokenStream types. And even if you
do make the composite,super type, its likely to just be a sugar
wrapper anyway - the implementation for say, payload and positions,
should probably be maintained in their own classes anyway.
Well, there are 5 different attributes already, all of which are
commonly used. Seems weird to have to cast the same var 5 different
ways. Definitely agree that one would likely deal with this by
wrapping, but then you end up either needing to extend your wrapper or
add new wrappers...
Well yes, there are 5 attributes, but n neither of the core tokenstreams
and -filters that I changed in my patch did I have to use more than two
or three of those. Currently the only attributes that are really used
are PositionIncrementAttribute and PayloadAttribute. And the
OffsetAttribute when TermVectors are turned on.
Even in the indexing chain currently we don't have a single consumer
that needs all attributes. The FreqProxWriter needs positions and
payloads, the TermVectorsWriter needs positions and offsets.
Also, you don't have to cast the same variable multiple times. In the
current patch you would call e. g.
token.getAttribute(PayloadAttribute.class) and keep a reference to it in
the consumer or filter.
IMO even calling getAttribute() 5 times or so and storing the references
wouldn't be so bad. And if you really don't like it you could make a
wrapper as you said. You also mentioned the disadvantages of the
wrapper, e. g. that you would have to extend it to add new attributes.
But then, isn't that the same disadvantage the current Token API has?
You could even use the new API in exact the same way as the old one.
Just create a subclass of Token that has all members you need and don't
add any attributes.
So I think the new API adds more flexibility, and still offers to use it
in the same way as the old one. I however think the recommended best
practice should be to use the new attributes, for reusability of
consumers that only need certain attributes.
However, please let me know if you have any concrete recommendations
about changing the API in LUCENE-1422. There might be better ones than
the APIs I came up with. It would be good to attach some code & demo
usage, then it's much easier to talk about it and see the
(dis)advantages of the different approaches.
-Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]