[jira] Commented: (LUCENE-1333) Token implementation needs improvements

DM Smith (JIRA) Sun, 10 Aug 2008 16:18:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621319#action_12621319
 ]


DM Smith commented on LUCENE-1333:
----------------------------------

Back from a trip. Would have jumped in to help, but it looks like you've found 
and fixed a couple of my mistakes/oversights. Thanks.

Regarding the equals implementation. Perhaps it would also be good to finish 
the canonical implementation and provide hashcode? That would allow for storage 
in sets and maps. (Not terribly sure how that would be useful. But I can 
imagine sorting on startOffsets to organize the contents.)

Regarding the changes to the jj files. I also made the corresponding changes in 
their generate java files. While it would be good to fix the generation 
problem, you can compare the jj and java pairs to see that they match.

I agree that Token next(Token) should assume a non-null. Perhaps an assert in a 
producer is a good idea. I find it easier to debug a failed assert than a null 
pointer exception.

> Token implementation needs improvements
> ---------------------------------------
>
>                 Key: LUCENE-1333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1333
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 2.3.1
>         Environment: All
>            Reporter: DM Smith
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1333-analysis.patch, LUCENE-1333-analyzers.patch, 
> LUCENE-1333-core.patch, LUCENE-1333-highlighter.patch, 
> LUCENE-1333-instantiated.patch, LUCENE-1333-lucli.patch, 
> LUCENE-1333-memory.patch, LUCENE-1333-miscellaneous.patch, 
> LUCENE-1333-queries.patch, LUCENE-1333-snowball.patch, 
> LUCENE-1333-wikipedia.patch, LUCENE-1333-wordnet.patch, 
> LUCENE-1333-xml-query-parser.patch, LUCENE-1333.patch, LUCENE-1333.patch, 
> LUCENE-1333.patch, LUCENE-1333.patch, LUCENE-1333a.txt
>
>
> This was discussed in the thread (not sure which place is best to reference 
> so here are two):
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/[EMAIL 
> PROTECTED]
> or to see it all at once:
> http://www.gossamer-threads.com/lists/lucene/java-dev/62851
> Issues:
> 1. JavaDoc is insufficient, leading one to read the code to figure out how to 
> use the class.
> 2. Deprecations are incomplete. The constructors that take String as an 
> argument and the methods that take and/or return String should *all* be 
> deprecated.
> 3. The allocation policy is too aggressive. With large tokens the resulting 
> buffer can be over-allocated. A less aggressive algorithm would be better. In 
> the thread, the Python example is good as it is computationally simple.
> 4. The parts of the code that currently use Token's deprecated methods can be 
> upgraded now rather than waiting for 3.0. As it stands, filter chains that 
> alternate between char[] and String are sub-optimal. Currently, it is used in 
> core by Query classes. The rest are in contrib, mostly in analyzers.
> 5. Some internal optimizations can be done with regard to char[] allocation.
> 6. TokenStream has next() and next(Token), next() should be deprecated, so 
> that reuse is maximized and descendant classes should be rewritten to 
> over-ride next(Token)
> 7. Tokens are often stored as a String in a Term. It would be good to add 
> constructors that took a Token. This would simplify the use of the two 
> together.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

Reply via email to