[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

Michael McCandless (JIRA) Sun, 07 Mar 2010 17:37:50 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842516#action_12842516
 ]


Michael McCandless commented on LUCENE-2302:
--------------------------------------------

{quote}
bq. A CollationFilter will not be needed anymore after that change, as any 
Tokenizer-Chain that wants to use collation can simply supply a special 
AttributeFactory to the ctor, that creates a special TermAttributeImpl class 
with modified getBytesRef().

Mike M. noted on LUCENE-1435 that the way to do "internal-to-indexing" 
collation is to store the original string in the term dictionary, sorted via 
user-specifiable collation.
{quote}

This works in flex now -- every field can specify its own Comparator (via the 
Codec).

But collation during indexing also works... but'd work better w/ these changes 
(not have to use indexable binary strings).

> Replacement for TermAttribute+Impl with extended capabilities (byte[] 
> support, CharSequence, Appendable)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2302
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>             Fix For: Flex Branch
>
>
> For flexible indexing terms can be simple byte[] arrays, while the current 
> TermAttribute only supports char[]. This is fine for plain text, but e.g 
> NumericTokenStream should directly work on the byte[] array.
> Also TermAttribute lacks of some interfaces that would make it simplier for 
> users to work with them: Appendable and CharSequence
> I propose to create a new interface "CharTermAttribute" with a clean new API 
> that concentrates on CharSequence and Appendable.
> The implementation class will simply support the old and new interface 
> working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of 
> this. So if somebody adds a TermAttribute, he will get an implementation 
> class that can be also used as CharTermAttribute. As both attributes create 
> the same impl instance both calls to addAttribute are equal. So a TokenFilter 
> that adds CharTermAttribute to the source will work with the same instance as 
> the Tokenizer that requested the (deprecated) TermAttribute.
> To also support byte[] only terms like Collation or NumericField needs, a 
> separate getter-only interface will be added, that returns a reusable 
> BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will 
> also support this interface. For backwards compatibility with old 
> self-made-TermAttribute implementations, the indexer will check with 
> hasAttribute(), if the BytesRef getter interface is there and if not will 
> wrap a old-style TermAttribute (a deprecated wrapper class will be provided): 
> new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the 
> indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

Reply via email to