[jira] Commented: (LUCENE-252) [PATCH] Problem with Sort logic on tokenized fields

Hoss Man (JIRA) Fri, 09 Mar 2007 08:57:30 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479648
 ]


Hoss Man commented on LUCENE-252:
---------------------------------

I'm afraid i'm still not understanding the issue in nutch, it seems like the 
root of hte problem is..

> ... We use the tokenized url field in the FieldCache ...

...if you know this field is tokenized, don't use it this way.  if you want to 
use it this way, index it a second time untokenized.

At a more practical level:

1) the change you propose to getStrings and getStringIndex is not practical 
because as we've discussed before, a field being tokenized isn't a garuntee 
that FieldCache won't work -- isTokenized just inidcates that an Analyzer was 
used -- it doesn't indicate that any real tokenization took place (the analyzer 
might have just been used to lowercase the field value before indexing, or 
strip off leading/trailing white space) that doesn't mean the normal FieldCache 
can't be used for sorting.  the converse is also true: !isTokenized doens't 
tell you that it's safe to build the FieldCache -- even if no Analyzer is ever 
used, multiple Field values can be added for the same field -- and that is hte 
root cause of hte problem, not tokenization but multiple terms for a given 
field.

2) the desired behavior you are requesting in a StoredFieldCacheImpl could be 
done without making any changes to what so ever to FieldCacheImpl -- since 
nutch knows exactly which fields it's indexing multiple tokens for, it can make 
the choice between using a StoredFieldCacheImple or using a FieldCacheImpl. 
(but as i've said, i really don't think that's the right solution)

> [PATCH] Problem with Sort logic on tokenized fields
> ---------------------------------------------------
>
>                 Key: LUCENE-252
>                 URL: https://issues.apache.org/jira/browse/LUCENE-252
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.4
>         Environment: Operating System: other
> Platform: All
>            Reporter: Aviran Mordo
>         Assigned To: Lucene Developers
>         Attachments: dif.txt, 
> FieldCacheImpl_Tokenized_fields_lucene_2.0.patch, 
> FieldCacheImpl_Tokenized_fields_lucene_2.0_v1.1.patch, 
> FieldCacheImpl_Tokenized_fields_lucene_2.2-dev.patch
>
>
> When you set s SortField to a Text field which gets tokenized
> FieldCacheImpl uses the term to do the sort, but then sorting is off 
> especially with more then one word in the field. I think it is much 
> more logical to sort by field's string value if the sort field is Tokenized 
> and
> stored. This way you'll get the CORRECT sort order

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-252) [PATCH] Problem with Sort logic on tokenized fields

Reply via email to