[ 
https://issues.apache.org/jira/browse/LUCENE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479354
 ] 

Hoss Man commented on LUCENE-252:
---------------------------------

definitely in agreement with yonik here, erroring out if 
"docField.isTokenized()" would prevent some perfectly valid use cases ... my 
point was that hte current test of "if (t >= mterms.length)" only triggers an 
error if htere are more total terms in the field then there are documents in 
the index ... but there can be plenty of situations where a doc has more then 
one indexed term, but the total number of indexed terms is less hten the number 
of documents, a better test would be to check and see if we have already 
recorded a term for this doc.

I have to say: I'm really not understanding how the current behavior is 
hindering nutch ... my understanding of the nutch model is that the set of 
fields is very well known -- why do you need to rely on FieldCache being smart 
enough to stop you from trying to sort on a tokenized field? (and what does 
that have to do with deleting duplicates?)

if nothing else: if nutch needs to prevent using FieldCache based sorting on 
tokenized fields, why can't the "if (docField.isTokenized())" logic be done 
outside of the FieldCacheImpl ... possibly as a way to decide if you want to 
use the basic sorting or use something like LUCENE-769?


...perhaps this is something that should be discussed more on java-dev?



> [PATCH] Problem with Sort logic on tokenized fields
> ---------------------------------------------------
>
>                 Key: LUCENE-252
>                 URL: https://issues.apache.org/jira/browse/LUCENE-252
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.4
>         Environment: Operating System: other
> Platform: All
>            Reporter: Aviran Mordo
>         Assigned To: Lucene Developers
>         Attachments: dif.txt, 
> FieldCacheImpl_Tokenized_fields_lucene_2.0.patch, 
> FieldCacheImpl_Tokenized_fields_lucene_2.0_v1.1.patch, 
> FieldCacheImpl_Tokenized_fields_lucene_2.2-dev.patch
>
>
> When you set s SortField to a Text field which gets tokenized
> FieldCacheImpl uses the term to do the sort, but then sorting is off 
> especially with more then one word in the field. I think it is much 
> more logical to sort by field's string value if the sort field is Tokenized 
> and
> stored. This way you'll get the CORRECT sort order

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to