[ https://issues.apache.org/jira/browse/LUCENE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479648 ]
Hoss Man commented on LUCENE-252: --------------------------------- I'm afraid i'm still not understanding the issue in nutch, it seems like the root of hte problem is.. > ... We use the tokenized url field in the FieldCache ... ...if you know this field is tokenized, don't use it this way. if you want to use it this way, index it a second time untokenized. At a more practical level: 1) the change you propose to getStrings and getStringIndex is not practical because as we've discussed before, a field being tokenized isn't a garuntee that FieldCache won't work -- isTokenized just inidcates that an Analyzer was used -- it doesn't indicate that any real tokenization took place (the analyzer might have just been used to lowercase the field value before indexing, or strip off leading/trailing white space) that doesn't mean the normal FieldCache can't be used for sorting. the converse is also true: !isTokenized doens't tell you that it's safe to build the FieldCache -- even if no Analyzer is ever used, multiple Field values can be added for the same field -- and that is hte root cause of hte problem, not tokenization but multiple terms for a given field. 2) the desired behavior you are requesting in a StoredFieldCacheImpl could be done without making any changes to what so ever to FieldCacheImpl -- since nutch knows exactly which fields it's indexing multiple tokens for, it can make the choice between using a StoredFieldCacheImple or using a FieldCacheImpl. (but as i've said, i really don't think that's the right solution) > [PATCH] Problem with Sort logic on tokenized fields > --------------------------------------------------- > > Key: LUCENE-252 > URL: https://issues.apache.org/jira/browse/LUCENE-252 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 1.4 > Environment: Operating System: other > Platform: All > Reporter: Aviran Mordo > Assigned To: Lucene Developers > Attachments: dif.txt, > FieldCacheImpl_Tokenized_fields_lucene_2.0.patch, > FieldCacheImpl_Tokenized_fields_lucene_2.0_v1.1.patch, > FieldCacheImpl_Tokenized_fields_lucene_2.2-dev.patch > > > When you set s SortField to a Text field which gets tokenized > FieldCacheImpl uses the term to do the sort, but then sorting is off > especially with more then one word in the field. I think it is much > more logical to sort by field's string value if the sort field is Tokenized > and > stored. This way you'll get the CORRECT sort order -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]