Sounds like a good alternative. But how do I perform the search on the tokenized filed and sort on the un_tokenized one?
Thanks, Andre On Tue, Aug 5, 2008 at 12:51 PM, <[EMAIL PROTECTED]> wrote: > This is what I did and it works fine. My untokenized fields where named: > "__AMSUNTOK__" + fieldName. > Where fieldName was the name of the tokenized field. > > > Bob Hastings > Ancept Inc. > > > > > Mark Miller <[EMAIL PROTECTED]> > 08/05/2008 02:38 PM > Please respond to > java-user@lucene.apache.org > > > To > java-user@lucene.apache.org > cc > > Subject > Re: Sorting > > > > > > > Hey Andre, > > The reason the javadoc says the field should not be tokenized stems from > the issue you point out. What you want to do is possible of course, but > making the Lucene code change would complicate a process that can be > quite memory and cpu intensive on large collections. Done right, it > might make a good patch though. > > A compromise that you can make outside of the Lucene code is to index a > separate field with the same contents but untokenized. Sorting on this > field instead, Lucene will treat "North Carolina" as one token and sort > as you'd expect. The downside to this approach is that you will have to > juggle the two fields in the future. > > - Mark > > Andre Rubin wrote: >> Hi there! >> >> I'm new to Lucene, so forgive any misconceptions on my part. >> >> I created an Index and now I want to search on it based on a field. >> The field is a String field and Field.Store.YES and >> Field.Index.TOKENIZED. No problems with the search. >> >> Now, I wanted to sort the results, and according to the Sort javadoc >> the field "should not be tokenized". But I decided to try it anyway, >> and it worked. However, the results showed that the tokens were >> sorted, not the full string in the field. >> >> Just to make myself more clear, here's an example. Let's say I have >> these strings indexed: >> >> "North Carolina" >> "British Columbia" >> "Canada" >> >> Now I search (with sort) for the token 'c*' >> >> The result I get is (sorted by the token found): >> >> 1) Canada >> 2) North Carolina >> 3) British Columbia >> >> The result I wanted was (sorted by the whole String)" >> >> 1) British Columbia >> 2) Canada >> 3) North Carolina >> >> Is there a way to do this? >> >> >> Another option would be to sort the index itself, since this field is >> the only field that we'd be searching on. But I'm just guessing here, >> cause I have no idea if this is possible at all! >> >> Thanks, >> >> >> Andre >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]