Thanks a lot Erik for the great tip! I do need to display all the fields and allow the users to sort by each field as they wish. My index is currently about 200 mb.
Your suggestion about storing (but not index) the cased version, and indexing (but not store) the lower-case version is an excellent solution for me. Is it possible to do it in the same field or do I have to do it in 2 separate fields? If I do it in one field, what are the Lucene class/methods I need to overwrite? Thanks again for your help! Alex This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. -----Original Message----- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:24 AM To: java-user@lucene.apache.org Subject: Re: Lucene sorting case-sensitive by default? Several things: 1> do you need to display all the fields? Would just storing them lower-case work? The only time I've needed to store fields case- sensitive is when I'm showing them to the user. If the user is just searching on them, I can store them any way I want and she'll never know. 2> You might very well be surprised at how little extra it takes to index (but not store) the lower-case version. How big is your index anyway? And be warned that the size increase is not linear, so just comparing the index sizes for, say, 10 document is misleading. If your index is 10M, there's no reason at all not to store twice. If it's 10G........ 3> You could store (but not index) the cased version. You could index (but not store) the lower-case version. The total size of your index is (I believe) about the same as indexing AND storing the fields. That gives you a way to search caselessly and display case-sensitively. Best Erick On Jan 14, 2008 10:58 AM, Alex Wang <[EMAIL PROTECTED]> wrote: > Thanks everyone for your replies! Guess I did not fully understand the > meaning of "natural order" in the Lucene Java doc. > > To add another all-lower-case field for each sortable field in my index > is a little too much, since the app requires sorting on pretty much all > fields (over 100). > > Toke, you mentioned "Using a Collator works but does take a fair amount > of memory", can you please elaborate a little more on that. Thanks. > > Alex > > -----Original Message----- > From: Toke Eskildsen [mailto:[EMAIL PROTECTED] > Sent: Monday, January 14, 2008 3:13 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene sorting case-sensitive by default? > > On Fri, 2008-01-11 at 11:40 -0500, Alex Wang wrote: > > Looks like Lucene is separating upper case and lower case while > sorting. > > As Tom points out, default sorting uses natural order. It's worth noting > that this implies that default sorting does not produce usable results > as soon as you use non-ASCII characters. Using a Collator works but does > take a fair amount of memory. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]