Hi Esra, Going back to the original problem statement, I see something that looks illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote: > i am using lucene's "IndexSearcher" to search the given xml by > keyword which contains farsi information. > while searching i use ranges like > > آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی > > when i do search for "د-ژ" range the results are wrong , they > are the results of " س-ظ "range. > > for example when i do search for "د-ژ" one of the the results is > "ساب ووفر", this result also shown on the " س-ظ " range's result > list which is the corret range. > > As IndexSearcher use "compareTo" method and this method uses > unicodes for comparing, i found the unicodes of the characters. > > د=U+62F > ژ = U+698 > and the first letter of "ساب ووفر " is س = U+633 It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and the "س-ظ" range [ U+0633 - U+0638 ] contain the first letter of "ساب ووفر", which is "س" = U+0633. You stated that U+0633 should be contained in the [ U+0633 - U+0638 ] range - I agree - but why do you think U+0633 should not be contained in the [ U+062F - U+0698 ] range? In other words, it looks to me like your problem is not a problem at all. Steve