Hi Steven, sorry i made a mistake. unicodes are like this:
> د=U+62F > ژ = U+632 > and the first letter of "ساب ووفر " is س = U+633 you can also check them here :http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html Esra Steven A Rowe wrote: > > Hi Esra, > > Going back to the original problem statement, I see something that looks > illogical to me - please correct me if I'm wrong: > > On Apr 30, 2008, at 3:21 AM, esra wrote: >> i am using lucene's "IndexSearcher" to search the given xml by >> keyword which contains farsi information. >> while searching i use ranges like >> >> آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی >> >> when i do search for "د-ژ" range the results are wrong , they >> are the results of " س-ظ "range. >> >> for example when i do search for "د-ژ" one of the the results is >> "ساب ووفر", this result also shown on the " س-ظ " range's result >> list which is the corret range. >> >> As IndexSearcher use "compareTo" method and this method uses >> unicodes for comparing, i found the unicodes of the characters. >> >> د=U+62F >> ژ = U+698 >> and the first letter of "ساب ووفر " is س = U+633 > > It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and the > "س-ظ" range [ U+0633 - U+0638 ] contain the first letter of "ساب ووفر", > which is "س" = U+0633. > > You stated that U+0633 should be contained in the [ U+0633 - U+0638 ] > range - I agree - but why do you think U+0633 should not be contained in > the [ U+062F - U+0698 ] range? > > In other words, it looks to me like your problem is not a problem at all. > > Steve > > -- View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]