Hi Steven , Hi Steven, i tried the class and it works fine with the locale parameter "ar".
Actually we are using "fa" for farsi and "ar" for arabic. I have added a little control for the locale parameter in my code and now i can see the correct results. Thank you very much for ypur help. Esra. Steven A Rowe wrote: > > Hi Esra, > > I have attached a patch to LUCENE-1279 containing a new class: > CollatingRangeQuery. > > The patch also contains a test class: TestCollatingRangeQuery. One of the > test methods checks for the Farsi range you were having trouble with. > > It should be mentioned that according to Collator.getAvailableLocales(), > neither Java 1.4.2 nor Java 1.5.0 contains Farsi collation information. > However, in the test class I use the Arabic Locale, which seems to > properly collate the non-Arabic Farsi letter U+0698, and hopefully behaves > well with other Farsi letters. If you find that this is not the case, you > can look into writing collation rules using RuleBasedCollator - you should > be able to directly specify the proper letter orderings for Farsi; > CollatingRangeQuery would also have to supply a constructor that takes in > a Collator instead of a Locale. > > Please give the class a try and post back about how it works. > > Thanks, > Steve > > On 05/03/2008 at 8:33 AM, esra wrote: >> >> Hi Steven, >> >> thanks for your help.... >> >> Esra >> >> >> Steven A Rowe wrote: >> > >> > Hi Esra, >> > >> > I have created an issue for this - see >> > <https://issues.apache.org/jira/browse/LUCENE-1279>. >> > >> > I'll try to take a crack at a patch this weekend. >> > >> > Steve >> > >> > On 05/02/2008 at 12:55 PM, esra wrote: >> > > >> > > Hi Steven , >> > > >> > > yes you are right, sorry i am a bit confused. >> > > >> > > i checked again and the correct one is "zhe"/U+698. >> > > >> > > It seems the word is in the range but my customer says it >> > > shouldn't be. >> > > >> > > I think problem occurs because "zhe" is a Persian letter >> > > outside the Arabic >> > > alphabet. In farsi alphabet this letter is not after the "س" >> > > letter but it's >> > > unicode is bigger than "س" letter's and the searcher works >> > > with unicodes. >> > > >> > > Esra >> > > >> > > >> > > Steven A Rowe wrote: >> > > > >> > > > Hi Esra, >> > > > >> > > > You are *still* incorrectly referring to the glyph with three dots >> > > > over it: >> > > > >> > > > On 05/02/2008 at 12:18 PM, esra wrote: >> > > > > yes the correct one is "ژ "/"ze"/U+632. >> > > > >> > > > "ژ" is *not* "ze"/U+632 - it is "zhe"/U+698. >> > > > >> > > > Have you increased the font size? Can you see the difference >> between >> > > > these two?: >> > > > >> > > > "ژ"/"zhe"/U+698 >> > > > "ز"/"ze"/U+632 >> > > > >> > > > > my problem is when i do search for "د-ژ" range. The result is >> "ساب >> > > > > ووفر" and this word's first letter is "س" and it's unicode is >> > > > > "U+633" and it is not in the in the [ U+062F - U+0632 ] range. >> > > > >> > > > Like I keep saying, in the above description, you're using the >> glyph >> > > > "ژ"/"zhe"/U+698, while calling at the same time incorrectly >> referring >> > > > to it as "ze"/U+632. >> > > > >> > > > I don't mean to continually bang on about this - if you're *sure* >> > > > that when you search, you're using the character represented by the >> > > > glyph with one dot (and not three), i.e. "ز"/"ze"/U+632, then the >> > > > problem lies elsewhere. >> > > > >> > > > Steve >> > > > >> > > > On 05/02/2008 at 12:18 PM, esra wrote: >> > > > > yes the correct one is "ژ "/"ze"/U+632. >> > > > > >> > > > > my problem is when i do search for " د-ژ" range. The result >> > > > > is ""ساب ووفر >> > > > > " and this word's first letter is "س " and it's unicode is >> > > > > "U+633" and it >> > > > > is not in the in the [ U+062F - U+0632 ] range. >> > > > > >> > > > > am i wrong? >> > > > > >> > > > > Esra >> > > > > >> > > > > Steven A Rowe wrote: >> > > > > > >> > > > > > Hi Esra, >> > > > > > >> > > > > > I still think you're wrong :). >> > > > > > >> > > > > > On 05/02/2008 at 9:31 AM, esra wrote: >> > > > > > > > ژ = U+632 >> > > > > > >> > > > > > According to the website you linked to, the above character, >> which >> > > > > > has three dots over it, is named "zhe", and its >> Unicode code point >> > > is >> > > > > > U+698. (I had to increase the font size to see the three dots.) >> > > > > > >> > > > > > I think you are confusing "ژ"/"zhe"/U+698 with the letter >> > > > > > "ز"/"ze"/U+632, which has just one dot over it. >> > > > > > >> > > > > > Unless you were mistaken in all of your emails when >> you included >> > > the >> > > > > > character "ژ"/"zhe" instead of "ز"/"ze", then what I said in my >> > > > > > previous email still stands: there is no problem here. >> > > > > > >> > > > > > Steve >> > > > > > >> > > > > > On 05/02/2008 at 9:31 AM, esra wrote: >> > > > > > > >> > > > > > > Hi Steven, >> > > > > > > >> > > > > > > sorry i made a mistake. unicodes are like this: >> > > > > > > >> > > > > > > > د=U+62F >> > > > > > > > ژ = U+632 >> > > > > > > > and the first letter of "ساب ووفر " is س = U+633 >> > > > > > > >> > > > > > > you can also check them here >> > > > > > > > >> > > http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html >> > > > > > > >> > > > > > > Esra >> > > > > > > >> > > > > > > >> > > > > > > Steven A Rowe wrote: >> > > > > > > > >> > > > > > > > Hi Esra, >> > > > > > > > >> > > > > > > > Going back to the original problem statement, I >> see something >> > > that >> > > > > > > > looks illogical to me - please correct me if I'm wrong: >> > > > > > > > >> > > > > > > > On Apr 30, 2008, at 3:21 AM, esra wrote: >> > > > > > > > > i am using lucene's "IndexSearcher" to search >> the given xml >> > > by >> > > > > > > > > keyword which contains farsi information. >> while searching i >> > > use >> > > > > > > > > ranges like >> > > > > > > > > >> > > > > > > > > آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی >> > > > > > > > > >> > > > > > > > > when i do search for "د-ژ" range the results >> are wrong , >> > > they >> > > > > > > > > are the results of " س-ظ "range. >> > > > > > > > > >> > > > > > > > > for example when i do search for "د-ژ" one of the the >> results >> > > > > > > > > is "ساب ووفر", this result also shown on the " >> س-ظ " range's >> > > result >> > > > > > > > > list which is the corret range. >> > > > > > > > > >> > > > > > > > > As IndexSearcher use "compareTo" method and this method >> uses >> > > > > > > > > unicodes for comparing, i found the unicodes of the >> characters. >> > > > > > > > > >> > > > > > > > > د=U+62F >> > > > > > > > > ژ = U+698 >> > > > > > > > > and the first letter of "ساب ووفر " is س = U+633 >> > > > > > > > >> > > > > > > > It appears to me that *both* the "د-ژ" range [ >> > > U+062F - U+0698 ] >> > > > > and >> > > > > > > > the "س-ظ" range [ U+0633 - U+0638 ] contain the >> > > first letter of >> > > > > "ساب >> > > > > > > > ووفر", which is "س" = U+0633. >> > > > > > > > >> > > > > > > > You stated that U+0633 should be contained in the [ >> > > U+0633 - U+0638 >> > > > > ] >> > > > > > > > range - I agree - but why do you think U+0633 should not be >> > > > > > > > contained in the [ U+062F - U+0698 ] range? >> > > > > > > > >> > > > > > > > In other words, it looks to me like your problem is >> > > not a problem >> > > > > at >> > > > > > > > all. >> > > > > > > > >> > > > > > > > Steve >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > -- View this message in context: >> > > > > > > >> > > > > http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498 >> > > > > .html Sent >> > > > > > from the Lucene - Java Users mailing list archive at >> Nabble.com. >> > > > > > >> > > > > > >> > > > > > >> > > >> --------------------------------------------------------------------- >> > > > > To >> > > > > > unsubscribe, e-mail: [EMAIL PROTECTED] >> For >> > > > > > additional commands, e-mail: [EMAIL PROTECTED] >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- View this message in context: >> > > > >> http://www.nabble.com/lucene-farsi-problem-tp16977096p17022861.html >> > > > Sent from the Lucene - Java Users mailing list archive at >> > > Nabble.com. >> > > > >> > > > >> > > > >> > > >> --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > > > For additional commands, e-mail: >> [EMAIL PROTECTED] >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > -- View this message in context: >> > > >> http://www.nabble.com/lucene-farsi-problem-tp16977096p17023557 > .html Sent >> > from the Lucene - Java Users mailing list archive at Nabble.com. >> > >> > >> > --------------------------------------------------------------------- >> To >> > unsubscribe, e-mail: [EMAIL PROTECTED] For >> > additional commands, e-mail: [EMAIL PROTECTED] >> > >> > >> >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/lucene-farsi-problem-tp16977096p17034715.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > -- View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p17080852.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]