On Sun, 2003-11-16 at 02:03, C Bobroff wrote: > I believe in principle, the search engines are to consider Persian and > Arabic Yeh to be the same.
They should consider it to be "weakly equivalent". That's the term. Same as they do for, say, capital "A" and small "a", or a-umlaut ("ä") and normal "a". > Yet they do not. Why? Are the tables they are consulting faulty? > Are they consulting the wrong tables? Does no one even know this is a > problem? Who handles this so they can fix the problem? Google does, for example. MSN also does, if they have a Persian search. As for contacting them, feel free to do so. I had done the same once with all the authority I could use from the Unicode Consortium, to no avail (in other words, I got no reply or action). > And what if we WANT the search engine > to distinguish between the Persian and Arabic? You provide an "option" to the engine, mentioning that it shouldn't use its equivalent tables. But can you do that with "A" and "a" in Google? > something like the first line in the Divan of Hafez: > alaa yaa ayyuhaa saaqi ader ka'san wa naawelhaa > Is that lang="fa" or lang="ar"? I agree that it's a hard question. Really depends on how you are going to write the "saaqi" part. Since it's pronounced /i:/, it should be written as dotted Yeh in the Arabic language. If you're writing it with a dotless Yeh, it should be Persian transliteration of Arabic text. Now, how do you mark an English tranliteration of Arabic text? With "en" or "ar"? Of course you'll use "en". So in that case, you should use "fa". > I wonder why you say "#1740;" instead of "U+06CC"?? :) :) He's a real person, not a computer programmer! Real people prefer decimal to hexadecimal, I guess. But AmirBehzad, it's an inconvenience to refer to Unicode characters by their decimal code. If you want to use HTML escapes, please use the in "ی" format, instead of "ی". Both are unreadable to a casual reader, but the first is readable by an specialist without using a scientific calculator. > I am slowly starting to think your idea is indeed the solution to the Yeh > and Kaf problem. I hope the more technically astute people will > also wake up and give you some feedback. The solution? The solution is of course fixing every software immediately. But I agree that AmirBehzad's is acutally a nice idea. To detect what the browser support properly (possibly using some JavaScript, browser sniffing, and other tricks) and then serve the browser what it can display. It works fine for display purposes, but there are scenarios that it not sufficent. Let's say a user is using IE5 on Win98, and he has the Persian Yeh bug. AmirBehzad's script serves her Arabic Yeh in medial and initial forms. She sees everything fine. But then, she wants to search the (already-retreived) web page using the "Find" menu on her browser (which has not implemented any such Yeh equivalence). The result: she can't find the Arabic Yeh (or the Persian one). Another alternative story: Let's say the writer of the page likes to say: "Don't use Arabic Yehs like 'ي', use Persian Yehs like 'ی'."? You'll agree that he will be scared when the software does him weird things. The best solution, is updating the software and the fonts. And nagging to the developers of the software if that doesn't fix the problem. And writing your own software if that did't work either. Or learning to write software if you don't know how. Or forgetting it all if it's not worth the effort. > (RoozBEH, are you almost done cleaning out your Inbox??) I'm doing it now. Next shot in 90 days. > Perhaps the script could also check if the win9x user has IE6 in > addition and if so, let them see Persian. I agree. > I would like to request that you make a simple webpage and post it > somewhere for newbies to copy and paste. > It would be nice if you put a > little alternating Persian and English content so people see how to switch > between the two. An exterior .CSS file that is 100% compliant with > directions for copying for one's own use would be so nice. For test > purposes, the Persian content should include some tricky things like > parentheses, diacritics (tashdid, sokun, zir, zabar, pish, etc), > zero-width-joiner, zero-width-non-joiner, heh+hamza, and something > requiring mouseovers (or some such feature requiring the browser to > calculate where the word is on the page.) After making everything as > standard and compliant as possible, also put in your script, and most > important, directions for how to copy and explanation for why it is there, > I think this would be the best. Very good recommendations. roozbeh _______________________________________________ FarsiWeb mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/farsiweb