Re: IranSystem to Unicode (UTF-8) converter

2005-12-06 Thread Ehsan Akhgari



 

  salam nemidoonam shoma in narmafzaro darin ya na 
  , age darin lotf konid baram send konid
I just wrote a PHP script to do just that a couple 
of days ago at work.  It's relatively simple, using Roozbeh Pournader's 
conversion table.  All you have to do is to read the input string byte by 
byte, and output the appropriate UTF-8 codes in reverse order.  The only 
gotcha I faced was if there are latin characters (or numbers) in the middle of 
the text, they should not be reversed.  This is caused by the way 
IranSystem encodes strings.
 
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-28 Thread Ehsan Akhgari



 

  
  Dear Ehsan,You suggested a creative solution. Thank you.My 
  application, consists of a database, and two user-interfaces.The first 
  UI is used for data entry,where I parse a given XML file, extract and 
  "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and 
  then insert them into DB.Luckily, PHP provides a very fast function 
  forsuch conversions, named strtr().Now I have a "Roman 
  DB".The second UI is used for data retrieval (searching),where I 
  "Romanize" the given search argument,and look for it trough the DB 
  records. The results will bedecoded and converted to Persian, before 
  sending to stdout.
I've actually implemented this approach in a 
project.  I have not yet published the code, but if you want, I can make it 
available under the GPL.
 
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-24 Thread Ehsan Akhgari



 

  One solution would be to augment a DB capabilityat the application 
  level. That is instead of the searchor select qualified by a SQL where 
  clause, simply geteverything (select *) and then let the application 
  filterwhat you want. Then when your given DB providesthat operation by 
  itself, simplify your applicationand deligate that to DB (Query Engine). 
  
Another solution is make the db believe your text 
is English.  This could be done by "romanizing" the text before inserting 
it to the db, and converting it back to Unicode after reading it from the db and 
before displaying it to the user.  This can be done by choosing a Roman 
letter for each Persian letter, and reading Persian characters one by one and 
looking them up in a conversion table and writing the equivalent Roman 
characters to the output.  However, this has the downside that IIRC MySQL's 
full-text search is case-insensitive, and if I'm right in that you'd have to 
choose Roman characters all from one case (upper or lower.)  In addition to 
that, the data stored in the db might be difficult/impossible to use without 
such a conversion.  It's you who should judge the tradeoffs before choosing 
to use this method or not.
 
For some good romanizing scripts, check out http://home.byu.net/jmd56/download.html.
 
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: problem in myql data display

2005-04-13 Thread Ehsan Akhgari
Sadeq Naqashzade wrote:
Salaam,
One of my frinds have same problem (but I have not) I'm using mysqli
and he using mysql extention. Try mysqli this may help you.
- Sadeq

Thanks, but I wasn't the one who asked the question!  I'm CCing the OP 
as well as the list.

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: problem in myql data display

2005-04-13 Thread Ehsan Akhgari
mzz wrote:
hi every one i have a problem in mysql data base
is that 
when i reveiw my table cotained data in PhpMyAdmin in
persian i can see and edit data correctly but when i
use 
my script to query my tables using PHP it display my
table data as a '?' (question marks)
i am using 
mysql server 4.1;
php4.xx and utf-8 encoding in my pages.
OS:Win2000 server.
Regards 
zarbizade.
Can you dump the table into a file from the PHP script and then make 
sure the data in the file is correct (and in UTF-8 encoding)?

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Number display in Firefox

2005-03-27 Thread Ehsan Akhgari
Hi all,
I just found something cool in Firefox which I had not come across 
before, and thought some of you guys might not know it as well.  As far 
as I can tell this is related to Gecko, so it must affect all Mozilla 
based applications, though I have not tested it anywhere except Firefox 1.0.

The default rendering behavior for numbers appearing inside Persian text 
in Mozilla is to show them as Latin digits (1 2 3 ...), though in IE it 
depends on the context (whether the direction of the containing text is 
rtl or ltr.)  To make Firefox respect the direction of the text in this 
regard, you can add the following line to your user.js file:

user_pref("bidi.numeral", 1);
which sets the number rendering mode to "context."  This enables ASCII 
digits entered inside Persian text to be rendered as Persian numbers (Û 
Û Û ...)  Of course this does not affect the behavior of rendering 
numbers explicitly entered using Unicode character codes.

FWIW,
Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: A new Persian Unicode keyboard

2005-02-10 Thread Ehsan Akhgari



 

  
  The problem, as some of you 
  might have guessed, is the direction switching. Given an application like MS 
  Word, my keyboard correctly sends the characters, and Word gives them the 
  right form. But sometimes some characters (mainly the “shared” chars), and 
  often the blinking caret appear on the wrong side of the line. 
  
   
  What can be done to make the 
  shared characters (Like “!”) to appear on the correct side? The caret problem 
  can be fixed with Word’s RTL command. But mixing English and Persian letters 
  in the same line often leads to unpredictable outcomes. 
The rule of the thumb 
is, use RTL paragraphs when writing Persian text (which might contain English 
text within it) and use LTR when writing English text (which might contain 
Persian text within it.)

  
  Is there an 
  algorithm governing these situations that I can use to modify the output to 
  remedy this?  
There is an algorithm called Unicode 
BiDirectional Algorithm, the details of which is avaibale on Unicode.org.  
As you might have guessed, Word doesn't provide a correct implementation of this 
algorithm (nor do any other text editors that I know of to this date.)  
There's a library being developed called FriBidi, of which Behdad is the project 
maintainer, IIRC, which might help you, but not with Word probably.  I 
guess Behdad would be able to make profound comments on this.

-Ehsan Akhgari


www.farda-tech.comList Owner: MSVC@BeginThread.com
[Email: [EMAIL PROTECTED]][WWW: http://www.beginthread.com/Ehsan 
]
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian in Windows Applications

2005-02-02 Thread Ehsan Akhgari



 

  
  I'm going to program and develop a windows 
  application
  and I want to use Persian in user 
  interface.
  I'm using Windows XP and uni-code in programming 
  language.
  But is there any trick or rule to make 
  application working fine in
  older windows? (98, ME)
  Or just using uni-code makes anything 
  fine?
Win9x does 
not support Unicode internally.  M$ has developed the so-called MSLU[1] 
which provides Unicode compatibility at the Windows API level for Win9x.  I 
have used it, and it indeed works, but be warned that these OSes do *not* 
support Unicode anyway, and all MSLU can do is implement API stubs for Unicode 
versions Win32 functions (such as, CreateFileW) which would allow you to build 
your app in Unicode mode in Visual C++.
 
What I've 
ended up doing in the past is do all the UI as HTML, and embed a HTML rendering 
engine in my app.  I've used the WebBrowser control (the same control used 
by IE).  This requires you to distribute a customized[2] version of IE with 
your own app which has "Arabic" support built-in, and write some amount of 
_javascript_ code to enable the user to type Persian in your application even if 
they don't have a Persian keyboard installed (you can find several JS codes as 
starters on the web for this purpose.)  You can also use Gecko, which is 
Mozilla's great HTML rendering engine as well.  If you decide to use the 
WebBrowser control, check out http://www.beginthread.com/Article/Ehsan/WebBrowser%20Goodies/ 
for some articles about possible customizations of the control that you may be 
needing in your own applications.
 
All of this, 
of course, applies to Visual C++.  If you use some other programming tool, 
then you'll have to research on your own, though I think that few support 
MSLU.
 
[1] You can 
download it from http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm.
[2] You can 
deploy a customized IE install using the IE Administration Kit (IEAK.) 


-Ehsan Akhgari


www.farda-tech.comList Owner: MSVC@BeginThread.com
[Email: [EMAIL PROTECTED]][WWW: http://www.beginthread.com/Ehsan 
]
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Frasi in MS Powerpoint

2005-02-01 Thread Ehsan Akhgari
> Hi
>
> I would like to write farsi in microsoft powerpoint for presentation
> purposes. Would it be possible at all? If yes, how this can be done?
> What alternatives are available.
>
> I appreciate your help.

It is possible.  You simply should switch to a Persian keyboard and type
your text.  I seem to remember that some versions of MS Powerpoint did not
support right-to-left text properly (I don't remember exactly what the
problem was).

A very good alternative to MS Powerpoint is the OpenOffice.org
(www.openoffice.org) version 1.1.3.  I have used it to create Persian
presentations with no problems.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: MSVC@BeginThread.com

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: openoffice & zwnj

2005-01-04 Thread Ehsan Akhgari
> That's a famous bug that will happen in applications. KDE also had
> that bug for quite a time until Behdad fixed it. The bug is because
> the application or the rendering engine asks the font for a glyph for
> the character, where it shouldn't.
> The application or the rendering engine should not pass ZWNJ (and a
> few other "invisible" Unicode characters) down.

Great to know it's been fixed.  Do you exactly know the fix is included
since which version of the KDE?  I've noticed that this bug seriously
affects the usability of KDE for Persian computing.

Thanks,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: MSVC@BeginThread.com

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Parsnegar to Unicode conversion AND phonetic Farsi keyboardwithEnglish keyboard

2004-12-18 Thread Ehsan Akhgari
Mr. Khazaee misdirected the email to me personally.  I thought I'd send it
to the whole list.

> -Original Message-
> From: khazaee [mailto:[EMAIL PROTECTED]
> Sent: 2004/12/18 10:27 Þ.Ù
> To: Ehsan Akhgari
> Subject: RE: Parsnegar to Unicode conversion AND phonetic
> Farsi keyboardwithEnglish keyboard
>
>
> You want to define a user-defined keyboard for linux
> operating system or not?
> for linux operating system you can refer to persian keyboard
> on farsilinux.org.
> you can change the position of persian letter in your keyboard easily.
> regards.
> -- Original Message ------
> From: "Ehsan Akhgari" <[EMAIL PROTECTED]>
> Date:  Fri, 17 Dec 2004 22:50:03 +0330
>
> >
> >
> >Also, I was wondering if anyone knows a way of defining a
> user-defined
> >keyboard to use with Farsi Unicode, similar to Parsnegar
> which allows
> >to define a phonetic Farsi keyboard with English keyboards, so that,
> >when typing in Microsoft word in Farsi, I could use key "J"
> for letter "jim", "A"
> >for letter "alef", etc.
> >
> >You need your custom keyboard layout.  M$ has a tool for that:
> >Microsoft Keyboard Layout Creator.  You can use it to create
> your fully
> >(well, nearly
> >fully) customized keyboard layout for Windows.
> >
> >-
> >Ehsan Akhgari
> >
> >www.farda-tech.com <http://www.farda-tech.com/> List Owner:
> ><mailto:[EMAIL PROTECTED]>
> >[EMAIL PROTECTED]
> >
> >[Email: [EMAIL PROTECTED]
> >[WWW: http://www.beginthread.com/Ehsan ]
> >
> >
> >
> >
> >
> >
>
>
>



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Parsnegar to Unicode conversion AND phonetic Farsi keyboard withEnglish keyboard

2004-12-17 Thread Ehsan Akhgari



 

  
  Also, I was wondering if anyone 
  knows a way of defining a user-defined keyboard to use with Farsi Unicode, 
  similar to Parsnegar which allows to define a phonetic Farsi keyboard with 
  English keyboards, so that, when typing in Microsoft word in Farsi, I could 
  use key “J” for letter “jim”, “A” for letter “alef”, etc. 
You need your custom keyboard layout.  M$ 
has a tool for that: Microsoft Keyboard Layout 
Creator.  You can use it to create your fully 
(well, nearly fully) customized keyboard layout for Windows.

-Ehsan Akhgari


www.farda-tech.comList Owner: [EMAIL PROTECTED]
[Email: [EMAIL PROTECTED]][WWW: http://www.beginthread.com/Ehsan 
]
___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-12-01 Thread Ehsan Akhgari
> Roozbeh, it is a long time and I don't remember your answer to this
> email. What happened to this new dll?

AFAIK, it's not still put in the sourceforge.  If you're interested, I can
mail it to you off-list.

-----
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-11-01 Thread Ehsan Akhgari
> Humm, would you check http://farsitex.org/?  I think it worked in IE
> when I designed it.

Done.  It looks pretty well, only the non-link items in the left hand menu
might not be much readable (or it might be my lack of perfect sight.)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

Light without eyes illuminates nothing.



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-11-01 Thread Ehsan Akhgari
> Ah, that's a good sign, that none of us at FarsiWeb uses IE anymore!
> BTW, IIRC, 8bit transparent PNG works in IE too.

I'm not sure.  What I can say for sure is the image won't render correctly
in IE.  Hmm, BTW, at a second look, IE fails to render the layout correctly
as well!  Of course that's not as bad as how the background image looks.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-10-31 Thread Ehsan Akhgari
> Hi friends,
>
> The FarsiWeb Project's website <http://farsiweb.info/> is now
> up-to-date with a new Wiki system.

Congrats on the new site!

I took a quick look, and I have a comment regarding the design.  It seems to
me that you're using a transparent PNG file as the background for the pages.
IE doesn't support this feature of PNG files correctly, so the pages render
half unreadable on IE.  I suggest changing this, and the easiest way would
be not to use a transparent PNG (no need for that, anyway - just let the
background be white.)  Fortunately real browsers (Firefox, and Mozilla) do
render it pretty fine!

Other than that, the layout seems very nice.  Thanks for your efforts.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Vi/Emacs editor with RTL support

2004-09-01 Thread Ehsan Akhgari
> Not anything really useful.  Vim has a rightleft mode (:set
> rightleft), which is useful for ONLY RIGHT-TO-LEFT text.
>
> Emacs, it's worse:  there's an emacs-unicode branch, an
> emacs-bidi branch, and the emacs-head branch.  They are
> trying to merge the three of them for a few years now!

Thanks for your reply, Behdad.

So, is there any editor you would recommend that has good support for
bidirectional (Persian and English) text, and preferrably supporting HTML
(but an editor without HTML support will also be just fine)?  The latest one
I'm working with is Bluefish, but it has some minor problems, and I'm
looking to see if there's something better available.

TIA,
-
Ehsan Akhgari

Learn Linux in Persian: http://www.persian-linux.org/



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Vi/Emacs editor with RTL support

2004-08-31 Thread Ehsan Akhgari
Hi all,

Sorry if this question is too basic.  Is anyone aware of a version of the vi
editor (preferrably) or Emacs which have support for right-to-left
languages, including Persian?  If they already support this, should I do
anything special to turn RTL support on in those applications?

Thanks in advance,
-
Ehsan Akhgari

Learn Linux in Persian: http://www.persian-linux.org/



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Linux teaching website

2004-08-07 Thread Ehsan Akhgari
> BTW Ehsan, I consider this off-topic. This is about Persian support in
> software and computers, software written to handle Persian text, etc.
> This is not a list to gather volunteers for a website that happens to
> be about an operating system and in Persian.
>
> Not that I'm not personally interested, but only that it is off-topic.

Oh, I'm sorry for posting off-topic to the list.  I'll try not to do so
again.  :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian translation of GNOME

2004-08-06 Thread Ehsan Akhgari
> > I've got to give them both a test, and if I don't like them, I'll
> > write my own tools.  :-)
>
> That's what is considered reinventing a wheel ;-).  You can just get
> on and improve gtranslator.

Sure, that's why I added the "if I don't like them" condition, which,
apparently, is not the case!

> I prefer you start right away too. ROOZBEH, hello, wake up...

:-)

> It's supposed to attract GNOME-lovers.  The problem is that I can't
> find any time to fire it up...  Perhaps after FarsiWeb set up its wiki
> system.

I've heard about FarsiWeb's wiki for quite a while.  What makes starting it
up so difficult?  Anything I can help with?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian translation of GNOME

2004-08-04 Thread Ehsan Akhgari
> There are a couple tools to help translation.  KBabel is the one from
> KDE project, and there's a gtranslator for more GNOMEi look.

I've got to give them both a test, and if I don't like them, I'll write my
own tools.  :-)

> I remember Roozbeh was preparing a guide for Persian GNOME
> translators.  There is also a list for that that Roozbeh will
> subscribe you eventually.  The translation process is definitely not
> as easy as it is for a left-to-right language.
>  Also we are a bit picky about words to use, want to conform to the
> Persian Academy translations and other sources...  But help is
> definitely welcome.

I suppose I'll receive a list of such words with the approved translations,
isn't it?  I personally have a low opinion about most of those "translated"
words that the Persian Academy has assigned (I'll *never* call computers
"Raayaaneh"!) but some of them sound meaningful, and anyway I'm not here to
enforce my personal preferences, but to help!

> Roozbeh is a bit busier than before these days.  If you didn't gety
> ANY feedback on these, come to in September again and I will use my
> privileges :-).

Fine - although I'd prefer to start right away, since the occasions in which
I have spare time are pretty scarce, and I'd like to use them well.

> Since you are in Iran now, you may also want to join gnome-ir-list on
> http://lists.gnome.org/ and help starting GNOME enthusiasm in Iran;
> this great desktop has been left in cold there...

I did.  Hmm, the list doesn't seem to want to attract many people, does it?
I had to type the URL by hand, and if it were not because of my personal
experience with Mailman, I would have never found its subscription page!
Maybe you'd like to make the list more visible...

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Linux teaching website

2004-08-04 Thread Ehsan Akhgari
Hi all,

Is there any interest for a Persian website dedicated to teach Linux from
the ground up?  I've been spending some time looking for Linux teaching
websites on the net, and I've found a number of them.  Most of them have
only contained a handful of Linux related tips, and there are a few which
attempt in actually teaching Linux, but they don't have a good teaching
program for getting beginners started -- All they provide is a teaching
guide for a certain application or aspect of the system.  And there are
several which are mostly dedicated to Linux discussions/news, which don't
fall in this category.

Now, what I have in mind is this.  As a Persian user, one needs a Persian
teaching resource which does not assume previous experience at all, and
starts teaching Linux from the ground up; in a way that they can follow from
Lesson 1 upward to start learning Linux.  And the whole teaching material
will be free, both as in freedom and as in free "maa-oshaeer".  :-)

Do you guys think this is a good idea?  Do you have any idea about things to
add, or exclude, maybe?

I also need help if anyone is willing/able to give.  I'm going to write up
"Linux from command line" lesssons myself, which start from ls/cd commands
up to more advanced command line tricks and shell programming methods, and
then I might consider writiing about a graphical desktop, an application (or
an app suite), or a specific task (like networking with Linux, for example.)
But I think it would be very nice if several parallel topics can be started
simultaneously.  But I don't have enough time for that myself, so I need
help.  If anyone is able to write about such a topic from the ground up and
on a lesson by lesson basis, I'd be grateful to have their help.  Also, if
anyone is able to write Linux tips & tricks, then that would be nice as
well.  Also, we can open up forums if some of you guys do the favor of
answering questions there (since I won't have enough time...)

In case anyone decides to join, I think I would use MovableType as the
publishing system, so it would be easy for anyone to get started writing
articles.

Ideas/questions/comments/suggestions?

Thanks in advance!
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Persian translation of GNOME

2004-08-04 Thread Ehsan Akhgari
I'd like to help in translating the GNOME 2.8 po files.  I noticed that
Roozbeh is the leader of the Persian translation team.  I'd like to know how
I can contribute.  Should I send patches to Roozbeh himself, or do something
else?  Also, are there any tools which can help in the translation (instead
of manually editing the po files)?

Thanks!

-----
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> [Ehsan, you just replied to me.  Answering on list.]

My bad.  Sorry, I meant to reply to the list.

> Well, you may wish to read a couple documents.  Read Unicode Collation
> Algorithm for example.  Just read the intro or something like that.
> The point is that Persian Collation is only an small table feed to the
> Unicode Collation Algorithm.
> So yes, there is a free Persian collation implementation, Glibc +
> fa_IR locale.

Good point, thanks.  I'll investigate it.

> What you have seen is the binary encoded table.  The source is in the
> fa_IR locale source file.

Thanks, I'll try Googling for it.

> Guys, both of you, if you don't have Glib,

You mean glibc, right?

> and your system
> does not provide what you need, you:
>
> * Either forget about Persian Collation, or
> * Implement your own minimal collation, or

That's what I have in mind, currently.

> * Consider using something like Glibc or uClibc with Persian
>   locale as a library.  Not sure how uClibc deals with Persian
>   locale.

Thanks again,

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> Right. I was thinking about adding UTF-8 Persian collation to MySql
> 4.1.x
> - our project will involve a fairly large amount of data, so we'd like
> to have the option of sorting at the DB level.

I've never tested MySQL 4.1.x.  Have you tried it?  How is the UTF-8
support?  Have you tried Persian collation in MySQL 4.1.x to see how much
better it's compared to 4.0.x?

Unfortunately I won't be willing to look into 4.1.x at this time, since it's
Beta, and we don't use Beta products on our productions servers, so doing so
will do no good to my project.

> ... which is why we're hoping to use MySql 4.1.x

I'd give it a try if I were in your shoes.

> Nope, no Persian collation file for MySql 4.1.x as far as I can see
> (which is where we came in!)

How does 4.1.x get Persian sorting?  Like 4.0.x?


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> That might work for Ehsan, but it sadly wouldn't save much effort for
> us since PHP doesn't do Persian UTF-8 collation (that I've been able
> to get working anyway), or provide access to strxfrm()
>
> :-(
>
> - which is why MySql seemed the least bad option.

Hmmm, if you've compiled PHP with glibc, I suppose you could simply do the
following (code not tested):



And yes, PHP doesn't provide access to strxfrm, but I think it's trivial to
write a PHP extension which provides that function.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-03 Thread Ehsan Akhgari
> Ehsan - are you thinking about adding glibc collation to the
> strings/ctype-MYSET.c file? Or something more fundemental?

Well, to tell you the truth, I'm not really sure, since I've not checked the
MySQL source tree yet.  But yes, I'm going to see if glibc support can be
incorporated into MySQL's charset handling mechanism.

> I think you and the team I'm working with are trying to do
> the same thing - it would be great if we could work together
> and come up with a solution that anyone else can use too.

I looked around a bit, and it seems like MySQL 4.1.x will be supporting
UTF-8.  MySQL 4.0.x doesn't have that support (the version I'm using on the
production server is 4.0.18-standard.)  Because of that, incorporating that
support into MySQL might require a lot more work that I currently imagine.
Unfortunately in that case, I'll have to leave MySQL as it is, and sort the
data at the client site (less efficient, but requiring less development
time), and since the application I'm working on doesn't store very big
chunks of data in the db, I may decide to sacrifice performance for
development time.

> What's involved in creating a collation file? These two pages:
> http://dev.mysql.com/doc/mysql/en/Adding_character_set.html
> http://dev.mysql.com/doc/mysql/en/Character_arrays.html
> http://dev.mysql.com/doc/mysql/en/String_collating.html
> seem to say that's it's not too difficult, if you know what
> you're doing?
> (Which I dont. I'm just a humble PHP programmer)

Well, that seems to be for single-byte code pages.  The Persian character
coding system used in glibc is UTF-8, and that will require patching MySQL
source code.  And like I said, because of MySQL's lack of UTF-8 support, it
might require more work that I imagine.  I think I can handle it from
technical point of view (I'm good at C/C++) but I'm quite pressed in free
time...

> ... it seems it would be great to create a mySql Persian
> collation file rather than changing the source, with all the
> problems that would lead to of having to re-patch the code
> everytime there's a new MySql release? Or is that inevitable?

Well, if we decide to change the MySQL source code, we can submit our
patches to MySQL team, and hopefully they will incorporate it into their new
releases.  Of course in that case we might have to look into adding that
support to MySQL 4.1.x as well (if it already doesn't have.)  So there's no
need for re-patching.  There's just a need for time!  :-)

In case I decide not to spend the time in the development of Persian
collation support in MySQL, I'll be glad to help your team in case they need
technical programming help.  In that case, I'll let you know off-list
(remind me if you don't get any note from me within a week, please.)


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-03 Thread Ehsan Akhgari
> It's not any easy to do what you are saying here, unless you
> make sure you ALWAYS run your mysql under the same (fa_IR)
> locale, and that the locale data does not change.  Any Glibc
> version >= 2.2 should be Ok.

I think I'll give it a try anyway; but I'm wonderring how useful it is,
considering the fact that MySQL 4.1.x (currently Beta) will be UTF-8
enabled...


Anyway, thanks for your comments a lot.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-02 Thread Ehsan Akhgari
> For proper sorting using Glibc, it's not enough that the
> application use Glibc, but it should call the sorting
> function of Glibc too! (which apparently MySql does not).

Right.

I'd like to spend some time trying to patch MySQL sources to use glibc
collation functions before I give up and sort the data at the client side.
Would you mind letting me know which version of glibc I should be using?
Also, is there any resource/documentation/how-to available which can guide
me in this job?

Thanks!

-----
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-02 Thread Ehsan Akhgari
> You can do proper Persian sorting using either glibc
> (available in all GNU/Linux distributions), or ICU (available
> from http://oss.software.ibm.com/icu/).

I have tested both MySQL 4.0.15 on WinXP and the default MySQL which comes
with Fedora Core 1, and neither could handle Persian sorting correctly.
They both seemed to start sorting from letter "FEH" to "YEH" and then
picking up "CHEH", "ZHEH", "GEH" and "PEH", and then starting from "ALEF" to
"GHEIN".

It's possible that the Windows version has not been compiled with glibc, but
the Linux version is most likely compiled with glibc, I think.

Do I need to compile MySQL manually?  If so, is any particular version of
glibc required, or do I need to specify any particular compilation options?

Thanks in advance,

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian-English Dictionary -- Was: Iranian Mac User group

2004-06-08 Thread Ehsan Akhgari
> > I volunteer to implement a web interface for the dictionary,
> Excellent!
> You'll have to make it so that whether the user types in bi[ZWNJ]kaar,
> bikaar, or bi kaar, the word will be found!

Yes, that's right.  This is relatively easy to implement.

> >  but I think we'll need other
> > people's help as well, because I would guess the whole data
> would be *huge*.
> Will this require separate dedicated server(s)?
> (I'm thinking about Behdad and the Persian Digital Library here...)

Hmmm, not necessarily *dedicated*.  As long as there's enough web space for
some part of the data to reside on the server, and I have access to it to
install an application which processes the queries locally, it doesn't
really have to be dedicated, unless the server's already fully loaded by
other tasks.  I don't think we'll need dedicated servers for this job.  The
process of searching can be done fast enough.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-06-08 Thread Ehsan Akhgari
> I would appreciate if you send me the exact process you used and the
> DLL, so we can publish it on the FarsiWeb website on SourceForge.

OK.  I send the step-by-step process on the list, and will send you the
relevant files off-list, so that you can put them on sourceforge.

Here are the steps I took to accomplish the job:

1.  After installing the Microsoft Keyboard Layout Creator (MSKLC) tool, I
inspected its install directory, and figured that it's being shipped with a
version of the MS C/C++ compiler (cl.exe) in the directory: C:\Program
Files\Microsoft Keyboard Layout Creator\bin\i386.  This assured me that the
tool creates a C source file, and feeds that to the compiler to create the
layout DLL.

Now, I needed to know the location of the generated source file, and also
the command prompt parameters passed to the compiler.

2.  To get the command prompt options passed to the compiler, I wrote a
simple application which appends its command line arguments to a log.txt
file.  This application is called shim.cpp, and is shipped  in the src
package inside the shim directory.  It can simply be compiled to shim.exe
using the command "cl shim.cpp".

3.  Now, I moved all of the .exe files in the C:\Program Files\Microsoft
Keyboard Layout Creator\bin\i386 directory, and copied shim.exe under all of
the moved files' names.  So, now I had a cl.exe, rc.exe, link.exe, etc. in
that directory which were all actually the shim.exe program.  This enabled
me to figure the command prompt options passed to the compiler tools from
the MSKLC tool so that I could immitate them manually.

4.  I opened MSKLC, and selected File | Load Existing Keyboard menu item to
load the "Persian experimental standard" keyboard (version 1.0.3.13) that I
had already grabbed from sf.net repository.

5.  I selected the Project | Build DLL and Setup Package menu item to build
the DLL.  The tool invoked my shim tool instead of all of the compiler's
tools (see Step 3 above.)

6.  I created the directory C:\Program Files\Microsoft Keyboard Layout
Creator\hack, and created a build.bat file there, which would execute the
compiler's tools with the command prompts passed by MSKLC to it.

7.  I copied the keyboard layout source files generated by MSKLC from the
temporary directory to the hack directory as well.

8.  I edited Persian.c, to change the shift state code for the Space key
from ' ' to 0x200C.  The patched line is line 268 in the original file
copied from the temp directory.

9.  I edited Persian.rc to change the version number from 1.0.3.13 to
1.0.3.14 so that I could tell my modified Persian.dll version from the
original FarsiWeb one.

10.  I ran build.bat, and voila!  The Persian.dll version 1.0.3.14 got
built.  Then I just had to replace it with the version 1.0.3.13 DLL from the
original FarsiWeb package.  The installer didn't need any change.  Now, I
just ran the installer to uninstall the old version, and install the new
version, and I had my keyboard working with Shift+Space.

I'm sending to Roozbeh two files: Persian-src-1_0_3_14.zip which contains
the modified source files, and Persian-1_0_3_14.zip which contains the DLL
plus the installer, which I guess he'd make available through the
sourceforge.

I'm open for questions/comments.  Please don't hesitate if you have any.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Misinformation!

2004-06-04 Thread Ehsan Akhgari
> There's a difference in the case of C++ standard and web
> standards:  Writing non-standard C++ code only produces compile-time
> problems, but if you happen to compile the code, it works correctly
> (or supposed to do so).

Well, that's not exactly so.  Some non-conformant behavior tend to generate
(maybe subtle) runtime behavior differences.  But I see what your point here
is.

> But it's quite a different case in web.
> 30-40 percent is low enough to get ignored, counting that the other
> way you are sacrificing the other 60-70% for not being able to find
> the document by searching in Google.  And note that even with Win9x
> and a recent IE, and updated fonts, there's no problem.

I'd definitely do so if the Google search problem couldn't be solved.  But
I've been using a method I've mentioned in my other post to solve that
problem as well.  This was the best way of having the best of the two worlds
that I could think of, but I'm wide open for suggestions/improvements to
this idea.

> About using HTML entities, no matter what the encoding of the page is,
> HTML entities generate Unicode characters.

They do on most browsers, but browsers are not required to do so.  Consider
a browser which can't handle UTF-8 (well, or at all).

> It's quite common to see
> people exporting Persian documents in MS Word, and get an HTML page
> encoded in MS Arabic encoding, with Persian Yeh and Keh encoded in
> HTML entities.

Yes, and that will make their document even more difficult for search
engines to index.  And of course, I'd debate that using CP1256/ISO-8859-6 is
not suitable for Persian documents, but that's another story perhaps.

> PS.  BTW, I just found that using Harakat (kasre, fathe, ...) also
> prevent a hit in Google search :(.  That's quite expected, but perhaps
> I should reconsider my habbit of putting those tiny marks everywhere.

That's another sad fact.  I really think that Google must seriously consider
implementing some such details on their indexing process.  That's also one
of the things that AriaSearch.com handles.

---

Hmmm, now that we're here, how about gathering some volunteers who can work
with Google to fix some of these problems?  In the past, I've contacted
Google on a number of occassions about small problems in their services, and
they seemed quite willing to fix them.  Maybe we would hopefully have a more
Persian-friendly Google in the future this way.

If you feel that this is a good idea, I'd be pleased to take part in that
team.  Comments?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Misinformation!

2004-06-04 Thread Ehsan Akhgari
> Here is a solution (in fact a hack) that if implemented correctly, can
> resolve some of the issues till people and Google start using correct
> software:
>
> With a little tweaking, the web servers can translate the correct
> Unicode to the incorrect unicode desired so much by the Win9X users.
> That is, the web severs looks at the browser request, and if it can
> detect Win9X, translates all U+06CC's in the document to U+064A (and
> all other required translations). The same technique could be used to
> fool google into generating correct search results. That, is the web
> server generates a Win9X friendly version of the document and appends
> it to the original document. You can also allocate tags that the user
> of the web server can disable or enable some of these features. This
> may even make one gain some advatnage over other web hosting
> companies.

That solves half of the problem.  On Win9x, the key d on the keyboard
inserts an Arabic YEH, and on Win2K+, it inserts FARSI YEH.  So, if you use
this method, when a user types in a word containing yeh in the google's
search box on Win9x, they wouldn't find your site.

The best hack (or solution, as one might call it) I've found for this is
feeding a version of page too Google which contains both forms of words
(using YEH and FARSI YEH) so that the chances of google finding your page
for a certain keyword gets maximized.  Of course, certain measures must be
taken to prevent bad results, for example, the proximity of the words must
not get touched.  Nevertheless, this will cause other problems, such as
malformed keyword density, which cannot be solved reliably.  The problem
must be fixed in the search engine code, really, and such hacks have their
own downsides.  The search engine project I've been working on
 handles this (and the ARABIC KEHEH and FARSI KEH
problem) among other problems for searching in Persian text.

> Of course, the solution above is only a transient one, and it is up to
> people to upgrade their Win9X machines to something that is
> Unicode-compliant, also it is up to Google to program their systems
> such that it can understand that both U+06CC and U+064A are the same
> shape and hence should be regarded the same for searching unless user
> requests otherwise. This is the same as case-insensitive search that
> is usually implemented by mapping all upper and lower case characters
> -- in documents and queries alike -- to uppercase.

Yeah that's right.  Of course great attention must be paid so that it
doesn't break Arabic search results.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

He who sees the abyss, but with eagle's eyes - he who with eagle's talons
grasps the abyss: he has courage.
-Thus Spoke Zarathustra, F. W. Nietzsche



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Misinformation!

2004-06-04 Thread Ehsan Akhgari
> Unfortunately this kind of misinforming is quite popular in weblogs,
> where people only care about being visible to more people.

I confess that I'm one of those who use this technique on their web sites.
I don't believe it's correct, and I don't think of it even as a semi-elegant
solution.  It's a solution which just works on the largest number of
platforms.  By inspecting the web server logs, I notice that still an
average of 30-40 percent of the visitors are using Win9x.  Hopefully one can
start dropping support for Win9x users as their number is constantly
decreasing, but right now if I choose the standards compliant route of using
FARSI YEH everywhere, those Win9x-ers will not be able to browse my sites.

I have a high respect and tendency to the standards.  I'm mostly a C++
programmer, and I'm one of those "preachers" of the C++ Standard.  However,
today's C++ compilers are still not fully compliant to the C++ Standard, so
whenever someone asks me for advice on how to accomplish a certain task on a
non-conformant compiler, I show them the non-standards way, and also mention
the standards way, so that they know what the *right* way is, and also what
the way to do their job right now is.  I see little difference in the web
standards land as well.

Of course this 'solution' (if it can be called so) poses other problems,
such as the inability of correctly indexing of such words with both forms of
YEH by search engine spiders such as Google's, which must be addressed
separately.  Also, if you choose to use the FARSI YEH form everywhere, then
again such problems will occur (such as a Win9x-er can neither correctly see
your pages nor fine them in Google; if they query for a word containing
YEH.)

> They even go on and use HTML entities (like ٚ) instead of UTF-8,
> just because if the user's browser is set to something other than auto
> and UTF-8, the page is still rendered correctly...

This one is silly, and I don't see how this can solve any problem.  The
browsers are required to be able to correctly resolve such numerical
entities only if the page's encoding is already UTF-8, and if it is so, why
not use UTF-8 encoded characters in the first place?  Also, some agents have
difficulties interpreting such numerical forms.  Furthermore, maintaining
them is impossible (not hard), and even they can't be treated as text by
most software packages (for example, they can't be searched for by many
programs.)  And the last, but not least, for a regular Persian document,
they're likely to increase the document size by more than two times.

They have their own usage, of course, but I don't see any sense in using
them instead of UTF-8 characters for regular web pages.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian-English Dictionary -- Was: Iranian Mac User group

2004-06-04 Thread Ehsan Akhgari
[snip]
> I'm sure this dictionary must have been funded by the Iranian
> government and no profits expected. I'm shocked to see that less than
> a dozen US universities have purchased it. I should think the author
> and publisher would be very happy to see it put online and all the
> efforts go to some use.  Surely they will agree if their name is kept
> with the data!  As for the technical part, I no longer have any doubts
> as to the abilities of the members of this group, especially after
> hearing the keyboard hack job for the sake of the ZWNJ earlier today!

:-)

I did the keyboard job just because I thought it's a lot easier to use
Shift+Space instead of Shift+B, and also because I was in the process of
typing in a lot of Persian data.  It took only about half an hour (not the
time to download the MSKLC tool of course) and improved my typing speed
considerably.

About your proposal, I'm personally interested in doing the technical part
of the job.  I volunteer to implement a web interface for the dictionary,
and I can also provide the hosting for the web interface.  I can provide
some amount of web space for the data as well, but I think we'll need other
people's help as well, because I would guess the whole data would be *huge*.
If the data has to reside on multiple web servers, I can code some sort of
distributed query mechanism which transparently fetches the definitions for
remote web servers and display them to the end user transparently.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-06-03 Thread Ehsan Akhgari
> There is no C/C++ source file. The source is a data file that MSKLC
> compiles into the DLL. If the data file contains ZWNJ on shift-space,
> it fails to compile. Microsoft developers confirmed that this is a
> bug.

Well, I did a little bit investigation on this.

I downloaded the MSKLC (MS Keyboard Layout Creator) tool, and took a look at
it.  This tool generates a C source code from the data you feed to it, and
then compiles this C code in order to generate the keyboard layout DLL.  The
bug which expects Space to only insert a space character is at the MSKLC
level.  IOW, if the generated C source code is patched correctly, and then
compiled with the same compiler switches that the MSKLC tool passes to the
compiler, ZWNJ can be successfully assigned to Shift+Space combination.

I did this, and installed the new DLL on my system, and it works beatifully.
It's the same keyboard layout, only Shift+Space inserts a ZWNJ instead of a
space.  I thought I would submit it to sourceforge so that everyone can use
the new tool.  Roozbeh, let me know if it would be okay for me to send the
files to you to get them into the sourceforge, or if I should do something
else.


---------
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-25 Thread Ehsan Akhgari
> > Thanks for the links.  Seems like a very handy keyboard.
> BTW, why the
> > Shift-Space combination does not work?
>
> Bug in Microsoft keyboard layout creation tool. Use "Shift-B"
> temporarily.

Thanks.

I've not done any work in this arena, so what I propose here might make no
sense.  Sorry if that's so.  But, the M$ page on the keyboard layout
creation tool says the tool "simplifies" the process of creating a keyboard
layout.  Would there be any way to assign ZWNJ to Shift+Space by coding the
keyboard layout tool manually?  If you can send me the C/C++ source file
off-list, I'll try to investigate it further.

If not, I guess Shift+B is not that bad as well.  The keyboard layout rocks,
even without having Shift+Space in place.  :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-25 Thread Ehsan Akhgari
> What is notepad? A text editor? Text editors should not insert a UTF-8
> BOM either. The problem is that Microsoft sometimes invents
> non-standard things and then pushes it so hard that Unicode adds it to
> parts of the standard (or an FAQ). "Microsoft conventions for .txt
> files" in the Unicode FAQ looks sarcastic to me.

Well, maybe you're right, but I don't see how a text editor is supposed to
know the encoding of a file without some kind of mark.  See, HTTP transfers
the character set using the Content-Type response header.  In HTML, it's
spedified with a  tag.  In XML, the
default encoding is UTF-8, and if a document is encoded in another encoding,
it must be specified in the  PI.  Plain text files have no means of
identifying the character encoding, so a single text file can be interpreted
as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the
exact character encoding used.

The point here is that, protocols which do not allow BOM are those who
provide other means of specifying the character encoding.  A certain byte
stream can have multiple interpretations depending on what content encoding
you use to interpret it, and there must be some way to cut off this
confusion.

YMMV,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-20 Thread Ehsan Akhgari
> You can re-live its creation here in the archives:
> http://lists.sharif.edu/pipermail/persiancomputing/2003-June/0
00538.html
[snip]

Thanks for the links.  Seems like a very handy keyboard.  BTW, why the
Shift-Space combination does not work?

> Done! Beautiful!
> I hope the Mozilla users appreciate all this trouble.
>
> Thanks again for all your help!

You're welcome! :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-19 Thread Ehsan Akhgari
> It appears taking a break is the best cure. Some progress:

Yes.  It certainly is.  Good to hear the problem's solved.

[snip]
> Find/Replace  [the invisible] ZWNJ in Notepad is no problem becuase I
> have the Persian Experimental Keyboard and ZWNJ is right on Shift-b.
> Although I can't actually SEE that I've typed ZWNJ in the Find box, it
> really is there. So now in my .js array, I have a few Persian words
> with \u200c right in the middle of the Persian script.

Interesting.  Sorry for my ignorance, but is that keyboard available
publicly?

> It doesn't seem like the browsers should be able to handle that but
> now I see it's not a problem.

Why not?  The \u syntax allows you to represent Unicode characters in
JavaScript.

> Only thing I have to
> remember is to re-open the Notepad file in a non-WYSIWYG editor and
> delete that BOM creature.
>
> Mozilla is now able to "find" my words containing ZWNJ which was the
> whole point of this exercise.
>
> One small problem still remains: in Mozilla, if you click on any Tajik
> word, it shows you the Persian counterpart in the popup.
> But Mozilla is not able to display the ZWNJ so that is ignored.
> I'm not sure what to do to solve this.

Well, on Mozilla1.2.1 that I tested it on, if you replaces ZWNJ in the
description of the Tajik array indices with ‌ then it seems to work
happily.  Try giving it a test.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-18 Thread Ehsan Akhgari
> First of all, thank you very much for all the patient and lengthy
> explanations. Very nice of you to share so many tips!
> (Thanks to the others too who answered on and off list!)

Happy to help!

[snip]
> Now that 2 people have said to change ZWNJ to \u200c, I tried that but
> it didn't work. I don't think I have the right tool.
>
> I couldn't do it in Notepad because as I said, it's WYSIWYG in Persian
> script so if I do a global replacement and stick \u200c in the middle
> of Persian script, that's obviously not going to work (and I also
> tried it for good measure and it didn't work but there may be many
> reasons it didn't work out using Notepad.)

I don't know what you mean here.  Why it doesn't work in Notepad?  Note that
on Windows XP, you can't type ZWNJ inside the Find/Replace dialog box - you
need to copy/paste it from inside the Notepad text editor window.  Another
reason why not to use Notepad.

> Then, since you recommended Frontpage, I tried that. Earlier, it had
> not even occured to me to attempt to open a .js file in  Frontpage
> (version
> 2000.) This time I fooled it by changing the extension from .js to
> .html and so was able to open it in html view where all the unicode
> was in numeric style. I changed all the ‌ to \u200c but now I
> see that also has not worked.

Well, I don't know what the problem is here...

BTW, FrontPage 2003 can open the .js file (using File | Open, or drag and
drop) and render the UTF-8 characters without converting them to numeric
entities just fine.  Don't try putting them in an HTML file.  Don't know
about FrontPage 2000, though.

> I think I'm not going to use Notepad for making bidirectional arrays
> from now on! That is insane to go to such great lengths!

Yeah, it's definitely so.

> Not sure what you have in mind here, but at this point, I"ll be glad
> just to make it work with ZWNJ.

In the JS code, try to replace the trailing ZWNJ-raa and ZWNJ-o with nothing
using a regex.

HTH,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-18 Thread Ehsan Akhgari
> An important note: what Notepad does here is only "acceptable". It's
> not even recommended. HTML 4 clearly doesn't allow a UTF-8 BOM appear
> before the HTML tag. Notepad is supposed to be a text editor. A text
> editor shouldn't insert markup by itself. BTW, ISIRI 6219 strongly
> discourages the use of a BOM in UTF-8 files.

The problem here is that web protocols (HTML for example) don't allow the
BOM, and Notepad is not an HTML editor, so there's nothing to prevent it
from adding the BOM.  Check out:

http://www.unicode.org/faq/utf_bom.html#28

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

'I generally take life as it comes my way', said Death.



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-17 Thread Ehsan Akhgari
e correspondence between
> languages for the purposes of this project. I was wishing I had
> Behdad's beloved U+202F, the Narrow No-Break Space for this operation!

You can leave them as they are, and handle them in the JavaScript code (trim
them off of the end of the Tajik words maybe.)

> 6. I embedded the fonts again.  Looks beautiful on WInXP/IE6 and
> limited others. I presume it looks terrible on the rest.
> Still thinking about what to do about that. Behnam, how's the Tajik
> looking on your Mac?

A big (IMO) problem with font embedding is that if users save the document
on their HD (using IE of course) then the fonts will be gone.  Not a
professional image, if you ask me.

That's why I try to stick with the std fonts, and use other formats when a
custom font is absolutely necessary (PDF being my favorite).  Not the best
of solutions, of course, but works for me.

Hope this helps,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: IranL10nInfo

2004-05-17 Thread Ehsan Akhgari
> Iranian guys, would you please do a short statistical survey?

I've never come across Amordad.  And I was born in (A)Mordad...

Ehsan



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Iranian Calendar

2004-05-15 Thread Ehsan Akhgari
> What we should look for, is clear and reasonable objection.
> There hasn't been any such objection for "Iranian calendar".

I think it's the most reasonable term when you look at it from a foreigner's
point of view.  They're not interested in what Jalali means, or the
astronomical details of the calendar.  I think "Iranian Calendar" best
identifies the subject as the calendar officially used in Iran, and that
would sound the most reasonable name it can get.  Of course, that's all my
opinion, hence my "personal preference"... :-)

> My rewording of the FarsiWeb opinion is that the 2820-year Birashk
> calendar is the best implementable arithmetic calendar. The law *is*
> different and the practice *may be* different, but this is the best we
> can find. The "showraa-ye aalie-e taghvim" (of the Islamic Republic of
> Iran) holds the authority on the Iranian calendar, and they don't even
> disclose the calendar of 1384 if you ask them to, let aside telling
> the algorithm they use to anybody (which includes other governmental
> bodies, like "saazmaan-e modiriat va barnaame-rizi-e keshvar" and
> "showraa-ye aali-e anformaatik").

Yeah, many such supposed-to-be-known-to-all information here are acted upon
like military secrects, unfortunately.  Lovely.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian PC-Kimmo 0.8 released

2004-05-13 Thread Ehsan Akhgari
Thanks for your reply, Jon.

> Thanks for asking.   All the words are in
> tab-separated text files, as in noun.lex, verb.lex,
> etc.   They get converted to a kimmo-usable file such
> as fa-noun.lex, fa-verb.lex, etc. using the db2lex perl scripts in the
> scripts directory.  The verb and adjective files use a specific script
> written for them; all others use the plain script.  Also see the
> orthography.txt file for the romanization scheme.  It also has some
> other goodies.
>
> I would love add any additions you might make to the lexicon in the
> next release.

I suppose I can use roman2unicode to convert the roman encoding into
readable plain text (I'm not fast on reading the roman notation).  That way,
I can import the data into Excel, sort it alphabetically, and start adding
new stuff...

> As you can see, it needs a little more work on the morphophonemic
> rules, but it should work fine for stemming purposes.

Yes, it's pretty good at recognizing the stem of the word.

> Hans Nelson is the man to talk to.  He's working on a Kimmo output to
> XML program.  I don't know much about
> it, but here's his email:   [EMAIL PROTECTED]

Thanks for your hint.  I'll try to contact him.  In case you're interested,
I can send the final result of our discussion to you off-list.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian PC-Kimmo 0.8 released

2004-05-11 Thread Ehsan Akhgari
> For anyone who's interested, Persian PC-Kimmo version
> 0.8 has just been released.  It's available here:
>
> http://home.byu.net/jmd56/download/persian-pckimmo-0.8.tar.gz

Thanks, Jon, for releasing this version.  It looks a lot better than the
previous one!

> The biggest thing holding them back from being a 1.0 is a relatively
> small lexicon (~1350 words).  The morphology engine achieves about
> two-thirds recognition on a corpus of about 3.5 million words.
> And of course, it's GPL'ed.

Hmmm, do you have a list of the words in the current lexicon?  (I'm not
familiar with PC-KIMMO specific commands, so I can't parse them on my own.)
What should I do to help adding more words?

> Any helpful feedback would be appreciated.

I find the new tree-style recognition a lot helpful:

n+mi+]+im NEG+DUR+come.PRES+1P

1:
Top
 |
   Verb
 |
VNEGPREFIXVNStem
n+ __|___
   NEG+ VPREFIX   VStem
  mi+   |
 DUR+V1Stem
|_
 V2Stem  VPSUFFIX
|   +im
 V3Stem +1P
|
V
]
come.PRES

Top:
[ cat:   Top ]

1 parse found

n+mi+]+m NEG+DUR+come.PRES+1S

1:
Top
 |
   Verb
 |
VNEGPREFIXVNStem
n+ __|___
   NEG+ VPREFIX   VStem
  mi+   |
 DUR+V1Stem
|_
 V2Stem  VPSUFFIX
|   +m
 V3Stem +1S
|
V
]
come.PRES

Top:
[ cat:   Top ]

1 parse found

I was wonderring if there's some way to retrieve the tree-structured data in
a format which is easy to parse (the ASCII style is too difficult for a
computer program to parse), something like an XML format maybe?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-30 Thread Ehsan Akhgari
> (I'll just reply to your other post here)  I guess I didn't know about
> a new pc-parse release.  Where did you get the newest source code?
> That's terrific news for me.

Well, the release I downloaded is approximately one year old, but here's the
URL I downloaded it from:

ftp://ftp.sil.org/software/unix/pc-parse-src-20030321.tgz

To build it, I just did a typical "./configure; make; make install;" - there
was nothing more than that.  What compiler version have you used to compile
it?

Let me know if you still have compilation problems.  I might be able to help
if I can reproduce them here.

> I'm very interested in any work you'd work on, including a PHP
> extension.  Maybe SIL.org might be interested as well.

Actually, what I'm working on is an English/Persian search engine which can
be placed on any site with no need to download/install anything.  It's
nearly finished, I only have to translate the web UI into Persian, and also
implement stemming for Persian in the engine.  Originally I planned to
implement a stemming algorithm myself, but I figured that I can't be
considered an expert in Persian grammar/linguistics at all, so I prefer to
use already working solutions, and your work seems to be the *best* choice.

The PHP extension would be quite a thin wrapper, but anyway I'll definitely
provide you with the source code when I'm finished.  You'll be also welcome
to a copy of the search engine's source code itself if you're interested.

> Give me a week and I'll email them to the email address in your
> signature, unless you tell me otherwise.

Thanks a lot!  I highly appreciate your great help.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-30 Thread Ehsan Akhgari
> One of the things that drives me nuts about the software is that it
> claims to run on Solaris/Sparc, Win/x86, MacOS, or BSD, but apparently
> no Linux (I have a Sparc box, so I'm lucky :-).  The source code is
> downloadable, but it currently doesn't seem to compile on Linux/x86.
> It does have a callable C interface, as documented in the kimmolib.txt
> in this file [2].  In fact, I'm working on an AI program that calls
> PC-Kimmo to do morphology.  Batch mode is used via the 'take' command,
> and using a .tak file.

Here's an update.  I tried to build the whole pc-parse package on Linux
(RedHat 9.0) using gcc 3.2.2, and it compiled without a single problem.  I
also tried running PC-Kimmo, and it was working smoothly.  I noticed that in
the README, they cliam to have tested the build process on the following
platfroms:

  1. Debian GNU/Linux 2.2 (kernel 2.2.17) / gcc 2.95.2, glibc 2.1.3-24
  2. Red Hat Linux 7.3 (kernel 2.4.18) / gcc 2.96, glibc 2.2.5-34
  3. Red Hat Linux 8.0 (kernel 2.4.18-14) / gcc 3.2-7, glibc 2.2.93-5
  4. OpenBSD 3.1 / gcc 2.95.3
  5. Mac OS X (10.2) / gcc 3.1
  6. cygwin 1.3.10-1 (Windows XP Pro) / gcc 2.95.3-5

Maybe you're trying an older version?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-30 Thread Ehsan Akhgari
> It's a two-level morphology engine, so basically it resolves a surface
> form to a lexical form, or lexical to surface form.
>  For example, if I give it a newspaper word like 'nmiAim'
> (نميايم -- I am not coming), it
> will resolve to 'n+mi+A+m', taking into account any morpheme boundary
> changes (like the yeh here).  More documentation is found here [1].

Thanks for the information.  I tried nmiAim, but unfortunately didn't get
any results.  However, I noticed it recognizes some words, like xuAhd
(khahad) as xuAh+d for example.  It seems like a perfect tool for my job.
Thanks for the nice job!

> One of the things that drives me nuts about the software is that it
> claims to run on Solaris/Sparc, Win/x86, MacOS, or BSD, but apparently
> no Linux (I have a Sparc box, so I'm lucky :-).  The source code is
> downloadable, but it currently doesn't seem to compile on Linux/x86.
> It does have a callable C interface, as documented in the kimmolib.txt
> in this file [2].  In fact, I'm working on an AI program that calls
> PC-Kimmo to do morphology.  Batch mode is used via the 'take' command,
> and using a .tak file.

I downloaded the source, and took a look into it, and found this file:
pc-parse-20030321/pckimmo/r.c, which seems to be exactly what I'm after - a
C interface for the recognition engine.  Too bad it doesn't compile on Linux
though, because I'm planning to use this in a PHP extension which must run
on both Linux/x86 and Win32.  However, if the source doesn't need a full
re-write, then I can fix it to compile on Linux as well (I'm a C/C++
programmer, more than anything!).  Are you interested in the fixed sources?
I could send them to you, or I can make it available online if there's
enough interest.  Also let me know if you'll be interested in the PHP
extension as well.

> Don't be too disappointed about version 0.5 of the Persian
> implementation -- it was released 2 years ago
> ;-)  I've reworked almost every aspect of it since then, so hopefully
> it will work better.
> Have fun.

Hmmm, would it be possible for me to have a copy of your latest work before
you publish it?  I'd be grateful if you can send them to me.

Thanks!
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-29 Thread Ehsan Akhgari
Thanks a lot, Jon, for your reply.

> The only one that I'm aware of is found here [1].  But it
> seems hard to get any other information about this stemmer.

Yes, it definitely seems so.  The only Farsi stemmer I've been aware of
myself is http://www.isri.unlv.edu/publications/isripub/Taghva2003-02.pdf .
I had contacted Dr. Taghva some time ago about his stemmer, but didn't hear
back from him at all.

> While the aim is a little different from a stemmer, a Perian
> morphological engine is being developed.  The one available
> for download [2] is a couple versions behind current
> development, but it still yeilds decent results.  Version 0.5
> is public domain, and newer versions will be under the
> General Public License.  A new version will be released in a
> couple of months.

I downloaded this package, and looked into it.  It seem to be useful for my
job.  However, this is the first time I'm hearing of PC-Kimmo, so I was kind
of lost when trying to figure out the whole thing.  I was wonderring if you
can provide me with some additional info (or URLs; didn't find any myself)
about this software, especially how can it be used on Linux in batch mode.
Does PC-Kimmo come with any callable C interface?

Thanks a lot!
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Farsi Stemming Algorithm

2004-04-28 Thread Ehsan Akhgari
Hi all,

Does anyone know of any free Farsi Stemming algorithm, like the Porter
algorithm to English?

Thanks a lot!
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing