Hi, I've been considering taking the Arabeyes wordlist as a basis for an Estonian-Arabic wordlist. I'm currently maintaining an online English-Estonian dictionary (http://www.tps.edu.ee/nastik/) and since I've started studying Arabic (I'm just a beginner) I've been looking for free Arabic wordlist for (semi-)automatic conversion. My goal is to build at least rudimentary Arabic-Estonian dictionary. Arabeyes wordlist seemed almost perfect at first but I've stumbled upon few issues that I'd like to ask clarification about.
All English words have initial capital letter. Why this decision? The problem with this is that one cannot distinguish between proper names and similarily written regular words. For example - in English "Jenny" is a name while "jenny" is a female donkey. "John" is a name but "john" is a loo. While I can see that in Arabeyes wordlist both Jenny and John give just Arabic tansliteration of those names I wouldn't know this for other words. There are many other such words, for example Maxim: it could be the name, it could be an aphorism or it could be a machine gun. That's the reason why dictionaries use lowercase for regular words and initial capital letter for proper names. Is there an Arabeyes wordlist available which follows this fashion? Second problem is that as a learner I find it confusing that most (all?) nouns are given with the definite article instead of just the base form. البحر (al bahr) instead of just بحر (bahr), البيت instead of just بيت (I hope I get right characters here). It is confusing for the beginner and a foreigner. For non-native speaker it is easier to look up the base word and add the article if needed, rather than look up the word with an article and then deduct to get the base form. I find using my small Al-Mawrid (2004, 6th ed) dictionary easier than the wordlist. What was the reasoning to include the articles with the words in Arabeyes wordlist?
_______________________________________________ Doc mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/doc

