Hello Kais, This is very good enhancement and results. May I know what is the difference between your parser and other statistical based parser like MADA? Also is yours available for download ? or do you plan to do that in the future?
I am working on MT and Automatic Tashkeel for Arabic and I am interested to utilize your parser in these researches. Best regards, Waleed On Sun, Sep 12, 2010 at 12:54 PM, Kais Dukes <[email protected]> wrote: > Hello Eric, > > > Some very exciting news … well at least exciting to me :-) Please accept my > apologies for not being very responsive on e-mail recently, but I had locked > myself in my study most evenings after coming home from work to concentrate > on something that I have found most interesting. For the past 12 months, > development of the Quranic Arabic Dependency Treebank ( > http://corpus.quran.com/treebank.jsp) has been slow involved me going > through the following steps repeatedly: > > > 1. Use a hand-written rule based parser to produce an initial draft > syntactic analysis of a verse of the Quran, e.g. see: > http://corpus.quran.com/treebank.jsp?chapter=67 > > > 2. Correct the output of the parser and add the resulting proofread verse > to the treebank. > > > 3. Potentially improve the parser’s accuracy by reviewing its rules against > the new larger set of data in the Treebank. Improving the hand-written > parser has been a costly exercise, involving the addition of new grammar > rules and refining these many times over. However, the parser had performed > well. Run against the current draft treebank covering approx. 20% of the > Quran, the rule-based parser is 78.79% accurate in terms of it's automatic > grammatical analysis using traditional Arabic dependency grammar: > > * > * > > *Rule-based parser ... F-measure 78.79%* (precision=90.13%, recall=69.99%) > > > Over the last few weeks I have been looking into moving away from the > rule-based parser and starting to a use probabilistic parser, trained > statistically via machine learning. This new parser automatically reads the > existing treebank and "learns" how to perform syntactic analysis for the > rest of the Quran automatically. Amazingly, I am very excited to announce > that I have found way to recast the problem of syntactic analysis in > traditional Arabic grammar as a statistical classification problem > (following a similar idea to Nivre’s dependency parsing algorithm). The > results for the new parser using machine learning are: > > * > * > > *Statistical parser ... F-measure 87.87%* (precision=90.02%, > recall=85.82%) > > > Not only is this a big jump in accuracy (from 79% to 88%), the parser only > takes 15 seconds to train on the existing Treebank, compared to many months > of development time for the rule based parser refining hand-crafted > constraint dependency rules. I am very excited about this! Immediately, what > comes to mind is: > > > 1) We are now using a data-driven statistical parser using > machine-learning, with accuracy comparable to state-of-the-art statistical > parsers for dependency grammar. > > > 2) The improved accuracy of the new parser means that continuing to develop > the syntactic treebank will be quicker since the resulting output is now > much more accurate, and also from reviewing the new syntactic analyses they > also appear to be more consistent. > > > 3) Completion of the treebank should also now move faster because I have to > spend less effort on the time-consuming task for building a rule-based > parser by hand, and I can spend more time on ensuring accuracy by > proofreading the automatic syntactic analyses. > > > 4) This should lead to a stronger journal paper submission on statistical > dependency parsing of Quranic Arabic. In fact, I am so excited about this > that I am keen to start working on this paper as soon as I have got the FAL > submission out of the way. > > > 5) I now intend to rework the PhD project plan to include this updated > information. > > > Looking forward to hearing from you! I hope it's okay, I have CC'd the > comp-quran mailing list, I would keen to here from others who have an > interest in, or experience with, statistical parsing. Any comments are most > welcome. > > > Kind Regards, > > > -- Kais > >
