Hello Kais,

This is very good enhancement and results. May I know what is
the difference between your parser and other statistical based parser like
MADA?
Also is yours available for download ? or do you plan to do that in the
future?

I am working on MT and Automatic Tashkeel for Arabic and I am interested to
utilize your parser in these researches.

Best regards,
Waleed

On Sun, Sep 12, 2010 at 12:54 PM, Kais Dukes <[email protected]> wrote:

> Hello Eric,
>
>
> Some very exciting news … well at least exciting to me :-) Please accept my
> apologies for not being very responsive on e-mail recently, but I had locked
> myself in my study most evenings after coming home from work to concentrate
> on something that I have found most interesting. For the past 12 months,
> development of the Quranic Arabic Dependency Treebank (
> http://corpus.quran.com/treebank.jsp) has been slow involved me going
> through the following steps repeatedly:
>
>
> 1. Use a hand-written rule based parser to produce an initial draft
> syntactic analysis of a verse of the Quran, e.g. see:
> http://corpus.quran.com/treebank.jsp?chapter=67
>
>
> 2. Correct the output of the parser and add the resulting proofread verse
> to the treebank.
>
>
> 3. Potentially improve the parser’s accuracy by reviewing its rules against
> the new larger set of data in the Treebank. Improving the hand-written
> parser has been a costly exercise, involving the addition of new grammar
> rules and refining these many times over. However, the parser had performed
> well. Run against the current draft treebank covering approx. 20% of the
> Quran, the rule-based parser is 78.79% accurate in terms of it's automatic
> grammatical analysis using traditional Arabic dependency grammar:
>
> *
> *
>
> *Rule-based parser ... F-measure 78.79%* (precision=90.13%, recall=69.99%)
>
>
> Over the last few weeks I have been looking into moving away from the
> rule-based parser and starting to a use probabilistic parser, trained
> statistically via machine learning. This new parser automatically reads the
> existing treebank and "learns" how to perform syntactic analysis for the
> rest of the Quran automatically. Amazingly, I am very excited to announce
> that I have found way to recast the problem of syntactic analysis in
> traditional Arabic grammar as a statistical classification problem
> (following a similar idea to Nivre’s dependency parsing algorithm). The
> results for the new parser using machine learning are:
>
> *
> *
>
> *Statistical parser ... F-measure 87.87%* (precision=90.02%,
> recall=85.82%)
>
>
> Not only is this a big jump in accuracy (from 79% to 88%), the parser only
> takes 15 seconds to train on the existing Treebank, compared to many months
> of development time for the rule based parser refining hand-crafted
> constraint dependency rules. I am very excited about this! Immediately, what
> comes to mind is:
>
>
> 1) We are now using a data-driven statistical parser using
> machine-learning, with accuracy comparable to state-of-the-art statistical
> parsers for dependency grammar.
>
>
> 2) The improved accuracy of the new parser means that continuing to develop
> the syntactic treebank will be quicker since the resulting output is now
> much more accurate, and also from reviewing the new syntactic analyses they
> also appear to be more consistent.
>
>
> 3) Completion of the treebank should also now move faster because I have to
> spend less effort on the time-consuming task for building a rule-based
> parser by hand, and I can spend more time on ensuring accuracy by
> proofreading the automatic syntactic analyses.
>
>
> 4) This should lead to a stronger journal paper submission on statistical
> dependency parsing of Quranic Arabic. In fact, I am so excited about this
> that I am keen to start working on this paper as soon as I have got the FAL
> submission out of the way.
>
>
> 5) I now intend to rework the PhD project plan to include this updated
> information.
>
>
> Looking forward to hearing from you! I hope it's okay, I have CC'd the
> comp-quran mailing list, I would keen to here from others who have an
> interest in, or experience with, statistical parsing. Any comments are most
> welcome.
>
>
> Kind Regards,
>
>
> -- Kais
>
>

Reply via email to