On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<jagana...@gmail.com> wrote:
>
>
> On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> <rajeev.sebast...@gmail.com> wrote:
>>
>> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<varewo...@gmail.com> wrote:
>> >
>> > i am so much interested to make this happen... i am always interested
>> > in linguistics...
>> > anybody tell me wat r the things we need primarily??
>>
>> How about ...
>>
>> 1) 50+ years of research (actually, 2000 if you consider Panini)
>
> It is history ? If you can work hard you can reduce the zero from it.

Huh ?

>>
>> 2) Extremely large corpus ... if you want to make a practical system
>
> Only if you adopt copus based model. That is not going to practical in right
> now in the case of English to Malayalam translation

It is not practical to make *anything* without a corpus. Even if you
use a non-corpus based methodology to perform translation, you still
need a large corpus to *validate* that your method works for more than
toy examples. This is the biggest problem that faces any NLP work for
Indic languages, and one that some glorified institutions in India
neither builds up nor shares, most probably because all their systems
are capable of are translating toy examples.

>>
>> 3) Large and talented team good in computational linguistics
>
> Where is it? We can build up this

Best of Luck.

>>
>> 4) a very practical theory that can model language effectively for
>> your purposes (seriously lacking for even small use cases in even
>> major languages)
>
> A perfect grammar for Malayalam is required. Especially in Sysntax and
> Morphology. Malayalam really lacks such studies.

I don't think any language has such an in-depth model that could be
used for generic MT. There are of course, special case models ...
which can be used for special cases.

>>
>> 5) since you want to do MT, you need one more theory to handle the
>> target language ... maybe even an IL model if you go that route
>> instead of direct translation.
>
> First of all we need a good English to Malayalam dict in e-format.  Which
> gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം,
> തര്‍ക്കശാസ്ത്രം like.

POS tagged dataset is just one component of a complete corpus.

Regards
Rajeev J Sebastian

--~--~---------~--~----~------------~-------~--~----~
"Freedom is the only law". 
"Freedom Unplugged"
http://www.ilug-tvm.org

You received this message because you are subscribed to the Google
Groups "ilug-tvm" group.
To post to this group, send email to ilug-tvm@googlegroups.com
To unsubscribe from this group, send email to
ilug-tvm-unsubscr...@googlegroups.com

For details visit the website: www.ilug-tvm.org or the google group page: 
http://groups.google.com/group/ilug-tvm?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to