El dj 20 de 03 de 2014 a les 04:49 -0700, en/na Rafi Kamal va escriure:
> Hi
> 
> 
> I came to know from Zaher and Ragib that en-bn PoS tagger is giving
> wrong output for some inputs. Some examples of wrong tagger output is
> here: 
> http://wiki.apertium.org/wiki/Bengali_and_English/Issues#Wrong_Tagger_Output.
> 
> 
> I think I should work on the PoS tagger first, because without fixing
> it, adding transfer rules or updating dictionaries won't help much.

Updating dictionaries will, transfer rules probably will help.

> I've talked to Unhammer on IRC about the tagger. He suggested me to
> train the tagger to improve its quality. I've read wiki articles on
> tagger training and unsupervised tagger training. Now I've a few
> questions:
>      1. Where can I find the tag definition file? According to the
>         wiki, it should be in the language pair directory. But find .
>         -name *.tsx doesn't return any match.

It probably doesn't have one written yet. 

>      1. I've downloaded the bnwiki dump, unzipped it and run
>         WikiExtractor.py script on it. But I think I'm not getting the
>         correct output. The script filters all the body texts from the
>         dump and preserves only some of the titles. Here is the first
>         100 lines of the script output:
>         http://apertium.codepad.org/HGLeBM2K. I can write an extractor
>         for Bangla myself, just needed to be sure if I'm not doing
>         anything wrong.

The way the script functions is not intuitive. Look for a file called
"bn.crp.txt" or something like that. It will contain the body texts. I
think you might be able to specify an output file too.

>      1. The wiki page focuses on creating an entirely new .prob file.
>         As there has already been one .prob file, is there any way I
>         can just update it by training? (I
>         guess apertium-tagger-trainer can do it, but it works only
>         with Apertium 1).

This sounds quite time consuming. I would work on more time effective
improvements to start out with.

Fran



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to