Hello Gang,

I put the Apertium list in copy, just in case someone want to add something.


>     I experiment with the en-es language pair
>     
> (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es),
>     and the newest "lt-toolbox" and "apertium" in the svn trunk. After a
>     compiling sucessfully, I go to the "apertium-en-es/es-tagger-data"
>     directory, and copy the "es-tagged.txt" to "es.crp.txt", and use the
>     latter file as the corpus for unsupervised training.
>
>     executing the command:
>     make -f es-en-unsupervised.make
>
>     and got the following log:
>
>     ================= log begin=========================
>     Generating es-tagger-data/es.dic
>     This may take some time. Please, take a cup of coffee and come back
>     later.
>     apertium-validate-dictionary apertium-en-es.es.dix
>     apertium-validate-tagger apertium-en-es.es.tsx
>     lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" | grep -v
>     ":<:" |\
>              awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | apertium-destxt
>      >es.dic.expanded
>     lt-proc -a es-en.automorf.bin <es.dic.expanded | \
>              apertium-filter-ambiguity apertium-en-es.es.tsx >
>     es-tagger-data/es.dic
>     rm es.dic.expanded;
>     apertium-destxt < es-tagger-data/es.crp.txt | lt-proc
>     es-en.automorf.bin > es-tagger-data/es.crp
>     apertium-validate-tagger apertium-en-es.es.tsx
>     apertium-tagger -t 8 \
>                                 es-tagger-data/es.dic \
>                                 es-tagger-data/es.crp \
>                                 apertium-en-es.es.tsx \
>                                 es-en.prob;
>     Calculating ambiguity classes...
>
>     106 states and 335 ambiguity classes
>     Kupiec's initialization of transition and emission probabilities...
>     Error: A new ambiguity class was found. I cannot continue.
>     Word 'Mar' not found in the dictionary.
>     New ambiguity class: {NOMMF,ANTROPONIM}
>     Take a look at the dictionary and at the training corpus. Then, retrain.
>     make: *** [es-en.prob] error 1
>     ================= log end=========================
>
>     I debugged the word "Mar" with lt-proc:
>     echo "Mar" | lt-proc es-en.automorf.bin
>
>     with the output:
>     ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$


Normally this happens when you are not regenerating the file with the 
dictionary from which the ambiguity classes are obtained.

Please check if ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ appears as a 
result of the expansion of the dictionary:

$ lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" |\
   grep -v ":<:" | awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' |\
   apertium-destxt | lt-proc -a es-en.automorf.bin > es.expand.dic

And if it appears after filtering the ambiguity file:
$ apertium-filter-ambiguity apertium-en-es.es.tsx < es.expand.dic > es.dic

Cheers
--
Felipe

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to