El dj 06 de 06 de 2013 a les 11:03 +0200, en/na Felipe Sánchez Martínez
va escriure:
> Hello Gang,
> 
> I put the Apertium list in copy, just in case someone want to add something.
> 
> 
> >     I experiment with the en-es language pair
> >     
> > (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es),
> >     and the newest "lt-toolbox" and "apertium" in the svn trunk. After a
> >     compiling sucessfully, I go to the "apertium-en-es/es-tagger-data"
> >     directory, and copy the "es-tagged.txt" to "es.crp.txt", and use the
> >     latter file as the corpus for unsupervised training.
> >
> >     executing the command:
> >     make -f es-en-unsupervised.make
> >
> >     and got the following log:
> >
> >     ================= log begin=========================
> >     Generating es-tagger-data/es.dic
> >     This may take some time. Please, take a cup of coffee and come back
> >     later.
> >     apertium-validate-dictionary apertium-en-es.es.dix
> >     apertium-validate-tagger apertium-en-es.es.tsx
> >     lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" | grep -v
> >     ":<:" |\
> >              awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | apertium-destxt
> >      >es.dic.expanded
> >     lt-proc -a es-en.automorf.bin <es.dic.expanded | \
> >              apertium-filter-ambiguity apertium-en-es.es.tsx >
> >     es-tagger-data/es.dic
> >     rm es.dic.expanded;
> >     apertium-destxt < es-tagger-data/es.crp.txt | lt-proc
> >     es-en.automorf.bin > es-tagger-data/es.crp
> >     apertium-validate-tagger apertium-en-es.es.tsx
> >     apertium-tagger -t 8 \
> >                                 es-tagger-data/es.dic \
> >                                 es-tagger-data/es.crp \
> >                                 apertium-en-es.es.tsx \
> >                                 es-en.prob;
> >     Calculating ambiguity classes...
> >
> >     106 states and 335 ambiguity classes
> >     Kupiec's initialization of transition and emission probabilities...
> >     Error: A new ambiguity class was found. I cannot continue.
> >     Word 'Mar' not found in the dictionary.
> >     New ambiguity class: {NOMMF,ANTROPONIM}
> >     Take a look at the dictionary and at the training corpus. Then, retrain.
> >     make: *** [es-en.prob] error 1
> >     ================= log end=========================
> >
> >     I debugged the word "Mar" with lt-proc:
> >     echo "Mar" | lt-proc es-en.automorf.bin
> >
> >     with the output:
> >     ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$
> 
> 
> Normally this happens when you are not regenerating the file with the 
> dictionary from which the ambiguity classes are obtained.
> 
> Please check if ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ appears as a 
> result of the expansion of the dictionary:
> 
> $ lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" |\
>    grep -v ":<:" | awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' |\
>    apertium-destxt | lt-proc -a es-en.automorf.bin > es.expand.dic
> 
> And if it appears after filtering the ambiguity file:
> $ apertium-filter-ambiguity apertium-en-es.es.tsx < es.expand.dic > es.dic

I think the problem is that the extra analyses are added by regular
expressions which are not covered in the expansion.

Fran


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to