Hello Gang, I put the Apertium list in copy, just in case someone want to add something.
> I experiment with the en-es language pair > > (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es), > and the newest "lt-toolbox" and "apertium" in the svn trunk. After a > compiling sucessfully, I go to the "apertium-en-es/es-tagger-data" > directory, and copy the "es-tagged.txt" to "es.crp.txt", and use the > latter file as the corpus for unsupervised training. > > executing the command: > make -f es-en-unsupervised.make > > and got the following log: > > ================= log begin========================= > Generating es-tagger-data/es.dic > This may take some time. Please, take a cup of coffee and come back > later. > apertium-validate-dictionary apertium-en-es.es.dix > apertium-validate-tagger apertium-en-es.es.tsx > lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" | grep -v > ":<:" |\ > awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | apertium-destxt > >es.dic.expanded > lt-proc -a es-en.automorf.bin <es.dic.expanded | \ > apertium-filter-ambiguity apertium-en-es.es.tsx > > es-tagger-data/es.dic > rm es.dic.expanded; > apertium-destxt < es-tagger-data/es.crp.txt | lt-proc > es-en.automorf.bin > es-tagger-data/es.crp > apertium-validate-tagger apertium-en-es.es.tsx > apertium-tagger -t 8 \ > es-tagger-data/es.dic \ > es-tagger-data/es.crp \ > apertium-en-es.es.tsx \ > es-en.prob; > Calculating ambiguity classes... > > 106 states and 335 ambiguity classes > Kupiec's initialization of transition and emission probabilities... > Error: A new ambiguity class was found. I cannot continue. > Word 'Mar' not found in the dictionary. > New ambiguity class: {NOMMF,ANTROPONIM} > Take a look at the dictionary and at the training corpus. Then, retrain. > make: *** [es-en.prob] error 1 > ================= log end========================= > > I debugged the word "Mar" with lt-proc: > echo "Mar" | lt-proc es-en.automorf.bin > > with the output: > ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ Normally this happens when you are not regenerating the file with the dictionary from which the ambiguity classes are obtained. Please check if ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ appears as a result of the expansion of the dictionary: $ lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" |\ grep -v ":<:" | awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' |\ apertium-destxt | lt-proc -a es-en.automorf.bin > es.expand.dic And if it appears after filtering the ambiguity file: $ apertium-filter-ambiguity apertium-en-es.es.tsx < es.expand.dic > es.dic Cheers -- Felipe ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
