Hello: The problem with "Mar" is not the dot, but the upper-case letter. A way to avoid this would be to expand the dictionary and duplicate those entries with an initial upper-case letter so that you have both "Mar" and "mar". The problem with that is that there will be some unknown words you will have to rule out.
Cheers -- Felipe El 07/06/13 07:51, Gang Chen escribió: > Hi, Philipe, Fran, > > I think there are two kinds of words that cause the "A new ambiguity > class" error. > > (1) the first kind are the words like "Mar", that make sense in both > cases, whether a dot is appended or not: > > I ran the command: > lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" | grep -v ":<:" | > awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | apertium-destxt >es.dic.expanded > > in the file es.dic.expanded, the contents are like these: > abyectas.[ > ]abyecta.[ > ]abyectos.[ > ]abyecto.[ > ... > > and there is a line: > ]Mar.[ > > Next, I ran the command: > lt-proc -a es-en.automorf.bin <es.dic.expanded > es.dic.expanded.morph > (before using "apertium-filter-ambiguity") > > in the file es.dic.expanded.morph, the contents are like these: > ^abyectas/abyecto<adj><f><pl>$^./.<sent>$[ > ]^abyecta/abyecto<adj><f><sg>$^./.<sent>$[ > ]^abyectos/abyecto<adj><m><pl>$^./.<sent>$[ > ]^abyecto/abyecto<adj><m><sg>$^./.<sent>$[ > ... ... > > however, for the line "]Mar.[", the content is: > ]^Mar./Mar.<n><m><sg>$[ > > So it seems the mophological analyser has recognized "Mar." as a word, > instead of two words, "Mar" and ".". The mophological analysis for the > word "Mar" is special (it is ambiguous), which is > "^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$", and no other words share > this ambiguity class, so the word "Mar" should have contributed this > ambituity class for the first time. > > I guess it may be because of the *dot* int the awk command "awk > 'BEGIN{FS=":>:|:"}{print $1".";}'". Why do we need the dot here? > > (2) the second kind of words are those may be recognized by the regular > expresions, just as Fran says. > > as far as I tried, these numericals will case the "A new ambiguity > class" error, > VI > ID > MI > ... ... > > However, not all the roman numericals causes the error (eg. II is OK). > Roman numericals that can also have other lexical forms will do, for > example: > ^VI/VI<num><mf><sg>/VER<vblex><ifi><p1><sg>$ > ^ID/ID<num><mf><sg>/IR<vblex><imp><p2><pl>$ > ^MI/MI<num><mf><sg>/MÍO<det><pos><mf><sg>$ > > these words' ambituity class didn't appear in the dic file, so the error > was caused. > > What can we do to avoid this? > > > Many thanks :) > > Gang > > > > 2013/6/6 Francis Tyers <fty...@prompsit.com <mailto:fty...@prompsit.com>> > > El dj 06 de 06 de 2013 a les 11:03 +0200, en/na Felipe Sánchez Martínez > va escriure: > > Hello Gang, > > > > I put the Apertium list in copy, just in case someone want to add > something. > > > > > > > I experiment with the en-es language pair > > > > > (https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es), > > > and the newest "lt-toolbox" and "apertium" in the svn > trunk. After a > > > compiling sucessfully, I go to the > "apertium-en-es/es-tagger-data" > > > directory, and copy the "es-tagged.txt" to "es.crp.txt", > and use the > > > latter file as the corpus for unsupervised training. > > > > > > executing the command: > > > make -f es-en-unsupervised.make > > > > > > and got the following log: > > > > > > ================= log begin========================= > > > Generating es-tagger-data/es.dic > > > This may take some time. Please, take a cup of coffee and > come back > > > later. > > > apertium-validate-dictionary apertium-en-es.es.dix > > > apertium-validate-tagger apertium-en-es.es.tsx > > > lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" | > grep -v > > > ":<:" |\ > > > awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | > apertium-destxt > > > >es.dic.expanded > > > lt-proc -a es-en.automorf.bin <es.dic.expanded | \ > > > apertium-filter-ambiguity apertium-en-es.es.tsx > > > > es-tagger-data/es.dic > > > rm es.dic.expanded; > > > apertium-destxt < es-tagger-data/es.crp.txt | lt-proc > > > es-en.automorf.bin > es-tagger-data/es.crp > > > apertium-validate-tagger apertium-en-es.es.tsx > > > apertium-tagger -t 8 \ > > > es-tagger-data/es.dic \ > > > es-tagger-data/es.crp \ > > > apertium-en-es.es.tsx \ > > > es-en.prob; > > > Calculating ambiguity classes... > > > > > > 106 states and 335 ambiguity classes > > > Kupiec's initialization of transition and emission > probabilities... > > > Error: A new ambiguity class was found. I cannot continue. > > > Word 'Mar' not found in the dictionary. > > > New ambiguity class: {NOMMF,ANTROPONIM} > > > Take a look at the dictionary and at the training corpus. > Then, retrain. > > > make: *** [es-en.prob] error 1 > > > ================= log end========================= > > > > > > I debugged the word "Mar" with lt-proc: > > > echo "Mar" | lt-proc es-en.automorf.bin > > > > > > with the output: > > > ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ > > > > > > Normally this happens when you are not regenerating the file with the > > dictionary from which the ambiguity classes are obtained. > > > > Please check if ^Mar/Mar<n><mf><sg>/Mar<np><ant><f><sg>$ appears as a > > result of the expansion of the dictionary: > > > > $ lt-expand apertium-en-es.es.dix | grep -v "__REGEXP__" |\ > > grep -v ":<:" | awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' |\ > > apertium-destxt | lt-proc -a es-en.automorf.bin > es.expand.dic > > > > And if it appears after filtering the ambiguity file: > > $ apertium-filter-ambiguity apertium-en-es.es.tsx < es.expand.dic > > es.dic > > I think the problem is that the extra analyses are added by regular > expressions which are not covered in the expansion. > > Fran > > -- Felipe Sánchez Martínez Dep. de Llenguatges i Sistemes Informàtics Universitat d'Alacant, E-03071 Alacant (Spain) Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326 http://www.dlsi.ua.es/~fsanchez ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff