El dc 28 de 03 de 2012 a les 12:43 +0200, en/na Orosz György va escriure: > Thanks, for clarifying things. > > > It is clear. I am wondering about the supervised training: > is it > > possible to train the tagger (in a supervised manner) > without creating > > all the lexical resources used by the MT system? What is > > not obvious for me, that why are these parameters needed: > > "apertium-tagger[-d] -s=n DIC CRP TSX TAGGER_DATA HTAG > UNTAG" > > > And FILES are: > DIC: full expanded dictionary file > CRP: training text corpus file > TSX: tagger specification file, in XML format > TAGGER_DATA: tagger data file, built in the training and used > while > tagging > HTAG: hand-tagged text corpus > UNTAG: untagged text corpus, morphological analysis of > HTAG > corpus to use both jointly with -s option > > > For Hungarian, "DIC" is not going to be possible as it relies > on > dictionary expansion,[1] the rest is possible (you just need > to convert > the resources you already have). > > Felipe: What is the dictionary expansion file used for when > training the > tagger, and could it be approximated in some way? > > Fran > > 1. Well, you could just analyse the corpus with your > morphological > analyser, and then convert the set of analyses from the corpus > to an > Apertium .dix file, then expand it. This would be useless for > most > purposes but would allow you to train the tagger. > > > > > Can you please confirm me whether it is the process of training or > not? > For tagging we need a untagged corpus (UNTAG), a disambiguated one > (HTAG), and one which has all the possible analysises for each > word(CRP). We also need a dictionary which has (a huge amount) > wordform analysis pairs (DIC). (Is it a simulated morphological > analyzer?)
It isn't used for morphological analysis, the morphological analyser is used for that. I believe that the expansion is used, along with the TSX file, for calculating the ambiguity classes. But someone else might know better. > TAGGER_DATA is created during the training, and TSX contains mapping > between tags of the MA and the tagger. (One more question: is it > possible to use identical relation as mapping, since the tagset we use > is the one that the MA generates?) You can make a TSX file which just has the same coarse tags as fine tags yes. Fran ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff