Hello All, I am conducting a series of experiments to build translation systems using Moses in which the corpus of the current experiment is a subset of the corpora used in the previous experiment. I have started with the Europarl corpora and am likely to repeat this process about 20 times. Unless I am mistaken, this is going to take me nearly a month and I am looking for ways to speeden up the whole process.
Is there any optimal way to run Giza++ on these different subsets of data without having to run it again and again? "I do not want to use the alignments obtained from running Giza++ on the entire Europarl corpora, for the other experiments (by selecting the alignment information from aligned.grow-final-and-diag for the sentences in the subsets)." The order of the experiments does not matter, so the experiments can be done on the smallest dataset followed by supersets of the previous dataset, provided there is a way to modify the translation probabilities from Giza++ using just the additional data alone and this does not affect the performance of Giza++ in comparison to when Giza++ is run on the corpus in stand-alone mode. Kindly let me know if there is some way to do this and I am missing it. - regards, Prasanth -- "Theories have four stages of acceptance. i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view, iii) this is true, but quite unimportant; iv) I always said so." --- J.B.S. Haldane
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support