[Moses-support] Running Giza++ on subsets of data

Prasanth K Wed, 15 Jun 2011 01:53:49 -0700

Hello All,

I am conducting a series of experiments to build translation systems using
Moses in which the corpus of the current experiment is a subset of the
corpora used in the previous experiment. I have started with the Europarl
corpora and am likely to repeat this process about 20 times.
Unless I am mistaken, this is going to take me nearly a month and I am
looking for ways to speeden up the whole process.


Is there any optimal way to run Giza++ on these different subsets of data
without having to run it again and again?
"I do not want to use the alignments obtained from running Giza++ on the
entire Europarl corpora, for the other experiments (by selecting the
alignment information from aligned.grow-final-and-diag for the sentences in
the subsets)."

The order of the experiments does not matter, so the experiments can be done
on the smallest dataset followed by supersets of the previous dataset,
provided there is a way to modify the translation probabilities from Giza++
using just the additional data alone and this does not affect the
performance of Giza++ in comparison to when Giza++ is run on the corpus in
stand-alone mode.

Kindly let me know if there is some way to do this and I am missing it.

- regards,
Prasanth


-- 
"Theories have four stages of acceptance. i) this is worthless nonsense; ii)
this is an interesting, but perverse, point of view, iii) this is true, but
quite unimportant; iv) I always said so."

  --- J.B.S. Haldane

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Running Giza++ on subsets of data

Reply via email to