Hi,

you could just run word alignment on the 50,000 lines, but you will get
better performance if you somehow leverage the baseline parallel corpus
for word alignment.

One way is incremental GIZA++, the other is re-run everything.

You could also try some middle ground of including some of the baseline
data in a re-running word alignment.

It is not clear how much you will loose by going down these options...

-phi

On Fri, Jul 26, 2013 at 2:16 AM, Elliot K Meyerson
<ekmeyer...@wesleyan.edu> wrote:
> Hello,
>
> I have a large phrase-based translation system. Alignment was done with
> mgiza, and took a few weeks. I now have a small amount of extremely relevant
> new bitext (~50,000 lines) that I would like to use to augment the model,
> without having to retrain everything. The new data contains many important
> words that are not found anywhere else in the training data, so lexical
> tables (at least) would need to be updated along with adding in new
> alignments. I could run the rest of training (steps 3+) no problem, as long
> as the relevant files from steps 1 and 2 are updated in a reasonable way. Is
> there some way for me to do this? or should I just cut my losses and retrain
> the entire thing?
>
> Thanks,
> Elliot
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to