actually there are two parts here --building large LMs and deploying
them.    i currently have a summer MSc project looking at using Hadoop and
Hbase to do this Google-style.  this really does use a cluster of machines,
for both parts.  in either case, building them on-disk with a single machine
or using a  cluster of machines, the same challenges await.

and one lesson i learnt a few years back was that if you want to guarantee
an improved translation score, put serious effort into the LM.  that is what
it gets you.

Miles

2008/8/6 amittai axelrod <[EMAIL PROTECTED]>

> 2008/8/5 John D. Burger <[EMAIL PROTECTED]>:
> > I'm starting to think it's a lost cause to try to get one LM
> > implementation to act very much like the other.  Thanks for the
> > insights, though!
>
> I also spent some time unsuccessfully trying to exactly match the
> SRILM toolkit's output. Aside from the various default settings, there
> is some pruning going on when using kndiscount.
> It's fairly easy to produce a LM that's within a few digits of
> precision, but it's hard to replicate perfectly. Of course, those
> pesky few last digits change the LM scores very much. You could just
> re-tune, but that's non-deterministic so things are still not directly
> comparable; kind of annoying.
>
> There is also the larger question of "What does it get you?" (aside
> from curiosity)... At the time, we were interested in building
> monolithic SRI-style LMs on huge corpora. In the end, general interest
> seems to have moved towards distributed LMs, mooting the original
> exercise.
> Um... Good luck!
>
> ~amittai
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to