Re: [Moses-support] Incremental training for SMT

2011-10-19 Thread Prasanth K
Hi all,

Probably not the appropriate thread to bring this discussion, but since
'suffix arrays' are only discussed in the context of Incremental training in
the documentation, I'd like to think of them as relevant to the discussion
on Incremental training.

1. I've trained a batch model using the Europarl corpus including all the
steps.

2. Now, I'd like to refrain from loading the tables and instead use the
suffix arrays.
I've made the change to the ttable-file entry in the config file as
suggested in the documentation, but am wondering about what needs to be done
about the distortion-file entry.
When left unchanged, it loads the re-ordering file(which I "assumed" would
be computed on the fly like the features in the translation table), and when
I comment the entry about that I get an error due to the weights for the
d-parameter that are obtained after MERT.

I was unable to find any documentation on the site about suffix arrays, so
I'd appreciate any help that you can give.


- Prasanth


On Thu, Oct 20, 2011 at 7:31 AM, Jehan Pages  wrote:

> Hi,
>
> 2011/10/6 Philipp Koehn :
> > Hi,
> >
> > for the Moses support on this, please take a look at:
> > http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27
>
> I add my voice here as that's quite an interesting topic! Thanks for
> the reading! :-)
>
> In this page, I can read this: "Note that at the moment the
> incremental phrase table code is not thread safe." Basically all it
> implies for Moses users is: do not try to make 2 incremental trainings
> at a time. Right?
>
> Also I didn't do any of this yet (probably much later), but I already
> have a few question (I'll have probably more later):
> 1/ just vocabulary: when you say "truecase", you mean the lowercase step,
> right?
>
> 2/ And when you says the mt engine is updated via XML-RPC, so it means
> the incremental training will work only in Moses server mode, I guess.
> But you don't give at all the XML-RPC request which must be done for
> this particular interaction.
>
> Thanks!
>
> Jehan
>
> P.S.: by the way, the search engine on the website does not seem to
> work (always return a blank page on my Firefox 7.0.1/GNU/Linux), I
> have to use external search engines to search on the website.
>
> > -phi
> >
> > 2011/10/6 Jesús González Rubio :
> >> 2011/10/6 HOANG Cong Duy Vu 
> >>>
> >>> Hi all,
> >>>
> >>> I am working on the problem that tries to develop a SMT system that can
> >>> learn incrementally. The scenario is as follows:
> >>>
> >>> - A state-of-the-art SMT system tries to translate a source language
> >>> sentence from users.
> >>> - Users identify some translation errors in translated sentence and
> then
> >>> give the correction.
> >>> - SMT system gets the correction and learn from that immediately.
> >>>
> >>> What I mean is whether SMT system can learn the user corrections
> (without
> >>> re-training) incrementally.
> >>>
> >>> Do you know any similar ideas or have any advice or suggestion?
> >>>
> >>> Thanks in advance!
> >>>
> >>> --
> >>> Cheers,
> >>> Vu
> >>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>
> >> Hi Vu,
> >> You can try searching for "Interactive machine translation",for example
> this
> >> paper covers the details of the online retraining of an MT system:
> >> Online Learning for Interactive Statistical Machine Translation
> >> aclweb.org/anthology/N/N10/N10-1079.pdf
> >> Cheers
> >> --
> >> Jesús
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
"Theories have four stages of acceptance. i) this is worthless nonsense; ii)
this is an interesting, but perverse, point of view, iii) this is true, but
quite unimportant; iv) I always said so."

  --- J.B.S. Haldane
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compiling Moses with IRSTLM

2011-10-19 Thread Jehan Pages
Hi,

On Thu, Sep 29, 2011 at 10:49 PM, Marcello Federico  wrote:
> Hi folks,
>
> although "just slower" you might indeed still want to compile IRSTLM and
> SRILM at least to estimate and build your LM files.
>
> We are currently working to get a new release of IRSTLM ready which should
> be both faster and thread safe. It will also provide new features that will 
> allow to
> manage different sorts of LMs which are currently not supported by the other
> LM libraries.

What are the difference between the various language models exactly? I
see Moses currently supports 4 different (SRI, IRST, RandLM, KenLM).
Reading the language model page
(http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel),
I would understand that SRI is not that good for big data, IRST is
better for big data (space and memory), RandLM is even better for
space and memory (so for very huge data), but much slower (4 times!)
at execution. And finally KenLM seems to beat them all, both for
memory and speed! Reading like this, that seems like the description
clearly advantages KenLM (which is also the default in Moses). Is that
accurate?

Is there other characteristics of the various LMs (especially about
the Open Source ones, I don't really care about SRI)?
Should I expect some of them to be more experimental/less reliable/less stable?
And what about the "quality" of translation? Is there any comparison
possible about this? (I know obviously that depends a lot on the data,
and that we are speaking about Machine Translation, hence "quality" is
not a word which applies that well. But maybe some flagrant
differences have been raised on these LMs for the same input data?)

> Please notice that we are NOT inviting you to NOT use other's  software.
> We are in fact thankful to the other open source developers for providing
> to the community good quality and useful software.
>
> We also like to see fair comparisons among different implementations  and
> believe that these can stimulate further technology  improvements to the
> benefit  of all.
>
> Finally, we like thinking of our work like a friendly competition in which no
> one is trying to diminish the other's work.

Don't worry. I understand this all. :-) That's what is good with Free
Software after all!
That's perfectly normal to advertise your project (as long, as you
say, this is done in a friendly way towards competition).
Thanks for all the responses I got here. This list is quite friendly indeed! ;-)

Jehan

> Greetings,
>
> Marcello Federico
>
> 
>
> On Sep 26, 2011, at 11:44 AM, Kenneth Heafield wrote:
>
>> Hi,
>>
>>    Since the sample language models are provided for you, it is no
>> longer necessary to compile SRILM or IRSTLM (though you can if you want
>> to use the specific features they provide; otherwise they're just
>> slower).  I've updated the getting started documentation.
>>
>> Kenneth
>>
>> On 09/26/11 09:32, Jehan Pages wrote:
>>> Hi,
>>>
>>> On Mon, Sep 26, 2011 at 3:48 PM, Nicola Bertoldi  wrote:
 I am going to release (very soon) a new version of Moses including  new LM 
 types
 Stay tuned on IRSTLM webpage

 If you need immediately, get the code from the IRSTLM SF repository

 you can download revision 452, which properly interfaces with the latest 
 revision of Moses
>>> Thanks for the answer. As right now, I am mainly testing this engine,
>>> the development version from the repo suits me ok. Anyway Moses
>>> compiled fine using revision 452 of IRSTLM. So that's great. Thanks
>>> again!
>>>
>>> Also just to be sure, in the "getting started" page, the sample models
>>> which are linked are only for SRILM, right? Because I wanted to test
>>> as explained in the page, and I get:
>>>
>>> [...]
>>> Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds
>>> ERROR:Language model type unknown. Probably not compiled into library
>>> Segmentation fault
>>>
>>>
>>> Seeing the srilm.gz extension, I guess that won't work with only
>>> IRSTLM compiled in. That information may be worth being updated into
>>> the "Getting started" page. :-)
>>> I guess I'll have to test directly with more complete data.
>>>
>>> Jehan
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-19 Thread Jehan Pages
Hi,

2011/10/6 Philipp Koehn :
> Hi,
>
> for the Moses support on this, please take a look at:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27

I add my voice here as that's quite an interesting topic! Thanks for
the reading! :-)

In this page, I can read this: "Note that at the moment the
incremental phrase table code is not thread safe." Basically all it
implies for Moses users is: do not try to make 2 incremental trainings
at a time. Right?

Also I didn't do any of this yet (probably much later), but I already
have a few question (I'll have probably more later):
1/ just vocabulary: when you say "truecase", you mean the lowercase step, right?

2/ And when you says the mt engine is updated via XML-RPC, so it means
the incremental training will work only in Moses server mode, I guess.
But you don't give at all the XML-RPC request which must be done for
this particular interaction.

Thanks!

Jehan

P.S.: by the way, the search engine on the website does not seem to
work (always return a blank page on my Firefox 7.0.1/GNU/Linux), I
have to use external search engines to search on the website.

> -phi
>
> 2011/10/6 Jesús González Rubio :
>> 2011/10/6 HOANG Cong Duy Vu 
>>>
>>> Hi all,
>>>
>>> I am working on the problem that tries to develop a SMT system that can
>>> learn incrementally. The scenario is as follows:
>>>
>>> - A state-of-the-art SMT system tries to translate a source language
>>> sentence from users.
>>> - Users identify some translation errors in translated sentence and then
>>> give the correction.
>>> - SMT system gets the correction and learn from that immediately.
>>>
>>> What I mean is whether SMT system can learn the user corrections (without
>>> re-training) incrementally.
>>>
>>> Do you know any similar ideas or have any advice or suggestion?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Cheers,
>>> Vu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> Hi Vu,
>> You can try searching for "Interactive machine translation",for example this
>> paper covers the details of the online retraining of an MT system:
>> Online Learning for Interactive Statistical Machine Translation
>> aclweb.org/anthology/N/N10/N10-1079.pdf
>> Cheers
>> --
>> Jesús
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support