[Moses-support] NULL token

2010-10-25 Thread Somayeh Bakhshaei
Hello,



In the theory we learn that the null token is place in the beginning of
each sentence. But in the output file of a real system it is seem there
is not such a token implicitly.

--

Best Regards,

S.Bakhshaei


  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] NULL token

2010-10-25 Thread Philipp Koehn
Hi,

the NULL token is an implicit concept of word alignment
(and it not placed at any specific position). You can see
it popping up in the the lexical translation tables, but
otherwise it is invisible.

-phi

On Mon, Oct 25, 2010 at 11:45 AM, Somayeh Bakhshaei
wrote:

> Hello,
>
> In the theory we learn that the null token is place in the beginning of
> each sentence. But in the output file of a real system it is seem there is
> not such a token implicitly.
>
> --
> Best Regards,
> S.Bakhshaei
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bag of words language model

2010-10-25 Thread Philipp Koehn
Hi,

I am not familiar with that, but somewhat related is
Arne Mauser's global lexical model, which also exists
as a secret feature in Moses (secret because no
effiencient training exists):

Citation:
A. Mauser, S. Hasan, and H. Ney. Extending Statistical Machine
Translation with Discriminative and Trigger-Based Lexicon Models. In
Conference on Empirical Methods in Natural Language Processing
(EMNLP), Singapore, August 2009.
http://www-i6.informatik.rwth-aachen.de/publications/download/628/MauserArneHasanSav%7Bs%7DaNeyHermann--ExtendingStatisticalMachineTranslationwithDiscriminativeTrigger-BasedLexiconModels--2009.pdf

-phi


On Fri, Oct 22, 2010 at 7:02 PM, Francis Tyers  wrote:
> Hello all,
>
> I have a rather strange request. Does anyone know of any papers (or
> impementations) on bag-of-words language models ? That is, a language
> model which does not take into account the order in which the words
> appear in an ngram, so if you have the string 'police chief of' in your
> model, you will get a result for both 'chief of police' and 'police
> chief of'. I have thought of using IRSTLM or some generic model and
> scoring all the permutations, but wondered if there was a more efficient
> implementation already in existence. I have searched without much luck
> in Google, but perhaps I am searching with the wrong words.
>
> Best regards,
>
> Fran
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bag of words language model

2010-10-25 Thread Miles Osborne
i implemented this years ago (the idea then was to see if for
free-word-order languages, phrases could be generalised).  at the time
it didn't seem that there was a more efficient way to do it than just
generate permutations and score them.

and if you think about it, this is essentially the reordering problem

Miles

On 25 October 2010 12:59, Philipp Koehn  wrote:
> Hi,
>
> I am not familiar with that, but somewhat related is
> Arne Mauser's global lexical model, which also exists
> as a secret feature in Moses (secret because no
> effiencient training exists):
>
> Citation:
> A. Mauser, S. Hasan, and H. Ney. Extending Statistical Machine
> Translation with Discriminative and Trigger-Based Lexicon Models. In
> Conference on Empirical Methods in Natural Language Processing
> (EMNLP), Singapore, August 2009.
> http://www-i6.informatik.rwth-aachen.de/publications/download/628/MauserArneHasanSav%7Bs%7DaNeyHermann--ExtendingStatisticalMachineTranslationwithDiscriminativeTrigger-BasedLexiconModels--2009.pdf
>
> -phi
>
>
> On Fri, Oct 22, 2010 at 7:02 PM, Francis Tyers  wrote:
>> Hello all,
>>
>> I have a rather strange request. Does anyone know of any papers (or
>> impementations) on bag-of-words language models ? That is, a language
>> model which does not take into account the order in which the words
>> appear in an ngram, so if you have the string 'police chief of' in your
>> model, you will get a result for both 'chief of police' and 'police
>> chief of'. I have thought of using IRSTLM or some generic model and
>> scoring all the permutations, but wondered if there was a more efficient
>> implementation already in existence. I have searched without much luck
>> in Google, but perhaps I am searching with the wrong words.
>>
>> Best regards,
>>
>> Fran
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bag of words language model

2010-10-25 Thread Ondrej Bojar
Hi, Philipp,

I was wondering what that secret model was... Is there any brief
documentation of what the Moses code expects to load for this model?

The training of this discriminative word lexicon can be heavily
parallelized. Is there any such implementation available, despite not
being efficient enough?

Cheers, O.

Philipp Koehn wrote:
> Hi,
> 
> I am not familiar with that, but somewhat related is
> Arne Mauser's global lexical model, which also exists
> as a secret feature in Moses (secret because no
> effiencient training exists):
> 
> Citation:
> A. Mauser, S. Hasan, and H. Ney. Extending Statistical Machine
> Translation with Discriminative and Trigger-Based Lexicon Models. In
> Conference on Empirical Methods in Natural Language Processing
> (EMNLP), Singapore, August 2009.
> http://www-i6.informatik.rwth-aachen.de/publications/download/628/MauserArneHasanSav%7Bs%7DaNeyHermann--ExtendingStatisticalMachineTranslationwithDiscriminativeTrigger-BasedLexiconModels--2009.pdf
> 
> -phi
> 
> 
> On Fri, Oct 22, 2010 at 7:02 PM, Francis Tyers  wrote:
>> Hello all,
>>
>> I have a rather strange request. Does anyone know of any papers (or
>> impementations) on bag-of-words language models ? That is, a language
>> model which does not take into account the order in which the words
>> appear in an ngram, so if you have the string 'police chief of' in your
>> model, you will get a result for both 'chief of police' and 'police
>> chief of'. I have thought of using IRSTLM or some generic model and
>> scoring all the permutations, but wondered if there was a more efficient
>> implementation already in existence. I have searched without much luck
>> in Google, but perhaps I am searching with the wrong words.
>>
>> Best regards,
>>
>> Fran
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Moses use by translation industry

2010-10-25 Thread Philipp Koehn
Hi,

not a bug, but a feature:

TDA Members doing business with Moses

The translation industry is steadily appropriating the Moses
translation engine, an open source system available as a kit on the
web. At the TAUS User Conference 2010 in Portland (Oregon) TDA members
from major corporations and service vendors gave first-time accounts
to the community of their experience with this MT engine. Here are the
highlights.
http://www.tausdata.org/blog/2010/10/doing-business-with-moses-open-source-translation/

-phi
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bag of words language model

2010-10-25 Thread Philipp Koehn
Hi,

I added the training script and some documentation:
http://www.statmt.org/mosesdev/?n=Moses.AdvancedFeatures#ntoc25

Let me know, if this actually works.

-phi

On Mon, Oct 25, 2010 at 1:15 PM, Ondrej Bojar  wrote:
> Hi, Philipp,
>
> I was wondering what that secret model was... Is there any brief
> documentation of what the Moses code expects to load for this model?
>
> The training of this discriminative word lexicon can be heavily
> parallelized. Is there any such implementation available, despite not
> being efficient enough?
>
> Cheers, O.
>
> Philipp Koehn wrote:
>> Hi,
>>
>> I am not familiar with that, but somewhat related is
>> Arne Mauser's global lexical model, which also exists
>> as a secret feature in Moses (secret because no
>> effiencient training exists):
>>
>> Citation:
>> A. Mauser, S. Hasan, and H. Ney. Extending Statistical Machine
>> Translation with Discriminative and Trigger-Based Lexicon Models. In
>> Conference on Empirical Methods in Natural Language Processing
>> (EMNLP), Singapore, August 2009.
>> http://www-i6.informatik.rwth-aachen.de/publications/download/628/MauserArneHasanSav%7Bs%7DaNeyHermann--ExtendingStatisticalMachineTranslationwithDiscriminativeTrigger-BasedLexiconModels--2009.pdf
>>
>> -phi
>>
>>
>> On Fri, Oct 22, 2010 at 7:02 PM, Francis Tyers  wrote:
>>> Hello all,
>>>
>>> I have a rather strange request. Does anyone know of any papers (or
>>> impementations) on bag-of-words language models ? That is, a language
>>> model which does not take into account the order in which the words
>>> appear in an ngram, so if you have the string 'police chief of' in your
>>> model, you will get a result for both 'chief of police' and 'police
>>> chief of'. I have thought of using IRSTLM or some generic model and
>>> scoring all the permutations, but wondered if there was a more efficient
>>> implementation already in existence. I have searched without much luck
>>> in Google, but perhaps I am searching with the wrong words.
>>>
>>> Best regards,
>>>
>>> Fran
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] train-truecaser.perl proposed tweak

2010-10-25 Thread Ben Gottesman
Hi,

Are truecase models still widely in use?

I have a proposal for a tweak to the train-truecaser.perl script.

Currently, we don't take the first token of a sentence as evidence for the
true casing of that type, on the basis that the first word of a sentence is
always capitalized.  The first token of a segment is always assumed to be
the first word of a sentence, and thus is never taken as casing evidence.

However, if a given segment is only one token long, then the segment is
probably not a sentence, and the token is quite possibly in its natural
case.  So my proposal is to take the sole token of one-token segments as
evidence for true casing.

I attach the code change.

Any objections?  If not, I'll check it in.

Ben


train-truecaser.perl
Description: Binary data
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] train-truecaser.perl proposed tweak

2010-10-25 Thread Miles Osborne
this sounds risky to me.  it would be better to allow the user to
specify the behaviour;  for your suggestions, you would add an extra
flag which would enable this.  the default would be for truecasing to
operate as it used to.

Miles

On 25 October 2010 17:37, Ben Gottesman  wrote:
> Hi,
>
> Are truecase models still widely in use?
>
> I have a proposal for a tweak to the train-truecaser.perl script.
>
> Currently, we don't take the first token of a sentence as evidence for the
> true casing of that type, on the basis that the first word of a sentence is
> always capitalized.  The first token of a segment is always assumed to be
> the first word of a sentence, and thus is never taken as casing evidence.
>
> However, if a given segment is only one token long, then the segment is
> probably not a sentence, and the token is quite possibly in its natural
> case.  So my proposal is to take the sole token of one-token segments as
> evidence for true casing.
>
> I attach the code change.
>
> Any objections?  If not, I'll check it in.
>
> Ben
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] train-truecaser.perl proposed tweak

2010-10-25 Thread Philipp Koehn
Hi,

Sounds reasonable to me, but it would be good to have this as an option, as
Miles suggested.

-phi

On 25 Oct 2010 17:40, "Ben Gottesman"  wrote:
> Hi,
>
> Are truecase models still widely in use?
>
> I have a proposal for a tweak to the train-truecaser.perl script.
>
> Currently, we don't take the first token of a sentence as evidence for the
> true casing of that type, on the basis that the first word of a sentence
is
> always capitalized. The first token of a segment is always assumed to be
> the first word of a sentence, and thus is never taken as casing evidence.
>
> However, if a given segment is only one token long, then the segment is
> probably not a sentence, and the token is quite possibly in its natural
> case. So my proposal is to take the sole token of one-token segments as
> evidence for true casing.
>
> I attach the code change.
>
> Any objections? If not, I'll check it in.
>
> Ben
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support