Re: [Moses-support] Adding Dictionary

2017-07-07 Thread Matthias Huck
Hi,

On Fri, 2017-07-07 at 17:47 +0530, Sanjanashree Palanivel wrote:
> Hi,
> 
> I have some doubts. while creating phrase-table ,should i give
> scores as all 1's. In case of ambigous words. how to give the scores?

You can provide word alignments for the dictionary entries and then run
the dictionary through the Moses phrase extraction pipeline, just like
any other corpus. It should calculate all the scores for you.

> Can I train glossary alone independently and the path of
> phrase-table to moses.ini?

Basically, yes. You can add a second phrase table. Or use a script to
combine multiple phrase tables into one, such as: 
contrib/combine-ptables/combine-ptables.pl

> should I train glossary alongside with parallel data only?
> 
>    Please guide me which way is better.

You'll typically have to investigate on your own what's working best
for your use case.

Also, I'm sure that there would be a couple of other ways of harnessing
a dictionary in Moses.

Cheers,
Matthias



> > On Fri, Jul 7, 2017 at 4:26 PM, Matthias Huck  wrote:
> 
> > 
> > Hi,
> > 
> > A simple solution would be to just append your dictionary to the
> > parallel training data. Or create a second phrase table from the
> > dictionary and do phrase table fillup or something similar.
> > 
> > Cheers,
> > Matthias
> > 
> > 
> > On Fri, 2017-07-07 at 15:02 +0530, Sanjanashree Palanivel wrote:
> > > 
> > > HI all,
> > > 
> > >   What are the possibilities to  add dictionary to moses. It would be
> > great
> > > 
> > > if I could get an earnest reply.
> > > 
> > > ___
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> 
> 
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Adding Dictionary

2017-07-07 Thread Matthias Huck
Hi,

A simple solution would be to just append your dictionary to the
parallel training data. Or create a second phrase table from the
dictionary and do phrase table fillup or something similar.

Cheers,
Matthias


On Fri, 2017-07-07 at 15:02 +0530, Sanjanashree Palanivel wrote:
> HI all,
> 
>   What are the possibilities to  add dictionary to moses. It would be great
> if I could get an earnest reply.
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Advanced Topics documentation

2017-07-06 Thread Matthias Huck
Hi,

Philipp Koehn's textbook is a nice introduction to SMT: 
http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521874157
http://www.statmt.org/book/

For advanced topics, it's best to read the primary literature (i.e.,
research papers published in conference proceedings and scientific
journals).

Cheers,
Matthias


On Thu, 2017-07-06 at 02:59 +0530, Sasi Kiran Patha wrote:
> hi Team,
> 
> Can you please suggest any book in the market to understand the
> concepts for implementing Advanced topics like incremental learning,
> Dictionary model.
> 
> Regards,
> Sasi Kiran P.
> 
> > On Sat, Jul 1, 2017 at 9:30 PM,  wrote:
> 
> > 
> > Send Moses-support mailing list submissions to
> > moses-support@mit.edu
> > 
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > or, via email, send a message with subject or body 'help' to
> > moses-support-requ...@mit.edu
> > 
> > You can reach the person managing the list at
> > moses-support-ow...@mit.edu
> > 
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Moses-support digest..."
> > 
> > 
> > Today's Topics:
> > 
> >    1. Advanced Topics documentation (Sasi Kiran Patha)
> >    2. Working of moses2 (Pritesh Ranjan)
> > 
> > 
> > --
> > 
> > Message: 1
> > Date: Sat, 1 Jul 2017 12:57:15 +0530
> > > > From: Sasi Kiran Patha 
> > Subject: [Moses-support] Advanced Topics documentation
> > To: moses-support@mit.edu
> > Message-ID:
> > > >  > gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> > 
> > Hi,
> > 
> > The advanced topics implementations looks too precise to me from
> > documentation
> > on website. I may not understand it until i go through the code.
> > Can you please specify if there is any book with more documentation on
> > topics like
> > Syntax models, Incremental Learning and Dictionary.
> > 
> > Regards,
> > Sasi Kiran P
> > -- next part --
> > An HTML attachment was scrubbed...
> > URL: http://mailman.mit.edu/mailman/private/moses-support/
> > attachments/20170701/0f3724ed/attachment-0001.html
> > 
> > --
> > 
> > Message: 2
> > Date: Sat, 1 Jul 2017 14:24:48 +0530
> > > > From: Pritesh Ranjan 
> > Subject: [Moses-support] Working of moses2
> > To: moses-support@mit.edu
> > Message-ID:
> > > >  > gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> > 
> > Dear Sir,
> > 
> > I was decoding various test data using both moses and moses2 but I found
> > some weird results. I got better bleu score when decoding a test set
> > without tuning the translation model but got really poor bleu score when
> > using moses.ini after tuning, which is the opposite of what should happen.
> > What is the reason behind this??
> > 
> > My another question is that I am trying to implement another searching cum
> > pruning algorithm(like alpha-beta pruning; right now moses2 uses cube
> > pruning ) I understand the working of the moses2 but I do not understand
> > how should I replace the code in moses2 to implement this.
> > 
> > It would be very helpful if some one could throw light in this direction.
> > Thank you for your time.
> > 
> > Thanks and Regards
> > 
> > -Pritesh Ranajn
> > -- next part --
> > An HTML attachment was scrubbed...
> > URL: http://mailman.mit.edu/mailman/private/moses-support/
> > attachments/20170701/0e0170bb/attachment-0001.html
> > 
> > --
> > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> > 
> > End of Moses-support Digest, Vol 129, Issue 1
> > *
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Adding new aligned phrases to the existing phrase table

2017-04-11 Thread Matthias Huck
Hi,

It might be better to do phrase table fill-up.

You would add entries from a second phrase table ("background phrase
table") to your first phrase table ("foreground phrase table") only if
they're not present yet. You end up with a single table without
duplicates. Added background phrases can be distinguished from
foreground phrases via a binary feature.

There's a tool for this in Moses: 

contrib/combine-ptables/combine-ptables.pl 

(Use `--mode fillup` for what I've described above. The script provides
further functionality which you can also try if you want.)

Cf.

Arianna Bisazza, Nick Ruiz, and Marcello Federico. 2011. Fill-up versus
Interpolation Methods for Phrase-based SMT Adaptation. In Proc. of the 
Int. Workshop on Spoken Language Translation (IWSLT), pages 136–143,
San Francisco, CA, USA, December.

or

Jan Niehues and Alex Waibel. 2012. Detailed Analysis of Different
Strategies for Phrase Table Adaptation in SMT. In Proc. of the Conf. of
the Assoc. for Machine Translation in the Americas (AMTA), San Diego,
CA, USA, October/November.

Cheers,
Matthias


On Tue, 2017-04-11 at 09:48 +0100, Hieu Hoang wrote:
> from the webpage:
>   http://www.statmt.org/moses/?n=Advanced.Models#ntoc7
> Add 2 phrase-table to the [feature] section
> 
> 
> [feature]
>  PhraseDictionaryMemory path=/my-dir/table1 ...
>  PhraseDictionaryMemory path=/my-dir/table2 ...
> 
> Add an entry to the [mapping] section
>  [mapping]
>   0 T 0
>   1 T 1
> 
> Add weights to the [weight] section
> 
>  [weight]
>  PhraseDictionaryMemory0= 0 0 1 0
>  PhraseDictionaryMemory1= 0 0 1 0
> 
> You don't need to use PhraseDictionaryGroup
> 
> 
> 
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
> 
> 
> On 9 April 2017 at 05:21, sriram  wrote:
> 
> > Hi Hieu,
> > 
> > Thanks for the suggestion.
> > 
> > In regard to point 2 . How can I use multiple phrase table inside moses?
> > 
> > 
> > Regards,
> > Sriram
> > 
> > On Fri, Apr 7, 2017 at 5:44 PM, Hieu Hoang  wrote:
> > 
> > > there's no tools to do this but you can write it yourself. You need to
> > > make up some scores to give each phrase.
> > > 
> > > The other methods to use your phrases are:
> > >1. Add it to the training data and retrain your model.
> > >2. Create a 2nd phrase-table with just your phrases and get the
> > > decoder to use it, in addition to the existing phrase-table
> > > 
> > > * Looking for MT/NLP opportunities *
> > > Hieu Hoang
> > > http://moses-smt.org/
> > > 
> > > 
> > > On 6 April 2017 at 19:14, sriram  wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I have some good aligned phrases collection and I want to add to the
> > > > existing phrase table. Is there any existing tool to add the same in 
> > > > Moses.
> > > > 
> > > > Thanks,
> > > > Sriram
> > > > 
> > > > 
> > > > 
> > > > ___
> > > > Moses-support mailing list
> > > > Moses-support@mit.edu
> > > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > > > 
> > > > 
> > > 
> > 
> > 
> > --
> > 
> > Open source English-Hindi MT system
> > http://anusaaraka.iiit.ac.in/
> > 
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Select sentences that maximize BLEU from n-best list

2017-03-28 Thread Matthias Huck
Hi Marcin,

If a sentence-level BLEU does the job for you (rather than corpus
-level), then check out the `sentence-bleu-nbest` tool in Moses. This
tool worked for me a couple of months ago, and I hope that nobody broke
it in the meantime.

Once you have sentence-level BLEU scores for all the n-best list
entries, you can just sort according to those values. 

It may well be that somebody has a better n-best oracle BLEU tool lying
around, though.

Cheers,
Matthias


On Tue, 2017-03-28 at 10:19 +0200, Marcin Junczys-Dowmunt wrote:
> Hi list,
> 
> does anyone have a tool that takes a moses-format n-best list and can 
> output the single best sentence per source sentence according to BLEU 
> and a given reference? Or anything that can be shoehorned into something 
> like that?
> 
> Thanks,
> 
> Marcin
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Output files of mgiza

2017-02-13 Thread Matthias Huck
Hi,

mgiza can be configured to write a Model 1 file to disk. 
Use the configuration option "model1dumpfrequency".

https://web.archive.org/web/20150919195919/http://www.kyloo.net/software/doku.php/mgiza:configure

Cheers,
Matthias


On Mon, 2017-02-13 at 16:50 +, Hieu Hoang wrote:
> the slide refers to giza++, not mgiza. I didn't know there was substantial
> difference between them, but then I could be wrong.
> 
> code for giza++ is here if you want to try it
>https://github.com/moses-smt/giza-pp
> 
> Hieu Hoang
> http://moses-smt.org/
> 
> On 13 February 2017 at 16:46, Tom McCoy  wrote:
> 
> > Thanks for the quick reply! What I mostly want is the translation
> > probabilities that are learned for IBM Model 1--e.g., the probability that 
> > *hello
> > *will be aligned with *bonjour, *i.e. t(*hello*|*bonjour*). In Giza++, at
> > least, these probabilities are given in output files called *ti.final or
> > *ti.actual.final. These files are discussed on slide 30 of this
> > presentation: https://www.cse.iitb.ac.in/~anoopk/
> > publications/presentations/moses_giza_intro.pdf
> > Is there any way to access these files in mgiza?
> > 
> > On Mon, Feb 13, 2017 at 6:41 AM, Hieu Hoang  wrote:
> > 
> > > I'm not sure if there's any such thing as 'final' or 'actual.final'. Your
> > > output seems to match the output from a typical run
> > > 
> > > http://www.statmt.org/moses/RELEASE-3.0/models/fr-en/training/giza.1/
> > > 
> > > The file 'A3.final' is used by the subsequent symmetrization step
> > > 
> > > On 11/02/2017 19:42, Tom McCoy wrote:
> > > 
> > > Hi,
> > > 
> > > I am using mgiza on Mac OSX. It runs without errors and gives the
> > > following output files:
> > > 
> > >- 117-02-11.134300.tommccoy.A3.final.part000
> > >- 117-02-11.134300.tommccoy.A3.final.part001
> > >- 117-02-11.134300.tommccoy.A3.final.part002
> > >- 117-02-11.134300.tommccoy.A3.final.part003
> > >- 117-02-11.134300.tommccoy.Decoder.config
> > >- 117-02-11.134300.tommccoy.a3.final
> > >- 117-02-11.134300.tommccoy.d3.final
> > >- 117-02-11.134300.tommccoy.d4.final
> > >- 117-02-11.134300.tommccoy.gizacfg
> > >- 117-02-11.134300.tommccoy.n3.final
> > >- 117-02-11.134300.tommccoy.p0_3.final
> > >- 117-02-11.134300.tommccoy.perpt3.final
> > > 
> > > However, this is missing the *.ti.final and *.actual.ti.final output
> > > files, both of which I need. Any thoughts on how to get these files to be
> > > outputted?
> > > 
> > > Thanks!
> > > Tom McCoy
> > > 
> > > 
> > > ___
> > > Moses-support mailing 
> > > listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> > > 
> > > 
> > > --
> > > Hieu Hoanghttp://moses-smt.org/
> > > 
> > > 
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] too few factors error in mert

2016-12-06 Thread Matthias Huck
Hi,

Maybe your moses.ini lets the decoder expect five input factors, wherea
s there are only four present in the data?

I see this in your log file:

input-factors: 0 1 2 3 4

Cheers,
Matthias


On Tue, 2016-12-06 at 11:18 +0200, Hasan Sait ARSLAN wrote:
> Hi,
> 
> I have a factored dataset. It involves 4 factors,
> factor1|factor2|factor3|factor4. I have trained my model with such a
> dataset.
> 
> Now when I want to tune my model, I encounter with the following error:
> 
> 
> 
> 
> *Exception: moses/Word.cpp:159 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector&, const StringPiece&, bool, bool) threw
> util::Exception because `!isNonTerminal && i < factorOrder.size()'.Too few
> factors in string '-|-|Punc|Punc*
> The details of the error is in mert.txt file, which is attached to this
> e-mail.
> 
> Thanks,
> 
> Kind Regards,
> Hasan Sait Arslan
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholder settings for tune

2016-08-23 Thread Matthias Huck
Hi,

In the EMS configuration file, you can specify

decoder-settings = "..."

under both [TUNING] and [EVALUATION]. Maybe that's all you need?

Cheers,
Matthias


On Tue, 2016-08-23 at 00:40 +0100, Hieu Hoang wrote:
> no really sure what you mean. Shouldn't have to dig around mert
> -moses.pl.
> 
> It's fairly straightforward but I would say that placeholders is only
> used
> by some people. If you find a suspected bug, report it or try & fix
> it
> yourself
> 
> 
> 
> Hieu Hoang
> http://www.hoang.co.uk/hieu
> 
> On 23 August 2016 at 00:22, Mike Ladwig  wrote:
> 
> > Hi Hieu!
> > 
> > Thanks for the reply.
> > 
> > On Mon, Aug 22, 2016 at 6:26 PM, Hieu Hoang 
> > wrote:
> > 
> > > hi mike
> > > 
> > > > 
> > > > 1. If I add "-placeholder-factor 1 -xml-input exclusive" to the
> > > > --decoder-flags parameter of mert-moses.pl, will that single
> > > > addition
> > > > extend into both places mentioned in the EMS section?
> > > > 
> > > No, if you want to use placeholders in tuning and during testing,
> > > you
> > > need to putting in both places. mert-moses.pl only does tuning
> > > 
> > 
> > I think I am confused because there are two [EVALUATION] sectons on
> > the
> > page. The first is in section 3 (Tuning) and the second is in
> > section 4
> > (Evaluation).
> > 
> > Is the section 3 Evaluation block an error, or do I need to go into
> > the
> > guts of mert-moses.pl to add flags to the decoder code invoked
> > inside
> > mert-moses.pl?
> > 
> > mike.
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] tuning not working properly in factored model

2016-04-28 Thread Matthias Huck
Hi Carlos,

Have you tried switching off MBR decoding during tuning? (Run Moses
without the -mbr parameter.) The exception it throws seems to suggest
that MBR doesn't work with more than a single output factor.

Cheers,
Matthias

On Thu, 2016-04-28 at 21:09 +0200, Carlos Escolano wrote:
> Hi,
> 
> Thank you for your answer
> 
> I've tried setting the output factors in the moses.ini before and
> mert-moses.pl throws the following error:
> 
> Loading table into memory...done.
> terminate called after throwing an instance of 'util::Exception'
>   what():  moses/mbr.cpp:112 in const Moses::TrellisPath doMBR(const
> Moses::TrellisPathList&, const Moses::AllOptions&) threw
> util::Exception
> because `oFactors.size() != 1'.
> Need exactly one output factor!
> 
> Using the moses.ini without tuning all factors are generated, It's
> during
> the tuning process that only the forms appear.
> 
> Best regards,
> 
> Carlos
> 
> 2016-04-28 20:14 GMT+02:00 Matthias Huck :
> 
> > Hi,
> > 
> > Moses can be configured to output the target-side factors of your
> > choice.
> > Add something like this to your moses.ini:
> > 
> > [output-factors]
> > 0
> > 1
> > 2
> > 
> > Cheers,
> > Matthias
> > 
> > 
> > On Thu, 2016-04-28 at 18:16 +0200, Carlos Escolano wrote:
> > > Hi,
> > > 
> > > Thank you for your answer.
> > > 
> > > You are right. While the phrase table has all three factors in
> > > the
> > > run.X.best.out only the form appears.
> > > 
> > > I'll check why this is happening.
> > > 
> > > Best Regards,
> > > 
> > > Carlos
> > > 
> > > 
> > > 
> > > 2016-04-28 8:46 GMT+02:00 Ondrej Bojar :
> > > 
> > > > Dear Carlos,
> > > > 
> > > > My frequent mistake in this respect is the match of factor
> > representation
> > > > in run.X.best.out and the reference sentences.
> > > > 
> > > > Technically, both is possible: evaluating only the first factor
> > > > (form)
> > or
> > > > all factors of each token. BLEU does not care. Mismatch will
> > > > cause
> > terribly
> > > > low scores.
> > > > 
> > > > O.
> > > > 
> > > > 
> > > > On April 27, 2016 9:48:50 PM CEST, Carlos Escolano <
> > carlos.e@gmail.com>
> > > > wrote:
> > > > > Hi,
> > > > > 
> > > > > I trained a chinese to spanish unfacored model and all worked
> > > > > perfectly.
> > > > > But when I try to train a factored model for the same task I
> > > > > have
> > some
> > > > > trouble while tuning. The factors I'm using are only words
> > > > > for
> > chinese
> > > > > and
> > > > > words, lemmas and POS tags for spanish.
> > > > > 
> > > > > Training seems to finish correctly and the phrase tables
> > > > > shows all
> > the
> > > > > factors but when tuning t it only does 2 runs and prints a
> > > > > message
> > > > > saying
> > > > > that weights have not change in the last run. Leaving the
> > > > > original
> > > > > weights.
> > > > > Also when translating, the BLEU obtained is worse than the
> > > > > obtained
> > > > > with
> > > > > the not factored model.
> > > > > 
> > > > > 
> > > > > These are my calls for training and tuning the model:
> > > > > 
> > > > > $SCRIPTS_ROOTDIR/training/train-model.perl \
> > > > >-external-bin-dir $GIZA_DIR/mgiza-bin -mgiza \
> > > > >--corpus $WORKING_DIR/train/train \
> > > > >--alignment grow-diag-final-and \
> > > > >--score-options '--GoodTuring' \
> > > > >--root-dir $WORKING_DIR/baseline/ \
> > > > >--f zh --e es \
> > > > >--lm 0:5:$WORKING_DIR/baseline/lm/words.lm.es:0 \
> > > > >--translation-factors 0-0,1,2 \
> > > > >--reordering msd-bidirectional-fe \
> > > > >--reordering-factors 0-0 \
> > > > > 
> > > > > $MOSES_SCRIPTS/training/mert-moses.pl  \
> > > > >  $WORKING_DIR/dev/dev.zh \
> > > > > $WORKING_DIR/dev/dev.es \
> 

Re: [Moses-support] tuning not working properly in factored model

2016-04-28 Thread Matthias Huck
Hi,

Moses can be configured to output the target-side factors of your choice. 
Add something like this to your moses.ini:

[output-factors]
0
1
2

Cheers,
Matthias


On Thu, 2016-04-28 at 18:16 +0200, Carlos Escolano wrote:
> Hi,
> 
> Thank you for your answer.
> 
> You are right. While the phrase table has all three factors in the
> run.X.best.out only the form appears.
> 
> I'll check why this is happening.
> 
> Best Regards,
> 
> Carlos
> 
> 
> 
> 2016-04-28 8:46 GMT+02:00 Ondrej Bojar :
> 
> > Dear Carlos,
> > 
> > My frequent mistake in this respect is the match of factor representation
> > in run.X.best.out and the reference sentences.
> > 
> > Technically, both is possible: evaluating only the first factor (form) or
> > all factors of each token. BLEU does not care. Mismatch will cause terribly
> > low scores.
> > 
> > O.
> > 
> > 
> > On April 27, 2016 9:48:50 PM CEST, Carlos Escolano 
> > wrote:
> > > Hi,
> > > 
> > > I trained a chinese to spanish unfacored model and all worked
> > > perfectly.
> > > But when I try to train a factored model for the same task I have some
> > > trouble while tuning. The factors I'm using are only words for chinese
> > > and
> > > words, lemmas and POS tags for spanish.
> > > 
> > > Training seems to finish correctly and the phrase tables shows all the
> > > factors but when tuning t it only does 2 runs and prints a message
> > > saying
> > > that weights have not change in the last run. Leaving the original
> > > weights.
> > > Also when translating, the BLEU obtained is worse than the obtained
> > > with
> > > the not factored model.
> > > 
> > > 
> > > These are my calls for training and tuning the model:
> > > 
> > > $SCRIPTS_ROOTDIR/training/train-model.perl \
> > >-external-bin-dir $GIZA_DIR/mgiza-bin -mgiza \
> > >--corpus $WORKING_DIR/train/train \
> > >--alignment grow-diag-final-and \
> > >--score-options '--GoodTuring' \
> > >--root-dir $WORKING_DIR/baseline/ \
> > >--f zh --e es \
> > >--lm 0:5:$WORKING_DIR/baseline/lm/words.lm.es:0 \
> > >--translation-factors 0-0,1,2 \
> > >--reordering msd-bidirectional-fe \
> > >--reordering-factors 0-0 \
> > > 
> > > $MOSES_SCRIPTS/training/mert-moses.pl  \
> > >  $WORKING_DIR/dev/dev.zh \
> > > $WORKING_DIR/dev/dev.es \
> > 
> > > $MOSES_DIR/moses-cmd/bin/gcc-4.8.5/release/link-static/threading-multi/moses
> > > \
> > > $WORKING_DIR/baseline/model/moses.ini \
> > > --nbest 100 \
> > > --working-dir $WORKING_DIR/baseline/tuning/ \
> > > --decoder-flags "-drop-unknown -mbr -threads 24 -mp -v 0" \
> > > --rootdir $MOSES_SCRIPTS \
> > > --mertdir $MOSES_DIR/bin/ \
> > > -threads 24  \
> > > --filtercmd '/veu4/usuaris24/xtrans/mosesdecoder/scripts/training/
> > > filter-model-given-input.pl'
> > > 
> > 
> > > /veu4/usuaris24/smt/softlic/mosesdecoder/scripts//ems/support/reuse-weights.perl
> > > \
> > > $WORKING_DIR/baseline/tuning/moses.ini <
> > > $WORKING_DIR/baseline/model/moses.ini >
> > > $WORKING_DIR/baseline/tuning/moses.weight-reused.ini
> > > 
> > > 
> > > Best regards,
> > > 
> > > Carlos
> > > 
> > > 
> > > 
> > > 
> > > ___
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> > --
> > Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz)
> > http://www.cuni.cz/~obo
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compiling with ./bjam problem

2016-03-11 Thread Matthias Huck
Hi Despina,

It seems to me that bjam doesn't use the boost build in your home
directory, but some other boost version installed on the system.

Maybe you should try

./bjam --with-boost=/home/despina/boost_1_55_0 -j4 -a

Cheers,
Matthias


On Fri, 2016-03-11 at 16:29 +, Hieu Hoang wrote:
> compiled ok for me:
> ./bjam --with-boost=/Users/hieu/workspace/boost/boost_1_60_0 -j4 -a
> Try git pull to get the latest code and add the -a to your bjam command 
> to forced recompilation of all files
> 
> On 11/03/2016 13:07, Despina Mouratidi wrote:
> > Hello again, i am trying to compile moses throught ./bjam as follows
> >
> > ./bjam --with-boost=~/boost_1_55_0 -j7
> >
> >
> > and it gives error.
> >
> > Best Despina
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] RNNLM Integration?

2016-03-08 Thread Matthias Huck
Hi,

We once empirically compared two different recombination schemes in a
hierarchical phrase-based system (without any kind of neural network
language model):

Recombination T.  The T recombination scheme recombines derivations that
produce identical translations. (I.e., hypotheses with the same
translation but different phrase segmentation are recombined.)

Recombination LM.  The LM recombination scheme recombines derivations
with identical language model context. (This is what we'd usually do.)


Cf. the following publication:

M. Huck, D. Vilar, M. Freitag, and H. Ney. A Performance Study of Cube
Pruning for Large-Scale Hierarchical Machine Translation. In Proceedings
of the NAACL 7th Workshop on Syntax, Semantics and Structure in
Statistical Translation (SSST-7), pages 29-38, Atlanta, Georgia, USA,
June 2013.
http://aclweb.org/anthology//W/W13/W13-0804.pdf


You'll find a couple of statistics and performance plots in the paper
for Chinese-to-English and Arabic-to-English translation tasks. This was
all done with the Jane SMT toolkit. 
http://www.hltpr.rwth-aachen.de/jane/
Note that the cube pruning k-best generation limit was applied after
recombination for the experiments in the paper.


I tend to think that you could do the "Recombination T" scheme if the
benefit of an RNNLM in terms of translation quality justifies it. There
might be pitfalls, e.g. regarding pruning settings and tuning on n-best
lists. Default Moses settings won't necessarily be a good choice.

Cheers,
Matthias


On Mon, 2016-03-07 at 15:41 -0600, Lane Schwartz wrote:
> Philipp,
> 
> Are you aware of any published work examining the importance of hypothesis
> recombination in terms of time/space/quality tradeoffs?
> 
> Lane
> 
> 
> On Mon, Mar 7, 2016 at 3:19 PM, Philipp Koehn  wrote:
> 
> > Hi,
> >
> > integrating this into the decoder will break all hypothesis recombination,
> > so it may be better (and definitely easier) to use the RNNLM to rerank
> > n-best lists.
> >
> > -phi
> >
> > On Mon, Mar 7, 2016 at 3:46 PM, Jake Ballinger 
> > wrote:
> >
> >> Hello everyone,
> >>
> >> Has anyone used an RNNLM language model instead of one of the recommended
> >> language models? I was specifically looking at the RNNLM toolkit provided
> >> by Tomas Mikolov at http://rnnlm.org/.
> >>
> >> Thank you!
> >>
> >> --
> >> Jake Ballinger
> >> Major: Computer Science
> >> Minors: Chinese, French, Spanish, & Math
> >> 443-974-6184
> >> balling...@allegheny.edu
> >> Box 582
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation Fault

2016-02-22 Thread Matthias Huck
Hi,

Which object doesn't exist? 

You can just protect access to your cache container with mutexes. 
I believe the Model1Feature does something similar.
https://github.com/moses-smt/mosesdecoder/blob/master/moses/FF/Model1Feature.cpp

There might be more beautiful solutions, though. Independent
thread-specific caches would be useful.

Cheers,
Matthias


On Sun, 2016-02-21 at 20:23 -0800, Jasneet Sabharwal wrote:
> Is it possible to cache some data when decoding a source sentence? I
> was trying to use boost's thread_specific_ptr to cache a map which I
> want to update in my evaluation function but when I try to access the
> map
> (https://github.com/KonceptGeek/mosesdecoder/blob/RELEASE-3.0-CombinedFeature-Caching/moses/FF/CoarseBiLM.cpp#L145-L154)
>  I get segmentation fault as the object doesn't exist. 
> 
> Is there any other way to do some caching?



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation Fault

2016-02-20 Thread Matthias Huck
Hi Jasneet,

Why don't you use a proper profiling tool, e.g. the one in valgrind [1]?

If you visualize its output [2], you'll see quickly where the program
spends all the computing time.

Cheers,
Matthias


[1] http://valgrind.org/docs/manual/cl-manual.html
[2] https://github.com/jrfonseca/gprof2dot



On Sat, 2016-02-20 at 09:58 +, Hieu Hoang wrote:
> it's great that you've written a new feature function but you will
> have to debug it yourself. I suggest you put lots of debugging
> messages in your code to find out where the problem is.
> 
> Moses has the Timer class in /moses/Timer.h which you can use to help
> your debug your problem
> 
> Hieu Hoang
> http://www.hoang.co.uk/hieu
> 
> On 20 February 2016 at 04:20, Jasneet Sabharwal <
> jasneet.sabhar...@sfu.ca> wrote:
> > Hi Hieu,
> > 
> > Just to provide more info, I had compiled moses using the following
> > command: "./bjam -j8 -q --with-cmph=/cs/natlang
> > -user/jasneet/softwares/cmph-2.0/ --with-boost=/cs/natlang
> > -user/jasneet/softwares/boost/ --max-kenlm-order=8 -a --with-mm -
> > -with-probing-pt”. 
> > 
> > Following are some more translation times from the logs using the
> > command: 
> > 
> > $ grep “Translation took” mert.log
> > 
> > Line 53: Translation took 9504.886 seconds total
> > Line 25: Translation took 16931.106 seconds total
> > Line 20: Translation took 17477.958 seconds total
> > Line 34: Translation took 18409.183 seconds total
> > Line 36: Translation took 20495.204 seconds total
> > Line 48: Translation took 16093.966 seconds total
> > Line 68: Translation took 4773.139 seconds total
> > Line 18: Translation took 22165.429 seconds total
> > Line 10: Translation took 23794.930 seconds total
> > Line 11: Translation took 26313.130 seconds total
> > Line 74: Translation took 6238.326 seconds total
> > Line 66: Translation took 14968.715 seconds total
> > Line 3: Translation took 28973.902 seconds total
> > Line 45: Translation took 27619.088 seconds total
> > Line 81: Translation took 4666.394 seconds total
> > Line 37: Translation took 36502.892 seconds total
> > Line 83: Translation took 3143.882 seconds total
> > Line 70: Translation took 20143.743 seconds total
> > Line 1: Translation took 38498.391 seconds total
> > Line 19: Translation took 39683.472 seconds total
> > Line 15: Translation took 39903.566 seconds total
> > Line 33: Translation took 40047.447 seconds total
> > 
> > The times are extremely high and I’m not really sure why it is
> > taking so much time.
> > 
> > Regards,
> > Jasneet
> > > On Feb 18, 2016, at 11:04 AM, Jasneet Sabharwal <
> > > jasneet.sabhar...@sfu.ca> wrote:
> > > 
> > > Hi,
> > > 
> > > I was able to solve the segmentation fault issue. It was because
> > > of OOVs. I’m currently trying to tune the parameters using mert,
> > > but it is running extremely slow. For example, from the logs: 
> > > 
> > > Translating: 美国 之 音 记者 伏 来 库 斯 从 布宜诺斯艾利斯 发 来 的 另 一 篇 报导 说 , 几 名
> > > 美国 国会 议员 星期二 把 这 一 争论 带 到 了 布宜诺斯艾利斯 的 会议 大厅 。
> > > Line 43: Initialize search took 0.007 seconds total
> > > Line 43: Collecting options took 0.191 seconds at
> > > moses/Manager.cpp:117
> > > Line 38: Search took 1092.075 seconds
> > > Line 38: Decision rule took 0.000 seconds total
> > > Line 38: Additional reporting took 0.041 seconds total
> > > Line 38: Translation took 1092.132 seconds total
> > > 
> > > I tried to time the functions in my feature function using
> > > clock_t but all of them show up as 0.000. I’m not sure why tuning
> > > is taking too much time. My moses.ini is attached in this email.
> > > 
> > > Any suggestions would be helpful.
> > > 
> > > Regards,
> > > Jasneet
> > > 
> > > 
> > > > On Feb 12, 2016, at 3:58 PM, Hieu Hoang 
> > > > wrote:
> > > > 
> > > > I think it's 
> > > >FeatureFunction::GetScoreProducerDescription()
> > > > 
> > > > On 12/02/16 23:56, Jasneet Sabharwal wrote:
> > > > >  Thanks, will give that a try. 
> > > > > 
> > > > > Also, is it possible to get the value of feature name inside
> > > > > the feature function. I’m specifically talking about “name”
> > > > > parameter in moses.ini. I’m running multiple copies of my
> > > > > feature function with different parameter as follows:
> > > > > CoarseBiLM name=CoarseBiLM tgtWordId...
> > > > > CoarseBiLM name=CoarseLM100 tgtWordId…
> > > > > CoarseBiLM name=CoarseLM1600 tgtWordId...
> > > > > CoarseBiLM name=CoarseBiLMWithoutClustering tgtWordId…
> > > > > 
> > > > > Thanks,
> > > > > Jasneet
> > > > > > On Feb 12, 2016, at 3:39 PM, Hieu Hoang <
> > > > > > hieuho...@gmail.com> wrote:
> > > > > > 
> > > > > > you can run the decoder
> > > > > >   ./moses -v 3
> > > > > > however, you should put debugging messages in your feature
> > > > > > functions to find out where the problem is. It looks like
> > > > > > its in the Load() method so add lots of debugging message
> > > > > > in there and all functions it calls
> > > > > > 
> > > > > > On 12/02/16 23:34, Jasneet Sabharwal wrote:
> > > > > > >  Thanks Hieu for your rep

Re: [Moses-support] Segmentation Fault

2016-02-15 Thread Matthias Huck
Hi,

You can set a local verbosity level for your feature function, e.g.:

CoarseBiLM name=CoarseBiLM100 verbosity=

If you use the macros FEATUREVERBOSE(level,str),
FEATUREVERBOSE2(level,str), or IFFEATUREVERBOSE(level) in your feature
function code, the verbose output will only be printed to cerr if
int-value >= level.

(Note that it only works once SetParameter() has been called. If you
need verbosity in the constructor, you need to resort to the normal
VERBOSE macro.)

Those macros are defined in moses/Util.h . They are fairly new. Not too
many feature functions make use of them yet. However, I'd highly
recommend them, since they allow you to suppress any global verbosity
and only print what you actually want to see.

FEATUREVERBOSE prints the name of the feature function in square
brackets with every verbose output, without you having to do that
yourself. FEATUREVERBOSE2 doesn't print the name of the feature
function. 
IFFEATUREVERBOSE starts a conditional block, which is helpful if the
verbosity code is a bit more complex.

Cheers,
Matthias


On Fri, 2016-02-12 at 23:39 +, Hieu Hoang wrote:
> you can run the decoder
>./moses -v 3
> however, you should put debugging messages in your feature functions to 
> find out where the problem is. It looks like its in the Load() method so 
> add lots of debugging message in there and all functions it calls
> 
> On 12/02/16 23:34, Jasneet Sabharwal wrote:
> > Thanks Hieu for your reply.
> >
> > Is it possible to do a verbose output of what’s happening, so that I 
> > can identify when it’s going out of memory? I’m only running it for 
> > 1928 sentences. I have almost 170gb of free memory and additional 
> > 400gb memory in buffer.
> >
> > Thanks,
> > Jasneet
> >
> >> On Feb 12, 2016, at 2:36 PM, Hieu Hoang  >> > wrote:
> >>
> >> looks like it's run out of memory.
> >>
> >> On 11/02/16 23:23, Jasneet Sabharwal wrote:
> >>> Hi,
> >>>
> >>> I was adding a new feature function in Moses 
> >>> (https://github.com/KonceptGeek/mosesdecoder/blob/master/moses/FF/CoarseBiLM.cpp).
> >>>  
> >>> It works fine when I test it for 1-2 sentences, but when I’m trying 
> >>> to tune my parameters, I’m getting segmentation faults or sometimes 
> >>> it is bad_alloc. Following was one of the commands that was executed 
> >>> during the tuning process which caused the Segmentation Fault or 
> >>> bad_alloc:
> >>>
> >>> moses -threads 40 -v 0 -config filtered/moses.ini -weight-overwrite 
> >>> 'CoarseLM100= 0.075758 LM0= 0.075758 CoarseBiLMNotClustered= 
> >>> 0.075758 WordPenalty0= -0.151515 PhrasePenalty0= 0.030303 
> >>> CoarseBiLMClustered= 0.075758 TranslationModel0= 0.030303 0.030303 
> >>> 0.030303 0.030303 Distortion0= 0.045455 CoarseLM1600= 0.075758 
> >>> LexicalReordering0= 0.045455 0.045455 0.045455 0.045455 0.045455 
> >>> 0.045455' -n-best-list run1.best100.out 100 distinct -input-file 
> >>> tune.word.lc.cn 
> >>>
> >>> The log is enclosed in this email.
> >>>
> >>> Any pointers would be very useful.
> >>>
> >>> Thanks,
> >>> Jasneet
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >> -- 
> >> Hieu Hoang
> >> http://www.hoang.co.uk/hieu
> >
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Matthias Huck

On Fri, 2016-01-29 at 21:26 +, Hieu Hoang wrote:
> The decoder should handle no translation without falling over. But
> yes, the model is too toy

Normally the decoder would always produce some translation. (The translation 
could be an empty sentence, of course.) If it's misconfigured, it should tell 
you about it. But maybe not with a segmentation fault. :-)


> On 29 Jan 2016 9:15 pm, "Matthias Huck"  wrote:
> > Hi,
> > 
> > It seems to me that this toy string-to-tree setup is either
> > outdated,
> > or it always had issues. It should be replaced.
> > 
> > Under real-world conditions, the decoder should always be able to
> > produce some hypothesis. We would therefore usually extract a whole
> > set
> > of glue rules. And we would typically also add an [unknown-lhs]
> > section
> > to the moses.ini that would tell the decoder which left-hand side
> > non
> > -terminal labels to use for out-of-vocabulary words. To my
> > knowledge,
> > these two techniques are crucial for being able to parse any input
> > sentence provided to the chart decoder in syntax-based translation.
> > 
> > So, in my opinion, the problem is most likely neither the server
> > implementation nor the syntax-based decoder, but a problematic
> > setup.
> > I would consider it okay for the server to crash (or at least print
> > a
> > warning) under such circumstances. You don't want it to silently
> > not
> > translate complete sentences.
> > 
> > (I must admit that I didn't look into it in too much detail, but it
> > sho
> > uld be easy to confirm.)
> > 
> > Cheers,
> > Matthias
> > 
> > 
> > On Fri, 2016-01-29 at 20:28 +, Barry Haddow wrote:
> > > Hi All
> > >
> > > I think I see what happened now.
> > >
> > > When you give the input "dies ist ein haus" to the sample model,
> > the
> > > "dies" is unknown, and there is no translation. The server did
> > not check
> > > for this condition, and got a seg fault. I have added a check, so
> > if you
> > > pull and try again it should not crash.
> > >
> > > In the log pasted by Martin, he passed "das ist ein haus" to
> > > command-line Moses, which works, and gives a translation.
> > >
> > > I think ideally the sample models should handle unknown words,
> > and give
> > > a translation. Maybe adding a glue rule would be sufficient?
> > >
> > > cheers - Barry
> > >
> > > On 29/01/16 11:13, Barry Haddow wrote:
> > > > Hi
> > > >
> > > > When I run command-line Moses, I get the output below - i.e. no
> > best
> > > > translation. The server crashes for me since it does not check
> > for the
> > > > null pointer, but the command-line version does.
> > > >
> > > > I think there should be a translation for this example.
> > > >
> > > > cheers - Barry
> > > >
> > > > [gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses 
> > -f
> > > > string-to-tree/moses.ini
> > > > Defined parameters (per moses.ini or switch):
> > > >   config: string-to-tree/moses.ini
> > > >   cube-pruning-pop-limit: 1000
> > > >   feature: KENLM name=LM factor=0 order=3 num
> > -features=1
> > > > path=lm/europarl.srilm.gz WordPenalty UnknownWordPenalty
> > > > PhraseDictionaryMemory input-factor=0 output-factor=0
> > > > path=string-to-tree/rule-table num-features=1 table-limit=20
> > > >   input-factors: 0
> > > >   inputtype: 3
> > > >   mapping: 0 T 0
> > > >   max-chart-span: 20 1000
> > > >   non-terminals: X S
> > > >   search-algorithm: 3
> > > >   translation-details: translation-details.log
> > > >   weight: WordPenalty0= 0 LM= 0.5
> > PhraseDictionaryMemory0= 0.5
> > > > line=KENLM name=LM factor=0 order=3 num-features=1
> > path=lm/europarl.srilm.gz
> > > > Loading the LM will be faster if you build a binary file.
> > > > Reading lm/europarl.srilm.gz
> > > > 5---10---15---20---25---30---35---40---45---50---55---60--
> > -65---70---75---80---85---90---95--100
> > > > **The ARPA file is missing .  Substituting log10 probabili

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Matthias Huck
Hi,

It seems to me that this toy string-to-tree setup is either outdated,
or it always had issues. It should be replaced. 

Under real-world conditions, the decoder should always be able to
produce some hypothesis. We would therefore usually extract a whole set
of glue rules. And we would typically also add an [unknown-lhs] section
to the moses.ini that would tell the decoder which left-hand side non
-terminal labels to use for out-of-vocabulary words. To my knowledge,
these two techniques are crucial for being able to parse any input
sentence provided to the chart decoder in syntax-based translation.

So, in my opinion, the problem is most likely neither the server
implementation nor the syntax-based decoder, but a problematic setup. 
I would consider it okay for the server to crash (or at least print a
warning) under such circumstances. You don't want it to silently not
translate complete sentences.

(I must admit that I didn't look into it in too much detail, but it sho
uld be easy to confirm.)

Cheers,
Matthias


On Fri, 2016-01-29 at 20:28 +, Barry Haddow wrote:
> Hi All
> 
> I think I see what happened now.
> 
> When you give the input "dies ist ein haus" to the sample model, the 
> "dies" is unknown, and there is no translation. The server did not check 
> for this condition, and got a seg fault. I have added a check, so if you 
> pull and try again it should not crash.
> 
> In the log pasted by Martin, he passed "das ist ein haus" to 
> command-line Moses, which works, and gives a translation.
> 
> I think ideally the sample models should handle unknown words, and give 
> a translation. Maybe adding a glue rule would be sufficient?
> 
> cheers - Barry
> 
> On 29/01/16 11:13, Barry Haddow wrote:
> > Hi
> > 
> > When I run command-line Moses, I get the output below - i.e. no best
> > translation. The server crashes for me since it does not check for the
> > null pointer, but the command-line version does.
> > 
> > I think there should be a translation for this example.
> > 
> > cheers - Barry
> > 
> > [gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses  -f
> > string-to-tree/moses.ini
> > Defined parameters (per moses.ini or switch):
> >   config: string-to-tree/moses.ini
> >   cube-pruning-pop-limit: 1000
> >   feature: KENLM name=LM factor=0 order=3 num-features=1
> > path=lm/europarl.srilm.gz WordPenalty UnknownWordPenalty
> > PhraseDictionaryMemory input-factor=0 output-factor=0
> > path=string-to-tree/rule-table num-features=1 table-limit=20
> >   input-factors: 0
> >   inputtype: 3
> >   mapping: 0 T 0
> >   max-chart-span: 20 1000
> >   non-terminals: X S
> >   search-algorithm: 3
> >   translation-details: translation-details.log
> >   weight: WordPenalty0= 0 LM= 0.5 PhraseDictionaryMemory0= 0.5
> > line=KENLM name=LM factor=0 order=3 num-features=1 path=lm/europarl.srilm.gz
> > Loading the LM will be faster if you build a binary file.
> > Reading lm/europarl.srilm.gz
> > 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> > **The ARPA file is missing .  Substituting log10 probability -100.000.
> > **
> > FeatureFunction: LM start: 0 end: 0
> > line=WordPenalty
> > FeatureFunction: WordPenalty0 start: 1 end: 1
> > line=UnknownWordPenalty
> > FeatureFunction: UnknownWordPenalty0 start: 2 end: 2
> > line=PhraseDictionaryMemory input-factor=0 output-factor=0
> > path=string-to-tree/rule-table num-features=1 table-limit=20
> > FeatureFunction: PhraseDictionaryMemory0 start: 3 end: 3
> > Loading LM
> > Loading WordPenalty0
> > Loading UnknownWordPenalty0
> > Loading PhraseDictionaryMemory0
> > Start loading text phrase table. Moses format : [3.038] seconds
> > Reading string-to-tree/rule-table
> > 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> > 
> > max-chart-span: 20
> > Created input-output object : [3.041] seconds
> > Line 0: Initialize search took 0.000 seconds total
> > Translating:  dies ist ein haus   ||| [0,0]=X (1) [0,1]=X (1)
> > [0,2]=X (1) [0,3]=X (1) [0,4]=X (1) [0,5]=X (1) [1,1]=X (1) [1,2]=X (1)
> > [1,3]=X (1) [1,4]=X (1) [1,5]=X (1) [2,2]=X (1) [2,3]=X (1) [2,4]=X (1)
> > [2,5]=X (1) [3,3]=X (1) [3,4]=X (1) [3,5]=X (1) [4,4]=X (1) [4,5]=X (1)
> > [5,5]=X (1)
> > 
> > 0   1   2   3   4   5
> > 0   1   2   2   1   0
> >   0   0   0   2   0
> > 0   0   4   0
> >   0   0   0
> > 0   0
> >   0
> > Line 0: Additional reporting took 0.000 seconds total
> > Line 0: Translation took 0.002 seconds total
> > Translation took 0.000 seconds
> > Name:moses  VmPeak:74024 kB VmRSS:11084 kB  RSSMax:36832 kB
> > user:2.972  sys:0.048  

Re: [Moses-support] IRSTLM

2016-01-19 Thread Matthias Huck
Hi,

I believe that the "~" might be the culprit. Try:

./bjam 
--with-irstlm=/home/mty2015/Public/MTEngine/Moseshome/mosesdecoder/irstlm

(If this is the correct absolute path to your IRSTLM installation.)

Cheers,
Matthias


On Wed, 2016-01-20 at 00:32 +, Hieu Hoang wrote:
> it's likely there was an error when you compiled irstlm as the irstlm
> library cannot be found.
> 
> can i ask - why do you need IRSTLM? for most cases, KenLM is faster.
> It's built into Moses so there's no external libraries you have to
> compile
> 
> On 20/01/16 00:27, Ouafa Benterki wrote:
> > Hi ,
> > 
> > Please find enclosed attached the build log, here's the command i
> > run
> > ./bjam--with-irstlm=~/Public/MTEngine/Moseshome/mosesdecoder/irstlm
> > 
> > best
> > 
> > Ouafa


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM installation

2016-01-18 Thread Matthias Huck
Hi,

Have you tried to use an absolute path?

Cheers,
Matthias


On Mon, 2016-01-18 at 02:52 +0100, Ouafa Benterki wrote:
> Hello,
> 
> I installed IRSTLM but when i used the command
> ./bjam --with-irstlm=/path to irstlm/ the installation failed
> can you advise
> 
> Best



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] BLEU score becomes different

2016-01-18 Thread Matthias Huck
Hi Liang,

mteval-v13a.pl does some internal tokenization and probably splits those
"~~" words into " ~ ~ ". If this is happening,
it explains your difference in the calculated BLEU scores.

Cheers,
Matthias


On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote:
> Dear Moses Support Team,
>   
>I added a source context-dependent  translation feature in moses baseline 
> system.
>In order to avoid  modifying the source code, i append a unique identifier 
> to every word in the test/dev source file.
>for example, a source file with two lines like the following: 
>   this is sentence 1
>  .  sentence 2
> would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7.
> Then, i generate my sentence-specific phrase tables for each sentence, use 
> the same IDs as the source file words in those phrase table entries. 
> I concatenate all the phrase tables together, then MERT and Decoder as usual. 
>  
> I do my experiments on Chinese2English translation tasks, and I found that in 
> the output file the oov words still have IDs .
> E.g. the translation of one NIST03 sentence are as follows:
>  published by the british science weekly , according to the study by the 14th 
> on chromosome sequencing of genes and gene segments 一千零五十~~97 .
>  一千零五十~~97 ~~97 is the ID of word " 一千零五十"
> I found that when i remove IDs in the output file, the BLEU scores are 
> significantly difference. I have no idea what happens ? could you give me 
> some advices?
> I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment .
> 
> 
> 
> 
> Thanks,
> Liang
> 
> 
>
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Tuning with no language model

2016-01-13 Thread Matthias Huck
Hi,

If you don't need all score components of a phrase table, the easiest
way to get rid of them is to set the scaling factors for the undesired
phrase table feature function components to 0 before tuning, and ask
the optimizer to ignore them. The feature function configuration
parameter "tuneable-components" can be used to tune only selected
components of the feature function.
http://permalink.gmane.org/gmane.comp.nlp.moses.user/12464

I'm using this quite frequently with MIRA. It's conceivable that it
won't work with MERT, though, since maybe this combination was never
tested.

Alternatively, remove any undesired score columns from your phrase
table. The latter approach has advantages as well. E.g., that way it's
less likely to end up with unnoticed mistakes in the configuration
files.

As always, check the content of your log files and any intermediate
files to avoid unexpected behavior. If you decide to give "tuneable
-components" a try, it will be easy to figure out whether it does what
it's supposed to do by just inspecting the tuned weights.

If you end up using only a single feature function with one score
component (assuming that you get rid of UnknownWordPenalty in one way
or another), then you won't have to tune. There's nothing to
interpolate if there's only one feature. The only thing you'll have to
think about is whether to set its scaling factor to a negative or to a
positive value. Search in Moses does argmax, some other decoders may
implement argmin.

Cheers,
Matthias


On Wed, 2016-01-13 at 15:20 +, Read, James C wrote:
> Thanks,
> 
> and if I wanted Moses to use one feature of TM alone I'm guessing
> there's no way to do this other than zeroing out the undesired TM
> features?
> 
> 
> From: Hieu Hoang 
> Sent: Wednesday, January 13, 2016 3:05 PM
> To: Read, James C
> Cc: Moses Support
> Subject: Re: [Moses-support] Tuning with no language model
>  
> you can delete all feature functions except for the
> UnknownWordPenalty. In the current moses, thats hardcoded into
> decoding. So be my guest and delete away!
> 
> Hieu Hoang
> http://www.hoang.co.uk/hieu
> 
> On 13 January 2016 at 15:02, Read, James C 
> wrote:
> > OK, looks like it's running. Probably won't be able to see if it
> > generates useful weights until tomorrow. Thanks.
> > 
> > As a side line of thought. I was wondering how many lines of
> > configuration I could get away with deleting? How would Moses
> > behave if I were to delete the lexical reordering lines? The
> > distortion? The word penalty? The phrase penalty?
> > 
> > My goal is to get Moses to choose the best translation with
> > reference to the translation model only.
> > 
> > If I delete these other configuration lines will Moses use defaults
> > for these other options or completely disable their operation
> > leaving just the TM?
> > 
> > 
> > From: Hieu Hoang 
> > Sent: Wednesday, January 13, 2016 2:51 PM
> > 
> > To: Read, James C
> > Cc: Moses Support
> > Subject: Re: [Moses-support] Tuning with no language model
> >  
> > ok. The mert script create a temporary directory every time you run
> > it. By default it's named
> >mert-work
> > Since you ran mert with an incorrect moses.ini previously, it may
> > have polluted the temporary directory and cause problem now.
> > 
> > You should find this temporary directory and delete it before
> > running mert again.
> > 
> > ps ALL directories must be absolute, eg  ../tuning_data/true.bg
> > 
> > 
> > Hieu Hoang
> > http://www.hoang.co.uk/hieu
> > 
> > On 13 January 2016 at 14:45, Read, James C 
> > wrote:
> > > 
> > > /media/bigdata/jcread/llv/data/europarlv7/raw/aligned/bg
> > > -en/training_data$
> > > /media/bigdata/jcread/3rd_party_software/mosesdecoder/scripts/tra
> > > ining/mert-moses.pl -no-filter-phrase-table
> > > ../tuning_data/true.bg ../tuning_data/true.en
> > > /media/bigdata/jcread/3rd_party_software/mosesdecoder/bin/moses
> > > /media/bigdata/jcread/llv/data/europarlv7/raw/aligned/bg
> > > -en/training_data/binarised/moses-tm.ini --mertdir
> > > /media/bigdata/jcread/3rd_party_software/mosesdecoder/bin/
> > > 
> > > with absolute path to moses-tm.ini does not resolve the problem.
> > > Still get the same error.
> > > 
> > > From: Read, James C
> > > Sent: Wednesday, January 13, 2016 2:40 PM
> > > To: Hieu Hoang
> > > 
> > > Cc: Moses Support
> > > Subject: Re: [Moses-support] Tuning with no language model
> > >  
> > > This is what I get when I run the same command as you:
> > > 
> > > LexicalReordering0= 0.30 0.30 0.30 0.30 0.30
> > > 0.30
> > > Distortion0= 0.30
> > > UnknownWordPenalty0 UNTUNEABLE
> > > WordPenalty0= -1.00
> > > PhrasePenalty0= 0.20
> > > TranslationModel0= 0.20 0.20 0.20 0.20
> > > 
> > > Looks just like your output. However, this is what I get when I
> > > run the mert script with the following command:
> > > 
> > > /media/bigdata/jcread/llv/data/europarlv7/raw/aligned/bg
> > > -en/training_data$
> > > /media/bigdata/j

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hmm, maybe it can also cause trouble with the reuse of parts from
previous steps if the user doesn't proceed with care.

You could overwrite steps/1/config.1 on a call of experiment.perl
-config config.1 -continue 1 -exec .


On Fri, 2016-01-08 at 20:56 +0000, Matthias Huck wrote:
> Hi Philipp, 
> 
> Usually I just keep track of the EMS config files in my base directory
> and more or less ignore the steps/*/config.* copies. 
> 
> I never mix 
>   experiment.perl -config config.1 -continue 1
> and 
>   experiment.perl -continue 1 
> 
> so I won't run into trouble. 
> 
> This workflow was totally okay for me so far. I wouldn't be surprised if
> many other users do the same. 
> 
> I agree that it may "cause some unexpected behaviour down the road",
> e.g. if the web interface is used in a collaborative environment for
> keeping track of experiments. It's gonna display the config from the
> steps/ directory, even though that one might have been superseded when
> continuing the experiment.
> 
> Cheers,
> Matthias
> 
> 
> On Fri, 2016-01-08 at 15:17 -0500, Philipp Koehn wrote:
> > Hi,
> > 
> > 
> > I looked at the code, and I was not happy with what I found, although
> > I apparently wrote it.
> > 
> > 
> > What do you expect 
> >experiment.perl -config config.1 -continue 1
> > to do differently than
> >experiment.perl -continue 1
> > 
> > 
> > Currently it seems to run with the specified config file but not
> > overwrite steps/1/config.1 which may cause some unexpected behaviour
> > down the road...
> > 
> > 
> > -phi
> > 
> > On Fri, Jan 8, 2016 at 1:24 PM, Matthias Huck 
> > wrote:
> > Hi Philipp,
> > 
> > On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote:
> > > the command
> > >   experiment.perl -config config.1 -continue 1
> > > actually is not defined.
> > 
> > Are you sure about this? I seem to be doing that kind of thing
> > all the
> > time and never had any issues with it, as far as I can tell.
> > It won't replace an older steps/1/config.1 , but I typically
> > don't mind.
> > 
> > Cheers,
> > Matthias
> > 
> > 
> > 
> > 
> > 
> > --
> > The University of Edinburgh is a charitable body, registered
> > in
> > Scotland, with registration number SC005336.
> > 
> > 
> > 
> > 
> > 
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hi Philipp, 

Usually I just keep track of the EMS config files in my base directory
and more or less ignore the steps/*/config.* copies. 

I never mix 
experiment.perl -config config.1 -continue 1
and 
experiment.perl -continue 1 

so I won't run into trouble. 

This workflow was totally okay for me so far. I wouldn't be surprised if
many other users do the same. 

I agree that it may "cause some unexpected behaviour down the road",
e.g. if the web interface is used in a collaborative environment for
keeping track of experiments. It's gonna display the config from the
steps/ directory, even though that one might have been superseded when
continuing the experiment.

Cheers,
Matthias


On Fri, 2016-01-08 at 15:17 -0500, Philipp Koehn wrote:
> Hi,
> 
> 
> I looked at the code, and I was not happy with what I found, although
> I apparently wrote it.
> 
> 
> What do you expect 
>experiment.perl -config config.1 -continue 1
> to do differently than
>experiment.perl -continue 1
> 
> 
> Currently it seems to run with the specified config file but not
> overwrite steps/1/config.1 which may cause some unexpected behaviour
> down the road...
> 
> 
> -phi
> 
> On Fri, Jan 8, 2016 at 1:24 PM, Matthias Huck 
> wrote:
> Hi Philipp,
> 
> On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote:
> > the command
> >   experiment.perl -config config.1 -continue 1
> > actually is not defined.
> 
> Are you sure about this? I seem to be doing that kind of thing
> all the
> time and never had any issues with it, as far as I can tell.
> It won't replace an older steps/1/config.1 , but I typically
> don't mind.
> 
> Cheers,
> Matthias
> 
> 
> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered
> in
> Scotland, with registration number SC005336.
> 
> 
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hi Philipp,

On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote:
> the command 
>   experiment.perl -config config.1 -continue 1
> actually is not defined.

Are you sure about this? I seem to be doing that kind of thing all the
time and never had any issues with it, as far as I can tell. 
It won't replace an older steps/1/config.1 , but I typically don't mind.

Cheers,
Matthias





-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
So, what has been the proper solution?

On Fri, 2016-01-08 at 13:20 -0500, Nicholas Ruiz wrote:
> Thanks everyone, it's working now.
> 
> zınɹ ʞɔıu




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hi Nick,

What you're attempting to do should generally be no problem. There's
most likely some issue with your EMS configuration file. Doesn't it tell
you something like:

BUGGY CONFIG LINE (474): wrapping-frame  = $tokenized-input

I get this when I put two spaces between "wrapping-frame" and "=".

Also, is /path/to/test4.tok.$output-extension some pre-translated
hypothesis? If it's the reference, you might have to specify it as
"tokenized-reference" rather than as "tokenized-output".

Cheers,
Matthias


On Fri, 2016-01-08 at 12:16 -0500, Nicholas Ruiz wrote:
> Thanks, Tomasz. Unfortunately modifying the config file in the steps
> directory didn't work for me. My block looks something like this:
> 
> [EVALUATION:test4]
> 
> tokenized-input = /path/to/test4.tok.$input-extension
> tokenized-output = /path/to/test4.tok.$output-extension
> wrapping-frame  = $tokenized-input
> 
> zınɹ ʞɔıu
> 
> On Fri, Jan 8, 2016 at 12:11 PM, Tomasz Dwojak  wrote:
> 
> > Hi Nick,
> >
> > there is a way to do that.
> >
> > In your working directory is a "steps" directory, where the EMS writes
> > outputs, etc... There is also the copy of the config file (e.g.
> > steps/1/config.1). You have to edit this file and then run EMS once again.
> >
> > Shortly:
> > 1. Edit steps/1/config.1
> > 2. run experiment.perl -continue 1
> >
> > Best,
> > Tomasz
> >
> > On 08.01.2016 17:45, Nicholas Ruiz wrote:
> >
> > Hi all,
> >
> > I have a few different experiments that have finished training. Let's say
> > I have versions 0 1 2 3. I'd like to translate/evaluate an additional test
> > set. I added another [EVALUATION:...] block to specify the paths of the
> > eval data and now I'd like to run it with
> >
> > experiment.perl -config config.1 -continue 1
> >
> > However, the steps for evaluating the new test set aren't in the list of
> > steps. What's the best way to run this test without having to create
> > another experiment folder?
> >
> > Thanks,
> > Nick
> >
> > zınɹ ʞɔıu
> >
> >
> > ___
> > Moses-support mailing 
> > listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese & Arabic Tokenizers

2015-12-18 Thread Matthias Huck
Hi Tom,

There used to be a freely available Chinese word segmenter provided by
the LDC as well. Unfortunately, things keep disappearing from the web.
https://web.archive.org/web/20130907032401/http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm

For Arabic, I think that many academic research groups used to work with
MADA. But it seems like you'll need a special license for commercial
use.
http://www1.cs.columbia.edu/~rambow/software-downloads/MADA_Distribution.html
https://secure.nouvant.com/columbia/technology/cu14012/license/492

Or you try MorphTagger/Segmenter, a segmentation tool for Arabic SMT. 
http://www.hltpr.rwth-aachen.de/~mansour/MorphSegmenter/
It may not be maintained any more. You can contact Saab Mansour to ask
about it.

Saab has published a couple of papers about this, some of which report
comparisons of different Arabic segmentation strategies for SMT.
http://www.hltpr.rwth-aachen.de/publications/download/687/Mansour-IWSLT-2010.pdf
http://www.hltpr.rwth-aachen.de/publications/download/808/Mansour-LREC-2012.pdf
http://link.springer.com/article/10.1007%2Fs10590-011-9102-0

Cheers,
Matthias


On Sat, 2015-12-19 at 01:19 +0800, Dingyuan Wang wrote:
> Hi Tom,
> 
> As far as I know, the following are widely-used and open-source Chinese
> tokenizers:
> 
> * https://github.com/fxsjy/jieba
> * http://sourceforge.net/projects/zpar/
> * https://github.com/NLPchina/ansj_seg
> 
> And this proprietary one:
> 
> * http://ictclas.nlpir.org/
> 
> (Disclaimer: I am one of the developers of jieba, and I personally use
> this.)
> 
> --
> Dingyuan Wang
> 2015年12月19日 00:51於 "Tom Hoar" 寫道:
> 
> > I'm looking for Chinese and Arabic tokenizers. We've been using
> > Stanford's for a while but it has downfalls. The Chinese mode loads its
> > statistical models very slowly. The Arabic mode stems the resulting
> > tokens. The coup de grace is that their latest jar update (9 days ago)
> > was compiled run only with Java 1.8.
> >
> > So, with the exception of Stanford, what choices are available for
> > Chinese and Arabic that you're finding worthwhile?
> >
> > Thanks!
> > Tom
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Lexical reordering fails with zlib

2015-12-17 Thread Matthias Huck
Hi,

As an addendum:

You can try a manual workaround. Run gunzip on extract.o.sorted.gz and
do lexical-reordering-score on the resulting plain text file. 

It might be inconvenient but would hopefully solve the issue.

Cheers,
Matthias


On Thu, 2015-12-17 at 17:44 +, Matthias Huck wrote:
> Hi,
> 
> It's a problem that apparently occurs very rarely, and as Guy mentioned,
> we were so far assuming that it's caused by a zlib bug. 
> 
> However, the zlib bug was (to my knowledge) fixed in zlib v1.2.8. 
> This seems to be the bug fix: 
> https://github.com/madler/zlib/commit/51370f365607fe14a6a7a1a27b3bd29d788f5e5b
> 
> I've only encountered the issue once (and I'm training systems
> frequently). When I came across it, I executed the same command with a
> Moses compile on a different machine that was running an older version
> of OpenSuse, rather than Ubuntu 12.04. The problem did not exist on the
> old system.
> 
> My guess is that it really is a zlib bug, but it would be worrying if
> switching to zlib v1.2.8 doesn't resolve it.
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2015-12-17 at 09:18 -0800, Varden Wang wrote:
> > I seem to be having a very very similar issue. I have the exact same
> > lib package as Guy (but I upgraded from lib 1.2.3.4). I'm using the
> > commit SHA e211d752f6bc680094520482f190d0f805405c6c of the
> > mosesdecoder. The funny thing is that I trained on the very same setup
> > on different data sets without encountering this problem.
> > 
> > My error:
> > 
> > Executing: 
> > /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
> > 0.5 
> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
> > --model "wbe msd wbe-msd-bidirectional-fe"
> > 
> > Lexical Reordering Scorer
> > 
> > scores lexical reordering models of several types (hierarchical,
> > phrase-based and word-based-extraction
> > 
> > terminate called after throwing an instance of 'util::GZException'
> > 
> >   what():  zlib encountered invalid distances set code -3
> > 
> > ERROR: Execution of:
> > /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
> > 0.5 
> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
> > --model "wbe msd wbe-msd-bidirectional-fe"
> > 
> >   died with signal 6, with coredump
> > 
> > Thanks,
> > 
> > Varden
> > 
> > On Mon, Dec 7, 2015 at 9:01 AM,   wrote:
> > > Send Moses-support mailing list submissions to
> > > moses-support@mit.edu
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > > or, via email, send a message with subject or body 'help' to
> > > moses-support-requ...@mit.edu
> > >
> > > You can reach the person managing the list at
> > > moses-support-ow...@mit.edu
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Moses-support digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > >1. Lexical reordering fails with zlib (Guy)
> > >
> > >
> > > --
> > >
> > > Message: 1
> > > Date: Mon, 7 Dec 2015 06:14:47 + (UTC)
> > > From: Guy 
> > > Subject: [Moses-support] Lexical reordering fails with zlib
> > > To: moses-support@mit.edu
> > > Message-ID: 
> > > Content-Type: text/plain; charset=us-ascii
> > >
> > > Hello everyone,
> > >
> > > I've just recently started to work with Moses and managed to build a 
> > > couple
> > > of models without problems... until now.
> > >
> > > I was training a new system and I got this error when executing
> > > lexical-reordering-score:
> > >
> > > ../mosesdecoder/scripts/../bin/lexical-reordering-score
> > > /local/scratch/train/model/extract.o.sorted.gz 0.5
> > > /local/scratch/train/model/reordering-table. --model "wbe msd
> > > wbe-msd-bidirectional-fe"
> > > Lexical Reordering

Re: [Moses-support] Lexical reordering fails with zlib

2015-12-17 Thread Matthias Huck
Hi,

It's a problem that apparently occurs very rarely, and as Guy mentioned,
we were so far assuming that it's caused by a zlib bug. 

However, the zlib bug was (to my knowledge) fixed in zlib v1.2.8. 
This seems to be the bug fix: 
https://github.com/madler/zlib/commit/51370f365607fe14a6a7a1a27b3bd29d788f5e5b

I've only encountered the issue once (and I'm training systems
frequently). When I came across it, I executed the same command with a
Moses compile on a different machine that was running an older version
of OpenSuse, rather than Ubuntu 12.04. The problem did not exist on the
old system.

My guess is that it really is a zlib bug, but it would be worrying if
switching to zlib v1.2.8 doesn't resolve it.

Cheers,
Matthias


On Thu, 2015-12-17 at 09:18 -0800, Varden Wang wrote:
> I seem to be having a very very similar issue. I have the exact same
> lib package as Guy (but I upgraded from lib 1.2.3.4). I'm using the
> commit SHA e211d752f6bc680094520482f190d0f805405c6c of the
> mosesdecoder. The funny thing is that I trained on the very same setup
> on different data sets without encountering this problem.
> 
> My error:
> 
> Executing: 
> /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
> /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
> 0.5 
> /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
> --model "wbe msd wbe-msd-bidirectional-fe"
> 
> Lexical Reordering Scorer
> 
> scores lexical reordering models of several types (hierarchical,
> phrase-based and word-based-extraction
> 
> terminate called after throwing an instance of 'util::GZException'
> 
>   what():  zlib encountered invalid distances set code -3
> 
> ERROR: Execution of:
> /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
> /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
> 0.5 
> /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
> --model "wbe msd wbe-msd-bidirectional-fe"
> 
>   died with signal 6, with coredump
> 
> Thanks,
> 
> Varden
> 
> On Mon, Dec 7, 2015 at 9:01 AM,   wrote:
> > Send Moses-support mailing list submissions to
> > moses-support@mit.edu
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > or, via email, send a message with subject or body 'help' to
> > moses-support-requ...@mit.edu
> >
> > You can reach the person managing the list at
> > moses-support-ow...@mit.edu
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Moses-support digest..."
> >
> >
> > Today's Topics:
> >
> >1. Lexical reordering fails with zlib (Guy)
> >
> >
> > --
> >
> > Message: 1
> > Date: Mon, 7 Dec 2015 06:14:47 + (UTC)
> > From: Guy 
> > Subject: [Moses-support] Lexical reordering fails with zlib
> > To: moses-support@mit.edu
> > Message-ID: 
> > Content-Type: text/plain; charset=us-ascii
> >
> > Hello everyone,
> >
> > I've just recently started to work with Moses and managed to build a couple
> > of models without problems... until now.
> >
> > I was training a new system and I got this error when executing
> > lexical-reordering-score:
> >
> > ../mosesdecoder/scripts/../bin/lexical-reordering-score
> > /local/scratch/train/model/extract.o.sorted.gz 0.5
> > /local/scratch/train/model/reordering-table. --model "wbe msd
> > wbe-msd-bidirectional-fe"
> > Lexical Reordering Scorer
> > scores lexical reordering models of several types (hierarchical,
> > phrase-based and word-based-extraction
> > terminate called after throwing an instance of 'util::GZException'
> >   what():  util/read_compressed.cc:163 in virtual std::size_t
> > util::{anonymous}::GZip::Read(void*, std::size_t, util::ReadCompressed&)
> > threw GZException'.
> > zlib encountered invalid distances set code -3
> > Aborted
> >
> > I found an old post
> > (http://permalink.gmane.org/gmane.comp.nlp.moses.user/10151) saying this was
> > due to an apparent bug in zlib 1.2.3.4 on Ubuntu 12.04 and that upgrading to
> > zlib 1.2.8 solves the problem. However, I already have zlib 1.2.8 (but on
> > Ubuntu 14.04) and I still get this error. In case it helps, package name is
> > zlib1g:amd64 version 1:1.2.8.dfsg-1ubuntu).
> >
> > It's a bit strange that I didn't stumble upon this problem when I trained
> > previous systems.
> >
> > Any ideas on what to do?
> >
> > Thank you very much,
> > Guy
> >
> >
> >
> >
> > --
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > End of Moses-support Digest, Vol 110, Issue 16
> > 

Re: [Moses-support] Slides or paper walking through SearchNormal::ProcessOneHypothesis ?

2015-12-15 Thread Matthias Huck
Hi Lane,

Well, you can find excellent descriptions of phrase-based decoding
algorithms in the literature, though possibly not all details of this
specific implementation.

I like this description:

R. Zens, and H. Ney. Improvements in Dynamic Programming Beam Search for
Phrase-based Statistical Machine Translation. In International Workshop
on Spoken Language Translation (IWSLT), pages 195-205, Honolulu, HI,
USA, October 2008. 
http://www.hltpr.rwth-aachen.de/publications/download/618/Zens-IWSLT-2008.pdf

It's what's implemented in Jane, RWTH's open source statistical machine
translation toolkit. 

J. Wuebker, M. Huck, S. Peitz, M. Nuhn, M. Freitag, J. Peter, S.
Mansour, and H. Ney. Jane 2: Open Source Phrase-based and Hierarchical
Statistical Machine Translation. In International Conference on
Computational Linguistics (COLING), pages 483-491, Mumbai, India,
December 2012. 
http://www.hltpr.rwth-aachen.de/publications/download/830/Wuebker-COLING-2012.pdf

However, I believe that the distinction of coverage hypotheses and
lexical hypotheses is a unique property of the RWTH systems. 

The formalization in the Zens & Ney paper is very nicely done. With hard
distortion limits or coverage-based reordering constraints, you may need
a few more steps in the algorithm. E.g., if you have a hard distortion
limit, you will probably want to avoid leaving a gap and then extending
your sequence in a way that puts your current position further away from
the gap than your maximum jump width. Other people should know more
about how exactly Moses' phrase-based decoder is dealing with this.

I can recommend Richard Zens' PhD thesis as well.
http://www.hltpr.rwth-aachen.de/publications/download/562/Zens--2008.pdf

I also remember that the following publication from Microsoft Research
is pretty helpful:

Robert C. Moore and Chris Quirk, Faster Beam-Search Decoding for Phrasal
Statistical Machine Translation, in Proceedings of MT Summit XI,
European Association for Machine Translation, September 2007.
http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf

Cheers,
Matthias



On Tue, 2015-12-15 at 22:33 +, Hieu Hoang wrote:
> I've been looking at this and it is surprisingly complicated. I think
> the code is designed to predetermine if extending a hypothesis will
> lead it down a path that won't ever be completed.
> 
> 
> Don't know any slide that explains the reasoning, Philipp Koehn
> explained it to me once and it seems pretty reasonable.
> 
> 
> 
> I wouldn't mind seeing this code cleaned up a bit and abstracted and
> formalised. I've made a start with the cleanup in my new decoder
> 
> https://github.com/moses-smt/mosesdecoder/blob/perf_moses2/contrib/other-builds/moses2/Search/Search.cpp#L36
>Search::CanExtend()
> 
> 
> There was an Aachen paper from years ago comparing different
> distortion limit heuristics - can't remember the authors or title.
> Maybe someone know more
> 
> 
> 
> 
> 
> Hieu Hoang
> http://www.hoang.co.uk/hieu
> 
> 
> On 15 December 2015 at 20:59, Lane Schwartz 
> wrote:
> Hey all,
> 
> 
> So the SearchNormal::ProcessOneHypothesis() method in
> SearchNormal.cpp is responsible for taking an existing
> hypothesis, creating all legal new extension hypotheses, and
> adding those new hypotheses to the appropriate decoder
> stacks. 
> 
> 
> First off, the method is actually reasonably well commented,
> so kudos to whoever did that. :)
> 
> 
> That said, does anyone happen to have any slides that actually
> walk through this process, specifically slides that take into
> account the interaction with the distortion limit? That
> interaction is where most of the complexity of this method
> comes from. I don't know about others, but even having a
> pretty good notion of what's going on here, the discussion of
> "the closest thing to the left" is still a bit opaque.
> 
> 
> Anyway, if anyone knows of a good set of slides, or even a
> good description in a paper, of what's going on here, I'd
> appreciate any pointers.
> 
> 
> Thanks,
> Lane
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Do debugging in the decoder?

2015-10-05 Thread Matthias Huck
Hi Yuqi,

I don't know. But maybe something like running a profiler on a
small-scale setup and printing the call graph would be more convenient
anyway? If you don't just want to try and read the source code right
away. 

Maybe someone else has better suggestions.

Cheers,
Matthias


On Mon, 2015-10-05 at 09:59 +0200, Yuqi Zhang wrote:
> Thanks a lot,  Matthias and Hieu!
> 
> 
> I have the debug version in Eclipse already and can compiled it
> without errors. 
> I could follow the debugging until to decoder(translation):
> 
> 
>  pool.Submit(task); // in Exportinterface.cpp
> 
> I didn't find a way to see what happen in the 'translation' task, e.g.
> how a source segment looks for its translations in PT. Is there a way
> to let me know what happened in 'translation' task? 
> 
> Thanks!
> 
> Best regards,
> 
> Yuqi
> 
> 
> 
> 2015-10-05 1:07 GMT+02:00 Hieu Hoang :
> i think it might be
>./bjam  variant=debug
> not
>   ./bjam ... --variant=debug
> 
> Also, please git pull. There was a minor compile error when
> using this option, which has now been fixed
> 
> https://github.com/moses-smt/mosesdecoder/commit/72bef00781de9821f2cff227ca7417939041d4e1
> 
> 
> On 04/10/2015 23:25, Matthias Huck wrote:
> Hi Yuqi,
> 
> You can build a debug compile by calling bjam with:
> 
> --variant=debug
> 
> Cheers,
> Matthias
> 
> 
> On Sun, 2015-10-04 at 23:05 +0200, Yuqi Zhang wrote:
> Hello,
> 
> 
> How can I debug the decoder?
> 
> 
> Must I turn off the pre-compile signal
> "WITH_THREADS"?
> Can it be turned off? (Since I have a try, but
> some head files
> regarding threads are always included.)
> Or is there any other way to allow me to get
> into the decoder?
> 
> 
> Thanks a lot!
> 
> 
> Best regards,
> Yuqi
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Do debugging in the decoder?

2015-10-04 Thread Matthias Huck
Hi Yuqi,

You can build a debug compile by calling bjam with: 

--variant=debug

Cheers,
Matthias


On Sun, 2015-10-04 at 23:05 +0200, Yuqi Zhang wrote:
> Hello, 
> 
> 
> How can I debug the decoder? 
> 
> 
> Must I turn off the pre-compile signal "WITH_THREADS"? 
> Can it be turned off? (Since I have a try, but some head files
> regarding threads are always included.)
> Or is there any other way to allow me to get into the decoder? 
> 
> 
> Thanks a lot!
> 
> 
> Best regards,
> Yuqi 
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Regarding Parallel Corpus Repository

2015-09-27 Thread Matthias Huck
Hi,

The Hindi-English language pair was part of the WMT shared translation
task in 2014. See the following website for download links of training
data and dev/test sets:
http://www.statmt.org/wmt14/translation-task.html

Cheers,
Matthias

On Sun, 2015-09-27 at 20:15 +0530, nakul sharma wrote:
> Dear All,
> 
> Is there any online repository of parallel corpus for Indian Regional
> languages ? Bulding from scratch is very tedious task and quite error
> prone? I am looking for English-to any North Indian language pair
> (Punjabi, Hindi, Urdu).
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent,

On Thu, 2015-09-24 at 22:37 +0200, Vincent Nguyen wrote:
> Thanks Matthias for the detailed explanation.
> I think I have most of it in mind except not really understanding how 
> this one works :
> 
> "Difficult sentences generally have worse model score than easy ones but
> may still be useful for training."

Well, your data selection method may discard training instances that are
somehow hard to decode, e.g. because of complex sentence structure or
because of rare vocabulary. But that doesn't necessarily mean that it's
bad sentence pairs that you're removing. You should manually inspect
some samples if possible.

I didn't try, but I suspect that you'd get a higher decoder score on the
1-best decoder output of the first of the following two input sentences:

(1) " Merci ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! "
(2) " Je l' ai vécu moi-même en personne quand j' ai eu mon diplôme à Barnard 
College en 2002 . "

(Just as a simple made-up example.)

If we assume that you have a correct English target sentence for both of
those sentences in your training data, I wonder which of the two you
could learn more from?

If you're doing what I think, then you're also basically just assessing
whether the source side of the sentence pair is easy to translate. Does
this tell you anything about the target sentence? The target side might
be misaligned or in a different third language if your data is noisy.

Cheers,
Matthias



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent,

This is a different topic, and I'm not completely clear about what
exactly you did here. Did you decode the source side of the parallel
training data, conduct sentence selection by applying a threshold on the
decoder score, and extract a new phrase table from the selected fraction
of the original parallel training data? If this is the case, I have some
comments:


- Be careful when you translate training data. The system knows these
sentences and does things like frequently applying long singleton
phrases that have been extracted from the very same sentence.
https://aclweb.org/anthology/P/P10/P10-1049.pdf

- Longer sentences may have worse model score than shorter sentences.
Consider normalizing by sentence length if you use model score for data
selection.
Difficult sentences generally have worse model score than easy ones but
may still be useful for training. You possibly keep the parts of the
data that are easy to translate or are highly redundant in the corpus.

- You probably see no out-of-vocabulary words (OOVs) when translating
training data, or very few of them (depending on word alignment, phrase
extraction method, and phrase table pruning), but be aware that if there
are OOVs, this may affect the model score a lot.

- Check to what extent the sentence selection reduces the vocabulary of
your system.


Last but not least, two more general comments:

- You need dev and test sets that are similar to the type of real-world
documents that you're building your system for. Don't tune on Europarl
if you eventually want to translate pharmaceutical patents, for
instance. Try to collect in-domain training data as well.

- In case you have in-domain and out-of-domain training corpora, you can
try modified Moore-Lewis filtering for data selection. 
https://aclweb.org/anthology/D/D11/D11-1033.pdf


Cheers,
Matthias


On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote:
> This is an interesting subject ..
> 
> As a matter of fact I have done several tests.
> I came up to that need after realizing that even though my results were 
> good in a "standard dev + test set" situation
> I had some strange results with real-world documents.
> That's why I investigated.
> 
> But you are right removing some so-called bad entries could have 
> unexpected results.
> 
> For instance here is a test I did :
> 
> I trained a fr-en model on europarl v7 ( 2 millions sentences)
> I tuned with a subset of 3 K sentences.
> I ran a evaluation on the full 2 million lines.
> then I removed the 90 K sentences for which the score was less than 0.2
> retrained on 1917853 sentences.
> 
> In the end I got more sentences (in %) with a score above 0.2
> but when analyzing at > 0.3 it becomes similar and > 0.4 the initial 
> corpus is better.
> 
> Just weird.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi,

If your analysis revealed that there's an issue with only a few specific
entries, then write regular expressions and grep them out. However, you
risk that those entries are a problem on the devtest set you're looking
at only, whereas on different input data it'll be other bad translation
options which pop up.

On Thu, 2015-09-24 at 16:08 +0200, Vincent Nguyen wrote:
> Matthias,
> 
> Pruning :
> I use the cube pop limit at 400 instead of default values (1000 or 5000)
> I use the MinScore 0.001

It seems to me that something like MinScore 2:0.001 should be effective
for most of the bad phrases you copied into your original mail as an
example.

> I tried sigtest filtering once, it never worked.

Why not?

> table-limit=20
> I have the feeling this is only for CreateOnDiskPt
> am I wrong ?
> does it work with ProcessPhrasetableMin ?

I think it works. The decoder does this, not the phrase table binarizer.
You could run a simple experiments in order to verify. Add
-feature-overwrite 'TranslationModel0 table-limit=20' (or equivalent) to
your decoder call.

Cheers,
Matthias


> Le 24/09/2015 15:21, Matthias Huck a écrit :
> > Hi Vincent,
> >
> > Pruning the phrase table will discard many bad entries.
> >
> > The decoder is typically configured to load no more than a maximum
> > number of translation options per distinct source side. Use
> > table-limit=20 as a parameter to your translation model feature to limit
> > the amount of candidates to the top 20.
> >
> > Alternatively you can pre-prune the phrase table. The following page
> > provides instructions:
> > http://www.statmt.org/moses/?n=Advanced.RuleTables
> >
> > In case you want to remove just a handful of individual entries, I
> > recommend grep -v on the Linux command line.
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2015-09-24 at 11:05 +0100, Hieu Hoang wrote:
> >> i've just added a new feature function that allows you to give a list
> >> of rules that you don't want to be used:
> >>  " 1 ||| One Million Roofs
> >>
> >>  oui ||| no
> >>
> >> To use this list, add the following to your moses.ini file
> >>
> >> [feature]
> >> DeleteRules path=/path/to/list
> >>
> >> Not tested.
> >>
> >>
> >>
> >> Hieu Hoang
> >> http://www.hoang.co.uk/hieu
> >>
> >>
> >> On 24 September 2015 at 10:11, Vincent Nguyen  wrote:
> >>  
> >>  well at times it does, the sequence:
> >>  " 1 "
> >>  became
> >>  One Million Roofs
> >>  completely off 
> >>  
> >>  
> >>  " 1 " . ||| one . ||| 4.77044e-05 2.56689e-08
> >>  0.103519 0.0135382 ||| 1-0 3-1 ||| 2170 1 1 ||| |||
> >>  " 1 " une ||| " 1 " meaning ||| 0.0517593
> >>  0.00140486 0.103519 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2
> >>  1 1 ||| |||
> >>  " 1 " ||| " 1 " meaning ||| 0.0517593
> >>  0.121628 0.0517593 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2 2
> >>  1 ||| |||
> >>  " 1 " ||| one ||| 1.34779e-06 2.65512e-08 0.0517593
> >>  0.0141179 ||| 1-0 ||| 76806 2 1 ||| |||
> >>  " 1 + ||| ' one @-@ on ||| 0.0517593 8.76241e-09
> >>  0.0345062 2.43009e-07 ||| 0-0 2-0 1-1 ||| 2 3 1 ||| |||
> >>  " 1 + ||| ' one @-@ ||| 0.0129398 8.76241e-09
> >>  0.0345062 1.65217e-05 ||| 0-0 2-0 1-1 ||| 8 3 1 ||| |||
> >>  " 1 + ||| ' one ||| 0.000685554 8.76241e-09
> >>  0.0345062 0.00189493 ||| 0-0 2-0 1-1 ||| 151 3 1 ||| |||
> >>  " 1 . ||| '1 . ||| 0.103519 0.241693 0.0345062
> >>  5.37965e-05 ||| 0-0 1-0 2-1 ||| 1 3 1 ||| |||
> >>  " 1 . ||| " 1 . ||| 0.508332 0.34958 0.33
> >>  0.180103 ||| 0-0 1-1 2-2 ||| 2 3 2 ||| |||
> >>  " 1 billion de dollars ||| $ 1 trillion of ||| 0.0207037
> >>  2.46862e-05 0.103519 0.0679424 ||| 4-0 1-1 2-2 3-3 ||| 5 1 1
> >>  ||| |||
> >>  " 1 billion de ||| 1 trillion of ||| 0.0345062
> >>  5.93019e-05 0.103519 0.161697 ||| 1-0 2-1 3-2 ||| 3 1 1 |||
> >>  |||
> >>  " 1 billion ||| 1 trillion ||| 0.00108967 0.000131965
> >>  0.103519 0.536768 ||| 1-0 2-1 ||| 95 

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent,

Pruning the phrase table will discard many bad entries. 

The decoder is typically configured to load no more than a maximum
number of translation options per distinct source side. Use
table-limit=20 as a parameter to your translation model feature to limit
the amount of candidates to the top 20.

Alternatively you can pre-prune the phrase table. The following page
provides instructions:
http://www.statmt.org/moses/?n=Advanced.RuleTables

In case you want to remove just a handful of individual entries, I
recommend grep -v on the Linux command line.

Cheers,
Matthias


On Thu, 2015-09-24 at 11:05 +0100, Hieu Hoang wrote:
> i've just added a new feature function that allows you to give a list
> of rules that you don't want to be used:
> " 1 ||| One Million Roofs
> 
> oui ||| no
> 
> To use this list, add the following to your moses.ini file
> 
>[feature]
>DeleteRules path=/path/to/list
> 
> Not tested.
> 
> 
> 
> Hieu Hoang
> http://www.hoang.co.uk/hieu
> 
> 
> On 24 September 2015 at 10:11, Vincent Nguyen  wrote:
> 
> well at times it does, the sequence:
> " 1 "
> became 
> One Million Roofs
> completely off 
> 
> 
> " 1 " . ||| one . ||| 4.77044e-05 2.56689e-08
> 0.103519 0.0135382 ||| 1-0 3-1 ||| 2170 1 1 ||| |||
> " 1 " une ||| " 1 " meaning ||| 0.0517593
> 0.00140486 0.103519 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2
> 1 1 ||| |||
> " 1 " ||| " 1 " meaning ||| 0.0517593
> 0.121628 0.0517593 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2 2
> 1 ||| |||
> " 1 " ||| one ||| 1.34779e-06 2.65512e-08 0.0517593
> 0.0141179 ||| 1-0 ||| 76806 2 1 ||| |||
> " 1 + ||| ' one @-@ on ||| 0.0517593 8.76241e-09
> 0.0345062 2.43009e-07 ||| 0-0 2-0 1-1 ||| 2 3 1 ||| |||
> " 1 + ||| ' one @-@ ||| 0.0129398 8.76241e-09
> 0.0345062 1.65217e-05 ||| 0-0 2-0 1-1 ||| 8 3 1 ||| |||
> " 1 + ||| ' one ||| 0.000685554 8.76241e-09
> 0.0345062 0.00189493 ||| 0-0 2-0 1-1 ||| 151 3 1 ||| |||
> " 1 . ||| '1 . ||| 0.103519 0.241693 0.0345062
> 5.37965e-05 ||| 0-0 1-0 2-1 ||| 1 3 1 ||| |||
> " 1 . ||| " 1 . ||| 0.508332 0.34958 0.33
> 0.180103 ||| 0-0 1-1 2-2 ||| 2 3 2 ||| |||
> " 1 billion de dollars ||| $ 1 trillion of ||| 0.0207037
> 2.46862e-05 0.103519 0.0679424 ||| 4-0 1-1 2-2 3-3 ||| 5 1 1
> ||| |||
> " 1 billion de ||| 1 trillion of ||| 0.0345062
> 5.93019e-05 0.103519 0.161697 ||| 1-0 2-1 3-2 ||| 3 1 1 |||
> |||
> " 1 billion ||| 1 trillion ||| 0.00108967 0.000131965
> 0.103519 0.536768 ||| 1-0 2-1 ||| 95 1 1 ||| |||
> " 1 milliard $ , ||| $ 1 billion ||| 0.00199074
> 2.23776e-06 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 |||
> |||
> " 1 milliard $ ||| $ 1 billion ||| 0.00199074 3.32223e-05
> 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 ||| |||
> " 1 milliard d' euros ||| EUR 1 billion |||
> 0.00026749 3.23583e-05 0.103519 0.179568 ||| 4-0 1-1 2-2 3-2
> ||| 387 1 1 ||| |||
> " 1 milliard d' ||| 1 billion ||| 0.000137475
> 6.11551e-05 0.103519 0.25129 ||| 1-0 2-1 3-1 ||| 753 1 1 |||
> |||
> " 1 milliard de dollars ||| $ 1 billion ||| 0.0195512
> 2.47433e-05 0.508332 0.105231 ||| 0-0 4-0 1-1 2-2 ||| 52 2 2
> ||| |||
> " 1 milliard de personnes ||| one billion people |||
> 0.00252484 9.77577e-09 0.103519 0.00258395 ||| 2-0 1-1 2-1 4-2
> ||| 41 1 1 ||| |||
> " 1 milliard de ||| 1 billion of ||| 0.00941078
> 0.000159942 0.0517593 0.15086 ||| 1-0 2-1 3-2 ||| 11 2 1 |||
> |||
> " 1 milliard de ||| one billion ||| 0.000509944
> 4.32371e-08 0.0517593 0.00492989 ||| 2-0 1-1 2-1 ||| 203 2 1
> ||| |||
> " 1 milliard ||| 1 billion ||| 0.0026678 0.000355919
> 0.502213 0.500792 ||| 1-0 2-1 ||| 753 4 3 ||| |||
> " 1 milliard ||| one billion ||| 0.000509944 3.43309e-07
> 0.0258796 0.00492989 ||| 2-0 1-1 2-1 ||| 203 4 1 ||| |||
> " 1 million $ ||| $ 1 million ||| 0.0172531 1.31973e-05
> 0.103519 0.221619 ||| 0-0 3-0 1-1 2-2 ||| 6 1 1 ||| |||
> " 1 million de toits ||| one million solar roofs |||
> 0.0517593 5.86831e-10 0.103519 1.43348e-10 ||| 2-0 1-1 4-3 |||
> 2 1 1 ||| |||
> " 1 million de ||| one million solar ||| 0.0258796
> 9.85876e-10 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
> |||
> " 1 million de ||| one million ||| 0.00021344 9.85876e-10
> 0.0517593 0.000202374 ||| 2-0 1-1 ||| 485 2 1 ||| |||
> " 1 million ||| one million solar ||| 0.0258796
> 7.82802e-09 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
> |||
> " 1 million ||| one million ||| 0.00021344 7.82802e-09
> 0.0517593 0.000202374 

Re: [Moses-support] How to Develop Parallel Corpus

2015-09-06 Thread Matthias Huck
Hi Asad,

You can try Hunalign or the Microsoft Bilingual Sentence Aligner (if
it's for non-commercial purposes).

Cheers,
Matthias


On Sun, 2015-09-06 at 10:24 +, Asad A.Malik wrote:
> Hi All, 
> 
> 
> I am currently trying to develop the parallel corpus. I wanted to know
> is there any tool with which I can develop it? i.e. The sentence in
> source language is at same line in target language.
> 
>  
> --
> 
> Kind Regards,
> 
> Mr. Asad Abdul Malik
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Generating Segment Level BLEU, NIST and METEOR scores

2015-09-02 Thread Matthias Huck
Hi Liling,

This tool calculates sentence-level BLEU scores (smoothed via
incrementing the n-gram counts by 1):

bin/sentence-bleu

Make sure that you provide the hypothesis and reference files in an
appropriately processed way. The tool doesn't apply any tokenization or
remove any markup internally.

You find the source code in

mert/sentence-bleu.cpp

if you need it.

There's also a special version for use with n-best lists:

bin/sentence-bleu-nbest

Cheers,
Matthias


On Wed, 2015-09-02 at 21:47 +0200, liling tan wrote:
> Dear Moses devs/users,
> 
> 
> Is there any script in moses or other MT libraries that can generate
> the segment level BLEU, NIST and METEOR scores for each sentence in
> the test set?
> 
> 
> Best Regards,
> Liling
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Domain adaptation

2015-08-14 Thread Matthias Huck
Hi,

I found this older tutorial to be very useful as well:

"Practical Domain Adaptation" by Marcello Federico and Nicola Bertoldi
http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf
(The document formatting is unfortunately slightly messed up.)

SMT research survey wiki:
http://www.statmt.org/survey/Topic/DomainAdaptation

Cheers,
Matthias


On Fri, 2015-08-14 at 20:37 +0100, Barry Haddow wrote:
> You could try this tutorial
> 
> http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf
> 
> On 14/08/15 20:20, Vincent Nguyen wrote:
> > I had read this section, which deals with translation model combination.
> > not much on language model or tuning.
> >
> > For instance : if I want to make sure that a specific expression
> > "titres" is translated in "equities" from French to English.
> >
> > These 2 words have specifically to be in the Monolingual corpus of the
> > language model, or in the parallel corpus ?
> >
> > the fact that 2 "parallel expressions" are in the tuning set but not
> > present in the parallel corpora nor the monolingual LM, can it trigger a
> > good translation ?
> >
> > I am not sure to be clear 
> >
> > thanks again for your help.
> >
> >
> > Le 14/08/2015 20:52, Rico Sennrich a écrit :
> >> Hi Vincent,
> >>
> >> this section describes some domain adaptation methods that are
> >> implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
> >>
> >> It is incomplete (focusing on parallel data and the translation model),
> >> and does not recommend best practices.
> >>
> >> In general, my recommendation is to use in-domain data whenever possible
> >> (for the language model, translation model, and held-out in-domain data
> >> for tuning/testing). Out-of-domain data can help, but also hurt your
> >> system: the effect depends on your domains and the amount of data you
> >> have for each. Data selection, instance weighting, model interpolation
> >> and domain features are different methods that give you the benefits of
> >> out-of-domain data, but reduce its harmful effects, and are often better
> >> than just concatenating all the data you have.
> >>
> >> best wishes,
> >> Rico
> >>
> >>
> >> On 14/08/15 16:22, Vincent Nguyen wrote:
> >>> Hi,
> >>>
> >>> I can't find a sort of "tutorial " on domain adaptation path to follow.
> >>> I read this in the doc :
> >>> The language model should be trained on a corpus that is suitable to the
> >>> domain. If the translation model is trained on a parallel corpus, then
> >>> the language model should be trained on the output side of that corpus,
> >>> although using additional training data is often beneficial.
> >>>
> >>> And in the training section of the EMS, there is a sub section with
> >>> domain-features=
> >>>
> >>> What is the best practice ?
> >>>
> >>> Let's say for instance that I would like to specialize my modem in
> >>> finance translation, with specific corpus.
> >>>
> >>> Should I train the Language model with finance stuff ?
> >>> Should I include parallel corpus in the translation model training ?
> >>> Should I tune with financial data sets ?
> >>>
> >>> Please help me to understand.
> >>> Vincent
> >>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-17 Thread Matthias Huck
On Fri, 2015-07-17 at 09:08 +0400, Hieu Hoang wrote:
> the OnDisk pt can do everything - sparse features, properties, hiero
> models. it's just slow and big
> 
> 
> i think the old Binary pt did sparse features but not properties, the
> Compact pt does neither


Ah, I guess that explains why it didn't work for me. I used the compact
phrase table in that experiment.





-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-16 Thread Matthias Huck
Hi,

You're right, I claimed in the previous mail that "in order to produce
sparse features, you need to write a feature function anyway" and this
is of course not true if you get the sparse phrase table features to
work.

When I tried those sparse domain indicators recently, they didn't work
out of the box, and I also don't know where to find the relevant code.
My guess is that this functionality was broken during the course of
Moses refactoring, but it may as well still be there and waiting to be
activated in the moses.ini. What I did was just switching to dense
domain indicators.

Maybe Hieu can help?

Cheers,
Matthias


On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote:
> Hi Matthias,
> 
> 
> Thanks for the information.
> 
> 
> I tested on moses 3.0, adding phrase table sparse feature is seems
> working.
> 
> 
> However, I did not add any flag into ini, like suggested "If a phrase
> table contains sparse features, then this needs to be flagged in the
> configuration file by adding the word sparse after the phrase table
> file name.". Did i miss anything?
> 
> 
> Regards,
> 
> 
> Jian
> 
> 
> 
> 
> 
> 
> 
> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck 
> wrote:
> Hi Jian,
> 
> That depends on the nature of the features you're planning to
> implement.
> 
> In order to produce sparse features, you need to write a
> feature
> function anyway.
> 
> But if it's only a handful of scores and they can be
> calculated during
> extraction time, then go for dense features and add the scores
> directly
> to the phrase table.
> 
> If the scores cannot be precalculated, for instance because
> you need
> non-local information that is only available during decoding,
> then a
> feature function implementation becomes necessary.
> 
> When you write a feature function that calculates scores
> during decoding
> time, it can produce dense scores, sparse scores, or both
> types. That's
> up to you.
> 
> If it's plenty of scores which are fired rarely, then sparse
> is the
> right choice. And you certainly need a sparse feature function
> implementation in case you are not aware in advance of the
> overall
> amount of feature scores it can produce.
> 
> If you need information from phrase extraction in order to
> calculate
> scores during decoding time, then we have something denoted as
> "phrase
> properties". Phrase properties give you a means of storing
> arbitrary
> additional information in the phrase table. You have to extend
> the
> extraction pipeline to retrieve and store the phrase
> properties you
> require. The decoder can later read this information from the
> phrase
> table, and your feature function can utilize it in some way.
> 
> A large amount of sparse feature scores can somewhat slow down
> decoding
> and tuning. Also, you have to use MIRA or PRO for tuning, not
> MERT.
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> > Hi Matthias,
> >
> >
> > Not for domain feature.
> >
> >
> > I want to implement some sparse features, so there are two
> options:
> > 1, add to phrase table, if it is supported
> > 2, implement sparse feature functions,
> >
> >
> > I'd like to know are there any difference between these two
> options,
> > for example, tuning, compute sentence translation scores ...
> >
> >
> > Regards,
> >
> >
> >
> > Jian
> >
> >
> >
> > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck
> 
> > wrote:
> > Hi,
> >
> > Are you planning to use binary domain indicator
> features? I'm
> > not sure
> > whether a sparse feature function for this is
> currently
> > implemented. If
> > you're working with a small set of domains, you can
>

Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-15 Thread Matthias Huck
Hi Jian,

That depends on the nature of the features you're planning to
implement. 

In order to produce sparse features, you need to write a feature
function anyway.

But if it's only a handful of scores and they can be calculated during
extraction time, then go for dense features and add the scores directly
to the phrase table.

If the scores cannot be precalculated, for instance because you need
non-local information that is only available during decoding, then a
feature function implementation becomes necessary.

When you write a feature function that calculates scores during decoding
time, it can produce dense scores, sparse scores, or both types. That's
up to you.

If it's plenty of scores which are fired rarely, then sparse is the
right choice. And you certainly need a sparse feature function
implementation in case you are not aware in advance of the overall
amount of feature scores it can produce.

If you need information from phrase extraction in order to calculate
scores during decoding time, then we have something denoted as "phrase
properties". Phrase properties give you a means of storing arbitrary
additional information in the phrase table. You have to extend the
extraction pipeline to retrieve and store the phrase properties you
require. The decoder can later read this information from the phrase
table, and your feature function can utilize it in some way.

A large amount of sparse feature scores can somewhat slow down decoding
and tuning. Also, you have to use MIRA or PRO for tuning, not MERT.

Cheers,
Matthias


On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> Hi Matthias,
> 
> 
> Not for domain feature.
> 
> 
> I want to implement some sparse features, so there are two options:
> 1, add to phrase table, if it is supported
> 2, implement sparse feature functions,
> 
> 
> I'd like to know are there any difference between these two options,
> for example, tuning, compute sentence translation scores ...
> 
> 
> Regards,
> 
> 
> 
> Jian
> 
> 
> 
> On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck 
> wrote:
> Hi,
> 
> Are you planning to use binary domain indicator features? I'm
> not sure
> whether a sparse feature function for this is currently
> implemented. If
> you're working with a small set of domains, you can employ
> dense
> indicators instead (domain-features = "indicator" in EMS).
> You'll have
> to re-extract the phrase table, though. Or process it with a
> script to
> add dense indicator values to the scores field.
> 
> I believe that there might also be some bug in the extraction
> pipeline
> when both domain-features = "sparse indicator" and
> score-settings =
> "--GoodTuring" are active in EMS. At least it caused me
> trouble a couple
> of weeks ago. However, I must admit that I didn't investigate
> it further
> at that point.
> 
> Anyway, the bottom line is that I recommend re-extracting with
> dense
> indicators.
> 
> But let me know what you find regarding a sparse
> implementation.
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
> > Hi,
> >
> >
> > Is the sparse features at phrase table, like
> >
> >
> >
> > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1
> ||| 5000
> > 5000 2500 ||| dom_europarl 1
> >
> >
> >
> > still supported? If yes, what should I set to the ini file
> based on
> > the example above?
> >
> >
> > Thank,
> >
> >
> > Jian
> >
> >
> > --
> > Jian Zhang
> > Centre for Next Generation Localisation (CNGL)
> > Dublin City University
> 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered
> in
> Scotland, with registration number SC005336.
> 
> 
> 
> 
> 
> -- 
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)
> Dublin City University



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-15 Thread Matthias Huck
Hi,

Are you planning to use binary domain indicator features? I'm not sure
whether a sparse feature function for this is currently implemented. If
you're working with a small set of domains, you can employ dense
indicators instead (domain-features = "indicator" in EMS). You'll have
to re-extract the phrase table, though. Or process it with a script to
add dense indicator values to the scores field.

I believe that there might also be some bug in the extraction pipeline
when both domain-features = "sparse indicator" and score-settings =
"--GoodTuring" are active in EMS. At least it caused me trouble a couple
of weeks ago. However, I must admit that I didn't investigate it further
at that point.

Anyway, the bottom line is that I recommend re-extracting with dense
indicators.

But let me know what you find regarding a sparse implementation.

Cheers,
Matthias


On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
> Hi,
> 
> 
> Is the sparse features at phrase table, like
> 
> 
> 
> das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1 ||| 5000
> 5000 2500 ||| dom_europarl 1
> 
> 
> 
> still supported? If yes, what should I set to the ini file based on
> the example above?
> 
> 
> Thank,
> 
> 
> Jian
> 
> 
> -- 
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)
> Dublin City University
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] multiple interpolated LM

2015-06-28 Thread Matthias Huck
Hi Hieu,

That should be no problem. Pretty sure I did that a couple of times
already. No need to add another [INTERPOLATED-LM] section. Just try!

Cheers,
Matthias


On Sun, 2015-06-28 at 10:55 +0400, Hieu Hoang wrote:
> in the EMS, is it possible to create interpolated LM for different
> factors? the [INTERPOLATED-LM] section is marked as single, so I'm
> guessing not
> 
> 
> Hieu Hoang
> Researcher
> 
> New York University, Abu Dhabi
> 
> http://www.hoang.co.uk/hieu
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Major bug found in Moses

2015-06-24 Thread Matthias Huck
Hi James,

Irrespective of the fact that you need to tune the weights of the
log-linear model: 

Let me provide more references in order to shed light on how well
established simple pruning techniques are in our field as well as in
related fields (namely, automatic speech recognition).

This list of references might not be what you are looking for, but maybe
other readers can benefit.


V. Steinbiss, B. Tran, H. Ney. Improvements in beam search. In Proc.
of the Int. Conf. on Spoken Language Processing (ICSLP’94), pages
2143-2146, Yokohama, Japan, Sept. 1994.
http://www.steinbiss.de/vst94d.pdf

R. Zens, F. J. Och, and H. Ney. Phrase-Based Statistical Machine
Translation. In German Conf. on Artificial Intelligence (KI), pages
18-32, Aachen, Germany, Sept. 2002.
https://www-i6.informatik.rwth-aachen.de/publications/download/434/Zens-KI-2002.pdf

Philipp Koehn. Pharaoh: a beam search decoder for phrase-based
statistical machine translation models. In Proc. of the AMTA, pages
115-124, Washington, DC, USA, Sept./Oct. 2004.
http://homepages.inf.ed.ac.uk/pkoehn/publications/pharaoh-amta2004.pdf

Robert C. Moore and Chris Quirk. Faster Beam-Search Decoding for Phrasal
Statistical Machine Translation. In Proc. of MT Summit XI, European
Association for Machine Translation, Sept. 2007.
http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf

Richard Zens and Hermann Ney. Improvements in Dynamic Programming Beam
Search for Phrase-based Statistical Machine Translation. In Proc. of the
International Workshop on Spoken Language Translation (IWSLT), Honolulu,
HI, USA, Oct. 2008.
http://www.mt-archive.info/05/IWSLT-2008-Zens.pdf


Cheers,
Matthias



On Wed, 2015-06-24 at 13:11 +, Read, James C wrote:
> Thank you for reading very careful the draft paper I provided a link
> to and noticing that the Johnson paper is duly cited there. Given that
> you had already noticed this I shall not proceed to explain the
> blinding obvious differences between my very simple filter and their
> filter based on Fisher's exact test.
> 
> Other than that it seems painfully clear that the point I meant to
> make has not been understood entirely. If the default behaviour
> produces BLEU scores considerably lower than merely selecting the most
> likely translation of each phrase then evidently there is something
> very wrong with the default behaviour. If we cannot agree on something
> as obvious as that then I really can't see this discussion making any
> productive progress.
> 
> James
> 
> 
> From: moses-support-boun...@mit.edu  on behalf 
> of Rico Sennrich 
> Sent: Friday, June 19, 2015 8:25 PM
> To: moses-support@mit.edu
> Subject: Re: [Moses-support] Major bug found in Moses
> 
> [sorry for the garbled message before]
> 
> you are right. The idea is pretty obvious. It roughly corresponds to
> 'Histogram pruning' in this paper:
> 
> Zens, R., Stanton, D., Xu, P. (2012). A Systematic Comparison of Phrase
> Table Pruning Technique. In Proceedings of the 2012 Joint Conference on
> Empirical Methods in Natural Language Processing and Computational
> Natural Language Learning (EMNLP-CoNLL), pp. 972-983.
> 
> The idea has been described in the literature before that (for instance,
> Johnson et al. (2007) only use the top 30 phrase pairs per source
> phrase), and may have been used in practice for even longer. If you read
> the paper above, you will find that histogram pruning does not improve
> translation quality on a state-of-the-art SMT system, and performs
> poorly compared to more advanced pruning techniques.
> 
> On 19.06.2015 17:49, Read, James C. wrote:
> > So, all I did was filter out the less likely phrase pairs and the BLEU 
> > score shot up. Was that such a stroke of genius? Was that not blindingly 
> > obvious?
> >
> >
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] please help me with the code - getting word index

2015-06-20 Thread Matthias Huck
Hi,

Yes, you need to calculate the absolute position by adding up the start
position of the current rule application, the relative index within the
rule, and the span width of any right-hand side non-terminals in the
current rule with a smaller source index.

As Rico noted, you'll find some similar code examples in existing
feature functions.

Cheers,
Matthias


On Sat, 2015-06-20 at 18:05 +0430, amir haghighi wrote:
> Thanks Matthias
> ChartHypothesis::GetCurrSourceRange() gets the source span that all
> terminals and non terminals in the current hypothesis cover in the
> source sentence. I'd like to know which terminals (non terminals) are
> corresponded to which source word's index in the source. Could you
> guide me how to obtain that?
> 
> 
> Thanks again
> 
> 
> On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck 
> wrote:
> Hi,
> 
> You can calculate absolute positions in the source sentence
> based on the
> words range of the current hypothesis and those of the direct
> predecessors (in case of right-hand side non-terminals).
> 
> Take a look at these methods:
> 
> InputPath::GetWordsRange()
> ChartHypothesis::GetCurrSourceRange()
> ChartCellLabel::GetCoverage()
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
> > Hi everybody
> >
> >
> > I wrote the following code to get an ordered list from the
> source words
> > inside a hypothesis. It gets the words in their translation
> order, but I
> > need not only the words' strings, but also the index of each
> word in  the
> > original sentence.
> >
> > could you please help me how to get the index of each word
> in srcPhrase, in
> > the sentence?
> >
> >
> > void Amir::GetSourcePhrase2(const ChartHypothesis&
> cur_hypo,Phrase
> > &srcPhrase) const
> > {
> > AmirUtils utility;
> > TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
> > const Phrase *sourcePh=targetPh.GetRuleSource();
> >  int
> targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
> > std::vector  source, orderedSource;
> > std::vector  alignmentVector;
> > std::vector  isAligned;
> >
> > std::vector  > sourcePosSets;
> >
> > for(int targetP=0; targetP< targetWordsNum; targetP++ ){
> > //std::cerr<<"setting alignments for targetword:
> "< >
> >
> 
> sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
> > }
> >
> >
> > for(int ii=targetWordsNum-1; ii>=0; ii--){
> > std::set  cur_srcPosSet=sourcePosSets[ii];
> > for (std::set ::const_iterator alignmet =
> > cur_srcPosSet.begin();alignmet != cur_srcPosSet.end();
> ++alignmet) {
> > int  alignmentElement=*alignmet;
> > for(int index=0; index rightmost one and
> > remove the othres
> > //remove it from the list
> > if(sourcePosSets[index].size()>0){
> > //std::cerr<<" removing "<<*alignmet< > //std::cerr<<"  for set with size:
> > "< > sourcePosSets[index].erase(alignmentElement);
> > }
> >
> > }
> > }
> > }
> >
> > for (size_t posT = 0; posT <
> cur_hypo.GetCurrTargetPhrase().GetSize();
> > ++posT) {
> >   const Word &word =
> cur_hypo.GetCurrTargetPhrase().GetWord(posT);
> >   if (word.IsNonTerminal()){
> > // non-term. fill out with prev hypo
> >
> > size_t nonTermInd =
> >
> 
> cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
> > const ChartHypothesis *prevHypo =
> cur_hypo.GetPrevHypo(nonTermInd);
> >
> >   

Re: [Moses-support] Major bug found in Moses

2015-06-19 Thread Matthias Huck
Hi James,

Well, it's pretty straightforward: The decoder's job is to find the
hypothesis with the maximum model score. That's why everybody builds
models which assign high model score to high-quality translations.
Unfortunately, you missed this last point in your own work.

Cheers,
Matthias


On Fri, 2015-06-19 at 14:15 +, Read, James C wrote:
> I'm gonna try once more. This is what he said:
> 
> "the decoder's job is NOT to find the high quality translation"
> 
> The  next time I have a panel of potential investors in front of me
> I'm gonna pass that line by them and see how it goes down. I stress
> the words HIGH QUALITY TRANSLATION.
> 
> Please promise me that the next time you put in a bid for funding you
> will guarantee your prospective funders that under no circumstances
> will you attempt to design a system which searches for HIGH QUALITY
> TRANSLATION.
> 
> James
> 
> 
> From: Matthias Huck 
> Sent: Friday, June 19, 2015 5:08 PM
> To: Read, James C
> Cc: Hieu Hoang; moses-support@mit.edu; Arnold, Doug
> Subject: Re: [Moses-support] Major bug found in Moses
> 
> Hi James,
> 
> Yes, he just said that.
> 
> The decoder's job is to find the hypothesis with the maximum model
> score. That's one reason why your work is flawed. You did not care at
> all whether your model score correlates with BLEU or not.
> 
> Cheers,
> Matthias
> 
> 
> On Fri, 2015-06-19 at 13:24 +, Read, James C wrote:
> > I quote:
> >
> >
> > "the decoder's job is NOT to find the high quality translation"
> >
> >
> >
> > Did you REALLY just say that?
> >
> >
> > James
> >
> >
> >
> >
> > __
> > From: Hieu Hoang 
> > Sent: Wednesday, June 17, 2015 9:00 PM
> > To: Read, James C
> > Cc: Kenneth Heafield; moses-support@mit.edu; Arnold, Doug
> > Subject: Re: [Moses-support] Major bug found in Moses
> >
> > the decoder's job is NOT to find the high quality translation (as
> > measured by bleu). It's job is to find translations with high model
> > score.
> >
> >
> > you need the tuning to make sure high quality translation correlates
> > with high model score. If you don't tune, it's pot luck what quality
> > you get.
> >
> >
> > You should tune with the features you use
> >
> >
> > Hieu Hoang
> > Researcher
> >
> > New York University, Abu Dhabi
> >
> > http://www.hoang.co.uk/hieu
> >
> >
> > On 17 June 2015 at 21:52, Read, James C  wrote:
> > The analogy doesn't seem to be helping me understand just how
> > exactly it is a desirable quality of a TM to
> >
> > a) completely break down if no LM is used (thank you for
> > showing that such is not always the case)
> > b) be dependent on a tuning step to help it find the higher
> > scoring translations
> >
> > What you seem to be essentially saying is that the TM cannot
> > find the higher scoring translations because I didn't pretune
> > the system to do so. And I am supposed to accept that such is
> > a desirable quality of a system whose very job is to find the
> > higher scoring translations.
> >
> > Further, I am still unclear which features you prequire a
> > system to be tuned on. At the very least it seems that I have
> > discovered the selection process that tuning seems to be
> > making up for in some unspecified and altogether opaque way.
> >
> > James
> >
> >
> > 
> > From: Hieu Hoang 
> > Sent: Wednesday, June 17, 2015 8:34 PM
> > To: Read, James C; Kenneth Heafield; moses-support@mit.edu
> > Cc: Arnold, Doug
> > Subject: Re: [Moses-support] Major bug found in Moses
> >
> > 4 BLEU is nothing to sniff at :) I was answering Ken's tangent
> > aspersion
> > that LM are needed for tuning.
> >
> > I have some sympathy for you. You're looking at ways to
> > improve
> > translation by reducing the search space. I've bashed my head
> > against
> > this wall for a while as well without much success.
> >
> > However, as everyone is telling you, you haven't understood
> >

Re: [Moses-support] Dependencies in EMS/Experiment.perl

2015-06-19 Thread Matthias Huck
Hi Evgeny,

If setting TRAINING:config won't help, then it might get a bit tricky.
Another thing you can try is setting filtered-config or filtered-dir in
the [TUNING] section.

The next workaround I can think of is pointing to existing files in all
the [CORPUS:*] sections by setting tokenized-stem, clean-stem,
truecased-stem ... 

Similarly in the [LM:*] sections with tokenized-corpus and
truecased-corpus etc., if defining lm and/or binlm doesn't make it skip
those steps.

Cheers,
Matthias


On Fri, 2015-06-19 at 16:41 +, Evgeny Matusov wrote:
> Hi,
> 
> 
> to those of you using Experiment.perl for experiments, maybe you can
> help me solve the following problem:
> 
> 
> I added a step to filter full segment overlap of evaluation and tuning
> data with the training data. This steps removes all sentences from
> each CORPUS which are also found in EVALUATION and TUNING sentences.
> Thus, one of the CORPUS steps depend on EVALUATION and TUNING.
> 
> 
> Now, I want to exchange the tuning corpus I am using, picking another
> one which was already declared in the EVALUATION section. Thus, the
> filter against which the overlap is checked does not change, and hence
> the training data does not need to be filtered again, and therefore
> neither the alignment training nor LM training or anything else should
> be repeated, just the tuning step should re-start. However,
> Experiment.perl is not smart enough to realize this. I tried to add
> "pass-if" or "ignore-if" step on the filter-overlap step that I
> declared and set a variable to pass it, but this did not help - all
> steps after it are still executed. Setting TRAINING:config to a valid
> moses.ini file helps to prevent the alignment training from running,
> but not the LM training, nor (more importantly), the several
> cleaning/lowercasing steps that follow the overlap step for each
> training corpus.
> 
> 
> Is there an easy way to block everything below tuning from being
> repeated, even if the tuning data changes?
> 
> 
> Thanks,
> 
> Evgeny.
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Major bug found in Moses

2015-06-19 Thread Matthias Huck
Hi James,

Yes, he just said that.

The decoder's job is to find the hypothesis with the maximum model
score. That's one reason why your work is flawed. You did not care at
all whether your model score correlates with BLEU or not.

Cheers,
Matthias


On Fri, 2015-06-19 at 13:24 +, Read, James C wrote:
> I quote:
> 
> 
> "the decoder's job is NOT to find the high quality translation"
> 
> 
> 
> Did you REALLY just say that?
> 
> 
> James
> 
> 
> 
> 
> __
> From: Hieu Hoang 
> Sent: Wednesday, June 17, 2015 9:00 PM
> To: Read, James C
> Cc: Kenneth Heafield; moses-support@mit.edu; Arnold, Doug
> Subject: Re: [Moses-support] Major bug found in Moses 
>  
> the decoder's job is NOT to find the high quality translation (as
> measured by bleu). It's job is to find translations with high model
> score.
> 
> 
> you need the tuning to make sure high quality translation correlates
> with high model score. If you don't tune, it's pot luck what quality
> you get.
> 
> 
> You should tune with the features you use
> 
> 
> Hieu Hoang
> Researcher
> 
> New York University, Abu Dhabi
> 
> http://www.hoang.co.uk/hieu
> 
> 
> On 17 June 2015 at 21:52, Read, James C  wrote:
> The analogy doesn't seem to be helping me understand just how
> exactly it is a desirable quality of a TM to
> 
> a) completely break down if no LM is used (thank you for
> showing that such is not always the case)
> b) be dependent on a tuning step to help it find the higher
> scoring translations
> 
> What you seem to be essentially saying is that the TM cannot
> find the higher scoring translations because I didn't pretune
> the system to do so. And I am supposed to accept that such is
> a desirable quality of a system whose very job is to find the
> higher scoring translations.
> 
> Further, I am still unclear which features you prequire a
> system to be tuned on. At the very least it seems that I have
> discovered the selection process that tuning seems to be
> making up for in some unspecified and altogether opaque way.
> 
> James
> 
> 
> 
> From: Hieu Hoang 
> Sent: Wednesday, June 17, 2015 8:34 PM
> To: Read, James C; Kenneth Heafield; moses-support@mit.edu
> Cc: Arnold, Doug
> Subject: Re: [Moses-support] Major bug found in Moses
> 
> 4 BLEU is nothing to sniff at :) I was answering Ken's tangent
> aspersion
> that LM are needed for tuning.
> 
> I have some sympathy for you. You're looking at ways to
> improve
> translation by reducing the search space. I've bashed my head
> against
> this wall for a while as well without much success.
> 
> However, as everyone is telling you, you haven't understood
> the role of
> tuning. Without tuning, you're pointing your lab rat to some
> random part
> of the search space, instead of away from the furry animal
> with whiskers
> and towards the yellow cheesy thing
> 
> On 17/06/2015 20:45, Read, James C wrote:
> > Doesn't look like the LM is contributing all that much then
> does it?
> >
> > James
> >
> > 
> > From: moses-support-boun...@mit.edu
>  on behalf of Hieu Hoang
> 
> > Sent: Wednesday, June 17, 2015 7:35 PM
> > To: Kenneth Heafield; moses-support@mit.edu
> > Subject: Re: [Moses-support] Major bug found in Moses
> >
> > On 17/06/2015 20:13, Kenneth Heafield wrote:
> >> I'll bite.
> >>
> >> The moses.ini files ship with bogus feature weights.  One
> is required to
> >> tune the system to discover good weights for their system.
> You did not
> >> tune.  The results of an untuned system are meaningless.
> >>
> >> So for example if the feature weights are all zeros, then
> the scores are
> >> all zero.  The system will arbitrarily pick some awful
> translation from
> >> a large space of translations.
> >>
> >> The filter looks at one feature p(target | source).  So now
> you've
> >> constrained the awful untuned model to a slightly better
> region of the
> >> search space.
> >>
> >> In other words, all you've done is a poor approximation to
> manually
> >> setting the weight to 1.0 on p(target | source) and the
> rest to 0.
> >>
> >> The problem isn't that you are running without a language
> model (though
> >> we generally do not care what happens

Re: [Moses-support] please help me with the code - getting word index

2015-06-18 Thread Matthias Huck
Hi,

You can calculate absolute positions in the source sentence based on the
words range of the current hypothesis and those of the direct
predecessors (in case of right-hand side non-terminals).

Take a look at these methods:

InputPath::GetWordsRange()
ChartHypothesis::GetCurrSourceRange()
ChartCellLabel::GetCoverage()

Cheers,
Matthias


On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
> Hi everybody
> 
> 
> I wrote the following code to get an ordered list from the source words
> inside a hypothesis. It gets the words in their translation order, but I
> need not only the words' strings, but also the index of each word in  the
> original sentence.
> 
> could you please help me how to get the index of each word in srcPhrase, in
> the sentence?
> 
> 
> void Amir::GetSourcePhrase2(const ChartHypothesis&  cur_hypo,Phrase
> &srcPhrase) const
> {
> AmirUtils utility;
> TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
> const Phrase *sourcePh=targetPh.GetRuleSource();
>  int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
> std::vector  source, orderedSource;
> std::vector  alignmentVector;
> std::vector  isAligned;
> 
> std::vector  > sourcePosSets;
> 
> for(int targetP=0; targetP< targetWordsNum; targetP++ ){
> //std::cerr<<"setting alignments for targetword: "< 
> sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
> }
> 
> 
> for(int ii=targetWordsNum-1; ii>=0; ii--){
> std::set  cur_srcPosSet=sourcePosSets[ii];
> for (std::set ::const_iterator alignmet =
> cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
> int  alignmentElement=*alignmet;
> for(int index=0; index remove the othres
> //remove it from the list
> if(sourcePosSets[index].size()>0){
> //std::cerr<<" removing "<<*alignmet< //std::cerr<<"  for set with size:
> "< sourcePosSets[index].erase(alignmentElement);
> }
> 
> }
> }
> }
> 
> for (size_t posT = 0; posT < cur_hypo.GetCurrTargetPhrase().GetSize();
> ++posT) {
>   const Word &word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
>   if (word.IsNonTerminal()){
> // non-term. fill out with prev hypo
> 
> size_t nonTermInd =
> cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
> const ChartHypothesis *prevHypo = cur_hypo.GetPrevHypo(nonTermInd);
> 
> GetSourcePhrase2(*prevHypo,srcPhrase);
> }
>   else{
> 
>   for(std::set::const_iterator
> it=sourcePosSets[posT].begin();it !=  sourcePosSets[posT].end() ; it++
> ){
>   srcPhrase.AddWord(sourcePh->GetWord(*it));
>   }
>   }
> }
> 
> 
> }
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Major bug found in Moses

2015-06-18 Thread Matthias Huck
Hi,

Not sure whether this was mentioned in the vast number of replies:

I'd like to stress that simple histogram pruning of the phrase table is
implemented in Moses and every other SMT system I'm aware of. 
(We know better pruning techniques, though:
http://anthology.aclweb.org/D/D12/D12-1089.pdf )

If you deactivate all non-local features (like the LM, the lexicalized
reordering model, the distance-based jump cost), run monotonic decoding,
and apply the features and scaling factors known to the decoder for
pruning as well, then it shouldn't matter how much you prune. If you
keep at least the best translation option per distinct source side, the
decoder should always output the very same Viterbi path.

A simple toy example should be sufficient to verify that the decoder
implements the argmax operation. 

We frequently run a couple of basic regression tests:
http://statmt.org/moses/cruise/ 
I'm pretty sure that we would have noticed quickly in case a major bug
was introduced just recently.

The decoder maximizes model score, not BLEU. Tuning is required to
achieve a correlation of model score with BLEU (or the quality metric of
your choice).

Cheers,
Matthias




On Thu, 2015-06-18 at 07:50 +0700, Tom Hoar wrote:
> Amittai, I understand your point about sounding "almost belligerently 
> confrontational." I also admire Jame's passion and the Moses team's 
> patience to walk through his logic. As a non-scientific reader, this is 
> the most educational exchange I've seen on this list for years. I'm 
> learning a lot. Thank you everyone.
> 
> James, as a non-scientific reader, let me say that Hieu's head bashing 
> to solve the same puzzle shows you're in good company. Yet, the Moses 
> "system" is defined, designed and works with two functionally different 
> pieces, i.e. the front-end and back-end. The front-end creates a (an 
> often wild) array of candidate hypotheses -- by design. Why is this 
> piece designed this way? Because the system design includes a back-end 
> that selects a final choice from amongst the candidates. The two halves 
> share a symbiotic relationship. Together, the pieces form a system with 
> a balance that can only be achieved by working together. In this 
> context, this is not a "bug" (major or minor) and the "system" is not 
> broken.
> 
> I submit, as others have suggested, that you have conceived and are 
> working with a new and different "system" that consists of two different 
> halves. Your front-end reduces table to a focused set. Your back-end 
> works much like today's translation table to select from the focused 
> set. Major advances sometimes come by challenging the status quo. We 
> have seen evidence here of both the challenge and the status quo.
> 
> So, although I can not "admit the system is broke," I encourage you to 
> advance your new system without trying to fix one that's not broken.
> 
> Tom
> 
> 
> > Date: Wed, 17 Jun 2015 15:48:14 +
> > From: "Read, James C"
> > Subject: Re: [Moses-support] Major bug found in Moses
> > To: Marcin Junczys-Dowmunt
> > Cc:"moses-support@mit.edu"  , "Arnold,   
> > Doug"
> > Message-ID:
> > Content-Type: text/plain; charset="iso-8859-2"
> >
> > 1) So if I've understood you correctly you are saying we have a system that 
> > is purposefully designed to perform poorly with a disabled LM and this is 
> > the proof that the LM is the most fundamental part. Any attempt to prove 
> > otherwise by, e.g. filtering the phrase table to help the disfunctional 
> > search algorithm, does not constitute proof that the TM is the most 
> > fundamental component of the system and if designed correctly can perform 
> > just fine on its own but rather only evidence that the researcher is not 
> > using the system as intended (the intention being to break the TM to 
> > support the idea that the LM is the most fundamental part).
> >
> > 2) If you still feel that the LM is the most fundamental component I 
> > challenge you to disable the TM and perform LM only translations and see 
> > what kind of BLEU scores you get.
> >
> > In conclusion, I do hope that you don't feel that potential investors in MT 
> > systems lack the intelligence to see through these logical fallacies. Can 
> > we now just admit that the system is broke and get around to fixing it?
> >
> > James
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] When to truecase

2015-05-22 Thread Matthias Huck
Hi,

If your system output is lowercase, you could try SRILM's `disambig`
tool for predicting the correct casing in a postprocessing step.

http://www.speech.sri.com/projects/srilm/manpages/disambig.1.html

Cheers,
Matthias


On Fri, 2015-05-22 at 11:20 +0200, Ondrej Bojar wrote:
> Hi,
> 
> we also have an experiment on truecasing, see Table 1 in
> http://www.statmt.org/wmt13/pdf/WMT08.pdf
> 
> What works best for us is relying on the casing as guessed by the
> lemmatizer. (Our lemmatizer recognizes names as separate lemmas and
> keeps the lemma upcased; which we then cast to the token in the
> sentence.)
> 
> Moses recaser was the worst option, it was actually better to
> lowercase only the source side of the parallel data, i.e. have the
> main search also pick the casing.
> 
> Cheers, O.
> 
> - Original Message -
> > From: "Lane Schwartz" 
> > To: "Philipp Koehn" 
> > Cc: moses-support@mit.edu
> > Sent: Wednesday, 20 May, 2015 20:50:41
> > Subject: Re: [Moses-support] When to truecase
> 
> > Got it. So then, how was casing handled in the "mbr/mp" column? Was all of
> > the data lowercased, then models trained, then recasing applied after
> > decoding? Or something else?
> > 
> > On Wed, May 20, 2015 at 1:30 PM, Philipp Koehn  wrote:
> > 
> >> Hi,
> >>
> >> no, the changes are made incrementally.
> >>
> >> So the recesed "baseline" is the previous "mbr/mp" column.
> >>
> >> -phi
> >>
> >> On Wed, May 20, 2015 at 2:01 PM, Lane Schwartz  wrote:
> >>
> >>> Philipp,
> >>>
> >>> In Table 2 of the WMT 2009 paper, are the "baseline" and "truecased"
> >>> columns directly comparable? In other words, do the two columns indicate
> >>> identical conditions other than a single variable (how and/or when casing
> >>> was handled)?
> >>>
> >>> In the baseline condition, how and when was casing handled?
> >>>
> >>> Thanks,
> >>> Lane
> >>>
> >>>
> >>> On Wed, May 20, 2015 at 12:43 PM, Philipp Koehn  wrote:
> >>>
>  Hi,
> 
>  see Section 2.2 in our WMT 2009 submission:
>  http://www.statmt.org/wmt09/pdf/WMT-0929.pdf
> 
>  One practical reason to avoid recasing is the need
>  for a second large cased language model.
> 
>  But there is of course also the practical issue with
>  have a unique truecasing scheme for each data
>  condition, handling of headlines, all-caps emphasis,
>  etc.
> 
>  It would be worth to revisit this issue again under
>  different data conditions / language pairs. Both
>  options are readily available in EMS.
> 
>  Each of the two alternative methods could be
>  improved as well. See for instance:
>  http://www.aclweb.org/anthology/N06-1001
> 
>  -phi
> 
>  -phi
> 
> 
>  On Wed, May 20, 2015 at 12:31 PM, Lane Schwartz 
>  wrote:
> 
> > Philipp (and others),
> >
> > I'm wondering what people's experience is regarding when truecasing is
> > applied.
> >
> > One option is to truecase the training data, then train your TM and LM
> > using that truecased data. Another option would be to lowercase the 
> > data,
> > train TM and LM on the lowercased data, and then perform truecasing 
> > after
> > decoding.
> >
> > I assume that the former gives better results, but the latter approach
> > has an advantage in terms of extensibility (namely if you get more data 
> > and
> > update your truecase model, you don't have to re-train all of your TMs 
> > and
> > LMs).
> >
> > Does anyone have any insights they would care to share on this?
> >
> > Thanks,
> > Lane
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> >>>
> >>>
> >>> --
> >>> When a place gets crowded enough to require ID's, social collapse is not
> >>> far away.  It is time to go elsewhere.  The best thing about space travel
> >>> is that it made it possible to go elsewhere.
> >>> -- R.A. Heinlein, "Time Enough For Love"
> >>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>>
> >>
> > 
> > 
> > --
> > When a place gets crowded enough to require ID's, social collapse is not
> > far away.  It is time to go elsewhere.  The best thing about space travel
> > is that it made it possible to go elsewhere.
> >-- R.A. Heinlein, "Time Enough For Love"
> > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Mo

Re: [Moses-support] How can I change LM binarization in EMS without re-tuning?

2015-05-20 Thread Matthias Huck
Oh, are there two ways of doing this? 
I use config-with-reused-weights rather than weight-config. 


On Wed, 2015-05-20 at 15:11 -0400, Philipp Koehn wrote:
> Hi,
> 
> you can point to the previous configuration file with the old weights:
> 
> [TUNING]
> 
> ### instead of tuning with this setting, old weights may be recycled
> # specify here an old configuration file with matching weights
> #
> weight-config = $toy-data/weight.ini
> 
> -phi
> 
> On Wed, May 20, 2015 at 3:01 PM, Lane Schwartz  wrote:
> 
> > I've got a system that I trained using EMS. I'd like to change the
> > binarization of my LM (for example, the original used KenLM probing, and
> > now I want KenLM trie with quantization).
> >
> > If I simply change the lm-binarizer line in my config, EMS assumes that it
> > should re-run tuning. Is there a way that I can force it to not re-tune in
> > this case?
> >
> > Thanks,
> > Lane
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How can I change LM binarization in EMS without re-tuning?

2015-05-20 Thread Matthias Huck
Hi Lane,

Just do the LM binarization manually, edit the LM feature line in your
tuned moses.ini to point to your new binary LM, and tell the EMS where
to look for the tuned moses.ini:


[TUNING]
config-with-reused-weights = $working-dir/tuning/moses.tuned.ini.10


It won't run tuning if you set config-with-reused-weights. 

Cheers,
Matthias

On Wed, 2015-05-20 at 14:01 -0500, Lane Schwartz wrote:
> I've got a system that I trained using EMS. I'd like to change the
> binarization of my LM (for example, the original used KenLM probing, and
> now I want KenLM trie with quantization).
> 
> If I simply change the lm-binarizer line in my config, EMS assumes that it
> should re-run tuning. Is there a way that I can force it to not re-tune in
> this case?
> 
> Thanks,
> Lane
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] keep some features fixed when tuning

2015-05-20 Thread Matthias Huck
Hi Vito,

tuneable=false should work.

However, in case you use the EMS to run experiments, there's a pitfall:
If a filtered phrase table for tuning exists from a previous
experimental run, then the EMS will typically apply the filter and
replace the line for the phrase table feature function in your moses.ini
by the respective line from the filtered directory. In case the previous
run didn't have tuneable=false, this will be dropped. You can easily
check this and manually edit the filtered moses.ini in the tuning
directory. A filtered moses.ini can be specified in the EMS config:

[TUNING]
filtered-config = $working-dir/tuning/moses.filtered.ini.10

tuneable=false will keep all the score components of a feature function
at their initial values. If you need something similar for individual
components only, please have a look at this:
https://www.mail-archive.com/moses-support@mit.edu/msg11653.html

Cheers,
Matthias


On Wed, 2015-05-20 at 10:12 +0200, Vito Mandorino wrote:
> Dear all,
> 
> is it possible when tuning to tell Moses to constrain the value of a subset
> of the features to some fixed, given-in-advance values ?
> I would like to do that because I'm dealing with a very small tuning set,
> and I think that reducing the number of tuneable features will prevent
> overfitting.
> I have tried two approaches so far but results were not as expected (or
> desired):
> - add  tuneable=false  to the concerned features in the moses.ini
> - add
>   --decoder-flags "-weight-overwrite 'LM0= 0.086 WordPenalty0= -0.021
> PhrasePenalty0= 0.022 Distortion0= 0.3 TranslationModel1= 0.04 -0.01 0.25
> 0.19'"
>  to the mert-moses.pl command.
> 
> 
> Best regards,
> 
> Vito Mandorino
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] processLexicalTableMin with more than 6 scores

2015-05-15 Thread Matthias Huck
Hi,

Hmm, I thought the default number of lexical reordering scores was 8? 
At least for hier-mslr-bidirectional-fe it's 8.

[feature]
LexicalReordering name=LexicalReordering0 num-features=8 
type=hier-mslr-bidirectional-fe-allff input-factor=0 output-factor=0 
path=/model-dir/reordering-table.3.hier-mslr-bidirectional-fe

Cheers,
Matthias


On Fri, 2015-05-15 at 20:46 +0200, Marcin Junczys-Dowmunt wrote:
> Hi,
> When I look at my code, I would say it's not hardcoded at all. It 
> inspects the first score set and uses that number later on. I guess you 
> would need to provide a reasonable interpretation for that additional 
> scores in the feature function itself. It probable gets that scores, but 
> does not use them. Retrieval should just work unless I am missing something.
> 
> W dniu 15.05.2015 o 20:35, Michael Denkowski pisze:
> > Hi all,
> >
> > Has anyone successfully used a compact reordering model with extra 
> > score components?  I added some features to a reordering table and ran 
> > processLexicalTableMin, which appeared to encode everything (at least 
> > by output file size inspection), but moses still seemed to think it 
> > had only 6 scores.  I didn't see a way to specify nscores like in 
> > processPhraseTableMin and a brief attempt to change the hard coded 
> > score numbers in LexicalReorderingTableCompact.cpp didn't work.  Has 
> > anyone else looked at this?
> >
> > Best,
> > Michael
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] mert-moses.pl -continue

2015-04-27 Thread Matthias Huck
Hi,

Is there possibly a problem when continuing interrupted tuning runs with
sparse features? 

It seems to me that mert-moses.pl doesn't add the [weight-file] section
to the run*.moses.ini it creates right after resuming the tuning. That
would imply that no sparse weights are used in the next decoding
iteration.

Am I doing something wrong, or does anyone know a workaround? 

Cheers,
Matthias



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Question About matrix.stamt.org WMT 2014 Test Set

2015-04-27 Thread Matthias Huck
Hi Graham,

Did you have a look at the tarballs that were distributed last year?
http://www.statmt.org/wmt14/translation-task.html

There are three different version:

- Test sets (5.2 MB) These are the source sgm files with extra "filler"
sentences. They were the actual files released for the campaign. 
http://www.statmt.org/wmt14/test.tgz

- Filtered Test sets (3.2 MB) These are the source and reference sgm
files used to evaluate, i.e. the Test sets without the "filler"
sentences. If you want to reproduce results from the campaign, use
these.
http://www.statmt.org/wmt14/test-filtered.tgz

- Cleaned Test sets (3.2 MB) These include fixes to minor encoding
errors, and reinstate around 10% of the en-de data which was excluded
from the evaluation. For further research, use these.
http://www.statmt.org/wmt14/test-full.tgz

WMT has a Google Group:
https://groups.google.com/forum/#!forum/wmt-tasks

Cheers,
Matthias


On Mon, 2015-04-27 at 22:14 +0900, Graham Neubig wrote:
> Hi Moses List,
> 
> Sorry about this being a bit off topic, but I have a question about the
> files on matrix.statmt.org, and couldn't find any information about who to
> contact on the site and assumed that here would be the next-best place to
> ask.
> 
> Specifically, I'm looking for the SGM files for newstest2014 in the same
> order as the system outputs on matrix.statmt.org. On the "test sets" page,
> in the place where there should be a link to newstest2014, it seems like
> the link actually points to newstest2013:
> http://matrix.statmt.org/test_sets/list
> 
> And the ones downloadable from the WMT 2015 site seem to be in a different
> order, and it'd be a bit of a pain (although possible) to match the lines
> properly:
> http://www.statmt.org/wmt15/translation-task.html
> 
> If possible, could someone help out with this, or tell me who's in charge
> of the evaluation matrix so I can contact them directly?
> 
> Graham
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] [decoding-graph-backoff]

2015-04-19 Thread Matthias Huck
Okay fine. So the backoff apparently works even though the code seemed
to suggest something different to me. Got it wrong I guess.

Maybe there's some other problem in my setup, for instance with factored
generation steps.

Thanks anyway for looking into this.


On Sun, 2015-04-19 at 12:52 +0400, Hieu Hoang wrote:
> i used the following simple moses.ini file. the decoder seems to be
> doing what is expected. Models, input and & output attached.
> [input-factors]
> 0
> 
> [mapping]
> 0 T 0
> 1 T 1
> 
> 
> [feature]
> UnknownWordPenalty
> PhraseDictionaryCompact name=pt input-factor=0 output-factor=0
> path=pt.compact num-features=1
> PhraseDictionaryCompact name=pt2 input-factor=0 output-factor=0
> path=pt2.compact num-features=1
> 
> [weight]
> pt= 1
> pt2= 1
> 
> [decoding-graph-backoff]
>  0
>  1
> 
> 
> 
> 
> Hieu Hoang
> Researcher
> 
> New York University, Abu Dhabi
> 
> http://www.hoang.co.uk/hieu
> 
> 
> On 17 April 2015 at 01:26, Matthias Huck  wrote:
> I think your remark in the mail from January was correct, it
> has to be
> ePos-sPos+1 > backoff
> but currently still is
> ePos-sPos+1 <= backoff
> 
> Are you able to somehow test this?
> 
> 
> On Thu, 2015-04-16 at 23:57 +0400, Hieu Hoang wrote:
> > ah yes, I thought the backoff was doing the opposite to what
> it's
> > supposed to do so I changed the comparison around. I checked
> that it
> > backed off, but i didn't run it through tuning.
> >
> > it may still be wrong, or there may be strange interaction
> with the tuning.
> >
> >
> > On 16/04/2015 22:16, Matthias Huck wrote:
> > > Well, what's that business mentioned in your mail from
> January (quoted
> > > below), with the backoff being broken, then being broken
> more, then
> > > possibly been fixed - or not?
> > >
> > >
> 
> https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19
> > >
> > >
> > > I set up a system that was supposed to do backoff with
> factored
> > > generation steps, more or less in the manner of what's
> described in this
> > > paper: Interpolated Backoff for Factored Translation
> Models, Philipp
> > > Koehn and Barry Haddow, AMTA 2012.
> > >
> > > MIRA tunes all the weights of the backoff models to 0.
> With exactly the
> > > same configuration, this did not happen last year
> (February 2014). Maybe
> > > the [decoding-graph-backoff] setting didn't have any
> effect prior to
> > > some of your code modifications, and the models were
> actually competing
>     > > in older setups? Or it's buggy now. I can't really tell.
> > >
> > > I can show you the two setups if you want.
> > >
> > >
> > >
> > > On Thu, 2015-04-16 at 21:34 +0400, Hieu Hoang wrote:
> > >> Didn't know it has changed. How should it behave and how
> does it
> > >> actually behave?
> > >>
> > >> On 16 Apr 2015 21:04, "Matthias Huck"
>  wrote:
> > >>  Hi Hieu,
> > >>
> > >>  It seems that [decoding-graph-backoff] doesn't
> quite behave
> > >>  like last
> > >>  year any more. Can you briefly explain how its
> behaviour has
> > >>  changed,
> > >>  i.e. what it did before and what it does now?
> Can you please
> > >>  also let me
> > >>  know whether there's a way to reproduce the old
> behaviour via
> > >>  configuration options?
> > >>
> > >>  Cheers,
> > >>  Matthias
> > >>
> > >>
> > >>
> > >>  On Fri, 2015-01-09 at 15:20 +, Hieu Hoang
> wrote:
> > >>  > >From the git history, I think it was slightly
>   

Re: [Moses-support] Segfaulting with WordTranslationFeature

2015-04-17 Thread Matthias Huck
Hi Lexi,

The feature most likely won't be particularly important.

But this might be a completely different issue than you think. You
should debug this. Can you print the phrase pair that is applied when
the error occurs?

I recently came across a segfault that seemed to be caused by the OSM
feature. Then it turned out that a collision in the compact phrase table
entailed out-of-bounds word alignments. The solution (recommended by
Marcin) was to switch to modified parameters for phrase table
binarization, thereby avoiding the collision. The modified binarization
parameters are now default:
https://github.com/moses-smt/mosesdecoder/commit/506427368fbb9b980784ed55a68777be43896e8a

Cheers,
Matthias


On Fri, 2015-04-17 at 22:31 +0100, Alexandra Birch wrote:
> Hi there,
> 
> 
> I have a seg fault with a normal master branch of Moses from 1 month
> ago, on a normal seeming test sentence. This was an en-cs system, and
> it translated the first 6000+ sentences fine. It also translates a
> short version of the sentence fine.
> 
> 
> So 
> "Daniel , the previous owner" 
> 
> 
> translates fine but:
> 
> 
> "Daniel , the previous owner , supported the author cinema on the
> complex premises after having himself financed its construction" 
> 
> 
> 
> segfaults! 
> 
> 
> If I remove the WordTranslation feature, so I delete:
> 
> 
> < [feature]
> < WordTranslationFeature name=WT input-factor=0 output-factor=0
> simple=1 source-context=0 target-context=0
> 
> 
> 
> from the moses.ini file, then it stops segfaulting. 
> 
> 
> Any had this happen to them? Does anyone know how much this feature
> helps?
> 
> 
> Lexi 
> 
> 
> 
> 
> 
> 
> -- 
> --
> School of Informatics
> University of Edinburgh
> Phone  +44 (0)131 650-8286
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] [decoding-graph-backoff]

2015-04-16 Thread Matthias Huck
I think your remark in the mail from January was correct, it has to be
ePos-sPos+1 > backoff
but currently still is 
ePos-sPos+1 <= backoff

Are you able to somehow test this?


On Thu, 2015-04-16 at 23:57 +0400, Hieu Hoang wrote:
> ah yes, I thought the backoff was doing the opposite to what it's 
> supposed to do so I changed the comparison around. I checked that it 
> backed off, but i didn't run it through tuning.
> 
> it may still be wrong, or there may be strange interaction with the tuning.
> 
> 
> On 16/04/2015 22:16, Matthias Huck wrote:
> > Well, what's that business mentioned in your mail from January (quoted
> > below), with the backoff being broken, then being broken more, then
> > possibly been fixed - or not?
> >
> > https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19
> >
> >
> > I set up a system that was supposed to do backoff with factored
> > generation steps, more or less in the manner of what's described in this
> > paper: Interpolated Backoff for Factored Translation Models, Philipp
> > Koehn and Barry Haddow, AMTA 2012.
> >
> > MIRA tunes all the weights of the backoff models to 0. With exactly the
> > same configuration, this did not happen last year (February 2014). Maybe
> > the [decoding-graph-backoff] setting didn't have any effect prior to
> > some of your code modifications, and the models were actually competing
> > in older setups? Or it's buggy now. I can't really tell.
> >
> > I can show you the two setups if you want.
> >
> >
> >
> > On Thu, 2015-04-16 at 21:34 +0400, Hieu Hoang wrote:
> >> Didn't know it has changed. How should it behave and how does it
> >> actually behave?
> >>
> >> On 16 Apr 2015 21:04, "Matthias Huck"  wrote:
> >>  Hi Hieu,
> >>  
> >>  It seems that [decoding-graph-backoff] doesn't quite behave
> >>  like last
> >>  year any more. Can you briefly explain how its behaviour has
> >>  changed,
> >>  i.e. what it did before and what it does now? Can you please
> >>  also let me
> >>  know whether there's a way to reproduce the old behaviour via
> >>  configuration options?
> >>  
> >>  Cheers,
> >>  Matthias
> >>  
> >>  
> >>  
> >>  On Fri, 2015-01-09 at 15:20 +, Hieu Hoang wrote:
> >>  > >From the git history, I think it was slightly broken, then
> >>  I broke it even
> >>  > more in May 2014.
> >>  >
> >>  >
> >>  
> >> https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19
> >>  >
> >>  > It was
> >>  >endPos-startPos+1 >= backoff
> >>  > then
> >>  >   endPos-startPos+1 <= backoff
> >>  > I think it should be
> >>  >   endPos-startPos+1 > backoff
> >>  >
> >>  > I'll change it if it's ok with everyone
> >>  >
> >>  >
> >>  > On 9 January 2015 at 15:11, Marcin Junczys-Dowmunt
> >>  
> >>  > wrote:
> >>  >
> >>  > >  Hm, we have been using it at WIPO, but I have to admit I
> >>  never checked
> >>  > > it _actually_ does anything useful. We sorta believe it
> >>  does.
> >>  > >
> >>  > > W dniu 09.01.2015 o 16:08, Hieu Hoang pisze:
> >>  > >
> >>  > >   Hi All
> >>  > >
> >>  > >  Does anyone use this functionality in Moses when you have
> >>  multiple
> >>  > > phrase-tables?
> >>  > >
> >>  > >  From the code, it doesn't look like it works as described
> >>  in
> >>  > >   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures
> >>  > >
> >>  > >  Maybe I'm missing something
> >>  > >
> >>  > > --
> >>  > > Hieu Hoang
> >>  > > Research Associate
> >>  > > University 

Re: [Moses-support] [decoding-graph-backoff]

2015-04-16 Thread Matthias Huck
Well, what's that business mentioned in your mail from January (quoted
below), with the backoff being broken, then being broken more, then
possibly been fixed - or not?

https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19


I set up a system that was supposed to do backoff with factored
generation steps, more or less in the manner of what's described in this
paper: Interpolated Backoff for Factored Translation Models, Philipp
Koehn and Barry Haddow, AMTA 2012.

MIRA tunes all the weights of the backoff models to 0. With exactly the
same configuration, this did not happen last year (February 2014). Maybe
the [decoding-graph-backoff] setting didn't have any effect prior to
some of your code modifications, and the models were actually competing
in older setups? Or it's buggy now. I can't really tell.

I can show you the two setups if you want.



On Thu, 2015-04-16 at 21:34 +0400, Hieu Hoang wrote:
> Didn't know it has changed. How should it behave and how does it
> actually behave?
> 
> On 16 Apr 2015 21:04, "Matthias Huck"  wrote:
> Hi Hieu,
> 
> It seems that [decoding-graph-backoff] doesn't quite behave
> like last
> year any more. Can you briefly explain how its behaviour has
> changed,
> i.e. what it did before and what it does now? Can you please
> also let me
> know whether there's a way to reproduce the old behaviour via
> configuration options?
> 
> Cheers,
> Matthias
> 
> 
> 
> On Fri, 2015-01-09 at 15:20 +, Hieu Hoang wrote:
> > >From the git history, I think it was slightly broken, then
> I broke it even
> > more in May 2014.
> >
> >
> 
> https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19
> >
> > It was
> >endPos-startPos+1 >= backoff
> > then
> >   endPos-startPos+1 <= backoff
> > I think it should be
> >   endPos-startPos+1 > backoff
> >
> > I'll change it if it's ok with everyone
> >
> >
> > On 9 January 2015 at 15:11, Marcin Junczys-Dowmunt
> 
> > wrote:
> >
> > >  Hm, we have been using it at WIPO, but I have to admit I
> never checked
> > > it _actually_ does anything useful. We sorta believe it
> does.
> > >
> > > W dniu 09.01.2015 o 16:08, Hieu Hoang pisze:
> > >
> > >   Hi All
> > >
> > >  Does anyone use this functionality in Moses when you have
> multiple
> > > phrase-tables?
> > >
> > >  From the code, it doesn't look like it works as described
> in
> > >   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures
> > >
> > >  Maybe I'm missing something
> > >
> > > --
> > > Hieu Hoang
> > > Research Associate
> > > University of Edinburgh
> > > http://www.hoang.co.uk/hieu
> > >
> > >
> > >
> > > ___
> > > Moses-support mailing
> 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> > >
> > >
> > > ___
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> > >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > The University of Edinburgh is a charitable body, registered
> in
> > Scotland, with registration number SC005336.
> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered
> in
> Scotland, with registration number SC005336.
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] [decoding-graph-backoff]

2015-04-16 Thread Matthias Huck
Hi Hieu,

It seems that [decoding-graph-backoff] doesn't quite behave like last
year any more. Can you briefly explain how its behaviour has changed,
i.e. what it did before and what it does now? Can you please also let me
know whether there's a way to reproduce the old behaviour via
configuration options?

Cheers,
Matthias



On Fri, 2015-01-09 at 15:20 +, Hieu Hoang wrote:
> >From the git history, I think it was slightly broken, then I broke it even
> more in May 2014.
> 
> https://github.com/moses-smt/mosesdecoder/commit/44fec57c535db2df73ccbb1628d8143a9c728c19
> 
> It was
>endPos-startPos+1 >= backoff
> then
>   endPos-startPos+1 <= backoff
> I think it should be
>   endPos-startPos+1 > backoff
> 
> I'll change it if it's ok with everyone
> 
> 
> On 9 January 2015 at 15:11, Marcin Junczys-Dowmunt 
> wrote:
> 
> >  Hm, we have been using it at WIPO, but I have to admit I never checked
> > it _actually_ does anything useful. We sorta believe it does.
> >
> > W dniu 09.01.2015 o 16:08, Hieu Hoang pisze:
> >
> >   Hi All
> >
> >  Does anyone use this functionality in Moses when you have multiple
> > phrase-tables?
> >
> >  From the code, it doesn't look like it works as described in
> >   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures
> >
> >  Maybe I'm missing something
> >
> > --
> > Hieu Hoang
> > Research Associate
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> >
> >
> >
> > ___
> > Moses-support mailing 
> > listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] swap glue rule

2015-04-14 Thread Matthias Huck
Hi,

On Fri, 2015-04-10 at 14:11 +, Rico Sennrich wrote:
> you'll probably also need to model the probability of swaps somehow. The
> simplest version is just a binary feature, but adding lexicalized models
> should help. The PhraseOrientationFeature in moses is an implementation of
> this paper:
> 
> Matthias Huck, Joern Wuebker, Felix Rietig, and Hermann Ney.
> A Phrase Orientation Model for Hierarchical Machine Translation.
> In ACL 2013 Eighth Workshop on Statistical Machine Translation (WMT 2013),
> pages 452-463, Sofia, Bulgaria, August 2013.
> 
> I don't know if the usage of the feature is documented anywhere.


In Moses, this feature can currently only be used for syntax-based
translation, not with hierarchical phrase tables. I did not implement
the necessary extensions in Moses' hierarchical phrase extractor, only
in the GHKM extractor.

Also, I made some implementation decisions that are slightly different
from our original work in the Jane toolkit (as described in our WMT 2013
paper).

Please note that the code for this feature has not been tested
extensively. I was mainly interested in whether the model is also
beneficial in syntax-based translation, and the results didn't look too
promising on the language pair I was working on.

I recommend that you contact me in case you want to make use of this (or
want to extend Moses' hierarchical extractor to support this).

Alternatively, you could use the Jane implementation:
http://www.hltpr.rwth-aachen.de/jane/

Cheers,
Matthias




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] n-best list reranking

2015-03-27 Thread Matthias Huck
Hi,

Right, if the `nbest` tool from CSLM is supposed to work with sparse
features, then it needs to read the names.

An n-best list entry with sparse feature scores may look like this:

0 ||| Orlando Bloom und Miranda Kerr noch lieben  ||| LexicalReordering0= 
-2.29848 0 0 0 -1.93214 0 0 0 
LexicalReordering0_phr-src-last-c200-cluster_162-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_41-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_134-0= 1 
LexicalReordering0_phr-src-last-c200-cluster_189-0= 1 
LexicalReordering0_phr-tgt-first-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-first-c200-cluster_34-0= 1 
LexicalReordering0_stk-src-first-c200-cluster_59-0= 3 
LexicalReordering0_phr-tgt-first-c200-cluster_134-0= 1 
LexicalReordering0_phr-tgt-last-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-last-c200-cluster_34-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_59-0= 3 
LexicalReordering0_phr-src-last-c200-cluster_126-0= 1 
LexicalReordering0_phr-tgt-first-c200-cluster_119-0= 1 
LexicalReordering0_phr-tgt-last-c200-cluster_134-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_59-0= 3 
LexicalReordering0_phr-src-last-c200-cluster_59-0= 3 LexicalReordering0_stk-!
 src-first-c200-cluster_162-0= 1 
LexicalReordering0_stk-src-first-c200-cluster_189-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_162-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_189-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_34-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-last-c200-cluster_133-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_162-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_189-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_134-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_34-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_54-0= 3 OpSequenceModel0= -31.707 
0 0 0 0 Distortion0= 0 LM0= -36.858 WordPenalty0= -7 PhrasePenalty0= 6 
TranslationModel0= -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 
4.99948 ||| -4.99724

There can be many thousand different sparse features
"LexicalReordering0_*" which fire on one particular set and in
hypotheses which make it to the 100-best list.

The amount of features in different n-best list entries can vary.

It seems to me that the `nbest` tool from CSLM v3 cannot deal with this.
I had a brief look at the code, and I ran: 

$ nbest -i in.100best -o out.100best

(Without specifying any new weights.)

It processes the list but outputs this:

0 |||  Orlando Bloom und Miranda Kerr noch lieben   ||| 0 -2.29848 0 0 0 
-1.93214 0 0 0 0 1 0 1 0 1 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 1 0 1 0 3 0 3 
0 1 0 1 0 1 0 1 0 1 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 -31.707 0 0 0 0 0 0 0 -36.858 
0 -7 0 6 0 -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| 
-4.99724

I think it just takes every token in the scores column and treats it as
a dense score (even including the feature names). Probably nobody
bothered to adapt it to the current format yet.

It would be a minor modification I suppose. The tool just needs to read
and store feature names. Weights would have to be stored by name as
well. They would have to be read from a sparse weights file:

...
LexicalReordering0_btn-src-first-c200-cluster_119-3 0.00840371
LexicalReordering0_btn-src-first-c200-cluster_12-2 0.000442284
LexicalReordering0_btn-src-first-c200-cluster_12-3 0.00182486
LexicalReordering0_btn-src-first-c200-cluster_120-2 5.34991e-06
LexicalReordering0_btn-src-first-c200-cluster_120-3 0.0143345
...

Is CSLM on GitHub? If you don't have a more recent version of the nbest
tool, and nobody else has anything equivalent, then I might take your
code base and just add the few bits that are missing in your tool. It
can be implemented quickly, I'm sure.

I don't want to add any new feature scores using the tool. I only want
to utilize it in order to calculate new overall scores given a weights
file with sparse features, and then to reorder the n-best list entries.
Not a big deal.

Basically, I would think that there should be some functioning tool
readily available for such a seemingly common task. But I'm not aware of
any. Maybe people code a new Perl script for this task on-demand each
time they need it? Or maybe some individual piece of code in the Moses
tuning pipeline does this, and only this?

Cheers,
Matthias




On Fri, 2015-03-27 at 23:48 +0100, Holger Schwenk wrote:
> Hello Matthias,
> 
> could you give us an idea what is missing in the CSLM reranker to make 
> it work for sparse features ?
> 
> Right now, we do not parse the names of the feature functions and store 
> the numerical values only.
> In principle, this could changed ...
> 
> Then it depends how you want to rescore the sparse features.
> The CSLM toolkit can rescore with an back-off LM and Moses on-disk 
> phrase tables (and obviously neural networks).
> 
> Why not adding more functionality ..

[Moses-support] n-best list reranking

2015-03-27 Thread Matthias Huck
Hi,

I'm looking for a tool to rerank n-best lists in Moses' current format,
including sparse features. The CSLM toolkit has quite a nice re-ranker
implementation, but apparently it doesn't know sparse features yet.

If anyone already has an extended version of the existing re-ranker from
the CSLM toolkit, or alternatively any other code that does the same and
can also deal with sparse features, please let me know. I'd prefer to
not spend any time at all on implementing this myself, as I'll probably
need to run it only a few times for testing purposes.

Cheers,
Matthias


> On 29 Apr 20:46 2013, Holger Schwenk wrote:
> 
> Hello,
> 
> you can do n-best list rescoring with the nbest tool which is part of 
> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/)
> It is designed to rescore with back-off or continuous space LMs, but is 
> shouldn't be difficult to add your won feature functions.
> 
> don't ask to contact me if you need help.
> 
> best,
> 
> Holger



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses segmentation fault under multi-thread

2015-03-11 Thread Matthias Huck
Hi,

I've recently been using these sparse feature functions without any
issues in multi-threaded chart-based decoding. There might be a problem
with thread safety, but I currently can't tell why you got the
segmentation fault. You should investigate this in more detail.

Cheers,
Matthias



On Wed, 2015-03-11 at 12:12 +, Liangyou Li wrote:
> When I run moses with sparse features in multi-threads ( parameter -threads 
> all), I got segmentation fault.
> 
> 
> The three sparse features are:
> SourceWordDeletionFeature factor=0
> TargetWordInsertionFeature factor=0
> WordTranslationFeature input-factor=0 output-factor=0
> 
> 
> The exact command I used is:
> moses_chart -threads all -f moses.ini -i input -n-best-list nbest 100 
> distinct
> 
> 
> This error happens kind of randomly​. But it only happens when sparse 
> features and multi-threads are used.
> 
> 
> I've tried several times to use gdb to trace the error. Fortunately, I just 
> get  the back-trace info, as listed:
> 
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdae77700 (LWP 32390)]
> 0x7711f5e3 in std::basic_ostream >& 
> std::operator<< , std::allocator 
> >(std::basic_ostream >&, std::basic_string std::char_traits, std::allocator > const&) () from 
> /usr/lib64/libstdc++.so.6
> Missing separate debuginfos, use: zypper install 
> glibc-debuginfo-2.15-22.17.1.x86_64 
> libgcc47-debuginfo-4.7.1_20120723-1.1.1.x86_64 
> liblzma5-debuginfo-5.0.3-12.2.2.x86_64 
> libstdc++47-debuginfo-4.7.1_20120723-1.1.1.x86_64 
> zlib-debuginfo-1.2.7-2.1.2.x86_64
> (gdb) backtrace
> #0  0x7711f5e3 in std::basic_ostream >& 
> std::operator<< , std::allocator 
> >(std::basic_ostream >&, std::basic_string std::char_traits, std::allocator > const&) () from 
> /usr/lib64/libstdc++.so.6
> #1  0x00584711 in Moses::operator<< (out=..., name=...) at 
> moses/FeatureVector.cpp:141
> #2  0x00440232 in 
> Moses::ScoreComponentCollection::GetVectorForProducer (this=0x7fffad0e9380, 
> sp=0xe2aab0) at moses/ScoreComponentCollection.cpp:293
> #3  0x004407da in 
> Moses::ScoreComponentCollection::OutputFeatureScores (this=0x7fffad0e9380, 
> out=..., ff=0xe2aab0, lastName="LM0") at 
> moses/ScoreComponentCollection.cpp:351
> #4  0x0044055a in 
> Moses::ScoreComponentCollection::OutputAllFeatureScores (this=0x7fffad0e9380, 
> out=...) at moses/ScoreComponentCollection.cpp:319
> #5  0x00561ce4 in Moses::ChartManager::OutputNBestList 
> (this=0x7fffacbe0010, collector=0x9320d210, nBestList=std::vector of length 
> 100, capacity 128 = {...}, translationId=569)
> at moses/ChartManager.cpp:381
> #6  0x005619d6 in Moses::ChartManager::OutputNBest 
> (this=0x7fffacbe0010, collector=0x9320d210) at moses/ChartManager.cpp:335
> #7  0x00467cd6 in Moses::TranslationTask::Run (this=0x214d90b0) at 
> moses/TranslationTask.cpp:111
> #8  0x00530b69 in Moses::ThreadPool::Execute (this=0x7fffd710) at 
> moses/ThreadPool.cpp:61
> #9  0x00534ecd in boost::_mfi::mf0 Moses::ThreadPool>::operator() (this=0xf263088, p=0x7fffd710) at 
> /home/lly/plateform/boost_1_55_0/include/boost/bind/mem_fn_template.hpp:49
> #10 0x00534e30 in 
> boost::_bi::list1 
> >::operator(), boost::_bi::list0> 
> (this=0xf263098, f=..., a=...)
> at /home/lly/plateform/boost_1_55_0/include/boost/bind/bind.hpp:253
> #11 0x00534dd5 in boost::_bi::bind_t Moses::ThreadPool>, boost::_bi::list1 > 
> >::operator() (this=0xf263088)
> at 
> /home/lly/plateform/boost_1_55_0/include/boost/bind/bind_template.hpp:20
> #12 0x00534d9a in boost::detail::thread_data boost::_mfi::mf0, 
> boost::_bi::list1 > > >::run 
> (this=0xf262ed0)
> at 
> /home/lly/plateform/boost_1_55_0/include/boost/thread/detail/thread.hpp:117
> #13 0x008b4752 in thread_proxy ()
> #14 0x76965e0e in start_thread () from /lib64/libpthread.so.0
> #15 0x7669d2cd in clone () from /lib64/libc.so.6
> 
> 
> ​Has anyone ​had the problem before ?  Any ideas on solving this ?
> Many Thanks ?
> 
> 
> 
> 
> PS: In my experiment, I found the function " void 
> CompletedRuleCollection::Add(const TargetPhraseCollection &tpc,  const 
> StackVec &stackVec,  const std::vector &stackScores,  const 
> ChartParserCallback &outColl) " does not consider " m_ruleLimit " . So after 
> adding parameter " -rule-limit 0 ", the decoder can only collect one 
> translation option.
> 
> 
> 
> 
> Cheers
> Liangyou
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> Liangyou Li
> CNGL
> School of Computing
> Dublin City University
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://m

Re: [Moses-support] Can I buy ready to use phrase-table

2015-03-03 Thread Matthias Huck
Hi,

Some pre-trained models for Moses Release 3.0 have been made publicly
available anyway:
http://www.statmt.org/moses/RELEASE-3.0/models/
http://www.statmt.org/moses/?n=moses.releases
http://www.statmt.org/mosescore/uploads/Internal/D1.4_Moses_v3_Release_Notes.pdf

I can't tell whether you're allowed to use them for commercial purposes,
though. Hieu or Philipp should be able to clarify that. 

Also, in order to improve translation quality on your task, maybe you'd
want your systems to be trained either on larger amounts of data, or
using your own (in-domain) training data, or both.

You can build your own custom systems if you're willing to spend time
and computational resources on it.

Cheers,
Matthias


On Tue, 2015-03-03 at 12:37 +0800, Александр Паньшин wrote:
> Maybe, somebody knows can I buy ready-to-use phrase-tables and moses
> configuration?
> 
> 
> Here in our project we need to translate text from many european
> languages to english.
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Documentation describing Moses n-best list extraction

2015-02-28 Thread Matthias Huck
On Sat, 2015-02-28 at 17:11 +, Matthias Huck wrote:
> On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote:
> > i've never seen the phrase-based n-best extraction explicitly
> > described. There was a paper on directed graph enumeration (I forget
> > which) that was helpful t me when I was implementing it. 
> 
> 
> Maybe this?
> 
> Hart, P., Nilsson, N., and Raphael, B., "A Formal Basis for the
> Heuristic Determination of Minimum Cost Paths," IEEE Trans. Syst.
> Science and Cybernetics, SSC-4(2):100-107, 1968. 
> 
> http://ai.stanford.edu/~nilsson/OnlinePubs-Nils/PublishedPapers/astar.pdf


And here's a paper from 2002 that describes the application of A* for
n-best list extraction from word graphs in statistical machine
translation (Section 4.3 & Table 1):

N. Ueffing, F. J. Och, and H. Ney. Generation of Word Graphs in
Statistical Machine Translation. In Proc. of the Conference on Empirical
Methods for Natural Language Processing, pages 156-163, Philadelphia,
PA, USA, July 2002. 

https://www-i6.informatik.rwth-aachen.de/publications/download/443/UeffingN.OchF.J.NeyH.--GenerationofWordGraphsinStatisticalMachineTranslation--2002.ps


> 
> 
> 
> > However, it's a fairly simple dynamic programming algorithm
> > 
> > 
> > The scfg-based extraction is different. I think it's based on 1 of
> > Liang Huang's paper, however, Phil can tell you more. 
> > 
> > 
> > It was formerly based on the same algorithm as pb, but it was found
> > out to be incorrect and missing some paths
> > 
> > Hieu Hoang
> > Research Associate (until March 2015)
> > 
> > ** searching for interesting commercial MT position **
> > 
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> > 
> > 
> > 
> > On 25 February 2015 at 13:08, Lane Schwartz 
> > wrote:
> > Is there a particular paper that describes the current
> > technique(s)
> > used for n-best list extraction within Moses?
> > 
> > Thanks,
> > Lane
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Documentation describing Moses n-best list extraction

2015-02-28 Thread Matthias Huck
On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote:
> i've never seen the phrase-based n-best extraction explicitly
> described. There was a paper on directed graph enumeration (I forget
> which) that was helpful t me when I was implementing it. 


Maybe this?

Hart, P., Nilsson, N., and Raphael, B., "A Formal Basis for the
Heuristic Determination of Minimum Cost Paths," IEEE Trans. Syst.
Science and Cybernetics, SSC-4(2):100-107, 1968. 

http://ai.stanford.edu/~nilsson/OnlinePubs-Nils/PublishedPapers/astar.pdf



> However, it's a fairly simple dynamic programming algorithm
> 
> 
> The scfg-based extraction is different. I think it's based on 1 of
> Liang Huang's paper, however, Phil can tell you more. 
> 
> 
> It was formerly based on the same algorithm as pb, but it was found
> out to be incorrect and missing some paths
> 
> Hieu Hoang
> Research Associate (until March 2015)
> 
> ** searching for interesting commercial MT position **
> 
> University of Edinburgh
> http://www.hoang.co.uk/hieu
> 
> 
> 
> On 25 February 2015 at 13:08, Lane Schwartz 
> wrote:
> Is there a particular paper that describes the current
> technique(s)
> used for n-best list extraction within Moses?
> 
> Thanks,
> Lane
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Single score in phrase table

2015-02-24 Thread Matthias Huck
Set a higher weight for UnknownWordPenalty? Maybe the default is not
adequate if you do strange things like this.


On Tue, 2015-02-24 at 23:49 +0100, Marcin Junczys-Dowmunt wrote:
> Hi,
> I have a problem with a single score phrase table. All scores have been 
> combined into one score as a linear combination of scores and weights. 
> However, for both, my compact phrase table the the in memory phrase 
> table, all input result in UNK for all input tokens. The phrases are 
> correctly found and returned by both phrase tables (including future 
> score calculation), so this happens somewhere later. Any ideas?
> 
> Best,
> Marcin
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck
Hi,

That's really not at all what is supposed to happen. You should get only
unique entries in the n-best list with the "distinct" parameter. (Maybe
less than 50 if n-best-factor is set to a low value, but there shouldn't
be any duplicates.)

I cannot find any reason why the "distinct" parameter wouldn't do what
it's supposed to do. But maybe I'm missing something. The relevant
method should be Manager::CalcNBest() (in moses/Manager.cpp). As far as
I can tell, there have been no recent modifications to it in Moses
master. 

Please try to investigate what's going on (if you have the time).

Also note that n-best-factor takes effect only if distinct is active.
There's no point in setting it if distinct is inactive or
malfunctioning. It would potentially help you to fill up your n-best
list if you got less than n (=50) entries with the distinct parameter.

Cheers,
Matthias


On Tue, 2015-02-24 at 21:08 +0200, Erinç Dikici wrote:
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
> 
> Thanks again for your quick answers.
> 
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total
> number
> of hypotheses returned for both cases was 50.
> 
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved
> the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
> 
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out
> of a
> 50-best list) with the same configuration back in version 0.x. I hope
> the
> new -n-best-factor will do the trick.
> 
> Best,
> 
> ED
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck
Hi Erinç,

On Tue, 2015-02-24 at 16:24 +, Matthias Huck wrote:
> I'd assume that your 32 entries of the n-best list weren't actually
> unique, though, but a number of duplicates of the (two) very same
> outputs, as "distinct" should simply avoid duplicate entries.

Actually, could you please check for us whether I'm right with this
assumption? If I'm not, then some other modification since version 2.1
might affect your experiment. I hope that's not the case.

Run something like
cut -d'|' -f4 | sort | uniq | wc -l
on the n-best list with 32 entries. It should print 2.

Or did you do this already? (You're mentioning "unique hypotheses" in
your mail.)

Cheers,
Matthias



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck

> somewhere between 2.1 and 3.0, the keyword 'distinct' was 

Oops, that was me. And it wasn't intended. I'm using this for my own
setups and apparently copied it to master when I added some other stuff.
Hope I didn't mess up other people's experiments. It's been in master
since 7 August 2014 already and nobody noticed.

Sorry for that, you can remove it again if you want. 
Lines 1280 and 1282 of scripts/training/mert-moses.pl .

I'd assume that your 32 entries of the n-best list weren't actually
unique, though, but a number of duplicates of the (two) very same
outputs, as "distinct" should simply avoid duplicate entries.

Here's a link to a related previous discussion on this mailing list:
http://comments.gmane.org/gmane.comp.nlp.moses.user/11097
You can try the parameter "n-best-factor".



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses with SRILM Compile Error

2015-02-23 Thread Matthias Huck
Hi,

Are you trying to build a 64-bit Moses, but linking with 32-bit SRILM
libraries?

Use

make MACHINE_TYPE=i686-m64 World

in order to build SRILM for the x86-64 architecture.

Cheers,
Matthias


On Mon, 2015-02-23 at 11:28 +0800, TinTin Kalaw wrote:
> Hello,
> 
> 
> Attached here is the build.log.gz after conducting the clean build.
> The command I used was ./bjam --with-boost=/usr/local --with-srilm=
> $SRILM.
> 
> 
> Thanks.
> 
> Regards,
> 
> Kristine Ma. Dominique F. Kalaw
> Contact No.: 0927.854.4201
> Email: tintin.ka...@gmail.com
> 
> 
> On Mon, Feb 23, 2015 at 5:32 AM, Barry Haddow
>  wrote:
> Hi TinTin
> 
> First, make sure you do a clean build, and that you're
> absolutely sure Moses is linking against SRILM V1.6. If the
> build still fails, then post your log to the list,
> 
> cheers - Barry
> 
> Quoting TinTin Kalaw  on Sun, 22 Feb
> 2015 22:48:03 +0800:
> 
> It is because this other tool that I need (a thesis
> project of an
> upperclassmen) was made with SRILM. To run their
> project, SRILM must work.
> 
> I used an older version of SRILM (v1.6.0). The build
> still failed.
> 
> Thank you for your fast reply.
> 
> Regards,
> 
> 
> 
> *Kristine Ma. Dominique F. KalawContact No.:
> 0927.854.4201Email:
> tintin.ka...@gmail.com *
> 
> On Sun, Feb 22, 2015 at 5:30 PM, Hieu Hoang
>  wrote:
> 
>  use a older version of SRILM.
> 
> Can I ask why you use SRILM in more detail? Do
> you use it to create
> language models, or within the decoder to look
> up LM scores? In both cases,
> there are now better tools to use than SRILM
> 
> 
> On 22/02/15 07:22, TinTin Kalaw wrote:
> 
> Good day!
> 
>  Whenever I try to compile Moses with the
> *--with-srilm=/my/path/to/srilm*,
> I get a compilation error. If I compile it
> with the
> *--with-boost=/my/path/to/boost* or with just
> *./bjam*, it is a success.
> Unfortunately I cannot use an alternative to
> SRILM because this other tool
> that I am using makes use of SRILM and Moses.
> 
>  I have already successfully
> installed/compiled the other
> tools/packages/dependencies that Moses needs.
> My machine is running on a
> dual-boot OS of *Windows 8.1* and *Ubuntu
> 14.04 LTS 64-bit*. I am
> currently using Ubuntu. I used *Giza v1.0.7*,
> *SRILM v1.7.1*, *Boost
> 1_57_0*, and the version of *Moses* as of Feb
> 7 2015.
> 
>  Attached here is the *build.log.gz* of the
> command *./bjam
> --with-srilm=$SRILM*.
> 
>  I am hoping for your swift response regarding
> this issue. Thank you.
> 
>  --
>   Regards,
> 
> 
> 
> *Kristine Ma. Dominique F. Kalaw Contact No.:
> 0927.854.4201 Email:
> tintin.ka...@gmail.com
> *
> 
> 
> ___
> Moses-support mailing
> 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> --
> Hieu Hoang
> Research Associate (until March 2015)
> ** searching for interesting commercial MT
> position **
> University of
> Edinburghhttp://www.hoang.co.uk/hieu
> 
> 
> 
>   

Re: [Moses-support] Tuning with mert-moses.perl error

2015-02-17 Thread Matthias Huck
Hi Hayo,

Can you please do two things:

1.) Send me the file filtered/moses.ini so that I can have a look at the
feature functions and scaling factors in there.

2.) Tell me the Git commit ID of the Moses version you're working with.
A bug was put into master with commit 70e8eb5. It's been fixed a couple
of days later (commit 0de206f). If you've checked out Moses from GitHub
with the bug, you need to update to the most recent code base and the
error most likely will be gone.

Cheers,
Matthias


On Tue, 2015-02-17 at 17:01 +0100, Hacksawhawk . wrote:
> Hi,
> 
> 
> While trying to tune the translation system I created, I ran into the
> following erorr:
> 
> The following weights have no feature function. Maybe incorrectly
> spelt weights: ,Exit code: 1
> The decoder died. CONFIG WAS -weight-overwrite 'PhrasePenalty0=
> 0.043478 WordPenalty0= -0.217391 TranslationModel0= 0.043478 0.043478
> 0.043478 0.043478 Distortion0= 0.065217 LM0= 0.108696
> LexicalReordering0= 0.065217 0.065217 0.065217 0.065217 0.065217
> 0.065217'
> 
> 
> It seems that mert-moses.pl is rearranging the weight features and
> then trying to overwrite the weight features in the moses.ini file but
> in the wrong order, is this the cause of the error? 
> 
> I have also attached the mert.out file, hopefully this will provide
> more information.
> 
> 
> thanks in advance,
> 
> Hayo
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Untuneable feature score components?

2015-02-16 Thread Matthias Huck
Hi,

I've added a feature function configuration parameter
"tuneable-components", as discussed in previous mails.

https://github.com/moses-smt/mosesdecoder/commit/6028c7cf9c256e5df80f71b52d912c47dab31abd

Let me know in case you notice any issues related to this.

Cheers,
Matthias


On Fri, 2015-01-23 at 18:37 +, Matthias Huck wrote:
> On Fri, 2015-01-23 at 18:18 +, Hieu Hoang wrote:
> > True, but that complicates the framework, and doesn't deal with sparse
> > features. 
> 
> Why does it complicate the framework? Isn't the trick about "tuneable"
> mostly that you don't write those scores to the n-best list? 
> We can even keep a boolean "tuneable" parameter and have another
> parameter "tuneable-components" (boolean vector sized like the weights
> vector).
> > 
> 
> > By adding another ff which grabs scores from the pt, u can arbitrarily
> > transform the scores 
> 
> Yeah I know. I can get around these things by writing more feature
> functions. For removing scores from phrase tables, I can also just
> process the phrase table file with awk, delete the score columns I don't
> need and write it to another file. But something user-friendly would be
> more appealing. Setting up contrastive experiments could be done much
> more rapidly with what I'm asking for. And maybe somebody on the mailing
> list has implemented this and never put it into master?
> 
> I want it for MIRA, btw.
> 
> I think it should be added if it doesn't exist somewhere yet. Unless
> someone has strong objections.
> 
> 
> > 
> > On 23 January 2015 18:09:11 GMT+00:00, Matthias Huck
> >  wrote:
> > That's not flexible enough. There should be something like:
> > 
> > [feature]
> > MyFeature name=MyFeature0 tuneable=0,1,0
> > 
> > [weight]
> > MyFeature0= 0.0 0.1 1
> > 
> > 
> > MyFeature has 3 score components. I want to tune the second 
> > component,
> > deactivate the first component, and set the scaling factor of the 
> > third
> > component manually to 1.
> > 
> > Currently the "tuneable" parameter is boolean and allows me to 
> > manually
> > set scaling factors for either no score component or all of them. 
> > 
> > 
> > 
> > 
> > On Fri, 2015-01-23 at 17:38 +, Hieu Hoang wrote:
> > The whole feature becomes untuneable. 
> >  
> >  I suppose u can make the pt untuneable, the write another 
> > (tuneable)
> >  ff which grabs whatever scores it wants from the pt
> >  
> >  On 23 January 2015 16:55:56 GMT+00:00, Matthias!
> >   Huck
> >   wrote:
> >  Hi,
> >  
> >  Is there any existing functionality to set only 
> > specific score
> >  components of a feature function as untuneable?
> >  
> >  Feature functions have a boolean "tuneable" 
> > parameter, but it affects
> >  all the scores produced by it. It doesn't help in 
> > case I want to switch
> >  off individual scores from a phrase table, for 
> > instance. Or manually
> >  assign large scaling factors to certain score 
> > components prior to
> >  tuning. As far as I know, right now I'm only able 
> > to do so if the score
> >  I'm interested in is the only score produced by a 
> > feature function.
> >  
> >  If anyone has already implemented something like 
> > that, please let me
> >  know.
> >  
> >  Cheers,
> >  Matthias
> >  
> >  
> >  
> >  
> >  --
> >  Sent while bumping !
> >  into
> > things 
> > 
> > 
> > 
> > --
> > Sent while bumping into things 
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] I can't get any output from my syntactic baseline.

2015-01-26 Thread Matthias Huck
Hi,

Yes, we use berkeleyparsed2mosesxml.perl .

Typically these kinds of errors happen if the EMS was misconfigured. But
I don't even know whether you used the EMS at all. You should try to
find the command line for the execution of the phrase extraction binary
in your logs. Then have a look at the parallel corpus and word alignment
that were passed to it. If the data has annotation issues, then track
them back to previous steps. If the data looks okay, then maybe we have
a bug in the extractor.

This is SAMT, not GHKM, right? We mostly use GHKM syntax in Edinburgh at
the moment. However, Hieu should know more in case there has been a
recent modification to SAMT grammar extraction.

Cheers,
Matthias



On Mon, 2015-01-26 at 10:02 +0800, hxshi wrote:
>  
> Thank you for your responses!
> I found tokens in my model such as following:
>  
> [X][HEAD] 高度 [X][POS] 建
> 设 [X] ||| [X][HEAD] label="HEAD">  last [X][POS] [TH] ||| 
> 0.00525464 0.122788 0.0475 1 ||| 1-0 2-1 3-5 4-4 ||| 0.244314 0.027027 
> 0.027027 ||| ||| 
>  
> So, is that the reason? I used the script in moses
> (berkeleyparsed2mosesxml.perl ) to change my parsing result to moses
> format. Did you use that script?  Or which script do you used for
> format changing. 
>  
>  
> 
> __
> Shi Huaxing
> 
> 
> MI&T Lab
> School of Computer Science and Technology
> Harbin Institute of Technology
> 
> 
>  
> From: Matthias Huck
> Date: 2015-01-26 07:53
> To: hxshi
> CC: moses-support
> Subject: Re: Re: [Moses-support] I can't get any output from my
> syntactic baseline.
> Hi,
>  
> I'm not fully sure but it looks to me like something's wrong about your
> model. The tree annotation was probably flawed and you ended up aligning
> and extracting annotation as proper words. Have a look at the rules in
> your phrase table and the parallel corpora it was extracted from. Search
> for tokens in the phrase table that shouldn't be in there (like " and "").
>  
> An alternative explanation might be that the input that you're
> translating got messed up with flawed annotation and those tokens are
> passed through as unknowns. If you're doing string-to-tree, there's no
> need to parse the source side, though. You're setting inputtype=3 which
> doesn't seem to make much sense with regard to that fact (3 is tree
> input, 0 is text input).
>  
> Cheers,
> Matthias
>  
>  
> On Sun, 2015-01-25 at 10:07 +0800, hxshi wrote:
> > Thank you for your advices. 
> > now it can translate something now. 
> > but when I run it , the translations are as following:
> >  
> >  
> > Yili> activities
> > label="AG"> Urumqi  electricity -LRB-  > 利  丁刚  李秀
> > 芩 -RRB- label="TH"> > label="VACTN">
> > > propaganda activities 
> >   label="PROP"> label="MOD"> 
> > voice accorded a warm welcome
> > Located Xibei border  Yili
> > label="REL"> 
> >  label="HEAD">   > Yili 
> >  >  
> >  
> > they are not what I expected. What is the problem? how can I get the
> > output as string. by the way, the out put even not a tree. 
> > 
> > __
> > Shi Huaxing
> > 
> > 
> > MI&T Lab
> > School of Computer Science and Technology
> > Harbin Institute of Technology
> > 
> > 
> >  
> > From: Matthias Huck
> > Date: 2015-01-25 04:04
> > To: hxshi
> > CC: moses-support
> > Subject: Re: [Moses-support] I can't get any output from my syntactic
> > baseline.
> > Hi,
> >  
> > As Rico pointed out before: the glue rules are missing. 
> >  
> > Cheers,
> > Matthias
> >  
> >  
> > On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote:
> > > I can't get any output with my syntactic baseline. Will anybody know
> > > what maybe wrong?
> > >  
> > > I trained a string2tree baseline. Got a rule-table such like this:
> > >  
> > > % [X][TH] 相当
> > > 于 [X][RA] [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.67586e-05 4.51106e-08 
> > > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.735177 0.735177 ||| |||
> > > % [X][TH] 相当
> > > 于 [X][RA] 。 [X] ||| [X][TH] is [X][RA] [PROP] ||| 2.06243e-06 6.65709e-09 
> > > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0904762 0.0904762 ||| |||
> > > % [X][TH] 相当
> > > 于 [X][RA] 于 [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.

Re: [Moses-support] train-model.perl error

2015-01-26 Thread Matthias Huck

On Mon, 2015-01-26 at 10:02 +, mohamed hasanien wrote:
> when i run this SCRIPT I GET THIS ERROR 
> ERROR: use --corpus to specify corpus
> at /root/mosesdecoder/scripts/training/tra
> in-model.perl line 379.


Hi Mohamed,

Well, yes, that script requires that you specify a couple of command
line parameters. 

You're better off reading some of the descriptions on the Moses website
first [http://www.statmt.org/moses/], e.g. the "Getting Started"
section, the "Phrase-Based Tutorial" and "Experiment.Perl". Next you
should try running the pipeline on small toy data with experiments.perl
(aka the EMS). Once you've written a simple configuration file for it,
the EMS will assemble and run all the commands for you.

Cheers,
Matthias



> 
> 
> 
>  
> mohammed hassanien Mohammed 
> Egyption Programmers Vice-captain 
> 01000121556
> 
> Egyption Programmers Syndicate
> 
> 
> 
> 
> On Sunday, January 25, 2015 3:33 PM, Matthias Huck
>  wrote:
> 
> 
> 
> If you're using the EMS (experiment.perl), you should edit the EMS
> config to point to the right paths.
> 
> Set 
> 
> moses-src-dir = $HOME/mosesdecoder
> 
> to tell it where to find Moses.
> 
> Cf. http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc3
> 
> 
> 
> On Sun, 2015-01-25 at 20:39 +, Matthias Huck wrote:
> > Your train-model.perl cannot be found in any of the directories in
> the
> > PATH environment variable.
> > 
> > Add the directory to the command: 
> > $HOME/mosesdecoder/scripts/training/train-model.perl
> > 
> > 
> > On Sun, 2015-01-25 at 20:29 +0200, mhmd hassnen wrote:
> > > 
> > > 
> > > 
> > > 
> > > > hi all, 
> > > > when i try to y\run the following commend
> > > > train-model.perl -external-bin-dir $HOME/mosesdecoder/tools
> > > > i get this error 
> > > >  -bash: train-model.perl: command not found
> > > > 
> > > >  
> > > > mohammed hassanien Mohammed 
> > > > Egyption Programmers Vice-captain 
> > > > 01000121556
> > > > 
> > > > Egyption Programmers Syndicate
> > > > 
> > > ___
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> > 
> > 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] I can't get any output from my syntactic baseline.

2015-01-25 Thread Matthias Huck
Hi,

I'm not fully sure but it looks to me like something's wrong about your
model. The tree annotation was probably flawed and you ended up aligning
and extracting annotation as proper words. Have a look at the rules in
your phrase table and the parallel corpora it was extracted from. Search
for tokens in the phrase table that shouldn't be in there (like "").

An alternative explanation might be that the input that you're
translating got messed up with flawed annotation and those tokens are
passed through as unknowns. If you're doing string-to-tree, there's no
need to parse the source side, though. You're setting inputtype=3 which
doesn't seem to make much sense with regard to that fact (3 is tree
input, 0 is text input).

Cheers,
Matthias


On Sun, 2015-01-25 at 10:07 +0800, hxshi wrote:
> Thank you for your advices. 
> now it can translate something now. 
> but when I run it , the translations are as following:
>  
>  
> Yililabel="AG"> Urumqi  electricity -LRB-  利  丁刚  李秀
> 芩 -RRB- label="TH"> label="VACTN">
> propaganda activities 
>   label="PROP"> label="MOD"> 
> voice accorded a warm welcome
> Located Xibei border  Yili
> label="REL"> 
>  label="HEAD">   Yili 
>   
>  
> they are not what I expected. What is the problem? how can I get the
> output as string. by the way, the out put even not a tree. 
> 
> ______
> Shi Huaxing
> 
> 
> MI&T Lab
> School of Computer Science and Technology
> Harbin Institute of Technology
> 
> 
>  
> From: Matthias Huck
> Date: 2015-01-25 04:04
> To: hxshi
> CC: moses-support
> Subject: Re: [Moses-support] I can't get any output from my syntactic
> baseline.
> Hi,
>  
> As Rico pointed out before: the glue rules are missing. 
>  
> Cheers,
> Matthias
>  
>  
> On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote:
> > I can't get any output with my syntactic baseline. Will anybody know
> > what maybe wrong?
> >  
> > I trained a string2tree baseline. Got a rule-table such like this:
> >  
> > % [X][TH] 相当
> > 于 [X][RA] [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.67586e-05 4.51106e-08 
> > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.735177 0.735177 ||| |||
> > % [X][TH] 相当
> > 于 [X][RA] 。 [X] ||| [X][TH] is [X][RA] [PROP] ||| 2.06243e-06 6.65709e-09 
> > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0904762 0.0904762 ||| |||
> > % [X][TH] 相当
> > 于 [X][RA] 于 [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.30259e-06 4.61662e-11 
> > 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0571429 0.0571429 ||| |||
> >  
> > And I always got no output when I using this baseline.
> > for example :
> >  
> > input  3 月
> >  
> > output on screen:
> > 3 月
> > Translating line 2  in thread id 47362102691584
> > Line 2: Initialize search took 0.000 seconds total
> > Translating:  3 月   ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [0,3]=X 
> > (1) [1,1]=X (1) [1,2]=X (1) [1,3]=X (1) [2,2]=X (1) [2,3]=X (1) [3,3]=X (1)
> >  
> >   0   1   2   3
> >   0   8   8   0
> > 0  29   0
> >   0   0
> > 0
> > Line 2: Additional reporting took 0.000 seconds total
> > Line 2: Translation took 0.003 seconds total
> > Translation took 0.000 seconds
> >  
> > Do you know what maybe wrong with my baseline?
> >  
> > I run decoder with 
> > moses_chart -T -f moses.ini
> >  
> > trainning this baseline with:
> > train_model.pl
> > --glue-grammar  --target-syntax -max-phrase-length=999 
> > --extract-options="--NonTermConsecSource --MinHoleSource 1 --MaxSpan 999 
> > --MinWords 0 --MaxNonTerm 3" -lm 0:5:lmsri.en --corpustrain_case --f zh --e 
> > en -root-dir train_dir -external-bin-dir bin -mgiza -mgiza-cpus 6   -cores 
> > 10--alignment grow-diag-final-and -score-options ' --GoodTuring' 
> >  
> > the moses.ini as following:
> > #
> >  
> > # input factors
> > [input-factors]
> > 0
> >  
> > # mapping steps
> > [mapping]
> > 0 T 0
> >  
> > [cube-pruning-pop-limit]
> > 1000
> >  
> > [non-terminals]
> > X
> >  
> > [search-algorithm]
> > 3
> >  
> > [inputtype]
> > 3
> >  
> > [max-chart-span]
> > 20
> > 1000
> >  
> > # feature functions
> > [feature]
> > UnknownWordPenalty
> > WordPenalty
> > PhrasePenalty
> > PhraseD

Re: [Moses-support] train-model.perl error

2015-01-25 Thread Matthias Huck
If you're using the EMS (experiment.perl), you should edit the EMS
config to point to the right paths.

Set 

moses-src-dir = $HOME/mosesdecoder

to tell it where to find Moses.

Cf. http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc3



On Sun, 2015-01-25 at 20:39 +0000, Matthias Huck wrote:
> Your train-model.perl cannot be found in any of the directories in the
> PATH environment variable.
> 
> Add the directory to the command: 
> $HOME/mosesdecoder/scripts/training/train-model.perl
> 
> 
> On Sun, 2015-01-25 at 20:29 +0200, mhmd hassnen wrote:
> > 
> > 
> > 
> > 
> > > hi all, 
> > > when i try to y\run the following commend
> > > train-model.perl -external-bin-dir $HOME/mosesdecoder/tools
> > > i get this error 
> > >  -bash: train-model.perl: command not found
> > > 
> > >  
> > > mohammed hassanien Mohammed 
> > > Egyption Programmers Vice-captain 
> > > 01000121556
> > > 
> > > Egyption Programmers Syndicate
> > > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] train-model.perl error

2015-01-25 Thread Matthias Huck
Your train-model.perl cannot be found in any of the directories in the
PATH environment variable.

Add the directory to the command: 
$HOME/mosesdecoder/scripts/training/train-model.perl


On Sun, 2015-01-25 at 20:29 +0200, mhmd hassnen wrote:
> 
> 
> 
> 
> > hi all, 
> > when i try to y\run the following commend
> > train-model.perl -external-bin-dir $HOME/mosesdecoder/tools
> > i get this error 
> >  -bash: train-model.perl: command not found
> > 
> >  
> > mohammed hassanien Mohammed 
> > Egyption Programmers Vice-captain 
> > 01000121556
> > 
> > Egyption Programmers Syndicate
> > 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] I can't get any output from my syntactic baseline.

2015-01-24 Thread Matthias Huck
Hi,

As Rico pointed out before: the glue rules are missing. 

Cheers,
Matthias


On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote:
> I can't get any output with my syntactic baseline. Will anybody know
> what maybe wrong?
>  
> I trained a string2tree baseline. Got a rule-table such like this:
>  
> % [X][TH] 相当
> 于 [X][RA] [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.67586e-05 4.51106e-08 
> 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.735177 0.735177 ||| |||
> % [X][TH] 相当
> 于 [X][RA] 。 [X] ||| [X][TH] is [X][RA] [PROP] ||| 2.06243e-06 6.65709e-09 
> 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0904762 0.0904762 ||| |||
> % [X][TH] 相当
> 于 [X][RA] 于 [X] ||| [X][TH] is [X][RA] [PROP] ||| 1.30259e-06 4.61662e-11 
> 0.0475 0.177966 ||| 1-0 2-1 3-2 ||| 2083.76 0.0571429 0.0571429 ||| |||
>  
> And I always got no output when I using this baseline.
> for example :
>  
> input  3 月
>  
> output on screen:
> 3 月
> Translating line 2  in thread id 47362102691584
> Line 2: Initialize search took 0.000 seconds total
> Translating:  3 月   ||| [0,0]=X (1) [0,1]=X (1) [0,2]=X (1) [0,3]=X 
> (1) [1,1]=X (1) [1,2]=X (1) [1,3]=X (1) [2,2]=X (1) [2,3]=X (1) [3,3]=X (1)
>  
>   0   1   2   3
>   0   8   8   0
> 0  29   0
>   0   0
> 0
> Line 2: Additional reporting took 0.000 seconds total
> Line 2: Translation took 0.003 seconds total
> Translation took 0.000 seconds
>  
> Do you know what maybe wrong with my baseline?
>  
> I run decoder with 
> moses_chart -T -f moses.ini
>  
> trainning this baseline with:
> train_model.pl
> --glue-grammar  --target-syntax -max-phrase-length=999 
> --extract-options="--NonTermConsecSource --MinHoleSource 1 --MaxSpan 999 
> --MinWords 0 --MaxNonTerm 3" -lm 0:5:lmsri.en --corpustrain_case --f zh --e 
> en -root-dir train_dir -external-bin-dir bin -mgiza -mgiza-cpus 6   -cores 10 
>--alignment grow-diag-final-and -score-options ' --GoodTuring' 
>  
> the moses.ini as following:
> #
>  
> # input factors
> [input-factors]
> 0
>  
> # mapping steps
> [mapping]
> 0 T 0
>  
> [cube-pruning-pop-limit]
> 1000
>  
> [non-terminals]
> X
>  
> [search-algorithm]
> 3
>  
> [inputtype]
> 3
>  
> [max-chart-span]
> 20
> 1000
>  
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> PhraseDictionaryMemory name=TranslationModel0 num-features=4 
> path=/home/workspace/moses-fbis-case-s2t-ch2en/training_dir/model/rule-table.gz
>  input-factor=0 output-factor=0
> KENLM name=LM0 factor=0 path=/home/workspace/data-lm/lmsri.en order=5
>  
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LM0= 0.5
> 
> __
> Shi Huaxing
> 
> 
> MI&T Lab
> School of Computer Science and Technology
> Harbin Institute of Technology
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
On Fri, 2015-01-23 at 18:18 +, Hieu Hoang wrote:
> True, but that complicates the framework, and doesn't deal with sparse
> features. 

Why does it complicate the framework? Isn't the trick about "tuneable"
mostly that you don't write those scores to the n-best list? 
We can even keep a boolean "tuneable" parameter and have another
parameter "tuneable-components" (boolean vector sized like the weights
vector).
> 

> By adding another ff which grabs scores from the pt, u can arbitrarily
> transform the scores 

Yeah I know. I can get around these things by writing more feature
functions. For removing scores from phrase tables, I can also just
process the phrase table file with awk, delete the score columns I don't
need and write it to another file. But something user-friendly would be
more appealing. Setting up contrastive experiments could be done much
more rapidly with what I'm asking for. And maybe somebody on the mailing
list has implemented this and never put it into master?

I want it for MIRA, btw.

I think it should be added if it doesn't exist somewhere yet. Unless
someone has strong objections.


> 
> On 23 January 2015 18:09:11 GMT+00:00, Matthias Huck
>  wrote:
> That's not flexible enough. There should be something like:
> 
> [feature]
> MyFeature name=MyFeature0 tuneable=0,1,0
> 
> [weight]
> MyFeature0= 0.0 0.1 1
> 
> 
> MyFeature has 3 score components. I want to tune the second component,
> deactivate the first component, and set the scaling factor of the 
> third
> component manually to 1.
> 
> Currently the "tuneable" parameter is boolean and allows me to 
> manually
> set scaling factors for either no score component or all of them. 
> 
> 
> 
> 
> On Fri, 2015-01-23 at 17:38 +, Hieu Hoang wrote:
> The whole feature becomes untuneable. 
>  
>  I suppose u can make the pt untuneable, the write another 
> (tuneable)
>  ff which grabs whatever scores it wants from the pt
>  
>  On 23 January 2015 16:55:56 GMT+00:00, Matthias!
>   Huck
>   wrote:
>  Hi,
>  
>  Is there any existing functionality to set only 
> specific score
>  components of a feature function as untuneable?
>  
>  Feature functions have a boolean "tuneable" 
> parameter, but it affects
>  all the scores produced by it. It doesn't help in 
> case I want to switch
>  off individual scores from a phrase table, for 
> instance. Or manually
>  assign large scaling factors to certain score 
> components prior to
>  tuning. As far as I know, right now I'm only able to 
> do so if the score
>  I'm interested in is the only score produced by a 
> feature function.
>  
>  If anyone has already implemented something like 
> that, please let me
>  know.
>  
>  Cheers,
>  Matthias
>  
>  
>  
>  
>  --
>  Sent while bumping !
>  into
> things 
> 
> 
> 
> --
> Sent while bumping into things 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
That's not flexible enough. There should be something like:

[feature]
MyFeature name=MyFeature0 tuneable=0,1,0

[weight]
MyFeature0= 0.0 0.1 1


MyFeature has 3 score components. I want to tune the second component,
deactivate the first component, and set the scaling factor of the third
component manually to 1.

Currently the "tuneable" parameter is boolean and allows me to manually
set scaling factors for either no score component or all of them. 




On Fri, 2015-01-23 at 17:38 +, Hieu Hoang wrote:
> The whole feature becomes untuneable. 
> 
> I suppose u can make the pt untuneable, the write another (tuneable)
> ff which grabs whatever scores it wants from the pt
> 
> On 23 January 2015 16:55:56 GMT+00:00, Matthias Huck
>  wrote:
> Hi,
> 
> Is there any existing functionality to set only specific score
> components of a feature function as untuneable?
> 
> Feature functions have a boolean "tuneable" parameter, but it affects
> all the scores produced by it. It doesn't help in case I want to 
> switch
> off individual scores from a phrase table, for instance. Or manually
> assign large scaling factors to certain score components prior to
> tuning. As far as I know, right now I'm only able to do so if the 
> score
> I'm interested in is the only score produced by a feature function.
> 
> If anyone has already implemented something like that, please let me
> know.
> 
> Cheers,
> Matthias
> 
> 
> 
> 
> --
> Sent while bumping into things 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
Hi,

Is there any existing functionality to set only specific score
components of a feature function as untuneable?

Feature functions have a boolean "tuneable" parameter, but it affects
all the scores produced by it. It doesn't help in case I want to switch
off individual scores from a phrase table, for instance. Or manually
assign large scaling factors to certain score components prior to
tuning. As far as I know, right now I'm only able to do so if the score
I'm interested in is the only score produced by a feature function.

If anyone has already implemented something like that, please let me
know.

Cheers,
Matthias




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] about the scores in run*.best100.out

2015-01-23 Thread Matthias Huck
Hi Arefeh,

On Fri, 2015-01-23 at 12:03 +, Arefeh Kazemi wrote:
> 
> >>Can you try to run the setup from the cluster on your local desktop
> system? With the same input, a Moses binary compiled from the same
> sources, and the same command to produce the n-best lists? Normally it
> should give you the same output.
> 
> 
> you mean I run the binary file which is produced on the cluster, on my
> local pc, without compiling moses again?


If you can use the binary from the cluster, yes, copy it to your desktop
system and run that.

Your previous mail seemed to indicate that you observe the issue on the
cluster only, not on your local machine. Make sure you did not use a
different Moses code base first. See if the output is the same on some
toy setup. Then try to reproduce the setup from the cluster on your
desktop (if the memory requirements allow for that).
> 
> 
> >>Why would the feature never produce an overall score larger than
> 60? 
> 
> 
> 
> my feature calculate multiply of some probabilities for dependency
> structures of a sentence. a sentence with limited max size has limited
> number of such dependencies, so I can estimate the maximum value of my
> feature.


Okay, that makes sense.
> 
> 
> 
> >>Also note that you should not use Assign() in your feature functions
> any
> more (since two weeks ago).
> 
> 
> I've already seen your post. I just use PlusEquals in my code, as I
> understand its not affected by your change, right?


No, if you used Assign() before, you've been setting the overall score
for the complete derivation so far. With PlusEquals() you're adding
deltas. They get accumulated. You need to rewrite your code to compute a
delta score just for the current hypothesis expansion. (Typical feature
functions compute that kind of score anyway, but maybe yours didn't.)

If you used PlusEquals() before already, then you don't have to modify
anything.


Cheers,
Matthias
> 
> 
> Regards
> 
>  
> On Friday, January 23, 2015 12:11 AM, Matthias Huck
>  wrote:
> 
> 
> 
> Hi Arefeh,
> 
> Can you try to run the setup from the cluster on your local desktop
> system? With the same input, a Moses binary compiled from the same
> sources, and the same command to produce the n-best lists? Normally it
> should give you the same output.
> 
> Why would the feature never produce an overall score larger than 60? 
> 
> Also note that you should not use Assign() in your feature functions
> any
> more (since two weeks ago).
> http://comments.gmane.org/gmane.comp.nlp.moses.user/12146
> If you use Assign rather than PlusEquals and you merged with the
> recent
> master from GitHub on the cluster but not on your desktop machine,
> then
> something like what you described can have happened.
> 
> Cheers,
> Matthias
> 
> 
> 
> On Thu, 2015-01-22 at 21:00 +, Arefeh Kazemi wrote:
> > Hi
> > 
> > 
> > I've implemented a feature with 4 scores in moses-chart. when I
> debug
> > the code on my system, every thing is OK and the scores are
> calculated
> > right, but when I run mert to tune the weights (on a cluster), I get
> > wrong scores for my feature in run*.best.100 files. for example my
> > feature could have the maximum value of 60 but for some instances I
> > get values more than 300. 
> > 
> > does anyone know why this happens?
> > 
> > 
> > Regards
> 
> > 
> > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Configuration parameter documentaion

2015-01-22 Thread Matthias Huck
Hi Roee,

I would be very surprised if each and every Moses feature is described
somewhere. But Moses is generally very well documented, and you find all
the information you need for building a state-of-the-art baseline system
on the website [http://www.statmt.org/moses/] and in the manual
[http://www.statmt.org/moses/manual/manual.pdf]. 

And if you want to know about the details you can even read the source
code!

Maybe our recent Moses tutorial slides from ICON come close to what
you're looking for (in particular slides 58 ff.).
http://www.statmt.org/moses/icon.2014.pdf

Let us know in case you need help with anything specific.

Cheers,
Matthias


On Thu, 2015-01-22 at 06:23 -0800, Roee Aharoni wrote:
> Hi all,
> 
> Is there any documentation that explains the meaning of every
> parameter in moses.ini?
> 
> 
> Thanks in advance,
> Roee
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] about the scores in run*.best100.out

2015-01-22 Thread Matthias Huck
Hi Arefeh,

Can you try to run the setup from the cluster on your local desktop
system? With the same input, a Moses binary compiled from the same
sources, and the same command to produce the n-best lists? Normally it
should give you the same output.

Why would the feature never produce an overall score larger than 60? 

Also note that you should not use Assign() in your feature functions any
more (since two weeks ago).
http://comments.gmane.org/gmane.comp.nlp.moses.user/12146
If you use Assign rather than PlusEquals and you merged with the recent
master from GitHub on the cluster but not on your desktop machine, then
something like what you described can have happened.

Cheers,
Matthias



On Thu, 2015-01-22 at 21:00 +, Arefeh Kazemi wrote:
> Hi
> 
> 
> I've implemented a feature with 4 scores in moses-chart. when I debug
> the code on my system, every thing is OK and the scores are calculated
> right, but when I run mert to tune the weights (on a cluster), I get
> wrong scores for my feature in run*.best.100 files. for example my
> feature could have the maximum value of 60 but for some instances I
> get values more than 300. 
> 
> does anyone know why this happens?
> 
> 
> Regards
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote:


> - tune & test
> (based on source)
> size of overlap set = 624
> (based on target)
> size of overlap set = 386

> 
> (tune & test have high overlapping parts based on source sentences,
> but half of them have different target sentences)



Does this mean that there are hundreds of sentences in your original
tuning and test sets that are equal on the source side but have
different references? That sounds a bit odd. Maybe it indicates that
something about your data is generally problematic.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
We typically try to increase the tuning set in order to obtain more
reliable sparse feature weights. But in your case it's rather the test
set that seems a bit small for trusting the BLEU scores. 

Do the sparse features give you any large improvement on the tuning set?



On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote:

> I used sparse features such as: TargetWordInsertionFeature,
> SourceWordDeletionFeature, WordTranslationFeature,
> PhraseLengthFeature.
> Sparse features are used only for top source and target words (100,
> 150, 200, 250, ).
> 
> 
> My parallel data include: train(201K); tune(6214); test(641).

> 
> Is there any way to prevent over-fitting when applying the sparse
> features? Or in this case, sparse features will not generalize well
> over "unseen" data?




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] phrase table

2015-01-15 Thread Matthias Huck
Hi,


The data is sentence-segmented.

Assume you train your model with a training corpus which contains a
single parallel sentence pair. Your training sentence has length L on
both source and target side, and it's aligned along the diagonal. 
If n > L, you cannot extract any phrase of length n from this training
corpus. If n <= L, you can extract L - n + 1 phrases of length n. 

Example: for L = 5 you can extract five phrases of length n = 1, four of
length n = 2, ... , one of length n = 5, and none of length n > 5.


Also, bilingual blocks are valid (=extractable) phrases only if they are
consistent wrt. the word alignment. Larger blocks are possibly more
frequently inconsistent.


Of course you should consider some more aspects, e.g.:

- training settings 
  (there won't be any 8-grams if you set the max. phrase length to 7; 
  long phrases will be affected more by a count cutoff because of sparsity)
- vocabulary sizes limit the amount of possible combinations
- n-gram entropy of the language 
  [http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf]


Analyzing such things in detail is surely a fun pastime. You can start
with vocabulary sizes, number of running words of your corpus,
histograms of source-side training sentence lengths, number of distinct
n-grams that appear in the source side of the corpus vs. number of
distinct n-grams that are source sides of valid phrases, number of
distinct n-grams that appear in the source side of the corpus if you
undo the sentence segmentation (replace all line breaks by spaces), etc.

Cheers,
Matthias



On Thu, 2015-01-15 at 16:39 +, Read, James C wrote:
> Hi,
> 
> 
> 
> I just ran a count of different sized n-grams in the source side of my
> phrase table and this is what I got.
> 
> 
> 
> unigrams 85,233
> 
> 
> bigrams   991,701
> 
> 
> trigrams   2,697,341
> 
> 
> 4-grams3,876,180
> 
> 
> 5-grams4,209,094
> 
> 
> 6-grams3,702,813
> 
> 
> 7-grams2,560,251
> 
> 
> 8-grams   0
> 
> 
> 
> So, up until the 5-grams the results are what I expected the number is
> increasing. But then it drops for the 6-grams and drops again for the
> 7-grams.
> 
> 
> 
> Does anybody know why?
> 
> 
> 
> James 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Feature score deltas in the chart decoder

2015-01-07 Thread Matthias Huck
Hi,

I've just pushed a commit to Moses that brings about a slight change
wrt. the way the chart decoder deals with feature scores. 

The chart decoder now stores deltas of individual feature scores instead
of constantly summing everything up. This behaviour is similar to what
we have been doing in the phrase-based decoder since a long time
already. The main purpose of this modification is to improve efficiency
with sparse features a bit.

https://github.com/moses-smt/mosesdecoder/commit/465b47566424efb707bdc063d0bff52b0650eb0a


The modification may however break existing feature function
implementations. 

As a rule of thumb, any feature function that calls 

ScoreComponentCollection::Assign()
in
EvaluateWhenApplied(const ChartHypothesis&, ...)

is affected and needs to be adapted to the new behaviour. 

Basically, the ScoreComponentCollection variable passed to
EvaluateWhenApplied() now accumulates the delta score of the current
rule application only, whereas it was previously accumulating the
overall score of the partial hypothesis. 
I.e., calling Assign() in EvaluateWhenApplied() now does not replace the
overall score any more, but has the same effect as calling PlusEquals().

If you are the author of a feature function that implements
EvaluateWhenApplied(const ChartHypothesis&, ...) and calls Assign()
within that method, or if you are using such a feature function in your
experiments, please update your implementation. The feature function
should call PlusEquals() instead and add a score delta.

I've already updated moses/LM/Ken.cpp and moses/LM/Implementation.cpp
and Rico has updated moses/LM/BilingualLM.cpp .
In the Moses master branch I found one other feature function that
requires modifications: 

moses/LM/DALMWrapper.cpp

This feature is currently not covered by a regression test, and I don't
have any setup with this feature myself. I would not be able to test any
modifications in that code and therefore would like to request that the
authors apply the necessary updates themselves. 

Please let me know in case you notice any issues or if you need any
further information or advice regarding this modification.

Cheers,
Matthias




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Alignment symmetrization without giza2bal

2014-12-14 Thread Matthias Huck
Hi Marcin,

I don't quite understand why this is a problem. But if you're looking
for alternative implementations for word alignment symmetrization: 
The Jane toolkit includes a program called `mergeAlignment`. 
It should be able to read Moses format alignments.

http://www.hltpr.rwth-aachen.de/jane/

Cheers,
Matthias


On Sun, 2014-12-14 at 21:01 +0100, Marcin Junczys-Dowmunt wrote:
> Hi,
> does anybody have a tool that can symmetrize alignments (i.e. 
> grow-diag-final) directly from two files with the same format that symal 
> or fast-align (0-0 1-1 2-2 ...) produce? I am too lazy to write one 
> myself and find it wasteful to go from symal format to GIZA format to  
> BAL format to symal format.
> 
> Thanks,
> Marcin
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] string of Words + states in feature functions

2014-12-10 Thread Matthias Huck
Hi Amir,

The input is passed to the feature functions via
InitializeForInput(InputType const& source). 
This method is called before search and collecting of translation
options (cf. moses/FF/FeatureFunction.h). You can set a member variable
to have access to the input in your scoring method.

Alternatively, if you implement EvaluateWithSourceContext(), the input
is passed directly to the method as a parameter (const InputType &input)
and you can use that.
Finally, there's another option in the EvaluateWhenApplied() methods.
You can get the input from the Hypothesis object:
const InputType& input = hypo.GetManager().GetSource();

The input is an InputType object. Moses knows different input types, see
InputTypeEnum in moses/TypeDef.h . So what you get might differ
depending on what was passed to the decoder. If you're happy with
implementing your feature for sentence input only, then you can cast the
input to a Sentence object. The Sentence object gives you convenient
access methods, in particular GetSize() and GetWord(size_t pos). You can
thus obtain the sequence of words in the input. "Words" can contain
several factors in Moses. The factor with index 0 is typically the
surface form. Access it using the [] operator.

I guess you will never really want to work directly with the string
representation of the factor, but at this point you would be able to get
it and for instance print it to your debug output.

Hope this was helpful as another answer to your first question.

Cheers,
Matthias


On Wed, 2014-12-10 at 11:41 +0330, amir haghighi wrote:
> Hi everyone 
> 
> 
> 
> I'm implementing a feature function in moses-chart. I need the source
> words string and also their indexes in the source sentence. I've
> written a function that gets the source words but I don't know how
> extract word string from a word.
> could anyone guide me how to do that? as I know, each word is
> implemented as an array of factors, which of them is its string?
> 
> 
> I have also some questions about the states in the stateful features, 
> what kind of variables should be stored in each state? only those ones
> that should be used in the compare function? or any variable from the
> previous hypothesis  that we use in our feature?
> 
> 
> Thanks in advance!
> 
> 
> Cheers
> 
> Amir
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to Run experiment.perl

2014-11-29 Thread Matthias Huck
$ /scripts/ems/experiment.perl -config config.toy -exec

On Sat, 2014-11-29 at 15:00 +, Asad A.Malik wrote:
> Hi All, 
> 
> 
> How can I Run experiment.perl -config config.toy -exec. 
> When I type following command:
> 
> 
> $ run experiment.perl -config config.toy -exec
> 
> 
> It says no command found.
> 
>  
> --
> 
> Kind Regards,
> 
> Mr. Asad Abdul Malik
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] sentence is always too short for cleaning

2014-11-28 Thread Matthias Huck
Hi,

If this happens in scripts/training/clean-corpus-n.perl then you should
check whether a parallel corpus with the same number of lines on source
and target side is passed to that script. Maybe there's an issue with
your training data or something went wrong in a previous step of the
preprocessing pipeline if the line numbers differ.

Cheers,
Matthias


On Fri, 2014-11-28 at 21:51 +0100, emna hkiri wrote:
> Dear friends
> i need your help please
> i have a problem of the cleaning phase of the arabic text
> every time moses returns the message sentences number 1562783 is too
> short!!!
> (in fact it is the last sentence in the text) so i delete it and again
> and
> again he tell me that this new last sentence is too short 
> and i do delete the last sentences and i have always the same problem
> 
> Can someone please throw some light on this.
> 
> Thanks & Regards
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] WG: Unknown single words that are part of phrases

2014-11-27 Thread Matthias Huck
Yes, that's right. That's a situation as illustrated in Fig. 1b of 
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf 
and a "single word heuristic" as proposed in that paper can be a remedy.


On Thu, 2014-11-27 at 16:51 +, Barry Haddow wrote:
> Hi Vera
> 
> I think the situation you describe could happen even without unaligned 
> words. Suppose that you have a 2 word sentence on each side, and the 
> alignment points are (0,0), (0,1) and (1,0) - I think this is possible 
> with the usual symmetrisation algorithm. Then you would extract the 
> phrase pair containing 2 2-word phrases, but no phrase pairs containing 
> 1-word phrases. (see below for an example)
> 
> You still get lexical weights for the translation of word-0 to word-0 
> though, since there is an alignment point there
> 
> cheers - Barry
> 
> 
> [hyperion]bhaddow: cat c.en
> a b
> [hyperion]bhaddow: cat c.fr
> A B
> [hyperion]bhaddow: cat c.align
> 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/bin/extract c.en c.fr c.align e 5
> PhraseExtract v1.4, written by Philipp Koehn
> phrase extraction from an aligned parallel corpus
> [hyperion]bhaddow: cat e
> A B ||| a b ||| 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/scripts/training/get-lexical.perl c.en 
> c.fr c.align c
> (c.en,c.fr,c)
> FILE: c.fr
> FILE: c.en
> FILE: c.align
> !
> Saved: c.f2e and c.e2f
> [hyperion]bhaddow: cat c.e2f
> a A 0.500
> a B 1.000
> b A 0.500
> [hyperion]bhaddow: cat c.f2e
> A a 0.500
> B a 0.500
> A b 1.000
> 
> 
> On 27/11/14 16:15, Matthias Huck wrote:
> > Hi Vera,
> >
> > It's odd that the lexical translation model contains such an entry if
> > the pair is always unaligned. Maybe you used a different word alignment
> > when you extracted the lexicon model?
> >
> > You should manually have a look at your word alignment in order to check
> > whether it has reasonable quality. There's a visualization tool called
> > "Picaro" in Moses:
> >
> > $ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f 
> > model/aligned.1.0.de -e model/aligned.1.0.en
> >
> > In order to find out whether the symmetrization heuristic is an issue
> > for you, you can compare the standard and inverse GIZA alignments with
> > the symmetrized alignment.
> >
> > Ways to experiment with word alignment quality are for instance:
> >
> > - Choosing a different symmetrization heuristic
> > - Modifying the GIZA settings, e.g. by training with a different number
> > of EM iterations or a different sequence of IBM/HMM models
> > - Using some other method for training word alignments, e.g. a
> > discriminative word aligner
> >
> > Also, if the amount of parallel training data is small, you shouldn't be
> > surprised if you are not able to train reliable models.
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> >> Hi,
> >>
> >> I have one more question:
> >> In the lex.e2f file there is a translation Gitarre->guitar:
> >>
> >>Gitarre guitar 0.400
> >>Gitarre using 0.284
> >>Gitarre ; 0.017
> >>
> >> Why has not it became part of the phrase table?
> >>
> >> Thanks again!
> >> Vera
> >>
> >> -Ursprüngliche Nachricht-
> >> Von: Vera Aleksic, Linguatec GmbH
> >> Gesendet: Donnerstag, 27. November 2014 09:42
> >> An: 'Matthias Huck'; Raj Dabre
> >> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
> >>
> >> Hi,
> >> Thank you for your answers.
> >> @Raj, one-word-translations do not exist, I have searched for them. If the 
> >> grow-diag method probably causes such phenomena, are there any better 
> >> alternatives?
> >> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but 
> >> I do not really understand why. Why is "guitar" in the example below 
> >> aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, 
> >> decomposing "Musik + Instrument" would help? How else could I improve the 
> >> word alignment quality?
> >> Thanks!
> >> Best,
> >> Vera
> >>
> >> für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 
> >> }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) 
> >> an ({ 5 }) electric (

Re: [Moses-support] WG: Unknown single words that are part of phrases

2014-11-27 Thread Matthias Huck
Hi Vera,

It's odd that the lexical translation model contains such an entry if
the pair is always unaligned. Maybe you used a different word alignment
when you extracted the lexicon model?

You should manually have a look at your word alignment in order to check
whether it has reasonable quality. There's a visualization tool called
"Picaro" in Moses:

$ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f 
model/aligned.1.0.de -e model/aligned.1.0.en

In order to find out whether the symmetrization heuristic is an issue
for you, you can compare the standard and inverse GIZA alignments with
the symmetrized alignment.

Ways to experiment with word alignment quality are for instance:

- Choosing a different symmetrization heuristic
- Modifying the GIZA settings, e.g. by training with a different number
of EM iterations or a different sequence of IBM/HMM models
- Using some other method for training word alignments, e.g. a
discriminative word aligner

Also, if the amount of parallel training data is small, you shouldn't be
surprised if you are not able to train reliable models.

Cheers,
Matthias


On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> Hi,
> 
> I have one more question:
> In the lex.e2f file there is a translation Gitarre->guitar:
> 
>   Gitarre guitar 0.400
>   Gitarre using 0.284
>   Gitarre ; 0.017
> 
> Why has not it became part of the phrase table?
> 
> Thanks again!
> Vera
> 
> -Ursprüngliche Nachricht-
> Von: Vera Aleksic, Linguatec GmbH 
> Gesendet: Donnerstag, 27. November 2014 09:42
> An: 'Matthias Huck'; Raj Dabre
> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
> 
> Hi,
> Thank you for your answers.
> @Raj, one-word-translations do not exist, I have searched for them. If the 
> grow-diag method probably causes such phenomena, are there any better 
> alternatives?
> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I 
> do not really understand why. Why is "guitar" in the example below aligned to 
> "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing 
> "Musik + Instrument" would help? How else could I improve the word alignment 
> quality?
> Thanks!
> Best,
> Vera
> 
> für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) 
> a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 
> 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
> 
> -Ursprüngliche Nachricht-
> Von: Matthias Huck [mailto:mh...@inf.ed.ac.uk]
> Gesendet: Mittwoch, 26. November 2014 17:54
> An: Raj Dabre
> Cc: Vera Aleksic, Linguatec GmbH; moses-support
> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
> 
> Hi,
> 
> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" 
> because this word pair is always unaligned in your training data. You could 
> try to improve your word alignment quality.
> 
> Alternatively, you could implement a procedure in the manner of the "forced 
> single word heuristic" as described in: 
> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to 
> Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of 
> Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, 
> April 2011.
> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
> (see Fig. 1c).
> 
> But the latter would rather be a workaround.
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> > Hello,
> > 
> > 
> > If I am not wrong this is most likely due to the grow (-diag) method 
> > applied to the word aligned data (both directions) before phrase extraction.
> > 
> > Furthermore. one word translations should exist (but not always) 
> > search for them.
> > 
> > 
> > 
> > Regards.
> > 
> > 
> > On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH 
> >  wrote:
> > Hi,
> > 
> > I have observed many times that some words do not exist as single 
> > word translations in the phrase table, although they exist in the training 
> > corpus and in multiword phrases.
> > An example:
> > German-English translation for "Gitarre" is unknown, i.e. there is 
> > no single word entry  for "Gitarre" in the phrase table, although some 
> > other phrases containing this word exist (see below).
> > How is it possible?
> > Thanks and best regards,
> >  

  1   2   >