Re: [Moses-support] CreateProbingPT2 exception

2017-05-02 Thread Nikolay Bogoychev
Hey Mike,

Is it possible for you to make the phrase table available
/home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/pt.txt.gz
publically so we can try to reproduce the problem?

Cheers,

Nick

On Tue, May 2, 2017 at 6:06 AM, Mike Ladwig  wrote:
> Got an exception creating a PT2 with yesterday's master:
>
> Binarize phrase and reordering model in probing table format:
> /home/mike/stelae5/mosesdecoder/scripts/generic/binarize4moses2.perl
> --phrase-table=/home/mike/stelae-projects/de-en/phrasemodel/model/phrase-table.gz
> --lex-ro=/home/mike/stelae-projects/de-en/phrasemodel/model/reordering-table.wbe-msd-bidirectional-fe.gz
> --output-dir=/home/mike/stelae-projects/de-en/phrasemodel/PT2
> --num-lex-scores=6
> Executing: gzip -dc
> /home/mike/stelae-projects/de-en/phrasemodel/model/phrase-table.gz |
> /home/mike/stelae5/mosesdecoder/scripts/generic/../../contrib/sigtest-filter/filter-pt
> -n 0 | gzip -c >
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/pt.gz
> sh:
> /home/mike/stelae5/mosesdecoder/scripts/generic/../../contrib/sigtest-filter/filter-pt:
> No such file or directory
> Executing:
> /home/mike/stelae5/mosesdecoder/scripts/generic/../../bin/processLexicalTableMin
> -in
> /home/mike/stelae-projects/de-en/phrasemodel/model/reordering-table.wbe-msd-bidirectional-fe.gz
> -out /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/lex-ro -T .
> -threads all
> Used options:
> Text reordering table will be read from:
> /home/mike/stelae-projects/de-en/phrasemodel/model/reordering-table.wbe-msd-bidirectional-fe.gz
> Output reordering table will be written to:
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/lex-ro.minlexr
> Step size for source landmark phrases: 2^10=1024
> Phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
> Single Huffman code set for score components: no
> Using score quantization: no
> Running with 24 threads
>
> Pass 1/2: Creating phrase index + Counting scores
> ..[500]
> ..[1000]
> ..[1500]
> ..[2000]
> ..[2500]
> ..[3000]
> ..[3500]
> ..[4000]
> ..[4500]
> ..[5000]
> ..[5500]
> ..[6000]
> ..[6500]
> ..[7000]
> ..[7500]
> 
>
> Intermezzo: Calculating Huffman code sets
> Creating Huffman codes for 32003 scores
> Creating Huffman codes for 16732 scores
> Creating Huffman codes for 31335 scores
> Creating Huffman codes for 32076 scores
> Creating Huffman codes for 15096 scores
> Creating Huffman codes for 31659 scores
>
> Pass 2/2: Compressing scores
> ..[500]
> ..[1000]
> ..[1500]
> ..[2000]
> ..[2500]
> ..[3000]
> ..[3500]
> ..[4000]
> ..[4500]
> ..[5000]
> ..[5500]
> ..[6000]
> ..[6500]
> ..[7000]
> ..[7500]
> 
>
> Saving to
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/lex-ro.minlexr
> Done
> Executing:
> /home/mike/stelae5/mosesdecoder/scripts/generic/../../bin/addLexROtoPT
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/pt.gz
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/lex-ro.minlexr  |
> gzip -c >
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/pt.withLexRO.gz
> Executing: ln -s pt.withLexRO.gz
> /home/mike/stelae-projects/de-en/phrasemodel/tmp.15419/pt.txt.gz
> Executing:
> /home/mike/stelae5/mosesdecoder/scripts/generic/../../bin/CreateProbingPT2
> --num-scores 4 --log-prob --input-pt
> 

Re: [Moses-support] Rebuilding moses binary only

2017-03-30 Thread Nikolay Bogoychev
I've been asking this same question since late 2013..?

On Thu, Mar 30, 2017 at 10:30 PM, Marcin Junczys-Dowmunt
 wrote:
> Hi list,
>
> is there a way to tell bjam to only rebuild the moses binary and not the
> 84 unrelated targets that just happen to be rebuilt out of solidarity?
>
> Thanks,
>
> Marcin
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] BilingualNPLM: A target phrase with no alignments detected!

2016-03-19 Thread Nikolay Bogoychev
Hey Jeremy,

The error you should get should be:   "A target phrase with no alignments
detected! " << targetPhrase << "Check if there is something wrong with your
phrase table."); which should also include the targetPhrase in question.
My guess is that you use PhraseDictionaryCompact as your phrase table which
in some cases is known to produce target phrases without alignments. My
first suggestion would be check your phrase table and see if that target
phrase has alignments.  If the target phrase has alignments then it is
probably lost during the the phrase table binarization. I would suggest
that you use a different phrase table or set EMS to use a single thread,
which should help avoid the problem.

Cheers,

Nick

On Wed, Mar 16, 2016 at 2:49 PM, Jeremy Gwinnup  wrote:

> Hi,
>
> I’m attempting to use a BilingualNPLM (trained per the recipe on the moses
> website) in decoding - I get ‘A target phrase with no alignments detected!’
> error. All data used in training the model were products of a training run
> in EMS. I’m using the recommended NPLM settings with the exception of
> setting the input embedding to 750.
>
> Any ideas as to if I need to train differently?
>
> Thanks!
> -Jeremy
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Blingual neural lm, log-likelihood: -nan

2015-09-21 Thread Nikolay Bogoychev
Hey Jian,

I have encountered this problem with nplm myself and couldn't really find a
solution that works every time.

Basically what happens is that there is a token that occurs very frequently
on the same position and it's weights become huge and eventually not a
number which propagates to the rest of the data. This usually happens with
the beginning of sentence token especially if your source and target size
contexts are big. One thing you could do is to decrease the source and
target size context (doesn't always work). Another thing you could do is to
lower the learning rate (always works, but you might need to set it quite
low like 0.25)

The proper solution to this according to Ashish Vasvani who is the creator
of nplm is to use gradient clipping which is commented out in his code. You
should contact him because this is a nplm issue.

Cheers,

Nick

On Sat, Sep 19, 2015 at 8:58 PM, jian zhang  wrote:

> Hi all,
>
> I got
>
> Epoch 
> Current learning rate: 1
> Training minibatches: Validation log-likelihood: -nan
>perplexity: nan
>
> during bilingual neural lm training.
>
> I use command:
> /home/user/tools/nplm-master-rsennrich/src/trainNeuralNetwork --train_file
> work_dir/blm/train.numberized --num_epochs 30 --model_prefix
> work_dir/blm/train.10k.model.nplm --learning_rate 1 --minibatch_size 1000
> --num_noise_samples 100 --num_hidden 2 --input_embedding_dimension 512
> --output_embedding_dimension 192 --num_threads 6 --loss_function log
> --activation_function tanh --validation_file work_dir/blm/valid.numberized
> --validation_minibatch_size 10
>
> where train.numberized and valid.numberized files are splitted from the
> file generated by
> script ${moses}/scripts/training/bilingual-lm/extract_training.py.
>
> Training/Validation numbers are:
> Number of training instances: 4128195
> Number of validation instances: 217274
>
>
> Thanks,
>
> Jian
>
>
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)
> 
> Dublin City University 
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] nplm ngram total order in ems

2015-08-01 Thread Nikolay Bogoychev
Hey John,

This is correct. So imagine the situation of order 5 and source window 4:
 s-4s-3s-2s-1s0s+1s+2s+3s+4t-4t-3t-2t-1t0
t0 is aligned to s0 and your source window is 4: 4 tokens before and
after s0, which results in a 14gram in total.

Cheers,

Nick

On Sat, Aug 1, 2015 at 4:30 PM, John Joseph Morgan 
johnjosephmor...@gmail.com wrote:

 I’m trying to run the toy bilingualnplm example with ems.
 The ngram order gets computed in experiment.perl on line 1868.
 The formula is:
 $order + 2 * $source_window + 1
 If $order is 5 and $source_window is 4 this formula gives 14.
 Is this correct?
 It doesn't seem right.

 John
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] ProbingPT creation and factor support

2015-04-22 Thread Nikolay Bogoychev
Hey,

Probingpt supports reading from gzipped files but I don't think it supports
factors. I didn't code for factors specifically anyways. I am not sure if
the factor support has to be built into the phrase table or is independent
of it and is part of moses.

Cheers,

Nick
On 22 Apr 2015 6:03 pm, Jeremy Gwinnup jer...@gwinnup.org wrote:

 Hi,

 I’ve got a 2-part question:

 Does ProbingPT work on gzip’d phrase tables, and if so, does it support
 phrase tables with multiple factors?

 Thanks!
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bilingual LM (nan nan nan)

2015-04-21 Thread Nikolay Bogoychev
Hey Marwa,
We have been having this problem with NPLM and we have found no real
solution. There were couple of threads on the mailing list with this
problem so far. Basically the solution that we use is to lower the learning
rate (from 1 to .5. If .5 doesn't work to .25 and so on) and increase the
number of generations that you produce because of it. Alternatively you may
try to use the experimental gradient clipping code that Ashish implemented.
Here's a quote of his email:

 You should be able to download the version of the nplm where the updates
 (gradient*learning_rate) are clipped between +5 and -5
 http://www.isi.edu/~avaswani/nplm_clipped.tar.gz
 If you want to change the magnitude of the update, please change it inside
 struct Clipper{
   double operator() (double x) const {
 return std::min(5., std::max(x,-5.));
 //return(x);
   }
 };

 in neuralClasses.h
 Right now, the clipping has been implemented only for standard SGD
 training, and not for adagrad or adadelta.


Cheers,

Nick

On Tue, Apr 21, 2015 at 6:17 AM, Marwa Refaie basmal...@hotmail.com wrote:

 Hi all

 When I train BilngualLM with large corpus it give 10 models.nplm filez
 with small numbers then alot if lines nan nan nan nan nan nan nan nan nan
 nan nan nan nan
 It works perfect with smaller corpus. Any suggestions plzzz

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] ProbingPT tests not building

2015-04-01 Thread Nikolay Bogoychev
Done.

Thanks for spotting it.

On Wed, Apr 1, 2015 at 4:49 PM, Jeroen Vermeulen 
j...@precisiontranslationtools.com wrote:

 On 01/04/15 22:29, Nikolay Bogoychev wrote:

  Those tests are indeed obsolete, I used them to test some behavior when
  I was building probingPT but the function in question became part of the
  HuffmanDecoder class as getTargetWordFromID
  You don't need to build or worry about those tests (there isn't a
  Jamfile in that directory) I still don't have __proper__ tests for
  probingPT.

 Thanks for the quick response!

 If they're not being built, maybe it's better to delete both the tests
 in ProbingPT/tess/ then, so that they don't give people a false sense of
 security?

 They'll still be in revision control if you want to refer to them for
 writing new tests.  Somebody else who wants to write tests won't know
 that, but in my experience, it's often better for people to start from
 scratch in that situation anyway.


 Jeroen

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] ProbingPT tests not building

2015-04-01 Thread Nikolay Bogoychev
Hey Jeroen,

Those tests are indeed obsolete, I used them to test some behavior when I
was building probingPT but the function in question became part of the
HuffmanDecoder class as getTargetWordFromID
You don't need to build or worry about those tests (there isn't a Jamfile
in that directory) I still don't have __proper__ tests for probingPT.

Cheers,

Nick

On Wed, Apr 1, 2015 at 2:39 PM, Jeroen Vermeulen 
j...@precisiontranslationtools.com wrote:

 Here's another case where I may be breaking things because I'm building
 manually, or something might be genuinely wrong.  For me,
 moses/TranslationModel/ProbingPT/tests/vocabid_test.cpp doesn't build.

 The problem is that main() calls getStringFromID(), which is not defined
 anywhere in the codebase that I can see.  Obsolete test?


 Jeroen
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Unicode Issues when Using Compact Phrase Table, Binaries vs. Own Build

2015-03-30 Thread Nikolay Bogoychev
Hey Венци,

Did you by any chance binarize your phrase tables from a raw text format or
from gunzip (or any other supported compressed text formats)? I recently
run into similar issues with my phrase table (ProbingPT)  if the input
phrase table had not been compressed during binary creation. I wasn't able
to trace the issue, i just make sure I gz any phrase table before
binarizing.

Cheers,

Nick

On Mon, Mar 30, 2015 at 10:11 AM, Marcin Junczys-Dowmunt junc...@amu.edu.pl
 wrote:

  Forgot to add that we use the compact phrase table and Moses on older
 and newer Ubuntu version with Arabic, Chinese, Korean, Japanese, Russian in
 both directions and no problems. Those puny German umlauts should not be a
 challenge. :)

 W dniu 30.03.2015 o 11:08, Marcin Junczys-Dowmunt pisze:

 Hi,
 the phrase-table and as far as I know Moses in general are
 unicode-agnostic, as long as you use utf-8. Input is handled as raw byte
 sequences, most of the time there are numeric identifiers only.
 Sounds more like a couple of messed up systems on your side, especially
 the part where self-compiled systems work or don't work. Cannot give you
 much more insight, unfortunately.
 Best,
 Marcin

 W dniu 30.03.2015 o 10:53, Венцислав Жечев (Ventsislav Zhechev) pisze:

 Hi all,

  I’m having this really weird Unicode issue when using compact phrase
 tables that could be related to endianness somehow, but I’ve no idea how.
 I compiled the training tools from v3 on my Mac and built a few models
 using compact phrase (and reordering) tables and KenLM, including (for
 simplicity) a recasing model for DE (download it from
 https://autodesk.box.com/DE-Recaser). Things become strange when I try to
 use the models, though:
 1. All works fine when I use the decoder binary I compiled myself on the
 Mac (10.10.2, self-built Boost 1.57)
  2. Unicode input is not recognised when I use the binary from
 http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/ i.e.
 words like ‘für’ or ‘ausführlich’ are marked as UNK.
 3. Unicode input is not recognised when I use a binary I compiled myself
 on Ubuntu 12.04.5 (self-built Boost 1.57)
 4. All  works fine when I use the binary from
 http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/

  I tested the above with the queryPhraseTableMin tool (rather than the
 decoder) and got the same results, which is what makes me think this could
 be somehow related to binary incompatibility with the way the phrase table
 is compacted. Haven’t investigated deeper than that, though.


  Any clues?
 One would say, just use the Linux binary then on Linux... However, I have
 a number of CentOS/RHEL 5 and 6 boxes, where the pre-compiled binary
 doesn’t work, as the system glibc is too old. So there I need to compile
 Moses myself, but then Unicode isn’t recognised...



  Cheers,

   Ventzi

  –––
 *Dr. Ventsislav Zhechev*
 Computational Linguist, Certified ScrumMaster®
 Platform Architecture and Technologies
 Localisation Services

  *MAIN* +41 32 723 91 22
 *FAX* +41 32 723 93 99

  *http://VentsislavZhechev.eu http://VentsislavZhechev.eu*

  *Autodesk, Inc.*
 Rue de Puits-Godet 6
 2000 Neuchâtel, Switzerland
 *www.autodesk.com http://www.autodesk.com/*





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support




 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Forbidden link to binaries

2015-03-19 Thread Nikolay Bogoychev
Hey Per,

The link seems to be outdated, as it points to RELEASE-1.0. You can find
the current ones here:
http://www.statmt.org/moses/RELEASE-3.0/binaries/

Cheers,

Nick

On Thu, Mar 19, 2015 at 2:22 PM, Per Tunedal per.tune...@operamail.com
wrote:

 Hi,
 I just read the page http://www.statmt.org/moses/?n=Moses.Releases and
 tried the link to the binaries:

 All the binary executables are made available for download for users who
 do not wish to compile their own version.

 Clicking on download gets me to the page
 http://www.statmt.org/moses/RELEASE-1.0/binaries/
 showing the message:

 Forbidden

 You don't have permission to access /moses/RELEASE-1.0/binaries/ on this
 server.

 Yours,
 Per Tunedal
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] where is premultiply member of class neuralLM ?

2015-02-08 Thread Nikolay Bogoychev
Hey,
I replied to your previous query but I got mail delivery failiure
notification, so I am trying again:

In order to use NPLM with moses you should use this fork of NPLM:
https://github.com/rsennrich/nplm

Cheers,

Nick

On Sun, Feb 8, 2015 at 9:38 AM, Jianri Li skywal...@postech.ac.kr wrote:

  Hi, all

 I resend the mail in case my description is not clear.

 When I compile MOSES with nplm, I got a error message like following:

 moses/LM/NeuralLMWrapper.cpp:37:22: error: ‘class nplm::neuralLM’ has no
 member named ‘premultiply’

 then I looked up the file moses/LM/NeuralLMWrapper.cpp, found the code
 like this:
 ---
 #include NeuralLMWrapper.h
 #include neuralLM.h

 ... ...

   m_neuralLM_shar! ed = new nplm::neuralLM();
   m_neuralLM_shared-read(m_filePath);
   m_neuralLM_shared-premultiply();

 ... ...

 --
 obviously it calls the class member premultiply of class neuralLM,
 which is used for pre-computation if there is only one hidden layer.
 However when I go back to the nplm folder and found none of the following
 header files or cpp files contain any member named premultiply,

 neuralClasses.h,
 neuralLM.h,
 neuralNetwork.h,

 Of course the nplm and moses are both lastest version.
 I am now really confused about this.
 I know the moses has supported the nplm for several months already, but I
 cannot find any similar problem in moses mail-list history or through
 Googling.
 Did I missed something or shoud I write the premulply member by myself?
 I guess it is not a serious problem and I just didn't get it.
 Thank you for your attention.

 Helson

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Error while compile with nplm

2015-02-07 Thread Nikolay Bogoychev
Hey,

In order to use NPLM with moses you should use this fork of NPLM:
https://github.com/rsennrich/nplm

Cheers,

Nick

On Sat, Feb 7, 2015 at 6:22 PM, Jianri Li skywal...@postech.ac.kr wrote:

  Hi, moses users
   I was trying to compile with nplm, but I got some errors like this:
 --
 moses/LM/NeuralLMWrapper.cpp:37:22: error: ‘class nplm::neuralLM’ has no
 member named ‘premultiply’
 ! ---
   I check the source code in nplm, actually there is no premultiply ...
   I downloaded the code from http://nlg.isi.edu/software/nplm/ which is
 listed in the moses homepage (NPLM) .
  And my compile option is :  ./bjam --with-boost=/path-to/boost_1_55_0
 --with-cmph=/path-to/cmph-2.0 --with-nplm=/path-to/nplm -j24
  If you have any idea, please help me.
  Thank you.

 Helson


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] nplm Building LM

2015-01-12 Thread Nikolay Bogoychev
Hey,

Refer to the moses documentation on how to use NPLM LM during decoding:
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc31
In particular you need to add this:

NeuralLM factor=factor order=order path=filename


To your moses.ini where filename is model.NUMBER.

The 10 files model.1, model.2 etc are the neural network LM output after
each iteration/generation of training. So model.1 is the first generation
and model.10 is the 10th generation.

Cheers,

Nick


On Mon, Jan 12, 2015 at 12:05 AM, Marwa Refaie basmal...@hotmail.com
wrote:





  Hi

 Please I need any step by step tutorial for the nplm .
 I compiled the package  make trainnueralnetworkLM , then I got the
 validation.ngrams  train.ngrams then I got 10 files , model.1 model.2
 ... model.10.

 I ran the ./bjam --with nplm 

 Then what to do next now ??

 Please any help ??

 *Marwa N. Refaie*


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] how to compile with nplm library

2014-12-29 Thread Nikolay Bogoychev
Hey,

First you need to checkout and compile this fork of nplm:
https://github.com/rsennrich/nplm

Then you need to compile moses with nplm switch:
./bjam --with-nplm=path/to/nplm

Then you can see how to use it here
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc31
On 30 Dec 2014 06:28, Xiaoqiang Feng feng.x.q.2...@gmail.com wrote:

 Hi,

 nplm is one toolkit of neural probabilistic language model. This toolkit
 can be used in Moses for language model and bilingual LM(neural network
 joint model, ACL 2014). These two parts have been updated in github
 mosesdecoder.

 If you want to use nplm in Moses, you have to compile Moses by linking
 libnplm.a (generated by nplm).
 Here is the probelm : how to compile Moses with libnplm.a ? Do I need to
 modify the Jamroot file and how to modify ?

 Thanks,
 Xiaoqiang Feng

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey,

BilingualLM is implemented and as of last week resides within moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently there are
two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
need to implement the interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the switch
-with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the switch
-with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm

Unfortunately documentaiton is not yet available so here's a short summary
how to train a model and use it using, the nplm backend:
Use the extract training script to prepare aligned bilingual corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

You need the following options:

-e, --target-language, type=string, dest=target_language)
//Mandatory, for example es -f, --source-language, type=string,
dest=source_language) //Mandatory, for example en -c, --corpus,
type=string, dest=corpus_stem) // path/to/corpus In the directory you
have specified there should be files corpus.sourcelang and
corpus.targetlang -t, --tagged-corpus, type=string,
dest=tagged_stem) //Optional for backoff to pos tag -a, --align,
type=string, dest=align_file) //Mandatory alignemtn file -w,
--working-dir, type=string, dest=working_dir) //Output directory of
the model -n, --target-context, type=int, dest=n) / -m,
--source-context, type=int, dest=m) //The actual context size is 2*m
+ 1, this is the number of words on both left and right -s,
--prune-source-vocab, type=int, dest=sprune) //cutoff vocabulary
threshold -p, --prune-target-vocab, type=int, dest=tprune) //cutoff
vocabulary threshold
Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py

Example execution is: train_nplm.py -w de-en-500250source/ -r
de-en150nopos-source750 -n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c corpus.1.word
-i 750 -o 750

where -i and -o are input and output embeddings
 -n is the total ngram size
 -d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model

Consult the python script for more detailed description of the options

After you have done that in the output directory you should have a trained
bilingual Neural Network language model

To run it in moses as a feature function you need the following line:

BilingualNPLM
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9
source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe

The source and target vocab is located in the working directory used to
prepare the neural network language model.
target_ngrams doesn't include the predicted word (so target_ngrams = 4,
would mean 1 word predicted and 4 target context word)
The total of the model would target_ngrams + source_ngrams + 1)

I will write a proper documentation  in the following weeks. If you have
any problems runnning it, please consult me.

Cheers,

Nick




On Wed, Nov 26, 2014 at 11:53 AM, Tom Hoar 
tah...@precisiontranslationtools.com wrote:

  Hieu,

 Sorry I missed you in Vancouver. I just reviewed your slide deck from the
 MosesCore TAUS Round Table in Vancouver
 (taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh).


 In particular, I'm interested in the Bilingual Language Models that
 replicate Delvin et al, 2014. A search on statmt.org/moses doesn't show
 any hits searching for delvin. So, A) is the code finished? If so B) are
 there any instructions how to enable/use this feature? If not, C) what kind
 of help do you need to test the code for release?

 --

 Best regards,
 Tom Hoar
 Managing Director
 *Precision Translation Tools Co., Ltd.*
 Bangkok, Thailand
 Web: www.precisiontranslationtools.com
 Mobile: +66 87 345-1875
 Skype: tahoar

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Fix formatting...

Hey,

BilingualLM is implemented and as of last week resides within moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently there are
two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
need to implement the interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the switch
-with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the switch
-with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm

Unfortunately documentaiton is not yet available so here's a short summary
how to train a model and use it using, the nplm backend:
Use the extract training script to prepare aligned bilingual corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

You need the following options:

-e, --target-language, type=string, dest=target_language)
//Mandatory, for example es -f, --source-language, type=string,
dest=source_language) //Mandatory, for example en -c, --corpus,
type=string, dest=corpus_stem) // path/to/corpus In the directory you
have specified there should be files corpus.sourcelang and
corpus.targetlang -t, --tagged-corpus, type=string,
dest=tagged_stem) //Optional for backoff to pos tag -a, --align,
type=string, dest=align_file) //Mandatory alignemtn file -w,
--working-dir, type=string, dest=working_dir) //Output directory of
the model -n, --target-context, type=int, dest=n) / -m,
--source-context, type=int, dest=m) //The actual context size is 2*m
+ 1, this is the number of words on both left and right -s,
--prune-source-vocab, type=int, dest=sprune) //cutoff vocabulary
threshold -p, --prune-target-vocab, type=int, dest=tprune) //cutoff
vocabulary threshold

Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py

Example execution is:

train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c corpus.1.word
-i 750 -o 750

where -i and -o are input and output embeddings
 -n is the total ngram size
 -d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model

Consult the python script for more detailed description of the options

After you have done that in the output directory you should have a trained
bilingual Neural Network language model

To run it in moses as a feature function you need the following line:

BilingualNPLM 
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9 source_vocab=/mnt/gna0/
nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe

The source and target vocab is located in the working directory used to
prepare the neural network language model.
target_ngrams doesn't include the predicted word (so target_ngrams = 4,
would mean 1 word predicted and 4 target context word)
The total of the model would target_ngrams + source_ngrams + 1)

I will write a proper documentation  in the following weeks. If you have
any problems runnning it, please consult me.

Cheers,

Nick


On Wed, Nov 26, 2014 at 1:02 PM, Nikolay Bogoychev nhe...@gmail.com wrote:

 Hey,

 BilingualLM is implemented and as of last week resides within moses
 master:
 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

 To compile it you need a NeuralNetwork backend for it. Currently there are
 two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
 need to implement the interface as shown here:

 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

 To compile with oxlm backend you need to compile moses with the switch
 -with-oxlm=/path/to/oxlm
 To compile with nplm backend you need to compile moses with the switch
 -with-nplm=/path/to/nplm (You need this fork of nplm
 https://github.com/rsennrich/nplm

 Unfortunately documentaiton is not yet available so here's a short summary
 how to train a model and use it using, the nplm backend:
 Use the extract training script to prepare aligned bilingual corpus:
 https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

 You need the following options:

 -e, --target-language, type=string, dest=target_language)
 //Mandatory, for example es -f, --source-language, type=string,
 dest=source_language) //Mandatory, for example en -c, --corpus,
 type=string, dest=corpus_stem) // path/to/corpus In the directory you
 have specified there should be files corpus.sourcelang and
 corpus.targetlang -t, --tagged-corpus, type

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey, Tom

1) It's independent. You just add -with-oxlm and -with-nplm to the stack
2) Yes, they are both thread safe, you can run the decoder with however
many threads you wish.
3) It doesn't create a separate binary. The compilation flag adds a new
feature inside moses that is called BilingualNPLM and you have to add it to
your moses.ini with a weight.
4) That depends on the vocabulary size used. With 16k source 16k target
about 100 megabytes. With 50 about 1.5 gigabytes.

Beware that the memory requirements during decoding are much larger,
because of premultiplication. If you have memory issues supply
premultiply=false to the BilingualNPLM line in moses.ini, but this is
likely going to slow down decoding by a lot.


Cheers,

Nick

On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar 
tah...@precisiontranslationtools.com wrote:

  Thanks Nikolay! This is a great start. I have a few clarification
 questions.

 1) does this replace or run independently of traditional language models
 like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
 -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
 to the stack or are they exclusive?

 2) It looks like your branch of nplm is thread-safe. Is oxlm also
 thread-safe?

 3) You say, To run it in moses as a feature function... Does that mean
 compiling with your above option(s) creates a new runtime binary 
 BilingualNPLM that replaces the moses binary, much like moseschart and
 mosesserver? Or, does BilingualNPLM run in a separate process that the
 Moses binary accesses during runtime?

 4) How large do these LM files become? Are they comparable to traditional
 ARPA files, larger or smaller? Also, are they binarized with mmap reads or
 do they have to load into RAM?

 Thanks,
 Tom





 On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:

  Fix formatting...

  Hey,

  BilingualLM is implemented and as of last week resides within moses
 master:
 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

  To compile it you need a NeuralNetwork backend for it. Currently there
 are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
 you need to implement the interface as shown here:

 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

  To compile with oxlm backend you need to compile moses with the switch
 -with-oxlm=/path/to/oxlm
 To compile with nplm backend you need to compile moses with the switch
 -with-nplm=/path/to/nplm (You need this fork of nplm
 https://github.com/rsennrich/nplm

  Unfortunately documentaiton is not yet available so here's a short
 summary how to train a model and use it using, the nplm backend:
 Use the extract training script to prepare aligned bilingual corpus:
 https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

  You need the following options:

  -e, --target-language, type=string, dest=target_language)
 //Mandatory, for example es -f, --source-language, type=string,
 dest=source_language) //Mandatory, for example en -c, --corpus,
 type=string, dest=corpus_stem) // path/to/corpus In the directory you
 have specified there should be files corpus.sourcelang and
 corpus.targetlang -t, --tagged-corpus, type=string,
 dest=tagged_stem) //Optional for backoff to pos tag -a, --align,
 type=string, dest=align_file) //Mandatory alignemtn file -w,
 --working-dir, type=string, dest=working_dir) //Output directory of
 the model -n, --target-context, type=int, dest=n) / -m,
 --source-context, type=int, dest=m) //The actual context size is 2*m
 + 1, this is the number of words on both left and right -s,
 --prune-source-vocab, type=int, dest=sprune) //cutoff vocabulary
 threshold -p, --prune-target-vocab, type=int, dest=tprune) //cutoff
 vocabulary threshold

  Then, use the training script to train the model:
 https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py

 Example execution is:

  train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16 -d
 0 --nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
 corpus.1.word -i 750 -o 750

 where -i and -o are input and output embeddings
  -n is the total ngram size
  -d is the number of hidden layyers
  -w and -c are the same as the extract_training options
  -r is the output directory of the model

 Consult the python script for more detailed description of the options

 After you have done that in the output directory you should have a trained
 bilingual Neural Network language model

 To run it in moses as a feature function you need the following line:

 BilingualNPLM 
 filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
 target_ngrams=4 source_ngrams=9 source_vocab=/mnt/gna0/
 nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
 target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-
 enIWSLTnopos/vocab.targe

 The source and target vocab is located

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey,

I can only answer 5-7
5. The alignment file is the one that's usually
called aligned.1.grow-diag-final-and and contains lines such as:

0-0 1-1 2-2 3-3
0-0 1-1 2-2 3-3 4-4

6. Yes. Basically prune vocab value of 16000 would take the 16000 most
common words in the corpus and discard the rest (replace them with UNK)
7. Yes

Cheers,

Nick

On Wed, Nov 26, 2014 at 3:44 PM, Tom Hoar 
tah...@precisiontranslationtools.com wrote:

  Thanks again. It's very useful feedback. We're now preparing to move from
 v1.0 to 3.x. We skipped Moses 2.x. So, I'm not familiar with the new
 moses.ini syntax.

 Here are some more questions to help us get started playing with the
 extract_training.py options:

1. I'm assuming corpus.e and corpus.f are the same prepared corpus
files as used in train-model.perl?
2. Is it possible for corpus.e and corpus.f to be different from the
train-model.perl corpus, for example a smaller random sampling?
 3. The corpus files are tokenized and lower-cased and escaped the
same.
4. Do the corpus files also need to enforce clean-corpus-n.perl max
tokens (100) and ratio (9:1) for src  tgt? These address (M)GIZA++ limits
and might not apply to BilingualLM. However, are there advantages to using
the limits or disadvantages to overriding them? I.e. can these corpus files
include lines that are filtered with clean-corpus-n.perl?
 5. What is the --align value? Is it the output of train-model.perl
step 3 or an file with word alignments for each line of the corpus.e and
corpus.f pair?
6. Re --prune-source-vocab  --prune-target-vocab, do these thresholds
set the size of the vocabulary you reference in #4 below (i.e. 16K, 500K,
etc)?
7. Re --source-context  --target-context, are these the BilingualLM
equivalents to a typical LM's order or ngrams for each?
8. Re --tagged-corpus, is this for POS factored corpora?

 Thanks.



 On 11/26/2014 09:27 PM, Nikolay Bogoychev wrote:

 Hey, Tom

  1) It's independent. You just add -with-oxlm and -with-nplm to the stack
 2) Yes, they are both thread safe, you can run the decoder with however
 many threads you wish.
 3) It doesn't create a separate binary. The compilation flag adds a new
 feature inside moses that is called BilingualNPLM and you have to add it to
 your moses.ini with a weight.
 4) That depends on the vocabulary size used. With 16k source 16k target
 about 100 megabytes. With 50 about 1.5 gigabytes.

  Beware that the memory requirements during decoding are much larger,
 because of premultiplication. If you have memory issues supply
 premultiply=false to the BilingualNPLM line in moses.ini, but this is
 likely going to slow down decoding by a lot.


  Cheers,

  Nick

 On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar 
 tah...@precisiontranslationtools.com wrote:

  Thanks Nikolay! This is a great start. I have a few clarification
 questions.

 1) does this replace or run independently of traditional language models
 like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
 -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
 to the stack or are they exclusive?

 2) It looks like your branch of nplm is thread-safe. Is oxlm also
 thread-safe?

 3) You say, To run it in moses as a feature function... Does that mean
 compiling with your above option(s) creates a new runtime binary 
 BilingualNPLM that replaces the moses binary, much like moseschart and
 mosesserver? Or, does BilingualNPLM run in a separate process that the
 Moses binary accesses during runtime?

 4) How large do these LM files become? Are they comparable to traditional
 ARPA files, larger or smaller? Also, are they binarized with mmap reads or
 do they have to load into RAM?

 Thanks,
 Tom





 On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:

  Fix formatting...

  Hey,

  BilingualLM is implemented and as of last week resides within moses
 master:
 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

  To compile it you need a NeuralNetwork backend for it. Currently there
 are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
 you need to implement the interface as shown here:

 https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

  To compile with oxlm backend you need to compile moses with the switch
 -with-oxlm=/path/to/oxlm
 To compile with nplm backend you need to compile moses with the switch
 -with-nplm=/path/to/nplm (You need this fork of nplm
 https://github.com/rsennrich/nplm

  Unfortunately documentaiton is not yet available so here's a short
 summary how to train a model and use it using, the nplm backend:
 Use the extract training script to prepare aligned bilingual corpus:
 https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

  You need the following options:

  -e, --target-language, type=string, dest=target_language

Re: [Moses-support] Moses profiling

2014-09-15 Thread Nikolay Bogoychev
If you want to use google-perftools for profiling
http://code.google.com/p/gperftools/wiki/GooglePerformanceTools
compile moses with:
./bjam --full-tcmalloc link=shared

On Sat, Sep 13, 2014 at 9:54 PM, Arturo Argueta arturoargu...@gmail.com
wrote:

 Is there any way to enable profiling on moses? I've heard that one
 modification on one bjam can enable profiling on moses

 Thanks

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support