Re: [Moses-support] moses-chart binary missing?

2014-11-26 Thread Eric Baucom
that works, thanks!

On Wed, Nov 26, 2014 at 4:20 PM, Hieu Hoang  wrote:

> There should be a softlink which points the file moses. Moses and
> Moses_chart are being merged. Everything in the tutorial should still work
> On 26 Nov 2014 21:16, "Eric Baucom"  wrote:
>
>> I am interested in experimenting with tree-to-tree translations, so I
>> recently installed Moses according to the guidelines here:
>> http://www.statmt.org/moses/?n=Development.GetStarted .
>>
>> The installation completed successfully, and I am able to successfully
>> translate using the sample models as described in the same web page, with
>> the regular "moses" binary.  However, my installation is missing the
>> "moses-chart" binary, which I believe is necessary to do any tree-to-tree
>> translation.  Is this an addition step of the installation?  I didn't see
>> any options about it in the documentation.
>>
>> Thanks,
>> Eric Baucom
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Moses for Mere Mortals: Update

2014-11-26 Thread João Luis Rosas
Hi,

There was recently a message about problems compiling Moses for Mere Mortals
in Ubuntu 14.04
(http://thread.gmane.org/gmane.comp.nlp.moses.user/11390/focus=11406).

We therefore have decided to review the whole package, added some slight
changes to it (in the end, a good deal of them), made it a bit more robust
and tested it in both Ubuntu 14.04 and Ubuntu 12.04. Thanks to Radian
Yazynin for having pointed us the problem. We will contact him separately
through e-mail.

You can find the result in https://github.com/jladcr/Moses-for-Mere-Mortals .

Greetings,

-- João Rosas

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem at a training stage - linguistic model not correctly trained

2014-11-26 Thread João Luis A. d. C. Rosas

Radian Yazynin  writes:

> 
> 
> Dear Support staff members,
> I will appreciate if you help me with one issue.
> These are my first steps...
> I am trying out a Moses-For-Mere-Mortals package (Moses + IRSTLM + RandLM
+  MGIZA)
> in Ubuntu 14.04.1 64bit.
> Compilation resulted in one error only: MGIZA didn't install correctly.
> After that I downloaded MGIZA as suggested at http://www.statmt.org/moses/
and installed it separately 
> into the folder where the erroneous MGIZA existed (in order to run the
scripts correctly).
> The preparatory step with test files in PT and EN (as included) was OK.
> Now that I try to use the test corpora with ./train-1.11 the training
procedure starts alright but
> after about 10 minutes terminates with the following:
> compile-lm: lmtable.cpp:237: int parseline(std: :istream&, int, 
> ngram&, float&, float&): Assertion 'howmany == (Order+ 1) || ho
> wmany == (Order + )'' failed.
> Aborted (core dumped)
> *** Writing training summary
> Linguistic model not correctly trained. Exiting...
> Thanks so much for you help!Kind regardsRadian Yazynin
> Tula, Russia
> 
> 

Hi Radian,

Thanks for having pointed this out. Here is what you were looking for :-) :
https://github.com/jladcr/Moses-for-Mere-Mortals .

This is in a new URL, supports Ubuntu 14.04 and has some improvements
(described in the documentation).

Greetings,

João



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Marcin Junczys-Dowmunt
Thanks, that's a very useful answer. I figured something similar, but I 
was curious how come these huge differences between the methods are 
never reported anywhere. Even in your paper they are just a few percent.


Also, could it be that the default METEOR setting is slighlty 
overfitting to the WMT ranking task? I have the impression that for 
systems that have generally higher BLEU scores than WMT systems (beyond 
45% BLEU) METEOR seems to flatten out, barely changing values, while 
BLEU differences are 4-6% absolute. This is not happening for BLEU 
values around 20-30%, METEOR scales nearly linearly in that range, 
following BLEU scores quite closely.

Cheers,
Marcin

W dniu 26.11.2014 o 22:31, Michael Denkowski pisze:

Hi Marcin,

Meteor scores can vary widely across tasks due to the training data 
and goal.  The default ranking task tries to replicate WMT rankings, 
so the absolute scores are not as important as the relative scores 
between systems.  The adequacy task tries to fit Meteor scores to 
numeric adequacy judgements as linearly as possible.  If you're 
looking to evaluate a system in isolation to see if the translations 
are "good", you can simulate an adequacy scale with the "adq" task.  
If you're comparing multiple systems, you should get the most reliable 
ranking with the default "rank" task, but the absolute scores will be 
less meaningful.


Best,
Michael

On Wed, Nov 26, 2014 at 9:34 AM, Marcin Junczys-Dowmunt 
mailto:junc...@amu.edu.pl>> wrote:


Hi,

A question concerning METEOR, maybe someone has some experience. I
am seeing huge differences between values for English with the
defauly task "ranking" and any other of the tasks (e.g. "adq"). up
to 30-40 points. Is this normal? In the literature I only ever see
marginal differences of maybe 1 or 2 per cent but nothing like 35%
vs. 65%. For the language independent setting is still get a score
of 55%.

See for instance:
http://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-wmt11.pdf
 for
the Urdu-English system for much smaller differences between
"ranking" and "adq". I get the same discrepancies with
meteor-1.3.jar and meteor-1.5.jar

Cheers,

Marcin


___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Michael Denkowski
Hi Marcin,

Meteor scores can vary widely across tasks due to the training data and
goal.  The default ranking task tries to replicate WMT rankings, so the
absolute scores are not as important as the relative scores between
systems.  The adequacy task tries to fit Meteor scores to numeric adequacy
judgements as linearly as possible.  If you're looking to evaluate a system
in isolation to see if the translations are "good", you can simulate an
adequacy scale with the "adq" task.  If you're comparing multiple systems,
you should get the most reliable ranking with the default "rank" task, but
the absolute scores will be less meaningful.

Best,
Michael

On Wed, Nov 26, 2014 at 9:34 AM, Marcin Junczys-Dowmunt 
wrote:

>  Hi,
>
> A question concerning METEOR, maybe someone has some experience. I am
> seeing huge differences between values for English with the defauly task
> "ranking" and any other of the tasks (e.g. "adq"). up to 30-40 points. Is
> this normal? In the literature I only ever see marginal differences of
> maybe 1 or 2 per cent but nothing like 35% vs. 65%. For the language
> independent setting is still get a score of 55%.
>
> See for instance:
> http://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-wmt11.pdf for the
> Urdu-English system for much smaller differences between "ranking" and
> "adq". I get the same discrepancies with meteor-1.3.jar and meteor-1.5.jar
>
> Cheers,
>
> Marcin
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] moses-chart binary missing?

2014-11-26 Thread Hieu Hoang
There should be a softlink which points the file moses. Moses and
Moses_chart are being merged. Everything in the tutorial should still work
On 26 Nov 2014 21:16, "Eric Baucom"  wrote:

> I am interested in experimenting with tree-to-tree translations, so I
> recently installed Moses according to the guidelines here:
> http://www.statmt.org/moses/?n=Development.GetStarted .
>
> The installation completed successfully, and I am able to successfully
> translate using the sample models as described in the same web page, with
> the regular "moses" binary.  However, my installation is missing the
> "moses-chart" binary, which I believe is necessary to do any tree-to-tree
> translation.  Is this an addition step of the installation?  I didn't see
> any options about it in the documentation.
>
> Thanks,
> Eric Baucom
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] moses-chart binary missing?

2014-11-26 Thread Eric Baucom
I am interested in experimenting with tree-to-tree translations, so I
recently installed Moses according to the guidelines here:
http://www.statmt.org/moses/?n=Development.GetStarted .

The installation completed successfully, and I am able to successfully
translate using the sample models as described in the same web page, with
the regular "moses" binary.  However, my installation is missing the
"moses-chart" binary, which I believe is necessary to do any tree-to-tree
translation.  Is this an addition step of the installation?  I didn't see
any options about it in the documentation.

Thanks,
Eric Baucom
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Unknown single words that are part of phrases

2014-11-26 Thread Matthias Huck
Hi,

Supposedly your phrase table does not contain an entry "Gitarre |||
guitar" because this word pair is always unaligned in your training
data. You could try to improve your word alignment quality.

Alternatively, you could implement a procedure in the manner of the
"forced single word heuristic" as described in: 
D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide
to Jane, an Open Source Hierarchical Translation Toolkit. The Prague
Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague,
Czech Republic, April 2011.
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf 
(see Fig. 1c).

But the latter would rather be a workaround.

Cheers,
Matthias


On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> Hello,
> 
> 
> If I am not wrong this is most likely due to the grow (-diag) method applied 
> to the word aligned data (both directions) before phrase extraction.
> 
> Furthermore. one word translations should exist (but not always) 
> search for them.
> 
> 
> 
> Regards.
> 
> 
> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH 
>  wrote:
> Hi,
> 
> I have observed many times that some words do not exist as single 
> word translations in the phrase table, although they exist in the training 
> corpus and in multiword phrases.
> An example:
> German-English translation for "Gitarre" is unknown, i.e. there is no 
> single word entry  for "Gitarre" in the phrase table, although some other 
> phrases containing this word exist (see below).
> How is it possible?
> Thanks and best regards,
> Vera
> 
> 
> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 
> 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 
> 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 
> 2.718 ||| ||| 4 1
> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 
> 2.718 ||| ||| 4 1
> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 
> 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 
> 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 
> 1 0.00163632 2.718 ||| ||| 1 1
> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.33 
> 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt , ||| of a guitar using ||| 0.33 
> 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt ||| of a guitar using ||| 0.33 0.00217827 
> 1 0.00471281 2.718 ||| ||| 3 1
> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 
> 0.0142097 2.718 ||| ||| 1 1
> wie eine elektrische Gitarre , ||| as an electric guitar ; ||| 1 
> 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> -- 
> Raj Dabre.
> Research Student, 
> 
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] CFP: NAACL HLT 2015 final call for papers

2014-11-26 Thread Saif Mohammad
[Apologies for the cross postings.]

NAACL HLT 2015 FINAL CALL FOR PAPERS
===

### The 2015 Conference of the North American Chapter of the Association
for Computational Linguistics - Human Language Technologies (NAACL HLT
2015) ###

### May 31 to June 5, 2015. ###
### Denver, Colorado, United States ###

### http://naacl.org/2015 ###

The NAACL HLT 2015 conference covers a broad spectrum of disciplines
aimed at: building intelligent systems to interact with humans using
natural language; understanding computational and other linguistic
properties of languages; and enhancing human-human communication
through speech recognition, automatic translation, information
retrieval, text summarization, and information extraction.

NAACL HLT 2015 will feature long papers, short papers, demonstrations,
and a student research workshop, as well as associated tutorials
and workshops. In addition, some of the presentations at the
conference will be of papers accepted for the new Transactions of
the ACL journal (http://www.transacl.org/).

The conference invites the submission of long and short papers on
substantial, original, and unpublished research in all aspects of
automated language processing, including language resources. The
short paper format may also be appropriate for a small, focused
contribution, a work in progress, a negative result, an opinion
piece or an interesting application nugget.  Topics include, but
are not limited to, the following areas:

* Dialogue and Interactive Systems
  (including spoken and multimodal dialogue systems)
* Discourse and Pragmatics
* Generation and Summarization
* Information Extraction and Question Answering
* Information Retrieval
* Language Resources and Evaluation
* Language and Vision
* Linguistic and Psycholinguistic Aspects of CL
* Machine Learning for NLP
* Machine Translation
* NLP for Web, Social Media and Social Sciences
* NLP-enabled Technology
* Phonology, Morphology and Word Segmentation
* Semantics
* Sentiment Analysis and Opinion Mining
* Spoken Language Processing
* Tagging, Chunking, Syntax and Parsing
* Text Categorization and Topic Models

Important Dates
---

* Deadline for BOTH Long and Short paper submission: Dec 4, 2014
* Author response period: Jan 22-28, 2015
* Notification to authors: Feb 20, 2015
* Camera ready papers due: Mar 20, 2015

All deadlines are 11:59PM Pacific Time. Please DO NOT submit the
same paper in long and short paper form. (See Multiple Submission
policy section.)

Submissions
---

### Long Papers ###

NAACL HLT 2015 submissions must describe substantial, original,
completed and unpublished work. Wherever appropriate, concrete
evaluation and analysis should be included. The long paper deadline
is December 4, 2014 by 11:59PM Pacific Standard Time (GMT-8).
Submissions will be judged on appropriateness, originality/innovativeness,
soundness/correctness, impact, meaningful comparison, thoroughness,
replicability and clarity. Each submission will be reviewed by at
least three program committee members.

Long papers may consist of up to eight (8) pages of content, plus
2 pages for references.  Upon acceptance, final versions of long
papers will be given one additional page (up to 9 pages with 2 pages
for references) so that reviewers' comments can be taken into
account.

Papers will be presented orally or as a poster presentation as
determined by the program committee.  The decisions as to which
papers will be presented orally and which as poster presentations
will be based on the nature rather than on the quality of the work.
There will be no distinction in the proceedings between long papers
presented orally and those presented as poster presentations.

### Short Papers ###

NAACL HLT 2015 also solicits short papers. Short paper submissions
must describe original and unpublished work. The short paper deadline
this year is also December 4, 2014 by 11:59PM Pacific Standard Time
(GMT-8). Characteristics of past short papers include:

* A small, focused contribution
* Work in progress
* A negative result
* An opinion piece
* An interesting application nugget

Short papers may consist of up to four (4) pages of content, plus
2 pages for references.  Upon acceptance, short papers will be given
five (5) pages in the proceedings and 2 pages for references.
Authors are encouraged to use this additional page to address
reviewers comments in their final versions.

Short papers will be presented in one or more oral or poster sessions.
While short papers will be distinguished from long papers in the
proceedings, there will be no distinction in the proceedings between
short papers presented orally and those presented as poster
presentations. Each short paper submission will be reviewed by at
least two program committee members.

### Electronic Submission ###

Papers should be submitted electronically using the Softconf START
conference management system at the following URL:

https://www.softconf.com

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Alexandra Birch
OK,

Here is 1-4:

1. You would normally train bilingual lm on the same corpus as the SMT
model, but it is not required
2. Yes, but there are also other ways to make training faster which you
might want to explore
3. Yes it is important that the bilingual lm corpus matches the format that
will be passed to it by the decoder at decoding time, or it will work as
well.
4. Yes it can include the sentences which were filtered by the training
scripts. You just need to have word alignments for them, and they do need
to be reasonably good translations of each other. So filter out the junk.

Lexi

On Wed, Nov 26, 2014 at 3:44 PM, Tom Hoar <
tah...@precisiontranslationtools.com> wrote:

>  Thanks again. It's very useful feedback. We're now preparing to move from
> v1.0 to 3.x. We skipped Moses 2.x. So, I'm not familiar with the new
> moses.ini syntax.
>
> Here are some more questions to help us get started playing with the
> extract_training.py options:
>
>1. I'm assuming corpus.e and corpus.f are the same prepared corpus
>files as used in train-model.perl?
>2. Is it possible for corpus.e and corpus.f to be different from the
>train-model.perl corpus, for example a smaller random sampling?
> 3. The corpus files are tokenized and lower-cased and escaped the
>same.
>4. Do the corpus files also need to enforce clean-corpus-n.perl max
>tokens (100) and ratio (9:1) for src & tgt? These address (M)GIZA++ limits
>and might not apply to BilingualLM. However, are there advantages to using
>the limits or disadvantages to overriding them? I.e. can these corpus files
>include lines that are filtered with clean-corpus-n.perl?
> 5. What is the --align value? Is it the output of train-model.perl
>step 3 or an file with word alignments for each line of the corpus.e and
>corpus.f pair?
>6. Re --prune-source-vocab & --prune-target-vocab, do these thresholds
>set the size of the vocabulary you reference in #4 below (i.e. 16K, 500K,
>etc)?
>7. Re --source-context & --target-context, are these the BilingualLM
>equivalents to a typical LM's order or ngrams for each?
>8. Re --tagged-corpus, is this for POS factored corpora?
>
> Thanks.
>
>
>
> On 11/26/2014 09:27 PM, Nikolay Bogoychev wrote:
>
> Hey, Tom
>
>  1) It's independent. You just add -with-oxlm and -with-nplm to the stack
> 2) Yes, they are both thread safe, you can run the decoder with however
> many threads you wish.
> 3) It doesn't create a separate binary. The compilation flag adds a new
> feature inside moses that is called BilingualNPLM and you have to add it to
> your moses.ini with a weight.
> 4) That depends on the vocabulary size used. With 16k source 16k target
> about 100 megabytes. With 50 about 1.5 gigabytes.
>
>  Beware that the memory requirements during decoding are much larger,
> because of premultiplication. If you have memory issues supply
> "premultiply=false" to the BilingualNPLM line in moses.ini, but this is
> likely going to slow down decoding by a lot.
>
>
>  Cheers,
>
>  Nick
>
> On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar <
> tah...@precisiontranslationtools.com> wrote:
>
>>  Thanks Nikolay! This is a great start. I have a few clarification
>> questions.
>>
>> 1) does this replace or run independently of traditional language models
>> like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
>> -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
>> to the stack or are they exclusive?
>>
>> 2) It looks like your branch of nplm is thread-safe. Is oxlm also
>> thread-safe?
>>
>> 3) You say, "To run it in moses as a feature function..." Does that mean
>> compiling with your above option(s) creates a new runtime binary "
>> BilingualNPLM" that replaces the moses binary, much like moseschart and
>> mosesserver? Or, does BilingualNPLM run in a separate process that the
>> Moses binary accesses during runtime?
>>
>> 4) How large do these LM files become? Are they comparable to traditional
>> ARPA files, larger or smaller? Also, are they binarized with mmap reads or
>> do they have to load into RAM?
>>
>> Thanks,
>> Tom
>>
>>
>>
>>
>>
>> On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:
>>
>>  Fix formatting...
>>
>>  Hey,
>>
>>  BilingualLM is implemented and as of last week resides within moses
>> master:
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>>
>>  To compile it you need a NeuralNetwork backend for it. Currently there
>> are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
>> you need to implement the interface as shown here:
>>
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>>
>>  To compile with oxlm backend you need to compile moses with the switch
>> -with-oxlm=/path/to/oxlm
>> To compile with nplm backend you need to compile moses with the switch
>> -with-nplm=/path/to/nplm (You need this fork of nplm
>> https://github.com/rsennri

Re: [Moses-support] Unknown single words that are part of phrases

2014-11-26 Thread Raj Dabre
Hello,

If I am not wrong this is most likely due to the grow (-diag) method
applied to the word aligned data (both directions) before phrase extraction.
Furthermore. one word translations should exist (but not always)
search for them.

Regards.

On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <
v.alek...@linguatec.de> wrote:

> Hi,
>
> I have observed many times that some words do not exist as single word
> translations in the phrase table, although they exist in the training
> corpus and in multiword phrases.
> An example:
> German-English translation for "Gitarre" is unknown, i.e. there is no
> single word entry  for "Gitarre" in the phrase table, although some other
> phrases containing this word exist (see below).
> How is it possible?
> Thanks and best regards,
> Vera
>
>
> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25
> 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1
> 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718
> ||| ||| 4 1
> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718
> ||| ||| 4 1
> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5
> 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5
> 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1
> 0.00163632 2.718 ||| ||| 1 1
> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.33
> 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt , ||| of a guitar using ||| 0.33 0.000309961 1
> 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt ||| of a guitar using ||| 0.33 0.00217827 1
> 0.00471281 2.718 ||| ||| 3 1
> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097
> 2.718 ||| ||| 1 1
> wie eine elektrische Gitarre , ||| as an electric guitar ; ||| 1
> 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey,

I can only answer 5-7
5. The alignment file is the one that's usually
called aligned.1.grow-diag-final-and and contains lines such as:

0-0 1-1 2-2 3-3
0-0 1-1 2-2 3-3 4-4

6. Yes. Basically prune vocab value of 16000 would take the 16000 most
common words in the corpus and discard the rest (replace them with UNK)
7. Yes

Cheers,

Nick

On Wed, Nov 26, 2014 at 3:44 PM, Tom Hoar <
tah...@precisiontranslationtools.com> wrote:

>  Thanks again. It's very useful feedback. We're now preparing to move from
> v1.0 to 3.x. We skipped Moses 2.x. So, I'm not familiar with the new
> moses.ini syntax.
>
> Here are some more questions to help us get started playing with the
> extract_training.py options:
>
>1. I'm assuming corpus.e and corpus.f are the same prepared corpus
>files as used in train-model.perl?
>2. Is it possible for corpus.e and corpus.f to be different from the
>train-model.perl corpus, for example a smaller random sampling?
> 3. The corpus files are tokenized and lower-cased and escaped the
>same.
>4. Do the corpus files also need to enforce clean-corpus-n.perl max
>tokens (100) and ratio (9:1) for src & tgt? These address (M)GIZA++ limits
>and might not apply to BilingualLM. However, are there advantages to using
>the limits or disadvantages to overriding them? I.e. can these corpus files
>include lines that are filtered with clean-corpus-n.perl?
> 5. What is the --align value? Is it the output of train-model.perl
>step 3 or an file with word alignments for each line of the corpus.e and
>corpus.f pair?
>6. Re --prune-source-vocab & --prune-target-vocab, do these thresholds
>set the size of the vocabulary you reference in #4 below (i.e. 16K, 500K,
>etc)?
>7. Re --source-context & --target-context, are these the BilingualLM
>equivalents to a typical LM's order or ngrams for each?
>8. Re --tagged-corpus, is this for POS factored corpora?
>
> Thanks.
>
>
>
> On 11/26/2014 09:27 PM, Nikolay Bogoychev wrote:
>
> Hey, Tom
>
>  1) It's independent. You just add -with-oxlm and -with-nplm to the stack
> 2) Yes, they are both thread safe, you can run the decoder with however
> many threads you wish.
> 3) It doesn't create a separate binary. The compilation flag adds a new
> feature inside moses that is called BilingualNPLM and you have to add it to
> your moses.ini with a weight.
> 4) That depends on the vocabulary size used. With 16k source 16k target
> about 100 megabytes. With 50 about 1.5 gigabytes.
>
>  Beware that the memory requirements during decoding are much larger,
> because of premultiplication. If you have memory issues supply
> "premultiply=false" to the BilingualNPLM line in moses.ini, but this is
> likely going to slow down decoding by a lot.
>
>
>  Cheers,
>
>  Nick
>
> On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar <
> tah...@precisiontranslationtools.com> wrote:
>
>>  Thanks Nikolay! This is a great start. I have a few clarification
>> questions.
>>
>> 1) does this replace or run independently of traditional language models
>> like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
>> -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
>> to the stack or are they exclusive?
>>
>> 2) It looks like your branch of nplm is thread-safe. Is oxlm also
>> thread-safe?
>>
>> 3) You say, "To run it in moses as a feature function..." Does that mean
>> compiling with your above option(s) creates a new runtime binary "
>> BilingualNPLM" that replaces the moses binary, much like moseschart and
>> mosesserver? Or, does BilingualNPLM run in a separate process that the
>> Moses binary accesses during runtime?
>>
>> 4) How large do these LM files become? Are they comparable to traditional
>> ARPA files, larger or smaller? Also, are they binarized with mmap reads or
>> do they have to load into RAM?
>>
>> Thanks,
>> Tom
>>
>>
>>
>>
>>
>> On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:
>>
>>  Fix formatting...
>>
>>  Hey,
>>
>>  BilingualLM is implemented and as of last week resides within moses
>> master:
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>>
>>  To compile it you need a NeuralNetwork backend for it. Currently there
>> are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
>> you need to implement the interface as shown here:
>>
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>>
>>  To compile with oxlm backend you need to compile moses with the switch
>> -with-oxlm=/path/to/oxlm
>> To compile with nplm backend you need to compile moses with the switch
>> -with-nplm=/path/to/nplm (You need this fork of nplm
>> https://github.com/rsennrich/nplm
>>
>>  Unfortunately documentaiton is not yet available so here's a short
>> summary how to train a model and use it using, the nplm backend:
>> Use the extract training script to prepare aligned bilingual corpus:
>> https://github.com/moses-s

[Moses-support] Unknown single words that are part of phrases

2014-11-26 Thread Vera Aleksic, Linguatec GmbH
Hi, 

I have observed many times that some words do not exist as single word 
translations in the phrase table, although they exist in the training corpus 
and in multiword phrases.
An example: 
German-English translation for "Gitarre" is unknown, i.e. there is no single 
word entry  for "Gitarre" in the phrase table, although some other phrases 
containing this word exist (see below).
How is it possible? 
Thanks and best regards,
Vera
 

Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 
0.0625119 2.718 ||| ||| 4 1
Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 
2.718 ||| ||| 4 1
Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| 
||| 4 1
Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 
4 1
Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 
4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 
2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 
0.00163632 2.718 ||| ||| 1 1
einer Gitarre darstellt , unter ||| of a guitar using ||| 0.33 6.4754e-07 1 
0.00471281 2.718 ||| ||| 3 1
einer Gitarre darstellt , ||| of a guitar using ||| 0.33 0.000309961 1 
0.00471281 2.718 ||| ||| 3 1
einer Gitarre darstellt ||| of a guitar using ||| 0.33 0.00217827 1 
0.00471281 2.718 ||| ||| 3 1
elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 
||| ||| 1 1
wie eine elektrische Gitarre , ||| as an electric guitar ; ||| 1 0.000177339 1 
0.000809485 2.718 ||| ||| 1 1

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Tom Hoar
Thanks again. It's very useful feedback. We're now preparing to move 
from v1.0 to 3.x. We skipped Moses 2.x. So, I'm not familiar with the 
new moses.ini syntax.


Here are some more questions to help us get started playing with the 
extract_training.py options:


1. I'm assuming corpus.e and corpus.f are the same prepared corpus
   files as used in train-model.perl?
2. Is it possible for corpus.e and corpus.f to be different from the
   train-model.perl corpus, for example a smaller random sampling?
3. The corpus files are tokenized and lower-cased and escaped the same.
4. Do the corpus files also need to enforce clean-corpus-n.perl max
   tokens (100) and ratio (9:1) for src & tgt? These address (M)GIZA++
   limits and might not apply to BilingualLM. However, are there
   advantages to using the limits or disadvantages to overriding them?
   I.e. can these corpus files include lines that are filtered with
   clean-corpus-n.perl?
5. What is the --align value? Is it the output of train-model.perl step
   3 or an file with word alignments for each line of the corpus.e and
   corpus.f pair?
6. Re --prune-source-vocab & --prune-target-vocab, do these thresholds
   set the size of the vocabulary you reference in #4 below (i.e. 16K,
   500K, etc)?
7. Re --source-context & --target-context, are these the BilingualLM
   equivalents to a typical LM's order or ngrams for each?
8. Re --tagged-corpus, is this for POS factored corpora?

Thanks.


On 11/26/2014 09:27 PM, Nikolay Bogoychev wrote:

Hey, Tom

1) It's independent. You just add -with-oxlm and -with-nplm to the stack
2) Yes, they are both thread safe, you can run the decoder with 
however many threads you wish.
3) It doesn't create a separate binary. The compilation flag adds a 
new feature inside moses that is called BilingualNPLM and you have to 
add it to your moses.ini with a weight.
4) That depends on the vocabulary size used. With 16k source 16k 
target about 100 megabytes. With 50 about 1.5 gigabytes.


Beware that the memory requirements during decoding are much larger, 
because of premultiplication. If you have memory issues supply 
"premultiply=false" to the BilingualNPLM line in moses.ini, but this 
is likely going to slow down decoding by a lot.



Cheers,

Nick

On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar 
> wrote:


Thanks Nikolay! This is a great start. I have a few clarification
questions.

1) does this replace or run independently of traditional language
models like KenLM? I.e. when compiling, we can use -with-kenlm,
-with-irstlm, -with-randlm and -with-srilm together. Are
-with-oxlm and -with-nplm added to the stack or are they exclusive?

2) It looks like your branch of nplm is thread-safe. Is oxlm also
thread-safe?

3) You say, "To run it in moses as a feature function..." Does
that mean compiling with your above option(s) creates a new
runtime binary "BilingualNPLM" that replaces the moses binary,
much like moseschart and mosesserver? Or, does BilingualNPLM run
in a separate process that the Moses binary accesses during runtime?

4) How large do these LM files become? Are they comparable to
traditional ARPA files, larger or smaller? Also, are they
binarized with mmap reads or do they have to load into RAM?

Thanks,
Tom





On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:

Fix formatting...

Hey,

BilingualLM is implemented and as of last week resides within
moses master:

https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently
there are two supported: Oxlm and Nplm. Adding a new backend is
relatively easy, you need to implement the interface as shown here:

https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the
switch -with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the
switch -with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm

Unfortunately documentaiton is not yet available so here's a
short summary how to train a model and use it using, the nplm
backend:
Use the extract training script to prepare aligned bilingual
corpus:

https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

You need the following options:

"-e", "--target-language", type="string", dest="target_language")
//Mandatory, for example es
"-f", "--source-language", type="string", dest="source_language")
//Mandatory, for example en
"-c", "--corpus", type="string", dest="corpus_stem") //
path/to/corpus In the directory you have specified there should
be files corpus.sourcelang and corpus.targetlang
"-t", "--tagged-corpus", type="strin

[Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Marcin Junczys-Dowmunt
 

Hi, 

A question concerning METEOR, maybe someone has some experience. I am
seeing huge differences between values for English with the defauly task
"ranking" and any other of the tasks (e.g. "adq"). up to 30-40 points.
Is this normal? In the literature I only ever see marginal differences
of maybe 1 or 2 per cent but nothing like 35% vs. 65%. For the language
independent setting is still get a score of 55%. 

See for instance:
http://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-wmt11.pdf for the
Urdu-English system for much smaller differences between "ranking" and
"adq". I get the same discrepancies with meteor-1.3.jar and
meteor-1.5.jar 

Cheers, 

Marcin 
 ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey, Tom

1) It's independent. You just add -with-oxlm and -with-nplm to the stack
2) Yes, they are both thread safe, you can run the decoder with however
many threads you wish.
3) It doesn't create a separate binary. The compilation flag adds a new
feature inside moses that is called BilingualNPLM and you have to add it to
your moses.ini with a weight.
4) That depends on the vocabulary size used. With 16k source 16k target
about 100 megabytes. With 50 about 1.5 gigabytes.

Beware that the memory requirements during decoding are much larger,
because of premultiplication. If you have memory issues supply
"premultiply=false" to the BilingualNPLM line in moses.ini, but this is
likely going to slow down decoding by a lot.


Cheers,

Nick

On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar <
tah...@precisiontranslationtools.com> wrote:

>  Thanks Nikolay! This is a great start. I have a few clarification
> questions.
>
> 1) does this replace or run independently of traditional language models
> like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
> -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
> to the stack or are they exclusive?
>
> 2) It looks like your branch of nplm is thread-safe. Is oxlm also
> thread-safe?
>
> 3) You say, "To run it in moses as a feature function..." Does that mean
> compiling with your above option(s) creates a new runtime binary "
> BilingualNPLM" that replaces the moses binary, much like moseschart and
> mosesserver? Or, does BilingualNPLM run in a separate process that the
> Moses binary accesses during runtime?
>
> 4) How large do these LM files become? Are they comparable to traditional
> ARPA files, larger or smaller? Also, are they binarized with mmap reads or
> do they have to load into RAM?
>
> Thanks,
> Tom
>
>
>
>
>
> On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:
>
>  Fix formatting...
>
>  Hey,
>
>  BilingualLM is implemented and as of last week resides within moses
> master:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>
>  To compile it you need a NeuralNetwork backend for it. Currently there
> are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
> you need to implement the interface as shown here:
>
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>
>  To compile with oxlm backend you need to compile moses with the switch
> -with-oxlm=/path/to/oxlm
> To compile with nplm backend you need to compile moses with the switch
> -with-nplm=/path/to/nplm (You need this fork of nplm
> https://github.com/rsennrich/nplm
>
>  Unfortunately documentaiton is not yet available so here's a short
> summary how to train a model and use it using, the nplm backend:
> Use the extract training script to prepare aligned bilingual corpus:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
>
>  You need the following options:
>
>  "-e", "--target-language", type="string", dest="target_language")
> //Mandatory, for example es "-f", "--source-language", type="string",
> dest="source_language") //Mandatory, for example en "-c", "--corpus",
> type="string", dest="corpus_stem") // path/to/corpus In the directory you
> have specified there should be files corpus.sourcelang and
> corpus.targetlang "-t", "--tagged-corpus", type="string",
> dest="tagged_stem") //Optional for backoff to pos tag "-a", "--align",
> type="string", dest="align_file") //Mandatory alignemtn file "-w",
> "--working-dir", type="string", dest="working_dir") //Output directory of
> the model "-n", "--target-context", type="int", dest="n") / "-m",
> "--source-context", type="int", dest="m") //The actual context size is 2*m
> + 1, this is the number of words on both left and right "-s",
> "--prune-source-vocab", type="int", dest="sprune") //cutoff vocabulary
> threshold "-p", "--prune-target-vocab", type="int", dest="tprune") //cutoff
> vocabulary threshold
>
>  Then, use the training script to train the model:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py
>
> Example execution is:
>
>  train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16 -d
> 0 --nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
> corpus.1.word -i 750 -o 750
>
> where -i and -o are input and output embeddings
>  -n is the total ngram size
>  -d is the number of hidden layyers
>  -w and -c are the same as the extract_training options
>  -r is the output directory of the model
>
> Consult the python script for more detailed description of the options
>
> After you have done that in the output directory you should have a trained
> bilingual Neural Network language model
>
> To run it in moses as a feature function you need the following line:
>
> BilingualNPLM 
> filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
> target_ngrams=4 source_ngrams=9 source_vocab=/mnt/g

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Tom Hoar
Thanks Nikolay! This is a great start. I have a few clarification 
questions.


1) does this replace or run independently of traditional language models 
like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm, 
-with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm 
added to the stack or are they exclusive?


2) It looks like your branch of nplm is thread-safe. Is oxlm also 
thread-safe?


3) You say, "To run it in moses as a feature function..." Does that mean 
compiling with your above option(s) creates a new runtime binary 
"BilingualNPLM" that replaces the moses binary, much like moseschart and 
mosesserver? Or, does BilingualNPLM run in a separate process that the 
Moses binary accesses during runtime?


4) How large do these LM files become? Are they comparable to 
traditional ARPA files, larger or smaller? Also, are they binarized with 
mmap reads or do they have to load into RAM?


Thanks,
Tom




On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:

Fix formatting...

Hey,

BilingualLM is implemented and as of last week resides within moses 
master: 
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp


To compile it you need a NeuralNetwork backend for it. Currently there 
are two supported: Oxlm and Nplm. Adding a new backend is relatively 
easy, you need to implement the interface as shown here:

https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the switch 
-with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the switch 
-with-nplm=/path/to/nplm (You need this fork of nplm 
https://github.com/rsennrich/nplm


Unfortunately documentaiton is not yet available so here's a short 
summary how to train a model and use it using, the nplm backend:
Use the extract training script to prepare aligned bilingual corpus: 
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py


You need the following options:

"-e", "--target-language", type="string", dest="target_language") 
//Mandatory, for example es "-f", "--source-language", type="string", 
dest="source_language") //Mandatory, for example en "-c", "--corpus", 
type="string", dest="corpus_stem") // path/to/corpus In the directory 
you have specified there should be files corpus.sourcelang and 
corpus.targetlang "-t", "--tagged-corpus", type="string", 
dest="tagged_stem") //Optional for backoff to pos tag "-a", "--align", 
type="string", dest="align_file") //Mandatory alignemtn file "-w", 
"--working-dir", type="string", dest="working_dir") //Output directory 
of the model "-n", "--target-context", type="int", dest="n") / "-m", 
"--source-context", type="int", dest="m") //The actual context size is 
2*m + 1, this is the number of words on both left and right "-s", 
"--prune-source-vocab", type="int", dest="sprune") //cutoff vocabulary 
threshold "-p", "--prune-target-vocab", type="int", dest="tprune") 
//cutoff vocabulary threshold


Then, use the training script to train the model: 
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py


Example execution is:

train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16 
-d 0 --nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c 
corpus.1.word -i 750 -o 750


where -i and -o are input and output embeddings
 -n is the total ngram size
 -d is the number of hidden layyers
 -w and -c are the same as the extract_training options
 -r is the output directory of the model

Consult the python script for more detailed description of the options

After you have done that in the output directory you should have a 
trained bilingual Neural Network language model


To run it in moses as a feature function you need the following line:

BilingualNPLM 
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10 
target_ngrams=4 source_ngrams=9 
source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source 
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe


The source and target vocab is located in the working directory used 
to prepare the neural network language model.
target_ngrams doesn't include the predicted word (so target_ngrams = 
4, would mean 1 word predicted and 4 target context word)

The total of the model would target_ngrams + source_ngrams + 1)

I will write a proper documentation  in the following weeks. If you 
have any problems runnning it, please consult me.


Cheers,

Nick


On Wed, Nov 26, 2014 at 1:02 PM, Nikolay Bogoychev > wrote:


Hey,

BilingualLM is implemented and as of last week resides within
moses master:

https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently
there are two supported: Oxlm and Nplm. Adding

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Fix formatting...

Hey,

BilingualLM is implemented and as of last week resides within moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently there are
two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
need to implement the interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the switch
-with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the switch
-with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm

Unfortunately documentaiton is not yet available so here's a short summary
how to train a model and use it using, the nplm backend:
Use the extract training script to prepare aligned bilingual corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

You need the following options:

"-e", "--target-language", type="string", dest="target_language")
//Mandatory, for example es "-f", "--source-language", type="string",
dest="source_language") //Mandatory, for example en "-c", "--corpus",
type="string", dest="corpus_stem") // path/to/corpus In the directory you
have specified there should be files corpus.sourcelang and
corpus.targetlang "-t", "--tagged-corpus", type="string",
dest="tagged_stem") //Optional for backoff to pos tag "-a", "--align",
type="string", dest="align_file") //Mandatory alignemtn file "-w",
"--working-dir", type="string", dest="working_dir") //Output directory of
the model "-n", "--target-context", type="int", dest="n") / "-m",
"--source-context", type="int", dest="m") //The actual context size is 2*m
+ 1, this is the number of words on both left and right "-s",
"--prune-source-vocab", type="int", dest="sprune") //cutoff vocabulary
threshold "-p", "--prune-target-vocab", type="int", dest="tprune") //cutoff
vocabulary threshold

Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py

Example execution is:

train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c corpus.1.word
-i 750 -o 750

where -i and -o are input and output embeddings
 -n is the total ngram size
 -d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model

Consult the python script for more detailed description of the options

After you have done that in the output directory you should have a trained
bilingual Neural Network language model

To run it in moses as a feature function you need the following line:

BilingualNPLM 
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9 source_vocab=/mnt/gna0/
nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe

The source and target vocab is located in the working directory used to
prepare the neural network language model.
target_ngrams doesn't include the predicted word (so target_ngrams = 4,
would mean 1 word predicted and 4 target context word)
The total of the model would target_ngrams + source_ngrams + 1)

I will write a proper documentation  in the following weeks. If you have
any problems runnning it, please consult me.

Cheers,

Nick


On Wed, Nov 26, 2014 at 1:02 PM, Nikolay Bogoychev  wrote:

> Hey,
>
> BilingualLM is implemented and as of last week resides within moses
> master:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>
> To compile it you need a NeuralNetwork backend for it. Currently there are
> two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
> need to implement the interface as shown here:
>
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>
> To compile with oxlm backend you need to compile moses with the switch
> -with-oxlm=/path/to/oxlm
> To compile with nplm backend you need to compile moses with the switch
> -with-nplm=/path/to/nplm (You need this fork of nplm
> https://github.com/rsennrich/nplm
>
> Unfortunately documentaiton is not yet available so here's a short summary
> how to train a model and use it using, the nplm backend:
> Use the extract training script to prepare aligned bilingual corpus:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
>
> You need the following options:
>
> "-e", "--target-language", type="string", dest="target_language")
> //Mandatory, for example es "-f", "--source-language", type="string",
> dest="source_language") //Mandatory, for example en "-c", "--corpus",
> type="string", dest="corpus_stem") // path/to/corpus In the director

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Nikolay Bogoychev
Hey,

BilingualLM is implemented and as of last week resides within moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp

To compile it you need a NeuralNetwork backend for it. Currently there are
two supported: Oxlm and Nplm. Adding a new backend is relatively easy, you
need to implement the interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h

To compile with oxlm backend you need to compile moses with the switch
-with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the switch
-with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm

Unfortunately documentaiton is not yet available so here's a short summary
how to train a model and use it using, the nplm backend:
Use the extract training script to prepare aligned bilingual corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py

You need the following options:

"-e", "--target-language", type="string", dest="target_language")
//Mandatory, for example es "-f", "--source-language", type="string",
dest="source_language") //Mandatory, for example en "-c", "--corpus",
type="string", dest="corpus_stem") // path/to/corpus In the directory you
have specified there should be files corpus.sourcelang and
corpus.targetlang "-t", "--tagged-corpus", type="string",
dest="tagged_stem") //Optional for backoff to pos tag "-a", "--align",
type="string", dest="align_file") //Mandatory alignemtn file "-w",
"--working-dir", type="string", dest="working_dir") //Output directory of
the model "-n", "--target-context", type="int", dest="n") / "-m",
"--source-context", type="int", dest="m") //The actual context size is 2*m
+ 1, this is the number of words on both left and right "-s",
"--prune-source-vocab", type="int", dest="sprune") //cutoff vocabulary
threshold "-p", "--prune-target-vocab", type="int", dest="tprune") //cutoff
vocabulary threshold
Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py

Example execution is: train_nplm.py -w de-en-500250source/ -r
de-en150nopos-source750 -n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c corpus.1.word
-i 750 -o 750

where -i and -o are input and output embeddings
 -n is the total ngram size
 -d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model

Consult the python script for more detailed description of the options

After you have done that in the output directory you should have a trained
bilingual Neural Network language model

To run it in moses as a feature function you need the following line:

BilingualNPLM
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9
source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe

The source and target vocab is located in the working directory used to
prepare the neural network language model.
target_ngrams doesn't include the predicted word (so target_ngrams = 4,
would mean 1 word predicted and 4 target context word)
The total of the model would target_ngrams + source_ngrams + 1)

I will write a proper documentation  in the following weeks. If you have
any problems runnning it, please consult me.

Cheers,

Nick




On Wed, Nov 26, 2014 at 11:53 AM, Tom Hoar <
tah...@precisiontranslationtools.com> wrote:

>  Hieu,
>
> Sorry I missed you in Vancouver. I just reviewed your slide deck from the
> MosesCore TAUS Round Table in Vancouver
> (taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh).
>
>
> In particular, I'm interested in the "Bilingual Language Models" that
> "replicate Delvin et al, 2014". A search on statmt.org/moses doesn't show
> any hits searching for "delvin". So, A) is the code finished? If so B) are
> there any instructions how to enable/use this feature? If not, C) what kind
> of help do you need to test the code for release?
>
> --
>
> Best regards,
> Tom Hoar
> Managing Director
> *Precision Translation Tools Co., Ltd.*
> Bangkok, Thailand
> Web: www.precisiontranslationtools.com
> Mobile: +66 87 345-1875
> Skype: tahoar
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] how to test whether tcmalloc is used?

2014-11-26 Thread Barry Haddow
How about:

nm -C moses | grep tcmalloc

On 26/11/14 11:34, Hieu Hoang wrote:
> Best to do what Rico says, but
>
> If the tcmalloc library is dynamically linked to Moses, running ldd 
> will show it is linked into moses:
>   #ldd bin/moses
>   .
>   libtcmalloc_minimal.so.4 => /usr/lib/libtcmalloc_minimal.so.4 
> (0x7ff49f5a2000)
>   ...
> You can force it to statically link by deleting
>rm /usr/lib/libtcmalloc*.a
>
> On 26 November 2014 at 10:50, Rico Sennrich  > wrote:
>
> Li Xiang  writes:
>
> >
> > I compile Moses with tcmalloc. How can I test whether tcmalloc
> is used and
> evaluate the performance ?
> >
>
> there's probably many ways, but here's three:
>
> at compile time, you will see the following message if tcmalloc is
> not enabled:
>
> "Tip: install tcmalloc for faster threading.  See
> BUILD-INSTRUCTIONS.txt for
> more information."
>
> you can also use '--without-tcmalloc' to disable tcmalloc and
> compare speed
> to a binary that is compiled with tcmalloc.
>
> If you use profiling tools (such as 'perf'), you can see which
> malloc is
> being called. 'perf top' shows me this line, among others:
>
>   1.75%  moses_chart  moses[.]
> 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> -- 
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Delvin et al 2014

2014-11-26 Thread Tom Hoar

Hieu,

Sorry I missed you in Vancouver. I just reviewed your slide deck from 
the MosesCore TAUS Round Table in Vancouver 
(taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh). 



In particular, I'm interested in the "Bilingual Language Models" that 
"replicate Delvin et al, 2014". A search on statmt.org/moses doesn't 
show any hits searching for "delvin". So, A) is the code finished? If so 
B) are there any instructions how to enable/use this feature? If not, C) 
what kind of help do you need to test the code for release?


--

Best regards,
Tom Hoar
Managing Director
*Precision Translation Tools Co., Ltd.*
Bangkok, Thailand
Web: www.precisiontranslationtools.com 


Mobile: +66 87 345-1875
Skype: tahoar
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] how to test whether tcmalloc is used?

2014-11-26 Thread Hieu Hoang
Best to do what Rico says, but

If the tcmalloc library is dynamically linked to Moses, running ldd will
show it is linked into moses:
  #ldd bin/moses
  .
  libtcmalloc_minimal.so.4 => /usr/lib/libtcmalloc_minimal.so.4
(0x7ff49f5a2000)
  ...
You can force it to statically link by deleting
   rm /usr/lib/libtcmalloc*.a

On 26 November 2014 at 10:50, Rico Sennrich  wrote:

> Li Xiang  writes:
>
> >
> > I compile Moses with tcmalloc. How can I test whether tcmalloc is used
> and
> evaluate the performance ?
> >
>
> there's probably many ways, but here's three:
>
> at compile time, you will see the following message if tcmalloc is not
> enabled:
>
> "Tip: install tcmalloc for faster threading.  See BUILD-INSTRUCTIONS.txt
> for
> more information."
>
> you can also use '--without-tcmalloc' to disable tcmalloc and compare speed
> to a binary that is compiled with tcmalloc.
>
> If you use profiling tools (such as 'perf'), you can see which malloc is
> being called. 'perf top' shows me this line, among others:
>
>   1.75%  moses_chart  moses[.]
>
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Format of binarized phrase tables

2014-11-26 Thread Raj Dabre
Hello Marcin,

Yes please.
It would save me lots of time.
Thanks.

Regards.

On Wed, Nov 26, 2014 at 6:50 PM, Marcin Junczys-Dowmunt 
wrote:

>  Hi,
>
> I have a JNI interface to my compact phrase table somewhere, I guess I can
> put that in contrib within a day or two if there is interest.
>
> best,
>
> Marcin
>
> W dniu 2014-11-26 10:45, Barry Haddow napisał(a):
>
> Hi Raj
>
> The format of these tables is not described anywhere. You'd have to read
> the code in moses/TranslationModel/PhraseDictionaryTree.cpp, and then
> try to convert it it Java.
>
> A better plan would be to use JNI to call the C++ code -- a similar
> approach has been followed in the python interface in contrib/python.
> This would insulate you from the low-level details, and from changes in
> the format,
>
> cheers - Barry
>
> On 26/11/14 03:22, Raj Dabre wrote:
>
> Hello All, I know that Moses allows for binarization of a phrase table
> which can be read on demand at decoding time. We get 5 files named:
> phrase-table.binphr.* I want to write my own routine in Java to read phrase
> pairs from these on demand. Can anyone guide me ? PS: If an explanation of
> the same for binary reordering tables can be done then it would be great
> too. Thanks in advance. -- Raj Dabre. Research Student, Graduate School
> of Informatics, Kyoto University. CSE MTech, IITB., 2011-2014
> ___ Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] how to test whether tcmalloc is used?

2014-11-26 Thread Rico Sennrich
Li Xiang  writes:

> 
> I compile Moses with tcmalloc. How can I test whether tcmalloc is used and
evaluate the performance ?
> 

there's probably many ways, but here's three:

at compile time, you will see the following message if tcmalloc is not enabled:

"Tip: install tcmalloc for faster threading.  See BUILD-INSTRUCTIONS.txt for
more information."

you can also use '--without-tcmalloc' to disable tcmalloc and compare speed
to a binary that is compiled with tcmalloc.

If you use profiling tools (such as 'perf'), you can see which malloc is
being called. 'perf top' shows me this line, among others:

  1.75%  moses_chart  moses[.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned long, int

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Format of binarized phrase tables

2014-11-26 Thread Marcin Junczys-Dowmunt
 

Hi, 

I have a JNI interface to my compact phrase table somewhere, I guess I
can put that in contrib within a day or two if there is interest. 

best, 

Marcin 

W dniu 2014-11-26 10:45, Barry Haddow napisał(a): 

> Hi Raj
> 
> The format of these tables is not described anywhere. You'd have to read 
> the code in moses/TranslationModel/PhraseDictionaryTree.cpp, and then 
> try to convert it it Java.
> 
> A better plan would be to use JNI to call the C++ code -- a similar 
> approach has been followed in the python interface in contrib/python. 
> This would insulate you from the low-level details, and from changes in 
> the format,
> 
> cheers - Barry
> 
> On 26/11/14 03:22, Raj Dabre wrote:
> 
>> Hello All, I know that Moses allows for binarization of a phrase table which 
>> can be read on demand at decoding time. We get 5 files named: 
>> phrase-table.binphr.* I want to write my own routine in Java to read phrase 
>> pairs from these on demand. Can anyone guide me ? PS: If an explanation of 
>> the same for binary reordering tables can be done then it would be great 
>> too. Thanks in advance. -- Raj Dabre. Research Student, Graduate School of 
>> Informatics, Kyoto University. CSE MTech, IITB., 2011-2014 
>> ___ Moses-support mailing list 
>> Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support 
>> [1]

 

Links:
--
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Format of binarized phrase tables

2014-11-26 Thread Barry Haddow
Hi Raj

The format of these tables is not described anywhere. You'd have to read 
the code in moses/TranslationModel/PhraseDictionaryTree.cpp, and then 
try to convert it it Java.

A better plan would be to use JNI to call the C++ code -- a similar 
approach has been followed in the python interface in contrib/python. 
This would insulate you from the low-level details, and from changes in 
the format,

cheers - Barry

On 26/11/14 03:22, Raj Dabre wrote:
> Hello All,
>
> I know that Moses allows for binarization of a phrase table which can 
> be read on demand at decoding time.
> We get 5 files named: phrase-table.binphr.*
> I want to write my own routine in Java to read phrase pairs from these 
> on demand.
> Can anyone guide me ?
>
> PS: If an explanation of the same for binary reordering tables can be 
> done then it would be great too.
>
> Thanks in advance.
>
> -- 
> Raj Dabre.
> Research Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support