Re: [Moses-support] Question about discontinuous orientation types

2012-08-14 Thread Daniel Schaut
Hi,

 

I found the answer in the paper of Galley and Manning. I've missed some
important parts of their paper which explain my questions. Sorry for
spamming.

 

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Daniel Schaut
Gesendet: 03 August 2012 16:50
An: moses-support@mit.edu
Betreff: [Moses-support] Question about discontinuous orientation types

 

Hi all,

What are the differences between discontinuous, discontinuous right and
discontinuous left orientation in lexicalized RMs? I'm a bit lost after
hours of skimming through papers. Discontinuous orientation occurs if
neither an alignment point to top left nor to the top right in an alignment
matrix exists - neither monotone nor swap. That's clear, but when are
discontinuous right and left detected?

Thanks,

Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to use queryLexicalTable

2012-08-09 Thread Daniel Schaut
Ah, ok thanks. Confusing name though. What is queryLexicalTable used for
instead then?

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Hieu Hoang
Gesendet: 09 August 2012 13:01
An: moses-support@mit.edu
Betreff: Re: [Moses-support] How to use queryLexicalTable

 

there's no tool to do this, as far as I know.

You can adapt the queryPhraseTable program to do it. The source code for the
binary lexical table was taken from the phrase table.

On 07/08/2012 12:05, Daniel Schaut wrote:

Hi all,

I'd like to look up some entries in my reordering models. Does anyone know
how to use queryLexicalTable for that? Calling

./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e
English phrase -c context of phrase

does not work for me. Any ideas?

Regards,

Daniel






___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to use queryLexicalTable

2012-08-09 Thread Daniel Schaut
Don't bother. I've just seen that it was coded by Konrad Rawlik from IPAB.
I'm gonna send him a mail. Might be more convenient for you.

 

Von: Hieu Hoang [mailto:fishandfrol...@gmail.com] 
Gesendet: 09 August 2012 16:11
An: Daniel Schaut
Cc: moses-support@mit.edu
Betreff: Re: AW: [Moses-support] How to use queryLexicalTable

 

oh sorry, forget what i said in the last email. I didn't know it was there.

I don't know how it works but just poke through the code.

On 09/08/2012 14:58, Daniel Schaut wrote:

Ah, ok thanks. Confusing name though. What is queryLexicalTable used for
instead then?

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Hieu Hoang
Gesendet: 09 August 2012 13:01
An: moses-support@mit.edu
Betreff: Re: [Moses-support] How to use queryLexicalTable

 

there's no tool to do this, as far as I know.

You can adapt the queryPhraseTable program to do it. The source code for the
binary lexical table was taken from the phrase table.

On 07/08/2012 12:05, Daniel Schaut wrote:

Hi all,

I'd like to look up some entries in my reordering models. Does anyone know
how to use queryLexicalTable for that? Calling

./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e
English phrase -c context of phrase

does not work for me. Any ideas?

Regards,

Daniel







___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

 

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] How to use queryLexicalTable

2012-08-07 Thread Daniel Schaut
Hi all,

I'd like to look up some entries in my reordering models. Does anyone know
how to use queryLexicalTable for that? Calling

./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e
English phrase -c context of phrase

does not work for me. Any ideas?

Regards,
Daniel
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Question about discontinuous orientation types

2012-08-03 Thread Daniel Schaut
Hi all,

What are the differences between discontinuous, discontinuous right and
discontinuous left orientation in lexicalized RMs? I'm a bit lost after
hours of skimming through papers. Discontinuous orientation occurs if
neither an alignment point to top left nor to the top right in an alignment
matrix exists - neither monotone nor swap. That's clear, but when are
discontinuous right and left detected?

Thanks,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Ems: interpolating LM using IrstLM

2012-08-02 Thread Daniel Schaut
Hi Mauro,

 

IRSTLM provides a special tool for that. Here you can find more information
about how to interpolate LMs using IRSTLM

http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=LM_interpolatio
n

 

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Philipp Koehn
Gesendet: 02 August 2012 00:35
An: Mauro Zanotti
Cc: moses-support@mit.edu
Betreff: Re: [Moses-support] Ems: interpolating LM using IrstLM

 

Hi,

yes, the current implementation relies on SRILM.
But maybe someone from IRST can explain how
to interpolate their models.

-phi

On Wed, Aug 1, 2012 at 3:37 PM, Mauro Zanotti mau.zano...@gmail.com wrote:

Dear all,

 

I trained 2 LM in EMS module, how can I interpolate them using irstlm
instead of srilm? interpolate-lm.perl works only with srilm?

 

Thank you in advance

Mauro


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] No moses executable

2012-08-02 Thread Daniel Schaut
Hi Patrick,

I had the same problem about six months before:
http://www.mail-archive.com/moses-support@mit.edu/msg05119.html

Unfortunately I can't remember how I fixed it, but the thread points to the
problem. Could you please provide your bjam command?

Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Patrick Bessler
Gesendet: 02 August 2012 18:30
An: moses-support@mit.edu
Betreff: [Moses-support] No moses executable

Hi there,

I am working with an Ubuntu 12.04 32bit machine. I have installed 
GIZA++, IRSTLM and SRILM. Now, I cloned the mosesdecoder from github.
That went very well. I executed the bjam script and pointed to giza, irst
and sri.
bjam finished but I didn't have any dist or bin folder, there was also no
moses or moses chart executable.

What kind of information can I additionally provide that you maybe could
help me.

cheers,
Patrick
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholder drift

2012-07-31 Thread Daniel Schaut
Hi,

This placeholder salad may occur very rarely, if there are placeholders in
your training and tuning sets as well as in the language model. Some time
ago I experienced almost the same issue, however occurring only with the
chart decoder. You can try playing around with the -dl option. Also, you can
try m4loc as already suggested by Tomáš if the data is in TMX or XLIFF
format. Then your test set may look like this

{1}processor{2}

If there are no placeholders in your sets unknown words may cause some
strange reordering, although they are copied verbatim (see
http://www.mail-archive.com/moses-support@mit.edu/msg02717.html).

What kind of reordering model are you using?

Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von John D Burger
Gesendet: 31 July 2012 16:09
An: Henry Hu
Cc: moses-support@mit.edu
Betreff: Re: [Moses-support] Placeholder drift

Are there any such placeholders in your language modeling data and your
parallel training data?  If not, all the models are going to treat them as
unknown words.  In the case of the language model, it doesn't surprise me
too much that the placeholders all get pushed together, as that will produce
fewer discontiguous subsequences, which the language model will prefer.

- John Burger
  MITRE  

On Jul 31, 2012, at 03:05 , Henry Hu wrote:

 Hi,
 
 I use a model to translate English to French. First, I replaced HTML 
 tags such as a, b, with the placeholder {}, like this:
 
 {}Processor{}
 
 Then decoding. To my confusion, I got the result:
 
 {}{} processeur
 
 instead of {}processeur{}. Why did the placeholder move? How can I 
 make it fixed? Thanks for any suggestion.
 
 Henry
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholder drift

2012-07-31 Thread Daniel Schaut
Tom, that's a good point. Henry, you can also check your phrase table with
queryPhraseTable to track back the entry that may cause the issue.

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Tom Hoar
Gesendet: 31 July 2012 16:58
An: moses-support@mit.edu
Betreff: Re: [Moses-support] Placeholder drift

 John, this is true if there were three tokens, but {}Processor{} has no
spaces. Assuming that the target language should be {}processeur{}  without
spaces in both the parallel and LM data, the tables and the  language model
will treat it as one token and not break break it up.

 Henry, I suspect your corpus preparation inserts spaces between to  create
{} Processor {} (3 tokens). John's description is much more  viable if this
is the case.

 One oddity is the output {}{} token because it's one token, not two. 
 Moses won't remove the space to splice the two. It would seem your  target
data contains this as a token from somewhere in the tables or LM.

 I suggest you double-check your tokenization and other preparation to
ensure source and target are still one token when you start training.

 Tom


 On Tue, 31 Jul 2012 10:08:43 -0400, John D Burger j...@mitre.org
 wrote:
 Are there any such placeholders in your language modeling data and 
 your parallel training data?  If not, all the models are going to 
 treat them as unknown words.  In the case of the language model, it 
 doesn't surprise me too much that the placeholders all get pushed 
 together, as that will produce fewer discontiguous subsequences, which 
 the language model will prefer.

 - John Burger
   MITRE

 On Jul 31, 2012, at 03:05 , Henry Hu wrote:

 Hi,

 I use a model to translate English to French. First, I replaced HTML 
 tags such as a, b, with the placeholder {}, like this:

 {}Processor{}

 Then decoding. To my confusion, I got the result:

 {}{} processeur

 instead of {}processeur{}. Why did the placeholder move? How can I 
 make it fixed? Thanks for any suggestion.

 Henry
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholder drift

2012-07-31 Thread Daniel Schaut
Well, Henry may clarify if it is intended to be a single token or not. But I
agree that it wouldn't make much sense to translate a
placeholder-text-placeholder sequence if represented as one single token (or
at least I can't imagine why), while for other sequences such as dates or
currencies it would make sense.

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von John D Burger
Gesendet: 31 July 2012 17:25
An: Moses-support
Betreff: Re: [Moses-support] Placeholder drift

I'm a little confused.  If the intent is for the
placeholder-text-placeholder sequence to be interpreted as a single token,
why would it be translated at all?  Isn't it likely to be seen as an unknown
word, as Daniel suggests (unless of course that exact same sequence occurs
in both the parallel and language modeling data).

Sorry if I'm coming in late, and everybody already understands this.

- John Burger
  MITRE

On Jul 31, 2012, at 11:20 , Daniel Schaut wrote:

 Hi,
 
 This placeholder salad may occur very rarely, if there are 
 placeholders in your training and tuning sets as well as in the 
 language model. Some time ago I experienced almost the same issue, 
 however occurring only with the chart decoder. You can try playing 
 around with the -dl option. Also, you can try m4loc as already 
 suggested by Tomáš if the data is in TMX or XLIFF format. Then your 
 test set may look like this
 
 {1}processor{2}
 
 If there are no placeholders in your sets unknown words may cause some 
 strange reordering, although they are copied verbatim (see 
 http://www.mail-archive.com/moses-support@mit.edu/msg02717.html).
 
 What kind of reordering model are you using?
 
 Daniel
 
 -Ursprüngliche Nachricht-
 Von: moses-support-boun...@mit.edu 
 [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger
 Gesendet: 31 July 2012 16:09
 An: Henry Hu
 Cc: moses-support@mit.edu
 Betreff: Re: [Moses-support] Placeholder drift
 
 Are there any such placeholders in your language modeling data and 
 your parallel training data?  If not, all the models are going to 
 treat them as unknown words.  In the case of the language model, it 
 doesn't surprise me too much that the placeholders all get pushed 
 together, as that will produce fewer discontiguous subsequences, which the
language model will prefer.
 
 - John Burger
  MITRE
 
 On Jul 31, 2012, at 03:05 , Henry Hu wrote:
 
 Hi,
 
 I use a model to translate English to French. First, I replaced HTML 
 tags such as a, b, with the placeholder {}, like this:
 
 {}Processor{}
 
 Then decoding. To my confusion, I got the result:
 
 {}{} processeur
 
 instead of {}processeur{}. Why did the placeholder move? How can I 
 make it fixed? Thanks for any suggestion.
 
 Henry
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-02 Thread Daniel Schaut
Hi Henry,

can also try to exclude the placeholders from the tokenization process so
that your example would look like this:

buy {70} and enjoy unlimited Trainings sessions

This worked pretty well for me. This means however that you might need to
train new models.

Best,
Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Henry Hu
Gesendet: 02 July 2012 11:40
An: moses-support@mit.edu
Betreff: [Moses-support] Placeholders missed

Hi guys,

I'm attempting to translate English to French. First I replaced some tags
with placeholders {70}. Next, decoding. Finally, restoring tags.
Most placeholders {70} maintained the same in the process of decoding, like
this:

English: buy { 70 } and enjoy unlimited Trainings sessions .
French:  acheter { 70 } et amusez-vous illimitée formations sessions .

However, some placeholders are incomplete, like this( missed { ):

English: acheter { 70 } et amusez-vous illimitée formations sessions .
French:  illimitée des réunions , chaque avec jusqu' à 70 } les participants

I guess I should use other placeholders. But what placeholders can be
options? Thanks for any suggestion.

Best regards,
Henry

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem occurred when build language model

2012-06-28 Thread Daniel Schaut
Hi,

 

Did you set the variable to IRSTLM to /path/where/to/install and PATH to the 
directory /path/where/to/install/bin?

Have you already tried using tlm for LM training and building instead?

 

Deleting the temp folder sometimes helps, too. You might also ask 
https://list.fbk.eu/sympa/info/user-irstlm for more help.

 

Regards,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im 
Auftrag von Randil Pushpananda
Gesendet: 28 June 2012 18:57
An: moses-support@mit.edu
Betreff: [Moses-support] Problem occurred when build language model

 

Hi,

When I try to build the language model I found the following error. It says 
permission denied. I tried to do the same using root. The result is same. Could 
you please tell me what is the reason for this?


/home/randil/smt/irstlm/bin/build-lm.sh -t /tmp -i 
work/lm/news-commentary.lowercased.en -o work/lm/news-commentary.en1.lm
Cleaning temporary directory /tmp

Extracting dictionary from training corpus
Splitting dictionary into 3 lists
Extracting n-gram statistics for each word list
Important: dictionary must be ordered according to order of appearance of words 
in data
used to generate n-gram blocks,  so that sub language model blocks results 
ordered too
dict.000
dict.001
dict.002
Estimating language models for each word list
dict.000
Collecting 1-gram counts
sh: /bin: Permission denied
dict.001
Collecting 1-gram counts
sh: /bin: Permission denied
dict.002
Collecting 1-gram counts
sh: /bin: Permission denied
Merging language models into work/lm/news-commentary.en1.lm
Cleaning temporary directory /tmp
Removing temporary directory /tmp

Thanks

Best Regards,
Randil

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to remove untranslated words

2012-05-23 Thread Daniel Schaut
Hi,

 

try to run the decoder with -du flag. Hence, the decoder will drop unknown
words.

 

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Abdollah Hakim
Gesendet: 23 May 2012 20:31
An: moses-support@mit.edu
Betreff: [Moses-support] How to remove untranslated words

 

Hi all,

 

Sorry for my simple question. I built an Arabic-English system based on
moses, and trying to translate new sentences, I see that moses leaves some
words and phrases untranslated. But I want it to remove untranslated words
from the output string. How can I tell moses to remove such words and
phrases during decoding?

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] A simple question about the phrase Table

2012-05-18 Thread Daniel Schaut
Hi,

 

For instance, have a look at

http://au.answers.yahoo.com/question/index?qid=20090318042359AAeQNkm

 

This might answer your question.

 

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Info Ic
Gesendet: 18 May 2012 16:23
An: Moses Support
Betreff: [Moses-support] A simple question about the phrase Table

 

Hi everyone , 

 

some lines in my phrase table contain some values like e-07 and e-05 , what
does it mean ??

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] tuning set

2012-05-14 Thread Daniel Schaut
Hi,

 

you might wanne have a look at the glossary

http://www.statmt.org/moses/glossary/SMT_glossary.html#tuning%20process

or for more detailed information

http://www.statmt.org/moses/?n=FactoredTraining.Tuning

 

Best,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von tharaka weheragoda
Gesendet: 14 May 2012 20:07
An: moses-support@mit.edu
Betreff: [Moses-support] tuning set

 

Hi,
i'm new to this field and i'm confused about the use of tuning set? Actually
waht's the purpose of using a tuning set here?

Thanks in advance

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] A Question About Phrase Table Format

2012-05-14 Thread Daniel Schaut
Hi,

 

I'll try to answer some of your questions.

 

1.

Regarding scores you might wanna try
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases which explains
how the scores are made up.

The alignment is explained here
http://www.statmt.org/moses/?n=FactoredTraining.AlignWords or see the
background section http://www.statmt.org/moses/?n=Moses.Background for more
information.

You can also try to search the user archives:

http://www.mail-archive.com/moses-support@mit.edu/info.html

 

3. That's ok. The alignment information is probably missing because you
might missed to include it for training. You might wanna train a model that
includes such information for better evaluation. A good starting point about
that can be found here:

http://www.mail-archive.com/moses-support@mit.edu/msg03656.html

 

Best,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Info Ic
Gesendet: 14 May 2012 14:21
An: Moses Support
Betreff: [Moses-support] A Question About Phrase Table Format

 

Hello everyone , 
1- I would like to ask you about the phrase table and all these values  . I
tried to google it and I found this :

phrase table line --- Source|| target || scores || alignement || counts 

but I don't understand what means scores,alignement and counts, what
is the difference between these values .

2- If I want to know the probability assigned to a couple of words p(T/S),
should I look for it in the phrase table generated in the training phase or
the one modified by Mert The filtered one (sinceMert is supposed to adjust
the scores).

3- while reading my phrase table I noticed that the values of the ||
alignement || are missing , is that OK ?? 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to query the rule table of a tree-based model

2012-05-09 Thread Daniel Schaut
Well, I wanted to use queryPhraseTable to look up entries in both tables
(rule and phrase table) for the evaluation of selected phrases.

 

Your best bet is to rewrite queryPhraseTable for the tree-based model. I
think it would be easy  i can help you.

We can try but be warned: My programming skills equal to 0.

 

Let me know.

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Hieu Hoang
Gesendet: 08 May 2012 23:15
An: moses-support@mit.edu
Betreff: Re: [Moses-support] How to query the rule table of a tree-based
model

 

queryPhraseTable won't work with the tree-based on-disk rule table.

the implementations are similar, but not the same. Since its all about bits
 bytes on disk  memory it's very difficult to make them compatible. Your
best bet is to rewrite queryPhraseTable for the tree-based model. I think it
would be easy  i can help you.

i'm also curious why there is a need for it. are you trying to reverse a
binary file?

On 08/05/2012 19:02, Daniel Schaut wrote: 

Hi all,

Quick question: How do you guys query a rule table of a tree-based model?
queryPhraseTable seems not to work on my side here.

Thanks and best,

Daniel






___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] How to query the rule table of a tree-based model

2012-05-08 Thread Daniel Schaut
Hi all,

Quick question: How do you guys query a rule table of a tree-based model?
queryPhraseTable seems not to work on my side here.

Thanks and best,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] NIST scoring tool

2012-04-29 Thread Daniel Schaut
Hi Yared,

 

- am I right to use the tokenized cased data?

Yes.

 

- and I can't get a NIST scoring tool. is there a way to download
mteval-v11b.pl with out file transfer protocol. ftp:\\ is blocked in my
working area?

Have a look at the generic folder of the released scripts. There you’ll find
your answer.

 

Best,

Daniel

 

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Yared Mekuria
Gesendet: 29 April 2012 16:22
An: moses-support
Betreff: [Moses-support] NIST scoring tool

 

Hi Daniel,

thank you for your replay,

I make the cased data the  tokenized English corpus news-commentary.tok.en
file, since the lower cased data  was news-commentary.lowercased.en

  and it works as you say.

 

- am I right to use the tokenized cased data?

 

- and I can't get a NIST scoring tool. is there a way to download
mteval-v11b.pl with out file transfer protocol. ftp:\\ is blocked in my
working area?

 

any suggestion, pls help.

Thank you.

Yared.

___

Moses-support mailing list

 mailto:Moses-support@mit.edu Moses-support@mit.edu

 http://mailman.mit.edu/mailman/listinfo/moses-support
http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] To ask for steps for Evaluation of MT system when IRSTLM used.

2012-04-27 Thread Daniel Schaut
Hi Yared,

It seems to be that you used the -n-gram-count switch which only works with
SRI LMs. Thanks to Jehan Pages, you can use -lm=IRSTLM and
-build-lm=/path/to/build-lm.sh to train a recasing model using IRSTLM.
Prerequisite for this is the a proper-cased/mixed-cased IRST LM containing
s elements. The -corpus switch should point to the your cased data. John
Burger gives a nice general overview for the recasing process:
http://www.mail-archive.com/moses-support@mit.edu/msg00696.html

Of course, you might want to only evaluate lowercased data - that's up to
your approach. Then there is no need to train a recasing model.

Hope this helps.

Best,
Daniel

-Ursprüngliche Nachricht-
Von: Yared Mekuria [mailto:yared.m...@gmail.com] 
Gesendet: 27 April 2012 07:52
An: danielsh...@hotmail.com
Betreff: To ask for steps for Evaluation of MT system when IRSTLM used.

Hello Daniel,
I am on the evaluation part of the MT system, and I don't understand how
evaluation is performed when IRSTLM language model is used.
I use the The following command to train the recaser

/home/admin1/mose/moses-scripts/scripts-20120409-0748/recaser/train-recaser.
perl
-train-script
/home/admin1/mose/moses-scripts/scripts-20120409-0748/training/train-model.p
erl
-ngram-count mose/bin/irstlm/bin/build-lm.sh -corpus
worked/corpus/news-commentary.tok.en -dir /home/admin1/worked/recaser
-scripts-root-dir
/home/admin1/mose/moses-scripts/scripts-20120409-0748

and the I got this error,

ERROR: Language model file not found or empty:
/home/admin1/worked/recaser/cased.irstlm.gz at
/home/admin1/mose/moses-scripts/scripts-20120409-0748/training/train-model.p
erl
line 324.

I don't have cased data, and is it necessary to use cased data for
evaluation when IRST LM used?

Please suggest me on it.
Yared.
Best regards.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-27 Thread Daniel Schaut
Hi guys,

 

Thank you for your comprehensive comments.

 

The most likely thing is that you have some of your test set included in your 
training set,

 

Indeed, there exist some similarities owing to the domain (instruction 
manuals). Typically for all kinds of manuals, you will find a high degree of 
similarities, e.g. on sub-segment level. I extracted the test set A and the 
tuning sets from the whole corpus before training my engine to make sure that 
test set A doesn’t interfere with the training set. Hmmm… that’s an epic fail 
then… Test set B was provided at a much later stage, when the training process 
was already done.

 

Did you try looking at the sentences ? -- 1,000 is few enough to eyeball them. 
Have you tried the same system with a different corpus ? (e.g.

EuroParl). Have you checked that your test set and your training set do not 
intersect ?

 

Apart from scoring, I checked almost every sentence in both test sets for my 
thesis. The quality of the outputs is on a moderate level for sentences up to 
50 words; everything beyond is of lesser quality. Especially, sentences up to 
20 words are on a good level.

I’ve just prepared a third and fourth test set from the OpenOffice corpus files 
and from another bunch of in-domain files. Regarding OO files (2,000 sentences 
)BLEU is 0.0858 and METEOR is 0.3031. Kind of disappointing…
The fourth test set of 2,000 sentences reveals similar scores compared to the 
other in-domain test sets.

Very short sentences will give you high scores. 

This might be truly another related issue for boosting the scores. On average, 
almost half of the sentences in the test set A and B are quit short.

 

To conclude, one could say that I’ve created an engine suitable for a specific 
domain? However, the engine’s performance outside my domain equals almost to 
zero?

 

Best,

Daniel

 

Von: miles...@gmail.com [mailto:miles...@gmail.com] Im Auftrag von Miles Osborne
Gesendet: 26 April 2012 21:17
An: John D Burger
Cc: Daniel Schaut; moses-support@mit.edu
Betreff: Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

 

Very short sentences will give you high scores. 

Also multiple references will boost them

Miles

On Apr 26, 2012 8:13 PM, John D Burger j...@mitre.org wrote:

I =think= I recall that pairwise BLEU scores for human translators are usually 
around 0.50, so anything much better than that is indeed suspect.

- JB

On Apr 26, 2012, at 14:18 , Daniel Schaut wrote:

 Hi all,


 I’m running some experiments for my thesis and I’ve been told by a more 
 experienced user that the achieved scores for BLEU/METEOR of my MT engine 
 were too good to be true. Since this is the very first MT engine I’ve ever 
 made and I am not experienced with interpreting scores, I really don’t know 
 how to reflect them. The first test set achieves a BLEU score of 0.6508 
 (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A 
 second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR 
 score of 0.6748.


 Here are some basic facts about my system:

 Decoding direction: EN-DE

 Training corpus: 1.8 mil sentences

 Tuning runs: 5

 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)

 LM type: trigram

 TM type: unfactored


 I’m now trying to figure out if these scores are realistic at all, as 
 different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang 
 2011. Any comments regarding the mentioned decoding direction and related 
 scores will be much appreciated.


 Best,

 Daniel

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Daniel Schaut
Hi all,

I'm running some experiments for my thesis and I've been told by a more
experienced user that the achieved scores for BLEU/METEOR of my MT engine
were too good to be true. Since this is the very first MT engine I've ever
made and I am not experienced with interpreting scores, I really don't know
how to reflect them. The first test set achieves a BLEU score of 0.6508
(v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A
second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR
score of 0.6748.

Here are some basic facts about my system:
Decoding direction: EN-DE
Training corpus: 1.8 mil sentences
Tuning runs: 5
Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
LM type: trigram
TM type: unfactored

I'm now trying to figure out if these scores are realistic at all, as
different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
2011. Any comments regarding the mentioned decoding direction and related
scores will be much appreciated.

Best,
Daniel
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] features in reordering model

2012-04-23 Thread Daniel Schaut
Hi Cyrine,

 

The answer to your question can be found here:
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases

 

Best,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Cyrine NASRI
Gesendet: Sonntag, 22. April 2012 22:59
An: moses-support@mit.edu
Betreff: [Moses-support] features in reordering model

 

Hello all, 
i have a question concern reordering model 
in the model i have this string : 
@ ries bibliothèques ||| @ ries ||| 0.60 0.20 0.20 0.20
0.20 0.60 

Can you explain me what these numbers refers to? how Moses calculate them?

Thank you 
Bests

-- 

Cyrine NASRI
Ph.D. Student in Computer Science

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] features in reordering model

2012-04-23 Thread Daniel Schaut
Hi Cyrine,

 

Sorry for providing the wrong link. If I’m correct, the 6 features of the
reordering model should be described here:

http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel

 

Best,

Daniel

 

Von: Cyrine NASRI [mailto:cyrine.na...@gmail.com] 
Gesendet: Montag, 23. April 2012 09:33
An: Daniel Schaut
Cc: moses-support@mit.edu
Betreff: Re: [Moses-support] features in reordering model

 

Hi Daniel, 
My question is about the reordering model, but here the link that you give
me is about phrase model.

thanks
Cyrine

Le 23 avril 2012 08:24, Daniel Schaut danielsh...@hotmail.com a écrit :

Hi Cyrine,

 

The answer to your question can be found here:
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases

 

Best,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Cyrine NASRI
Gesendet: Sonntag, 22. April 2012 22:59
An: moses-support@mit.edu
Betreff: [Moses-support] features in reordering model

 

Hello all, 
i have a question concern reordering model 
in the model i have this string : 
@ ries bibliothèques ||| @ ries ||| 0.60 0.20 0.20 0.20
0.20 0.60 

Can you explain me what these numbers refers to? how Moses calculate them?

Thank you 
Bests


 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Today's Topics??The Parsing Algorithm of ParseCYKPlus and ParseScope3

2012-04-16 Thread Daniel Schaut
Hi,

 

Please read the following post
http://www.mail-archive.com/moses-support@mit.edu/msg03135.html about CYK+
parsing. Regarding your second question, please see
http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel for
more information on lexical reordering models.

 

Hope this answers your questions.

 

Best,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von kehai chen
Gesendet: Montag, 16. April 2012 07:34
An: moses-support@mit.edu
Betreff: [Moses-support] Today's Topics??The Parsing Algorithm of
ParseCYKPlus and ParseScope3

 

Hi:

   I load the newest source code of Moses from Github,then I discover a new
enum variable named ParsingAlgorithm in the folder TypeDef.h:

   ..

  enum ParsingAlgorithm {

   ParseCYKPlus = 0,

   ParseScope3 = 1

   };

 ..

 

 Could you tell me some information about the enum variable? what's more,I
don't understand the member FE,F among another enum variable LexReoderType
in the folder TypeDef.h:

 ..

 namespace LexReorderType

{

enum LexReorderType { // explain values

  Backward

  ,Forward

  ,Bidirectional

  ,Fe

  ,F

};

}

..

Look forward to your replying.Thanks.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses documentation

2012-04-10 Thread Daniel Schaut
If I might edge myself into this interesting conversion...

Sourceforge comes with an opt-in mediawiki app, e.g. Marcello and Nicola
make us of it for IRSTLM (which is nicely done btw)
http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Main_Page
But since Moses moved to Git, this would be more confusing than an option.

I found a nice blog on the github site about git-backed wikis:
https://github.com/blog/699-making-github-more-open-git-backed-wikis
As far as I skimmed the text, each wiki can be set up to a Git repository,
so you're able to push and pull them like anything else. Each wiki respects
the same permissions as the source repository. In other words: Each page
should be file in a directory and each change should be a commit. They
support eight formats with context sensitive help and a toolbar; reference
images are hosted inside the Git repository. Furthermore, you're able to see
diffs of changes for the wiki.

There's also a ruby library for implementing such as wiki. gollum provides a
ruby API for accessing and modifying the content, and also includes a small
Sinatra web server.
https://github.com/github/gollum
A demo gollum wiki can be cloned here:
https://github.com/mojombo/gollum-demo

I hope this might help.

Best,
Daniel


-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Hieu Hoang
Gesendet: Dienstag, 10. April 2012 18:24
An: moses-support@mit.edu
Betreff: Re: [Moses-support] Moses documentation

i think it's only easy to do the easy things in the present wiki.

It's impossible to add a picture, or an equation, or to add a new section to
the sidebar, without ssh access to the edinburgh server. And err root
access...

and it's impossible to add user-based access or to be notified when the
wiki's being changed. This kinda of means we can never let newer people edit
the wiki, which is a shame since the docs are mostly for them and they
should have the ability to edit it too. Ideally, i think it should be a
cross between a manual and a stackoverflow forum.

mediawiki might be another idea



On 10/04/2012 22:07, Barry Haddow wrote:
 Hi Folks

 Thanks for all your suggestions!

 I'm not convinced about putting the documentation into github. At the 
 moment the documentation is in a wiki, which is good because it's 
 really easy to edit, the results of an edit are immediate, and you end 
 up with a linked set of html documents. The main issue that I see is 
 that there is only one password, so there's no way for people to get 
 credit for their edits or create areas to upload their own stuff.

 If we move to github, with the primary documentation written in Latex, 
 then it seems to make it harder to contribute. Not everyone knows 
 Latex, it's harder to link across documents with Latex, and you have 
 to wait at least until you check it in before you see how it affects 
 the website. Wikis should make collaborative editing easier, in a way 
 that a document checked into source control doesn't.

 Also, if we go down the github/latex (or github/docbook or whatever) 
 route, then there's a bit of hacking to convert the existing 
 documentation to editable latex, and rig up commit hooks in github. (I 
 know we generate latex from the existing documentation, but the 
 generated latex is probably not suitable for human  editing). I 
 suppose if we think github/latex is a good route then these problems could
be overcome.

 Another option would be to switch to a different wiki option (e.g. 
 mediawiki) which allows user accounts and comments on pages. That 
 would mean that people could add their own pages, getting credit for 
 their edits. It also has pdf book export built-in. There would still be
the format conversion pain...

 cheers - Barry


 On Tuesday 10 April 2012 14:42:11 Hieu Hoang wrote:
 I think putting it as a special branch of github is a good idea.
 Anything where other people can add there own stuff to the docs is cool.

 another thing we might want is to be able to let people comment on a 
 particular section. eg. suggested changes/queries. It might also move 
 some of the newbie questions away from the mailing list

 there's just the small matter of cutting  pasting everything from 
 the current docs...

 On 10/04/2012 20:01, Lane Schwartz wrote:
 Barry,

 What about making a special branch in the git repo for documentation?

 That way anyone with access to the git repo could easily add to the 
 documentation as needed.

 The nightly build could just check out that branch and compile it 
 from whatever format you want people to edit it in (presumably latex 
 or possibly docbook) into pdf (and possibly also html).

 Cheers,
 Lane


 On Tue, Apr 10, 2012 at 8:51 AM, Barry Haddowbhad...@inf.ed.ac.uk 
 mailto:bhad...@inf.ed.ac.uk  wrote:

  Hi Folks

  I'm going to be spending some time over the next couple of weeks
  improving the
  Moses documentation (http://www.statmt.org/moses/), with 

Re: [Moses-support] MT training on a laptop

2012-02-22 Thread Daniel Schaut
Hi Hieu,

My latest tests on a netbook (dual core 1.6 Ghz, 2GB Ram, 320 GB 5400 rpm):

- test sets had a size of 1000 to 2000 sentences and the complete parallel
corpus was around 1.8 mil words (~950 sentences each)
- I performed several training steps for chart and pb-decoding using an
unfactored model and a tree-based one
- phrase, rule and reodering tables were binarized and/or filtered
- training pipeline using the chart-decoder took up almost twice as much
time compared to the pb decoder ranging from 5 to 12hrs per run-through
- step 2 and 3 took up the most time (grow-diag-final-and)
- during the alignment GIZA stopped from time to time (probably due to the
IO waits you mentioned)
- moses git rev cb5213a, GIZA++, IRSTLM 5.70.04, Ubuntu 10.4

As Tom already mentioned, tests on desktops run much smoother.

Hope this might help, too.
-Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Tom Hoar
Gesendet: Mittwoch, 22. Februar 2012 15:48
An: Hieu Hoang
Cc: moses-support
Betreff: Re: [Moses-support] MT training on a laptop

 I don't like to admit it, but I run some tests on a Samsung netbook  with a
dual-core Atom 1.6 Ghz, 2 GB Ram and 300 GB 5400 rpm hard disk. 
 typically only use a small ~40K pair test corpus. Only use MGIZA++ and
snt2cooc slows to slower-than a crawl on one core.

 Have several desktops we draft into action sometimes. 4 GB w/ 3 Ghz
Pentium Dual-cores. They run much smoother and faster than on the 2 GB
netbook.

 We're running SVN rev 4153 from mid-August last year on Ubuntu 10.04. 
 Plan to update our binaries to the GITHUB version 2-3 months after the
Ubuntu 12.04 LTS is launched in the spring.

 Hope this helps.
 Tom


 On Wed, 22 Feb 2012 12:54:36 +, Hieu Hoang hieuho...@gmail.com
 wrote:
 hi all

 does anyone have experience running the training pipeline on a laptop?
 It seem very slow to me, especially some parts of the GIZA++ alignment 
 (and possibly later stages too). Seems to be crawling due to IO waits 
 on a GIZA process called snt2cooc.out. This doesn't happens when 
 running on larger servers. Has anyone else encounter this problem?

 I'm using a MacBook 2.4Ghz dual core, OSX 10.7.3, 240GB disk (5400 
 spin), 4GB ram.


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation fault in tuning with chart decoder

2012-01-23 Thread Daniel Schaut
Hi Rasul,

I experienced exactly the same issue two or four weeks ago: My tuning set
contained an odd number of lines, e.g. the target side included 2000 lines
and the source side 2003 lines. Subsequently, I removed the remaining lines
on the source side and the issue was gone. If I remember correctly, I also
filtered the tuning set using --filtercmd. Hope this helps!

Best,
Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von ra...@rszk.net
Gesendet: Montag, 23. Januar 2012 00:57
An: moses-support@mit.edu
Betreff: [Moses-support] Segmentation fault in tuning with chart decoder

Hi all,

I have trained a hierarchical model and trying to tune it using mert.  
I'm getting this segmentation fault error in the early stages.  
Following is the log and the command I'm using. Your idea is much
appreciated.

Best Wishes,
Rasul.

--- Log
Executing: /tools/moses/moses-chart-cmd/src/moses_chart -v 0 -config
filtered/moses.ini  -inputtype 0 -show-weights  ./features.list In
LanguageModelIRST::Load: nGramOrder = 5 Language Model Type of
/en-fr/lm/irstlmse.5grams.lm.fr is 1 \data\
loadtxt_ram()
1-grams: reading 176493 entries
done level1
2-grams: reading 1332577 entries
done level2
3-grams: reading 1402000 entries
done level3
4-grams: reading 1836276 entries
done level4
5-grams: reading 1830829 entries
done level5
done
OOV code is 176492
OOV code is 176492
sh: line 1:  7057 Segmentation fault   
/tools/moses/moses-chart-cmd/src/moses_chart -v 0 -config filtered/moses.ini
-inputtype 0 -show-weights  ./features.list Exit code: 139 Failed to run
moses with the config filtered/moses.ini at
/tools/moses/scripts/training/mert-moses.pl line 1072.
r

--- Command nohup nice
$SCRIPTS_ROOTDIR/training/mert-moses.pl
$EXP_ENFR/common/corpus/dev.tok.lower.en
$EXP_ENFR/corpus/dev.tok.lower.fr
$MOSES_ROOTDIR/moses-chart-cmd/src/moses_chart
$EXP_ENFR/model/moses.ini
--working-dir $EXP_ENFR/tuning/mert
--mertdir $MOSES_ROOTDIR/mert
--rootdir $SCRIPTS_ROOTDIR
--decoder-flags -v 0
 $EXP_ENFR/tuning/mert.log

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Language Support for recase.perl

2012-01-06 Thread Daniel Schaut
Hi,

ahhh, ok. Now I see... I was a bit confused, because I changed line 9 in
recase.perl from en to de. Consequently, the script told me that there
are no rules for the language de.

Thank you very much!
Daniel


-Ursprüngliche Nachricht-
Von: phko...@gmail.com [mailto:phko...@gmail.com] Im Auftrag von Philipp
Koehn
Gesendet: Freitag, 6. Januar 2012 05:58
An: Daniel Schaut
Cc: Moses-support@mit.edu
Betreff: Re: [Moses-support] Language Support for recase.perl

Hi,

the language specific stuff in recase.perl is only for English headlines,
which have an odd capitilization style. This can be completely ignored for
other languages.

-phi

On Thu, Jan 5, 2012 at 9:12 AM, Daniel Schaut danielsh...@hotmail.com
wrote:
 Hi all,

 First, happy new year to all of you! :)

 Second, I've got a question regarding the languages supported by 
 recase.perl and regarding workarounds for my current problem.

 After three weeks of long-term tuning my EN-DE Moses system, I'd like 
 to recase my lowercased German output for evaluation purposes with
METEOR/TERp.
 Unfortunately, I've noticed today that recase.perl supports English
solely.
 So, how do I get the output recased? I don't what to start the whole 
 preparation process (corpus data preparations, LM and TM training, 
 tuning) from scratch using truecasing. Are there any workarounds?

 Or, If I installed SRILM and trained a truecasing model instead, would 
 truecase.perl be able to recase the lowercased German output 
 accordingly, although the output is lowercased and not truecased?

 Help is very much appreciated! :)

 Regards,
 Daniel


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Language Support for recase.perl

2012-01-05 Thread Daniel Schaut
Hi all,

First, happy new year to all of you! :)

Second, I've got a question regarding the languages supported by recase.perl
and regarding workarounds for my current problem.

After three weeks of long-term tuning my EN-DE Moses system, I'd like to
recase my lowercased German output for evaluation purposes with METEOR/TERp.
Unfortunately, I've noticed today that recase.perl supports English solely.
So, how do I get the output recased? I don't what to start the whole
preparation process (corpus data preparations, LM and TM training, tuning)
from scratch using truecasing. Are there any workarounds?

Or, If I installed SRILM and trained a truecasing model instead, would
truecase.perl be able to recase the lowercased German output accordingly,
although the output is lowercased and not truecased?

Help is very much appreciated! :)

Regards,
Daniel


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.

2011-12-20 Thread Daniel Schaut
Hi Barry, hi Christophe,

Thanks for your answers. Please find attached mert.out and mert.log.

Why do you think it's running out of memory?
I assume my system ran out of memory, because when it failed to run mert,
memory usage was at 100% for quite a while. Don't know what happened
exactly. I'll try to perform some other runs and keep you updated.

Regards,
Daniel

-Ursprüngliche Nachricht-
Von: Christophe Servan [mailto:christophe.ser...@gmail.com] 
Gesendet: Montag, 19. Dezember 2011 20:33
An: moses-support@mit.edu; Daniel Schaut
Cc: Barry Haddow
Betreff: Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at
./mert-moses-multi.pl line 1374.

Hi Daniel,
As Barry said, I made this variation of the mert-moses.pl in order to tune
with multiple metrics together.
The tuning is made with a linear ponderation of metrics, for example : 
(1xBLEU+2xTER)/3
The setting is made with the switch --sc-config=BLEU:1,TER:2 (for my
previous example).
If you don't use this switch, you will tune only with BLEU (the default
metric for tuning).
As Barry proposed, would you like to post the mert.out and mert.log you
generated ?

Best regards,

Christophe


Le 19/12/2011 15:44, Barry Haddow a écrit :
 Hi Daniel

 Why do you think it's running out of memory? Could you post mert.out 
 and mert.log ?

 Christophe Servan  is the person who knows most about this script,

 cheers - Barry

 On Sunday 18 Dec 2011 18:49:46 Daniel Schaut wrote:
 mert.out 2  mert.log

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



run3.mert.out
Description: Binary data


run3.mert.log
Description: Binary data
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Tuning of hierarchical models: main::create_extractor_script() called too early

2011-12-20 Thread Daniel Schaut
Hi Patrick,

thanks for your help.

In the meantime, I found this debugging suggestion:
http://www.mail-archive.com/moses-support@mit.edu/msg05041.html

Might be worth implementing it. If it's just a warning and not affecting the
process itself, it's not much of an issue. ;)

Thanks again,
Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Patrik Lambert
Gesendet: Dienstag, 20. Dezember 2011 16:25
An: moses-support@mit.edu
Betreff: Re: [Moses-support] Tuning of hierarchical models:
main::create_extractor_script() called too early

Hi Daniel,

sorry for the late answer, I actually found your post because I had the same
error:

main::create_extractor_script() called too early to check prototype at
./mert-moses.pl line 666

It is due to the fact that the function create_extractor_script is defined
with parentheses (thus it is called before it is defined, I guess). It
should be

sub create_extractor_script
{

instead of

sub create_extractor_script()
{

However, it is just a warning.

Patrik

 I've got a quick question regarding the tuning of hierarchical 
 phrase-based models. When calling mert, my terminal outputs an error I 
 can't find in the mailing lists:

 main::create_extractor_script() called too early to check prototype at 
 ./mert-moses.pl line 666

 The script didn't stop at that point and finished processing, but only 
 performed two mert runs. The created phrase-table in 
 my/path/to/tuning/mert/filtered resulted in a 0 Mbyte file. The moses 
 ini I passed to mert was configured with KenLM.

 That's my call:

 /mert-moses.pl
 /home/user/moses/chart/tuning/tuning.en
 /home/user/moses/chart/tuning/tuning.de
 /home/user/moses/mosesdecoder/moses-chart-cmd/src/moses_chart
 /home/user/moses/mosesdecoder/model/moses_chart.ini
 -working-dir /home/user/moses/chart/tuning/mert
 -mertdir /home/user/moses/mosesdecoder/mert
 -rootdir /home/user/moses/mosestools/scripts-20111024-1127

 Any guesses how to fix this? Help is very much appreciated.

 Thanks a lot,
 Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.

2011-12-20 Thread Daniel Schaut
.features.dat.BLEU --sctype BLEU
-r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n
run1.best100.out.gz  extract.out.BLEU 2 extract.err.BLEU
exec: /home/dan/smt/decoder/mert/extractor  --scconfig case:true --scfile
run1.scores.dat.TER --ffile run1.features.dat.TER --sctype TER -r
/home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz
Executing: /home/dan/smt/decoder/mert/extractor  --scconfig case:true
--scfile run1.scores.dat.TER --ffile run1.features.dat.TER --sctype TER -r
/home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz
 extract.out.TER 2 extract.err.TER
Exit code: 1
ERROR: Failed to run '/home/dan/smt/decoder/mert/extractor  --scconfig
case:true --scfile run1.scores.dat.TER --ffile run1.features.dat.TER
--sctype TER -r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n
run1.best100.out.gz'. at ./mert-moses-multi.pl line 1374.

Regards,
Daniel

-Ursprüngliche Nachricht-
Von: Daniel Schaut [mailto:danielsh...@hotmail.com] 
Gesendet: Dienstag, 20. Dezember 2011 15:05
An: 'Christophe Servan'; 'moses-support@mit.edu'
Cc: 'Barry Haddow'
Betreff: AW: [Moses-support] mert-moses-multi.pl: Failed to run mert at
./mert-moses-multi.pl line 1374.

Hi Barry, hi Christophe,

Thanks for your answers. Please find attached mert.out and mert.log.

Why do you think it's running out of memory?
I assume my system ran out of memory, because when it failed to run mert,
memory usage was at 100% for quite a while. Don't know what happened
exactly. I'll try to perform some other runs and keep you updated.

Regards,
Daniel

-Ursprüngliche Nachricht-
Von: Christophe Servan [mailto:christophe.ser...@gmail.com]
Gesendet: Montag, 19. Dezember 2011 20:33
An: moses-support@mit.edu; Daniel Schaut
Cc: Barry Haddow
Betreff: Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at
./mert-moses-multi.pl line 1374.

Hi Daniel,
As Barry said, I made this variation of the mert-moses.pl in order to tune
with multiple metrics together.
The tuning is made with a linear ponderation of metrics, for example : 
(1xBLEU+2xTER)/3
The setting is made with the switch --sc-config=BLEU:1,TER:2 (for my
previous example).
If you don't use this switch, you will tune only with BLEU (the default
metric for tuning).
As Barry proposed, would you like to post the mert.out and mert.log you
generated ?

Best regards,

Christophe


Le 19/12/2011 15:44, Barry Haddow a écrit :
 Hi Daniel

 Why do you think it's running out of memory? Could you post mert.out 
 and mert.log ?

 Christophe Servan  is the person who knows most about this script,

 cheers - Barry

 On Sunday 18 Dec 2011 18:49:46 Daniel Schaut wrote:
 mert.out 2  mert.log

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



extract.err.BLEU
Description: Binary data


extract.err.TER
Description: Binary data
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.

2011-12-18 Thread Daniel Schaut
Hi all,

I've got three quick questions regarding the behavior of
mert-moses-multi.pl, because my system runs out of memory after some
iterations. Both, my phrase/ reordering tables are binarized. That's my call

./mert-moses-multi.pl
/home/user/smt/phrase/tuning/devset-1.tok.lw.en
/home/user/smt/phrase/tuning/devset-1.tok.lw.de
/home/user/smt/decoder/dist/cb5213a/bin/moses
/home/user/smt/phrase/model/moses.ini
--working-dir /home/user/smt/phrase/tuning/mert
--mertdir=/home/user/smt/decoder/mert
--rootdir /home/user/smt/scripts/cb5213a
--threads=2
--decoder-flags -v 0 -threads 2

And, that's the message mert-moses-multi.pl gives me when running out of
memory:

Executing: gzip -f run3.best100.out
Scoring the nbestlist.
exec: /home/user/smt/decoder/mert/extractor  --scconfig case:true --scfile
run3.scores.dat.BLEU --ffile run3.features.dat.BLEU --sctype BLEU -r
/home/user/smt/phrase/tuning/rainbow/devset-1.tok.lw.de -n
run3.best100.out.gz
Executing: /home/user/smt/decoder/mert/extractor  --scconfig case:true
--scfile run3.scores.dat.BLEU --ffile run3.features.dat.BLEU --sctype BLEU
-r /home/user/smt/phrase/tuning/rainbow/devset-1.tok.lw.de -n
run3.best100.out.gz  extract.out.BLEU 2 extract.err.BLEU
Executing: \cp -f init.opt run3.init.opt
exec: /home/user/smt/decoder/mert/mert -d 22   --scconfig case:true --sctype
MERGE  --sctype MERGE  --sctype MERGE  --ffile
run1.features.dat,run2.features.dat,run3.features.dat --scfile
run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n 20
Executing: /home/user/smt/decoder/mert/mert -d 22   --scconfig case:true
--sctype MERGE  --sctype MERGE  --sctype MERGE  --ffile
run1.features.dat,run2.features.dat,run3.features.dat --scfile
run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n 20
 mert.out 2 mert.log
Exit code: 134
ERROR: Failed to run '/home/user/smt/decoder/mert/mert -d 22   --scconfig
case:true --sctype MERGE  --sctype MERGE  --sctype MERGE  --ffile
run1.features.dat,run2.features.dat,run3.features.dat --scfile
run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n
20'. at ./mert-moses-multi.pl line 1374.

When I run the a similar command using mert-moses.pl on the same devset,
mert-moses.pl is able to complete the tuning process. That's the command

./mert-moses.pl
/home/user/smt/phrase/tuning/devset-1.tok.lw.en
/home/user/smt/phrase/tuning/devset-1.tok.lw.de
/home/user/smt/decoder/dist/cb5213a/bin/moses
/home/user/smt/phrase/model/moses.ini
--working-dir /home/user/smt/phrase/tuning/mert
--mertdir=/home/user/smt/decoder/mert
--rootdir /home/user/smt/scripts/cb5213a
--decoder-flags -v 0 -threads 2

So my questions are now:
What am I doing wrong?
Is this behavior of mert-moses-multi.pl normal?
What can I do to prevent mert-moses-multi.pl from stopping the tuning
process?

If needed, I can provide further information. Help is very much appreciated.

Regards,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] alignment point out of range

2011-12-15 Thread Daniel Schaut
Hi all,

By accident, I came across this issue yesterday. It's not about the corpus,
it seems to be about an user error related to GIZA++. I wanted to train two
different models with different corpus files. By default, GIZA++ saves all
its files into the training folder of the released scripts, when calling
train-model, right? If these folders already exist, then GIZA++ skips
preparing the corpus, selecting factors and running mkcls. Then GIZA++ wants
to learn the translation tables from the already existing files, although
you indicated different corpus files. To conclude, before calling
train-model for a second time with similar parameters, just move the
appropriate folders GIZA++ creates during the first training process.

Regards,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Training of LM and TM containing placeholders

2011-12-15 Thread Daniel Schaut
Hi,

Thanks for the tip. I'll try that. 

Regards,
Daniel


-Ursprüngliche Nachricht-
Von: phko...@gmail.com [mailto:phko...@gmail.com] Im Auftrag von Philipp
Koehn
Gesendet: Montag, 12. Dezember 2011 23:11
An: Daniel Schaut
Cc: moses-support@mit.edu
Betreff: Re: [Moses-support] Training of LM and TM containing placeholders

Hi,

I would suggest to use XML markup to specify translations for the place
holders.

You can find some more information about this here:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4

-phi

On Sun, Dec 11, 2011 at 6:46 AM, Daniel Schaut danielsh...@hotmail.com
wrote:
 Hi all,

 At the moment I’m experimenting with corpus files that contain
placeholders.
 Since I’m not a very experienced user, I’d like to ask for some 
 advice. Did anyone already experimented with that?

 At first sight, I was thinking of removing all instances of 
 placeholders, but they make up around 10 % of the corpus files. So I’d 
 like to keep them for training, as in a lot of cases they would represent
words, e.g.:

 Original text strings:

 See ph x=1{1}/ph and ph x=2{2}/ph.

 Removed markup:

 See {1} and {2}.

 When I’d remove the placeholders, the sentence structure gets 
 obviously broken. Broken sentences should be quite problematic, shouldn’t
they?

 Other instances of placeholders appear to be meant inline elements, e. g.

 Select an ph x=1{1}/phoptionph x=2{2}/ph from the context
menu.

 Select an {1}option{2} from the context menu.

 My strategy would be to add these placeholders to the list of 
 non-breaking prefixes in order to have them treated like words. Then 
 setting the right distortion value should do the trick, to keep them 
 in place. Is this a good idea?

 Best regards,

 Daniel


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Scripts problem. Step missing?

2011-12-11 Thread Daniel Schaut
Hi Ana,

 

make sure to include --install-scripts when using bjam.

 

This might be helpful:

http://www.mail-archive.com/moses-support@mit.edu/msg04984.html

 

Best regards,

Daniel

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Ana Sanz
Gesendet: Samstag, 10. Dezember 2011 14:18
An: moses-support@MIT.EDU
Betreff: [Moses-support] Scripts problem. Step missing?

 

Dear all,

I am trying to set up Moses with your new Step-by-step tutorial (some days
ago it was moved to Git :) )

It seems to be one step missing. When I was trying to execute 

tools/moses-scripts/scripts-MMDD-HHMM/training/clean-corpus-n.perl
work/corpus/news-commentary.tok es en work/corpus/news-commentary.clean 1 40


 

(Prepare Data - Filter out long sentences step) I realized that there is no
scripts-MMDD-HHMM folder generated. 

With Moses SVN version, (Set script environment variables step) you can
create moses-scripts folder, modify the Makefile in the scripts folder,
execute make release and that scripts will be generated.

I could not find the way to get that scripts with the new version. Please,
be so kind as to tell me what should I do.


 


Best regards, thank you in advance,

Ana 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Training of LM and TM containing placeholders

2011-12-11 Thread Daniel Schaut
Hi all,

At the moment I'm experimenting with corpus files that contain placeholders.
Since I'm not a very experienced user, I'd like to ask for some advice. Did
anyone already experimented with that?
At first sight, I was thinking of removing all instances of placeholders,
but they make up around 10 % of the corpus files. So I'd like to keep them
for training, as in a lot of cases they would represent words, e.g.:

Original text strings:
See ph x=1{1}/ph and ph x=2{2}/ph.

Removed markup:
See {1} and {2}.

When I'd remove the placeholders, the sentence structure gets obviously
broken. Broken sentences should be quite problematic, shouldn't they?
Other instances of placeholders appear to be meant inline elements, e. g.

Select an ph x=1{1}/phoptionph x=2{2}/ph from the context menu.

Select an {1}option{2} from the context menu.

My strategy would be to add these placeholders to the list of non-breaking
prefixes in order to have them treated like words. Then setting the right
distortion value should do the trick, to keep them in place. Is this a good
idea?

Best regards,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Tuning of hierarchical models: main::create_extractor_script() called too early

2011-11-24 Thread Daniel Schaut
Hi all,

I've got a quick question regarding the tuning of hierarchical phrase-based
models. When calling mert, my terminal outputs an error I can't find in the
mailing lists: 

main::create_extractor_script() called too early to check prototype at
./mert-moses.pl line 666

The script didn't stop at that point and finished processing, but only
performed two mert runs. The created phrase-table in
my/path/to/tuning/mert/filtered resulted in a 0 Mbyte file. The moses ini I
passed to mert was configured with KenLM.

That's my call:

/mert-moses.pl
/home/user/moses/chart/tuning/tuning.en
/home/user/moses/chart/tuning/tuning.de
/home/user/moses/mosesdecoder/moses-chart-cmd/src/moses_chart
/home/user/moses/mosesdecoder/model/moses_chart.ini
-working-dir /home/user/moses/chart/tuning/mert
-mertdir /home/user/moses/mosesdecoder/mert
-rootdir /home/user/moses/mosestools/scripts-20111024-1127

Any guesses how to fix this? Help is very much appreciated.

Thanks a lot,
Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Train recasing model using IRSTLM

2011-11-15 Thread Daniel Schaut
Hi Kenneth,

I ran iconv on my raw file and on the iARPA/ARPA files; encoding is ok, it
did not print any errors. build_binary neither echoed any errors.
But finally, I've found the issue causing the script to stop at line 95.

In addition to the suggested changes from
http://www.mail-archive.com/moses-support@mit.edu/msg01934.html,

one need to change line 13 from
my $TRAIN_SCRIPT =  train-factored-phrase-model.perl;
to
my $TRAIN_SCRIPT = /my/path/to/train-model.perl;

To conclude, using build_binary or build-lm.sh worked out fine.
However, If one would like to use compile-lm instead of build-lm, passing a
gzipped IARPA file, the train-recaser script still stops at line 64/70 due
to UTF8 issues. I'll asked the IRSTLM guys.

Thanks for your help! :)
Daniel

-Ursprüngliche Nachricht-
Von: Kenneth Heafield [mailto:mo...@kheafield.com] 
Gesendet: Montag, 14. November 2011 16:05
An: Daniel Schaut
Betreff: Re: AW: [Moses-support] Train recasing model using IRSTLM

You can test if a file is UTF-8 using this command:

iconv -f utf8 -t utf8 file_name /dev/null

Does this succeed on your corpus, namely the file you're passing with
--corpus? Or does it print an error?

What's the error message that build_binary gives you? None of the error
messages you gave comes from build_binary.

On 11/14/11 14:40, Daniel Schaut wrote:
 Hi Kenneth,

 Thanks for your reply.

 I'm afraid I checked the iARPA file again, it's UTF8. Furthermore, I 
 deleted the first line of the file and tried it again, but without 
 success, same
 result:
 utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64, 
 CORPUS  line 1.
 Malformed UTF-8 character (fatal) at ./train-recaser.perl line 
 70,CORPUS line 1.

 Further, I tried to call build_binary with an ARPA file, but still I 
 get the same error as if I run build-lm.sh
 (4) Training recasing model @ Mon Nov 14 12:49:06 CET 2011 Can't exec
 /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
 : No such file or directory at ./train-recaser.perl line 95.

 Of course, I cleaned my files berforehand with clean-corpus-n and also 
 looked into train-recaser. Additionally, I changed the switch 
 $TRAIN_SCRIPT from train-factored-phrase-model.perl to
train-model.perl in line 13.
 Line 95 just echos the error/command (print STDERR '$cmd';). In my 
 folder corpus, I've got files called cased, lowercased and a LM 
 called cased.ilm/arpa depending on the command I use. 
 Train-model.perl remains in /scripts-20111024-1127/training. Even if I 
 move train-model.perl into /scripts-20111024-1127/recaser, the error line
95 persists.

 What did I miss? Which line or switch do I have to change, too?

 Best,
 Daniel

 -Ursprüngliche Nachricht-
 Von: moses-support-boun...@mit.edu 
 [mailto:moses-support-boun...@mit.edu] Im Auftrag von Kenneth Heafield
 Gesendet: Samstag, 12. November 2011 18:31
 An: moses-support@mit.edu
 Betreff: Re: [Moses-support] Train recasing model using IRSTLM

 Hi,

   It looks like your training data isn't valid UTF8.  Either convert
it 
 to UTF8 with iconv or scrub the invalid data first.

 Kenneth

 On 11/12/11 15:58, Daniel Schaut wrote:
 Dear all,



 I’m having some difficulties to train the recasing model with IRSTLM.
 I changed the train-recaser script according to

 http://www.mail-archive.com/moses-support@mit.edu/msg01934.html

 but this results in an error which I don’t know how to fix.



 Error log:

 -
 -
 -

 (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011

 /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
 --root-dir /home/user/moses/work/recaser --model-dir 
 /home/user/moses/work/recaser --first-step 4 --alignment a --corpus 
 /home/user/moses/work/recaser/aligned --f lowercased --e cased 
 --max-phrase-length 1 --lm
 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
 /home/user/moses/mosestools/scripts-20111024-1127

 Can't exec
 /home/user/mosestools/scripts-20111024-1127/training/train-model.perl:
 No such file or directory at ./train-recaser.perl line 95.



 (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011

 -
 -
 -



 Then instead of using build-lm.sh, I gave it another try calling 
 compile-lm directly:

 my $cmd = /home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm
 $CORPUS /dev/stdout | gzip -c $DIR/cased.irstlm.gz

 where $CORPUS is a gzip iARPA file.



 Error log:

 -
 -
 -

 (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 
 CET
 2011

 /home/nexoc/moses/work/recaser/aligned.lowercased

 utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64, 
 CORPUS  line 1.

 Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, 
 CORPUS  line 1

[Moses-support] User Manual: Error 404

2011-11-15 Thread Daniel Schaut
Hi Moses-Team,

The download of the user manual results in error 404:
Object not found! The requested URL was not found on this server. The
referring link seems to be wrong or outdated. Please contact.

Best,
Daniel
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses STM Support.

2011-11-15 Thread Daniel Schaut
Hi Hamza,

 

There’s a general SMT lecture available on the internet. It’s a two-parted 
video lecture on phrase-based and factored SMT:

http://videolectures.net/aerfaiss08_koehn_pbfs/

 

A tutorial on how to install Moses using Win7 can be found here:

http://ssli.ee.washington.edu/people/amittai/Moses-on-Win7.pdf

 

For more information on Moses, please refer the comprehensive Moses web site:

http://www.statmt.org/moses/

 

General Pub’s to read, can be found here:

http://www.statmt.org/moses/?n=Moses.Publications

or here:

http://www.statmt.org/

or here:

http://homepages.inf.ed.ac.uk/pkoehn/

 

Best,

Dan

 

Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im 
Auftrag von Hamza Acikgoz
Gesendet: Dienstag, 15. November 2011 16:57
An: moses-support@mit.edu
Betreff: [Moses-support] Moses STM Support.

 

Hello all,

 

I never used a Linux installed PC. I am using a Windows 7 installed one and I 
have Cygwin in it. I really would like to get to know the Moses STM. I am 
intending to prepare an English/Kurdish/Turkish translator. I even searched on 
the net whether if there is a Moses STM courses given; But couldn't able to 
find one. 

 

Please advice.

 

Thanking you.

 

Hamza Açıkgöz

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Train recasing model using IRSTLM

2011-11-12 Thread Daniel Schaut
Dear all,

 

I'm having some difficulties to train the recasing model with IRSTLM. I
changed the train-recaser script according to

http://www.mail-archive.com/moses-support@mit.edu/msg01934.html

but this results in an error which I don't know how to fix.

 

Error log:

---

(4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011

/home/user/mosestools/scripts-20111024-1127/training/train-model.perl
--root-dir /home/user/moses/work/recaser --model-dir
/home/user/moses/work/recaser --first-step 4 --alignment a --corpus
/home/user/moses/work/recaser/aligned --f lowercased --e cased
--max-phrase-length 1 --lm
0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
/home/user/moses/mosestools/scripts-20111024-1127

Can't exec
/home/user/mosestools/scripts-20111024-1127/training/train-model.perl: No
such file or directory at ./train-recaser.perl line 95.

 

(11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011

---

 

Then instead of using build-lm.sh, I gave it another try calling compile-lm
directly:

my $cmd = /home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm $CORPUS
/dev/stdout | gzip -c  $DIR/cased.irstlm.gz

where $CORPUS is a gzip iARPA file.

 

Error log:

---

(3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET
2011

/home/nexoc/moses/work/recaser/aligned.lowercased

utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64,
CORPUS line 1.

Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, CORPUS
line 1.

---

 

Please see full error logs attached for more information.

 

Could anyone give me a hint on how to train a recasing model with either
build-lm.sh or compile-lm? Help is very much appreciated.

 

Thanks,

Daniel

 

./train-recaser-irstlm.perl 
-train-script 
/home/nexoc/mosestools/scripts-20111024-1127/training/train-model.perl 
-corpus /home/nexoc/moses/work/corpus/cased.ilm.gz 
-dir /home/nexoc/moses/work/recaser 
-scripts-root-dir /home/nexoc/moses/mosestools/scripts-20111024-1127
(2) Train language model on cased data @ Sat Nov 12 15:11:22 CET 2011
/home/nexoc/moses/mosestools/irstlm-5.60.03/bin/compile-lm 
/home/nexoc/moses/work/corpus/cased.ilm.gz /dev/stdout | gzip -c  
/home/nexoc/moses/work/recaser/cased.irstlm.gz
inpfile: /home/nexoc/moses/work/corpus/cased.ilm.gz
dub: 1000
Language Model Type of /home/nexoc/moses/work/corpus/cased.ilm.gz is 1
Reading /home/nexoc/moses/work/corpus/cased.ilm.gz...
iARPA
loadtxt()
1-grams: reading 22785 entries
2-grams: reading 120301 entries
3-grams: reading 220243 entries
done
OOV code is 22784
OOV code is 22784
creating cache for storing prob, state and statesize of ngrams
Saving in bin format to /dev/stdout
savebin: /dev/stdout
saving 22785 1-grams
saving 120301 2-grams
saving 220243 3-grams
done
deleting cache for storing prob, state and statesize of ngrams

(3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET 2011
/home/nexoc/moses/work/recaser/aligned.lowercased
utf8 \x8B does not map to Unicode at ./train-recaser-irstlm.perl line 64, 
CORPUS line 1.
Malformed UTF-8 character (fatal) at ./train-recaser-irstlm.perl line 70, 
CORPUS line 1.

creating for broken files: aligned.a, aligned.lowercased, aligned.cased and 
alinged.irstlm.gz in the directory /home/user/moses/work/recaser and a 
cased.ilm.lm file in the ROOT_SCRIPTS directory recaser.



./train-recaser-raw.perl 
-train-script 
/home/nexoc/mosestools/scripts-20111024-1127/training/train-model.perl 
-corpus /home/nexoc/moses/work/corpus/cased 
-dir /home/nexoc/moses/work/recaser 
-scripts-root-dir /home/nexoc/moses/mosestools/scripts-20111024-1127
(2) Train language model on cased data @ Sat Nov 12 14:46:36 CET 2011
/home/nexoc/moses/mosestools/irstlm-5.60.03/bin/build-lm.sh -t /tmp -i 
/home/nexoc/moses/work/corpus/cased -n 3 -o 
/home/nexoc/moses/work/recaser/cased.irstlm.gz
Collecting 1-gram counts
Computing n-gram probabilities:
Collecting 1-gram counts
Computing n-gram probabilities:
Collecting 1-gram counts
Computing n-gram probabilities:
Cleaning temporary directory /tmp
Extracting dictionary from training corpus
Splitting dictionary into 3 lists
Extracting n-gram statistics for each word list
Important: dictionary must be ordered according to order of appearance of words 
in data
used to generate n-gram blocks,  so that sub language model blocks results 
ordered too
dict.000
dict.001
dict.002
Estimating language models for each word list
dict.000
dict.001
dict.002
Merging language models into /home/nexoc/moses/work/recaser/cased.irstlm.gz
Cleaning temporary directory /tmp
Removing temporary directory /tmp

(3) Preparing data for training recasing model @ Sat Nov 12 14:49:05 CET 2011

[Moses-support] Pre- and post-processing of corpus files: Alignment

2011-10-26 Thread Daniel Schaut
Hi all,

I've got two quick questions regarding the data structure of a prepared
parallel corpus before and after an alignment process. I'm a bit confused on
my side here regarding the term alignment and how the data structure should
be organized accordingly to call train-model.perl. I'll put an example of my
pre-processed corpus (without markup, limited char count, sentence-splitted,
lowercased and tokenized) to illustrate my situation:

http://www.statmt.org/moses/?n=FactoredTraining.PrepareTraining reads
Training data has to be provided sentence aligned (one sentence per line),
in two files, one for the foreign sentences, one for the English sentences.

followed by an example that looks like example A.

Example A: Data structure of a sentence-splitted corpus
File srcFile tgt
abc def ghi , jkl mno pqr . abc def ghi , jkl mno pqr . 
dfg fgd dfdf kuki i.fgfdg fgfg zuz ycvb .   
trtrt jjkhkj uzu dhfg jgjgfj .  Fbfgjgj gjhgjg jkhkh hkjl . 
.   .   

That's perfectly clear, but when continuing reading, I stumbled over

http://www.statmt.org/moses/?n=FactoredTraining.PrepareData which reads
The sentence-aligned corpus now looks like this:

followed by an example that is similar to example B.

Example B: Data scructure of a sentence-aligned file

Aligned file
SEN ID 1
23 343 4343 34343 3434 12   
656 65654 3243 565 12   
SEN ID 2
454 5656 89898 5454 12  
435325 5646 878 12  

Furthermore, section Sentence splitter of README downloaded from
www.statmt.org/europarl/v6/tools.tgz reads
Uses punctuation and Capitalization clues to split paragraphs of 
sentences into files with one sentence per line. For example:

This is a paragraph. It contains several sentences. But why, you ask?

goes to:

This is a paragraph.
It contains several sentences.
But why, you ask?

To conclude, . sentence aligned (one sentence per line), in two files,.
refers to another concept, namely, sentence splitting??? So, when speaking
of aligning a corpus at sentence level in order to a train a translation
model with train-model.perl; are you referring to sentence splitting (data
structure of example A) or actual alignment at sentence level (example B)???

Thanks a lot,

Daniel

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Support for new users: Software packages

2011-10-25 Thread Daniel Schaut
Hi all,

Since I'm a quite new user to Linux and to Moses, I needed some time to
gather dev tools and software packages to set up the decoder or external
tools.
These are the software packages I installed on my Ubuntu system so far. Some
are mentioned in the manual, others are not. Note that pre-installed
packages may vary from distribution to distribution.

CPP
GCC
G++
TCL
TLCX
TK
BSH
TCSH
CSH
GAWK
AUTOTOOLS (LIBTOOL, AUTOMAKE, AUTOCONF, GNULIB)
GIT
CPAN
PERL
CVS
XML-RPC
WGET
BOOST
OPEN/SUN JDK
LIBTOOLS
BISON
PYTHON
XETEX
GNUPLOT
GV
GHOSTSCRIPT

Please note that some of them are required where others are optional,
depending on the tools you use. This list isn't complete at all, though,
I'll update the list from time to time when progressing further. Perhaps,
I'll even categorize the packages according to their use at a later state.
Corrections, amendments and additions are always very welcomed.

I hope this list might be helpful for other beginners. :-)

Best,
Dan


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Translating sample model with KenLM: Terminate called after throwing an instance of 'util::ErrnoException'

2011-10-13 Thread Daniel Schaut
Hi Kenneth,

Thanks for your quick reply. I moved the files; Moses translated the sample
model accordingly.

Many thanks and best,
Dan

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Kenneth Heafield
Gesendet: Mittwoch, 12. Oktober 2011 19:39
An: moses-support@mit.edu
Betreff: Re: [Moses-support] Translating sample model with KenLM: Terminate
called after throwing an instance of 'util::ErrnoException'

Hi,

Try running Moses from ~/moses/mosesdecoder/sample-models . 

Kenneth

On 10/12/11 18:24, Daniel Schaut wrote:
 Hi all,

 I'm a new user to Moses and received the following error message while 
 trying to translate the sample model:

 user@user-desktop:~/moses/mosesdecoder/sample-models/phrase-model$
 /home/user/moses/mosesdecoder/moses-cmd/src/moses -f moses.ini  in  
 out Defined parameters (per moses.ini or switch):
   config: moses.ini 
   input-factors: 0 
   lmodel-file: 8 0 3 lm/europarl.srilm.gz 
   mapping: T 0 
   n-best-list: nbest.txt 100 
   ttable-file: 0 0 0 1 phrase-table 
   ttable-limit: 10 
   weight-d: 1 
   weight-l: 1 
   weight-t: 1 
   weight-w: 0
 Loading lexical distortion models...have 0 models Start loading 
 LanguageModel lm/europarl.srilm.gz : [0.000] seconds terminate called 
 after throwing an instance of 'util::ErrnoException'
   what():  util/file.cc:33 in int util::OpenReadOrThrow(const char*) 
 threw ErrnoException because `-1 == (ret = open(name, O_RDONLY))'.
 No such file or directory while opening lm/europarl.srilm.gz Aborted

 I followed the Step-by-Step Guide on the internet, checked out Moses 
 (Rev 4339), built it accordingly and configured it using KenLM. 
 Furthermore, I've already searched the Moses-Support Archives for this 
 type of error, but I could't find an answer to this problem.

 Could you please give a hint on how to solve this issue? If further 
 information about my system is needed, I'll be happy to provide it.

 Many thanks in advance and best,
 Dan


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Translating sample model with KenLM: Terminate called after throwing an instance of 'util::ErrnoException'

2011-10-12 Thread Daniel Schaut
Hi all,

I'm a new user to Moses and received the following error message while
trying to translate the sample model:

user@user-desktop:~/moses/mosesdecoder/sample-models/phrase-model$
/home/user/moses/mosesdecoder/moses-cmd/src/moses -f moses.ini  in  out
Defined parameters (per moses.ini or switch):
config: moses.ini 
input-factors: 0 
lmodel-file: 8 0 3 lm/europarl.srilm.gz 
mapping: T 0 
n-best-list: nbest.txt 100 
ttable-file: 0 0 0 1 phrase-table 
ttable-limit: 10 
weight-d: 1 
weight-l: 1 
weight-t: 1 
weight-w: 0 
Loading lexical distortion models...have 0 models
Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds
terminate called after throwing an instance of 'util::ErrnoException'
  what():  util/file.cc:33 in int util::OpenReadOrThrow(const char*) threw
ErrnoException because `-1 == (ret = open(name, O_RDONLY))'.
No such file or directory while opening lm/europarl.srilm.gz
Aborted

I followed the Step-by-Step Guide on the internet, checked out Moses (Rev
4339), built it accordingly and configured it using KenLM. Furthermore, I've
already searched the Moses-Support Archives for this type of error, but I
could't find an answer to this problem.

Could you please give a hint on how to solve this issue? If further
information about my system is needed, I'll be happy to provide it.

Many thanks in advance and best,
Dan


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support