[Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Sudip Datta
Hi,

I am a noob at using Moses and have been trying to build a model and then
use the decoder to translate test sentences. I used the following command
for training:

* train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ --corpus
/cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*

The process ended cleanly with the following moses.ini file:

*# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

# translation tables: table type (hierarchical(0), textual (0), binary (1)),
source-factors, target-factors, number of scores, file
# OLD FORMAT is still handled for back-compatibility
# OLD FORMAT translation tables: source-factors, target-factors, number of
scores, file
# OLD FORMAT a binary table type (1) is assumed
[ttable-file]
0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz

# no generation models, no generation-file section

# language models: type(srilm/irstlm), factors, order, file
[lmodel-file]
1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz


# limit on how many phrase translations e for each phrase f are loaded
# 0 = all elements loaded
[ttable-limit]
20

# distortion (reordering) weight
[weight-d]
0.6

# language model weights
[weight-l]
0.5000


# translation model weights
[weight-t]
0.2
0.2
0.2
0.2
0.2

# no generation models, no weight-generation section

# word penalty
[weight-w]
-1

[distortion-limit]
6*

But the decoding step ends with a segfault with following output for -v 3:

*Defined parameters (per moses.ini or switch):
config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5
/cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
le.gz
ttable-limit: 20
verbose: 100
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.2 0.2 0.2 0.2 0.2
weight-w: -1
input type is: text input
Loading lexical distortion models...have 0 models
Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : [0.000]
secon
ds
In LanguageModelIRST::Load: nGramOrder = 2
Loading LM file (no MAP)
iARPA
loadtxt()
1-grams: reading 3195 entries
2-grams: reading 13313 entries
3-grams: reading 20399 entries
done
OOV code is 3194
OOV code is 3194
IRST: m_unknownId=3194
creating cache for storing prob, state and statesize of ngrams
Finished loading LanguageModels : [1.000] seconds
About to LoadPhraseTables
Start loading PhraseTable
/cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
gz : [1.000] seconds
filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
using standard phrase tables
PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
Finished loading phrase tables : [1.000] seconds
IO from STDOUT/STDIN
Created input-output object : [1.000] seconds
The score component vector looks like this:
Distortion
WordPenalty
!UnknownWordPenalty
LM_2gram
PhraseModel_1
PhraseModel_2
PhraseModel_3
PhraseModel_4
PhraseModel_5
Stateless: 1Stateful: 2
The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
0.200
0
.200 0.200 0.200
Translating: istuntokauden uudelleenavaaminen

DecodeStep():
outputFactors=FactorMask<0>
conflictFactors=FactorMask<>
newOutputFactors=FactorMask<0>
Translation Option Collection

   Total translation options: 2
Total translation options pruned: 0
translation options spanning from  0 to 0 is 1
translation options spanning from  0 to 1 is 0
translation options spanning from  1 to 1 is 1
translation options generated in total: 2
future cost from 0 to 0 is -100.136
future cost from 0 to 1 is -200.271
future cost from 1 to 1 is -100.136
Collecting options took 0.000 seconds
added hyp to stack, best on stack, now size 1
processing hypothesis from next stack

creating hypothesis 1 from 0 ( ... )
base score 0.000
covering 0-0: istuntokauden
translated as: istuntokauden|UNK|UNK|UNK
score -100.136 + future cost -100.136 = -200.271
unweighted feature scores: <<0.000, -1.000, -100.000, -2.271, 0.000,
0.0
00, 0.000, 0.000, 0.000>>
added hyp to stack, best on stack, now size 1
Segmentation fault (core dumped)*

The only suspicious thing I found in above is the message '*creating
hypothesis 1 from 0*', but neither I know if it is the actual problem and
why is it happening. I believe that problem is with the training step since
the samples models that I downloaded from
http://www.statmt.org/moses/download/sample-models.tgz work fine.

Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for
cleaning the decoder input. Looking at the archives, the closest message I
could find was http://thread.gmane.org/gmane.comp.nlp.moses.user/1478 but I
don't think I'm committing the same mistake as the author of that message.

I'll be delighted if anybody could provide any insights in this problem or
requires me

[Moses-support] Converting binary LM to arpa

2011-03-25 Thread Mehmet Tatlıcıoğlu
Hi,
I am sorry in advance if this is not the right place to ask this question.
Is there a way to convert the LM in binary format compiled by IRSTLM to ARPA
or iARPA?
Greets.

--
Mehmet Tatlıcıoğlu
Cell Phone: +90 532 201 85 64
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Converting binary LM to arpa

2011-03-25 Thread Mauro Cettolo
by using again the "compile-lm" of the IRSTLM toolkit:

compile-lm --text yes 

Mauro

Mehmet Tatlıcıoğlu wrote:
> Hi,
> I am sorry in advance if this is not the right place to ask this 
> question. Is there a way to convert the LM in binary format compiled 
> by IRSTLM to ARPA or iARPA?
> Greets.
>
> --
> Mehmet Tatlıcıoğlu
> Cell Phone: +90 532 201 85 64
> 
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>   


-- 
Mauro Cettolo
FBK - Ricerca Scientifica e Tecnologica
Via Sommarive 18
38123 Povo (Trento), Italy
Phone: (+39) 0461-314551
E-mail: cett...@fbk.eu
URL: http://hlt.fbk.eu/people/cettolo

E cuale esie la me Patrie? cent, centmil, nissune
parcè che par picjâ lis bandieris spes a si picjin i omis

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Suzy Howlett
I've been thinking about the issue of nondeterminism and am somewhat 
concerned because typically MT results/papers give just a single 
performance figure for each system. As there is an element of 
nondeterministic behaviour, it would seem prudent to run several repeats 
of each system and give mean and standard deviation information instead. 
Of course, this has a practicality trade-off, so an investigation is 
warranted to determine the scale of the problem. Is anyone interested in 
collaborating on a paper or CL squib to address the issue, and bring it 
to the attention of the MT community (and CL community at large)?

Suzy

On 25/03/11 11:58 AM, Tom Hoar wrote:
> We pick the random set from across the entire collection of documents.
> The documents are retrieved as the file system orders them (not
> alphabetically sorted). Your comment, "picked in consecutive order" is
> interesting. I've often wondered if the order could affect a system's
> performance. It's easy enough for me to randomize both the collection
> line order and the test set line order.
>
> The large variance in BLEU would normally be alarming, but this is on a
> very small sample corpus of only 40,000 lines. We use the sample corpus
> to validate the system installs properly. We haven't seen such large
> variations in multi-million pair corpora, but they do range 2-4 BLEU
> points.
>
> Tom
>
>
> -Original Message-
> *From*: Hieu Hoang  >
> *To*: moses-support@mit.edu 
> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
> config, different n-best lists
> *Date*: Thu, 24 Mar 2011 20:43:49 +
>
> There may be some systematic differences between the randomly choosen
> test sets, eg. the sentences are from the same documents 'cos they were
> picked in consecutive order from a multi-doc corpus. Otherwise, I'll be
> worried about such a large BLEU variation.
>
>
>
> also, see here on the evils of MERT
> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
>
>
> On 24/03/2011 16:06, Tom Hoar wrote:
>> We often run multiple trainings on the exact same bitext corpus but
>> pull different random samples for each run. We've observed drastically
>> different BLEU scores between different runs with BLEUs ranging from
>> 30 to 45. This is from exactly the same training data except for the
>> randomly-pulled tuning and evaluation sets. We've assumed this
>> difference is due to both the random differences in the sets, floating
>> point variations between various machines and not using
>> --predictable-seeds.
>>
>> Tom
>>
>>
>>
>> -Original Message-
>> *From*: Hieu Hoang > >
>> *Reply-to*: h...@hoang.co.uk 
>> *To*: John Burger > >
>> *Cc*: Moses-support > >
>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
>> config, different n-best lists
>> *Date*: Thu, 24 Mar 2011 15:51:48 +
>>
>> there's little differences in floating point between OS and gcc
>> versions. One of the regression test fails because of rounding errors,
>> depending on which machine you run it on. Other than truncating the
>> scores, there's not a lot we can do.
>>
>> The mert perl scripts also dabbles in the scores and that may be
>> another source of divergence
>>
>> On 24 March 2011 15:07, John Burger > > wrote:
>>
>> Lane Schwartz wrote:
>>
>> > I've examined the n-best lists, and it seems there are at least a
>> > couple of interesting cases. In the simplest case, several
>> > translations of a given sentence produce the exact same score, and
>> > these tied translations appear in different order during different
>>
>> > runs. This is a bit odd, but [not] terribly worrisome. The stranger
>> > case is when there are two different decoding runs, and for a given
>> > sentence, there are translations that appear only in run A, and
>> > different translations that only appear in run B.
>>
>>
>> Both these cases are relevant to something we've occasionally seen,
>> which is non-determinism during =tuning=. This is not surprising
>> given the above, since tuning of course involves decoding. It's hard
>> to reproduce, but we have sometimes seen very different weights coming
>> out of MERT for the exact same system configurations. The problem
>> here is that even very small differences in tuning can result in
>> substantial differences in test results, because of how twitchy
>> BLEU is.
>>
>> Like many folks, we typically run MERT on a cluster. This brings up
>> another source of non-determinism we've theorized about. Some of our
>> clusters are heterogenous, and we've wondered if there might be minor
>> differences in floating point behavior from machine to machine. 

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Barry Haddow
Hi Sudip

If you're using windows, then you should use the internal LM. See here:
http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
afaik this is still the case.

Also, there are a couple of odd things in your setup. Firstly, you've built a 
3-gram LM, but you're telling moses that it's 2-gram:
> [lmodel-file]
> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
This shouldn't matter, but just in case you're unaware.

Also, both the words in your input sentence are unknown. Did the phrase table 
build OK? Maybe you could use zless or zcat to extract and post the first few 
lines of it,

best regards - Barry

On Friday 25 March 2011 08:13, Sudip Datta wrote:
> Hi,
>
> I am a noob at using Moses and have been trying to build a model and then
> use the decoder to translate test sentences. I used the following command
> for training:
>
> * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ --corpus
> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
>
> The process ended cleanly with the following moses.ini file:
>
> *# input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: table type (hierarchical(0), textual (0), binary
> (1)), source-factors, target-factors, number of scores, file
> # OLD FORMAT is still handled for back-compatibility
> # OLD FORMAT translation tables: source-factors, target-factors, number of
> scores, file
> # OLD FORMAT a binary table type (1) is assumed
> [ttable-file]
> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 20
>
> # distortion (reordering) weight
> [weight-d]
> 0.6
>
> # language model weights
> [weight-l]
> 0.5000
>
>
> # translation model weights
> [weight-t]
> 0.2
> 0.2
> 0.2
> 0.2
> 0.2
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> -1
>
> [distortion-limit]
> 6*
>
> But the decoding step ends with a segfault with following output for -v 3:
>
> *Defined parameters (per moses.ini or switch):
> config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> distortion-limit: 6
> input-factors: 0
> lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> mapping: 0 T 0
> ttable-file: 0 0 0 5
> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> le.gz
> ttable-limit: 20
> verbose: 100
> weight-d: 0.6
> weight-l: 0.5000
> weight-t: 0.2 0.2 0.2 0.2 0.2
> weight-w: -1
> input type is: text input
> Loading lexical distortion models...have 0 models
> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : [0.000]
> secon
> ds
> In LanguageModelIRST::Load: nGramOrder = 2
> Loading LM file (no MAP)
> iARPA
> loadtxt()
> 1-grams: reading 3195 entries
> 2-grams: reading 13313 entries
> 3-grams: reading 20399 entries
> done
> OOV code is 3194
> OOV code is 3194
> IRST: m_unknownId=3194
> creating cache for storing prob, state and statesize of ngrams
> Finished loading LanguageModels : [1.000] seconds
> About to LoadPhraseTables
> Start loading PhraseTable
> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> gz : [1.000] seconds
> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> using standard phrase tables
> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
> Finished loading phrase tables : [1.000] seconds
> IO from STDOUT/STDIN
> Created input-output object : [1.000] seconds
> The score component vector looks like this:
> Distortion
> WordPenalty
> !UnknownWordPenalty
> LM_2gram
> PhraseModel_1
> PhraseModel_2
> PhraseModel_3
> PhraseModel_4
> PhraseModel_5
> Stateless: 1Stateful: 2
> The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
> 0.200
> 0
> .200 0.200 0.200
> Translating: istuntokauden uudelleenavaaminen
>
> DecodeStep():
> outputFactors=FactorMask<0>
> conflictFactors=FactorMask<>
> newOutputFactors=FactorMask<0>
> Translation Option Collection
>
>Total translation options: 2
> Total translation options pruned: 0
> translation options spanning from  0 to 0 is 1
> translation options spanning from  0 to 1 is 0
> translation options spanning from  1 to 1 is 1
> translation options generated in total: 2
> future cost from 0 to 0 is -100.136
> future cost from 0 to 1 is -200.271
> future cost from 1 to 1 is -100.136
> Collecting options took 0.000 seconds
> added hyp to stack, best on stack, now size 1
> processing hypothesis from next stack
>
> creating hypothesis 1 from 0 ( ... )
> base score 0.000
> covering 0-0: istuntokauden
> translated as: istuntokauden|UNK|UNK|UNK
> score -100.

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Sudip Datta
Hi Barry,

Thanks a lot for the response.

On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow  wrote:

> Hi Sudip
>
> If you're using windows, then you should use the internal LM. See here:
> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> afaik this is still the case.
>

I am using Windows only because I've been forced to :(. Shouldn't using it
on Cygwin work the same way as any other Linux distro?


>
> Also, there are a couple of odd things in your setup. Firstly, you've built
> a
> 3-gram LM, but you're telling moses that it's 2-gram:
> > [lmodel-file]
> > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> This shouldn't matter, but just in case you're unaware.
>
>
All the while I was confused what order represents in --lm
factor:order:filename:type. Thanks for pointing it out that it is the
n-gram.

Also, both the words in your input sentence are unknown. Did the phrase
> table
> build OK? Maybe you could use zless or zcat to extract and post the first
> few
> lines of it,
>
>
The phrase table looks ok. Here are the first few lines:

( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1
0.010962 2.718 ||| ||| 1 1
( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| |||
1 1
( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within the
United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1
( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations
||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1

The missed terms could be because the training data is very small - I was
just trying to get started. I don't think missing terms could result in the
process crashing, since there can always be missing terms even with a large
training data.


> best regards - Barry
>

Thanks and regards,

--Sudip.

>
> On Friday 25 March 2011 08:13, Sudip Datta wrote:
> > Hi,
> >
> > I am a noob at using Moses and have been trying to build a model and then
> > use the decoder to translate test sentences. I used the following command
> > for training:
> >
> > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> --corpus
> > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> >
> > The process ended cleanly with the following moses.ini file:
> >
> > *# input factors
> > [input-factors]
> > 0
> >
> > # mapping steps
> > [mapping]
> > 0 T 0
> >
> > # translation tables: table type (hierarchical(0), textual (0), binary
> > (1)), source-factors, target-factors, number of scores, file
> > # OLD FORMAT is still handled for back-compatibility
> > # OLD FORMAT translation tables: source-factors, target-factors, number
> of
> > scores, file
> > # OLD FORMAT a binary table type (1) is assumed
> > [ttable-file]
> > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >
> > # no generation models, no generation-file section
> >
> > # language models: type(srilm/irstlm), factors, order, file
> > [lmodel-file]
> > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >
> >
> > # limit on how many phrase translations e for each phrase f are loaded
> > # 0 = all elements loaded
> > [ttable-limit]
> > 20
> >
> > # distortion (reordering) weight
> > [weight-d]
> > 0.6
> >
> > # language model weights
> > [weight-l]
> > 0.5000
> >
> >
> > # translation model weights
> > [weight-t]
> > 0.2
> > 0.2
> > 0.2
> > 0.2
> > 0.2
> >
> > # no generation models, no weight-generation section
> >
> > # word penalty
> > [weight-w]
> > -1
> >
> > [distortion-limit]
> > 6*
> >
> > But the decoding step ends with a segfault with following output for -v
> 3:
> >
> > *Defined parameters (per moses.ini or switch):
> > config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> > distortion-limit: 6
> > input-factors: 0
> > lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > mapping: 0 T 0
> > ttable-file: 0 0 0 5
> > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> > le.gz
> > ttable-limit: 20
> > verbose: 100
> > weight-d: 0.6
> > weight-l: 0.5000
> > weight-t: 0.2 0.2 0.2 0.2 0.2
> > weight-w: -1
> > input type is: text input
> > Loading lexical distortion models...have 0 models
> > Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz :
> [0.000]
> > secon
> > ds
> > In LanguageModelIRST::Load: nGramOrder = 2
> > Loading LM file (no MAP)
> > iARPA
> > loadtxt()
> > 1-grams: reading 3195 entries
> > 2-grams: reading 13313 entries
> > 3-grams: reading 20399 entries
> > done
> > OOV code is 3194
> > OOV code is 3194
> > IRST: m_unknownId=3194
> > creating cache for storing prob, state and statesize of ngrams
> > Finished loading LanguageModels : [1.000] seconds
> > About to LoadPhraseTables
> > Start loading PhraseTable
> > /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> > gz : [1.000] seconds
> > filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> > using standard phrase tables
>

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Hieu Hoang
If you've compiled with gcc in cygwin, you can use any lm. The
stipulation of using only the internal lm only applies Iif you use
visual studio.

However, I would personally use srilm to start with as I'm not sure if
the other lm are fully tested on cygwin

Hieu
Sent from my flying horse

On 25 Mar 2011, at 10:06 AM, Barry Haddow  wrote:

> Hi Sudip
>
> If you're using windows, then you should use the internal LM. See here:
> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> afaik this is still the case.
>
> Also, there are a couple of odd things in your setup. Firstly, you've built a
> 3-gram LM, but you're telling moses that it's 2-gram:
>> [lmodel-file]
>> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> This shouldn't matter, but just in case you're unaware.
>
> Also, both the words in your input sentence are unknown. Did the phrase table
> build OK? Maybe you could use zless or zcat to extract and post the first few
> lines of it,
>
> best regards - Barry
>
> On Friday 25 March 2011 08:13, Sudip Datta wrote:
>> Hi,
>>
>> I am a noob at using Moses and have been trying to build a model and then
>> use the decoder to translate test sentences. I used the following command
>> for training:
>>
>> * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ --corpus
>> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
>> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
>>
>> The process ended cleanly with the following moses.ini file:
>>
>> *# input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>>
>> # translation tables: table type (hierarchical(0), textual (0), binary
>> (1)), source-factors, target-factors, number of scores, file
>> # OLD FORMAT is still handled for back-compatibility
>> # OLD FORMAT translation tables: source-factors, target-factors, number of
>> scores, file
>> # OLD FORMAT a binary table type (1) is assumed
>> [ttable-file]
>> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>>
>> # no generation models, no generation-file section
>>
>> # language models: type(srilm/irstlm), factors, order, file
>> [lmodel-file]
>> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>>
>>
>> # limit on how many phrase translations e for each phrase f are loaded
>> # 0 = all elements loaded
>> [ttable-limit]
>> 20
>>
>> # distortion (reordering) weight
>> [weight-d]
>> 0.6
>>
>> # language model weights
>> [weight-l]
>> 0.5000
>>
>>
>> # translation model weights
>> [weight-t]
>> 0.2
>> 0.2
>> 0.2
>> 0.2
>> 0.2
>>
>> # no generation models, no weight-generation section
>>
>> # word penalty
>> [weight-w]
>> -1
>>
>> [distortion-limit]
>> 6*
>>
>> But the decoding step ends with a segfault with following output for -v 3:
>>
>> *Defined parameters (per moses.ini or switch):
>>config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
>>distortion-limit: 6
>>input-factors: 0
>>lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>>mapping: 0 T 0
>>ttable-file: 0 0 0 5
>> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
>> le.gz
>>ttable-limit: 20
>>verbose: 100
>>weight-d: 0.6
>>weight-l: 0.5000
>>weight-t: 0.2 0.2 0.2 0.2 0.2
>>weight-w: -1
>> input type is: text input
>> Loading lexical distortion models...have 0 models
>> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : [0.000]
>> secon
>> ds
>> In LanguageModelIRST::Load: nGramOrder = 2
>> Loading LM file (no MAP)
>> iARPA
>> loadtxt()
>> 1-grams: reading 3195 entries
>> 2-grams: reading 13313 entries
>> 3-grams: reading 20399 entries
>> done
>> OOV code is 3194
>> OOV code is 3194
>> IRST: m_unknownId=3194
>> creating cache for storing prob, state and statesize of ngrams
>> Finished loading LanguageModels : [1.000] seconds
>> About to LoadPhraseTables
>> Start loading PhraseTable
>> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
>> gz : [1.000] seconds
>> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>> using standard phrase tables
>> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
>> Finished loading phrase tables : [1.000] seconds
>> IO from STDOUT/STDIN
>> Created input-output object : [1.000] seconds
>> The score component vector looks like this:
>> Distortion
>> WordPenalty
>> !UnknownWordPenalty
>> LM_2gram
>> PhraseModel_1
>> PhraseModel_2
>> PhraseModel_3
>> PhraseModel_4
>> PhraseModel_5
>> Stateless: 1Stateful: 2
>> The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
>> 0.200
>> 0
>> .200 0.200 0.200
>> Translating: istuntokauden uudelleenavaaminen
>>
>> DecodeStep():
>>outputFactors=FactorMask<0>
>>conflictFactors=FactorMask<>
>>newOutputFactors=FactorMask<0>
>> Translation Option Collection
>>
>>   Total translation options: 2
>> Total translation options pruned: 0
>> translation options spanning from  0 to 0 is 1
>> translation options spanning from  0 to 1 is 0
>> translation option

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Barry Haddow
Hi Sudip

You're phrase table looks fine, assuming that all the 'CEN' tokens are 
supposed to be there. Unknown words don't  cause moses to crash, but I 
thought they might be symptomatic of some other problem with your phrase 
table. 

Unfortunately, I think you'll either have to switch to linux (can you use a 
live distro, if you can't install one?) or use the internal LM,

best regards - Barry

On Friday 25 March 2011 10:15, Sudip Datta wrote:
> Hi Barry,
>
> Thanks a lot for the response.
>
> On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow  wrote:
> > Hi Sudip
> >
> > If you're using windows, then you should use the internal LM. See here:
> > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> > afaik this is still the case.
>
> I am using Windows only because I've been forced to :(. Shouldn't using it
> on Cygwin work the same way as any other Linux distro?
>
> > Also, there are a couple of odd things in your setup. Firstly, you've
> > built a
> >
> > 3-gram LM, but you're telling moses that it's 2-gram:
> > > [lmodel-file]
> > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >
> > This shouldn't matter, but just in case you're unaware.
>
> All the while I was confused what order represents in --lm
> factor:order:filename:type. Thanks for pointing it out that it is the
> n-gram.
>
> Also, both the words in your input sentence are unknown. Did the phrase
>
> > table
> > build OK? Maybe you could use zless or zcat to extract and post the first
> > few
> > lines of it,
>
> The phrase table looks ok. Here are the first few lines:
>
> ( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1
> 0.010962 2.718 ||| ||| 1 1
> ( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| |||
> 1 1
> ( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within
> the United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1
> ( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations
>
> ||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1
>
> The missed terms could be because the training data is very small - I was
> just trying to get started. I don't think missing terms could result in the
> process crashing, since there can always be missing terms even with a large
> training data.
>
> > best regards - Barry
>
> Thanks and regards,
>
> --Sudip.
>
> > On Friday 25 March 2011 08:13, Sudip Datta wrote:
> > > Hi,
> > >
> > > I am a noob at using Moses and have been trying to build a model and
> > > then use the decoder to translate test sentences. I used the following
> > > command for training:
> > >
> > > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> >
> > --corpus
> >
> > > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> > > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> > >
> > > The process ended cleanly with the following moses.ini file:
> > >
> > > *# input factors
> > > [input-factors]
> > > 0
> > >
> > > # mapping steps
> > > [mapping]
> > > 0 T 0
> > >
> > > # translation tables: table type (hierarchical(0), textual (0), binary
> > > (1)), source-factors, target-factors, number of scores, file
> > > # OLD FORMAT is still handled for back-compatibility
> > > # OLD FORMAT translation tables: source-factors, target-factors, number
> >
> > of
> >
> > > scores, file
> > > # OLD FORMAT a binary table type (1) is assumed
> > > [ttable-file]
> > > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> > >
> > > # no generation models, no generation-file section
> > >
> > > # language models: type(srilm/irstlm), factors, order, file
> > > [lmodel-file]
> > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > >
> > >
> > > # limit on how many phrase translations e for each phrase f are loaded
> > > # 0 = all elements loaded
> > > [ttable-limit]
> > > 20
> > >
> > > # distortion (reordering) weight
> > > [weight-d]
> > > 0.6
> > >
> > > # language model weights
> > > [weight-l]
> > > 0.5000
> > >
> > >
> > > # translation model weights
> > > [weight-t]
> > > 0.2
> > > 0.2
> > > 0.2
> > > 0.2
> > > 0.2
> > >
> > > # no generation models, no weight-generation section
> > >
> > > # word penalty
> > > [weight-w]
> > > -1
> > >
> > > [distortion-limit]
> > > 6*
> > >
> > > But the decoding step ends with a segfault with following output for -v
> >
> > 3:
> > > *Defined parameters (per moses.ini or switch):
> > > config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> > > distortion-limit: 6
> > > input-factors: 0
> > > lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > > mapping: 0 T 0
> > > ttable-file: 0 0 0 5
> > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> > > le.gz
> > > ttable-limit: 20
> > > verbose: 100
> > > weight-d: 0.6
> > > weight-l: 0.5000
> > > weight-t: 0.2 0.2 0.2 0.2 0.2
> > > weight-w: -1
> > > input type is: text input
> > > Loading lexical distortion models...have 0 models
> > > Start loadi

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Barry Haddow

> Unfortunately, I think you'll either have to switch to linux (can you use a
> live distro, if you can't install one?) or use the internal LM,

Ok, so Hieu has corrected me here. I would suggest trying with srilm though, 
or use kenlm if licensing restrictions prevent you using srilm. If it still 
crashes, then could you post a stack trace?

best regards - Barry

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne
this is something that I have been concerned about for a long time
now.  and things are actually worse than this, since often only a
single language pair / test set / training set is used.  claims cannot
be made on the basis of such shaky evidence,

Miles

On 25 March 2011 09:42, Suzy Howlett  wrote:
> I've been thinking about the issue of nondeterminism and am somewhat
> concerned because typically MT results/papers give just a single
> performance figure for each system. As there is an element of
> nondeterministic behaviour, it would seem prudent to run several repeats
> of each system and give mean and standard deviation information instead.
> Of course, this has a practicality trade-off, so an investigation is
> warranted to determine the scale of the problem. Is anyone interested in
> collaborating on a paper or CL squib to address the issue, and bring it
> to the attention of the MT community (and CL community at large)?
>
> Suzy
>
> On 25/03/11 11:58 AM, Tom Hoar wrote:
>> We pick the random set from across the entire collection of documents.
>> The documents are retrieved as the file system orders them (not
>> alphabetically sorted). Your comment, "picked in consecutive order" is
>> interesting. I've often wondered if the order could affect a system's
>> performance. It's easy enough for me to randomize both the collection
>> line order and the test set line order.
>>
>> The large variance in BLEU would normally be alarming, but this is on a
>> very small sample corpus of only 40,000 lines. We use the sample corpus
>> to validate the system installs properly. We haven't seen such large
>> variations in multi-million pair corpora, but they do range 2-4 BLEU
>> points.
>>
>> Tom
>>
>>
>> -Original Message-
>> *From*: Hieu Hoang > >
>> *To*: moses-support@mit.edu 
>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
>> config, different n-best lists
>> *Date*: Thu, 24 Mar 2011 20:43:49 +
>>
>> There may be some systematic differences between the randomly choosen
>> test sets, eg. the sentences are from the same documents 'cos they were
>> picked in consecutive order from a multi-doc corpus. Otherwise, I'll be
>> worried about such a large BLEU variation.
>>
>>
>>
>> also, see here on the evils of MERT
>> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
>>
>>
>> On 24/03/2011 16:06, Tom Hoar wrote:
>>> We often run multiple trainings on the exact same bitext corpus but
>>> pull different random samples for each run. We've observed drastically
>>> different BLEU scores between different runs with BLEUs ranging from
>>> 30 to 45. This is from exactly the same training data except for the
>>> randomly-pulled tuning and evaluation sets. We've assumed this
>>> difference is due to both the random differences in the sets, floating
>>> point variations between various machines and not using
>>> --predictable-seeds.
>>>
>>> Tom
>>>
>>>
>>>
>>> -Original Message-
>>> *From*: Hieu Hoang >> >
>>> *Reply-to*: h...@hoang.co.uk 
>>> *To*: John Burger >> >
>>> *Cc*: Moses-support >> >
>>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
>>> config, different n-best lists
>>> *Date*: Thu, 24 Mar 2011 15:51:48 +
>>>
>>> there's little differences in floating point between OS and gcc
>>> versions. One of the regression test fails because of rounding errors,
>>> depending on which machine you run it on. Other than truncating the
>>> scores, there's not a lot we can do.
>>>
>>> The mert perl scripts also dabbles in the scores and that may be
>>> another source of divergence
>>>
>>> On 24 March 2011 15:07, John Burger >> > wrote:
>>>
>>>     Lane Schwartz wrote:
>>>
>>>     > I've examined the n-best lists, and it seems there are at least a
>>>     > couple of interesting cases. In the simplest case, several
>>>     > translations of a given sentence produce the exact same score, and
>>>     > these tied translations appear in different order during different
>>>
>>>     > runs. This is a bit odd, but [not] terribly worrisome. The stranger
>>>     > case is when there are two different decoding runs, and for a given
>>>     > sentence, there are translations that appear only in run A, and
>>>     > different translations that only appear in run B.
>>>
>>>
>>>     Both these cases are relevant to something we've occasionally seen,
>>>     which is non-determinism during =tuning=. This is not surprising
>>>     given the above, since tuning of course involves decoding. It's hard
>>>     to reproduce, but we have sometimes seen very different weights coming
>>>     out of MERT for the exact same system configurations. The problem
>>>     here is that even very small differences in t

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Sudip Datta
I've used gcc in cygwin to compile both Moses and IRSTLM. But as you and
Barry pointed out I'll try to use kenlm (can't use srilm due to licensing
restrictions) and if that doesn't work try srilm at my college.

I think the segfault occurs in Hypothesis.cpp at:

*m_ffStates[i] = ffs[i]->Evaluate(
*this,
m_prevHypo ? m_prevHypo->m_ffStates[i] : NULL,
&m_scoreBreakdown);

*May be, it gives some clue in identifying the issue.

Thanks again  --Sudip.

On Fri, Mar 25, 2011 at 3:51 PM, Hieu Hoang  wrote:

> If you've compiled with gcc in cygwin, you can use any lm. The
> stipulation of using only the internal lm only applies Iif you use
> visual studio.
>
> However, I would personally use srilm to start with as I'm not sure if
> the other lm are fully tested on cygwin
>
> Hieu
> Sent from my flying horse
>
> On 25 Mar 2011, at 10:06 AM, Barry Haddow  wrote:
>
> > Hi Sudip
> >
> > If you're using windows, then you should use the internal LM. See here:
> > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> > afaik this is still the case.
> >
> > Also, there are a couple of odd things in your setup. Firstly, you've
> built a
> > 3-gram LM, but you're telling moses that it's 2-gram:
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > This shouldn't matter, but just in case you're unaware.
> >
> > Also, both the words in your input sentence are unknown. Did the phrase
> table
> > build OK? Maybe you could use zless or zcat to extract and post the first
> few
> > lines of it,
> >
> > best regards - Barry
> >
> > On Friday 25 March 2011 08:13, Sudip Datta wrote:
> >> Hi,
> >>
> >> I am a noob at using Moses and have been trying to build a model and
> then
> >> use the decoder to translate test sentences. I used the following
> command
> >> for training:
> >>
> >> * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> --corpus
> >> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> >> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> >>
> >> The process ended cleanly with the following moses.ini file:
> >>
> >> *# input factors
> >> [input-factors]
> >> 0
> >>
> >> # mapping steps
> >> [mapping]
> >> 0 T 0
> >>
> >> # translation tables: table type (hierarchical(0), textual (0), binary
> >> (1)), source-factors, target-factors, number of scores, file
> >> # OLD FORMAT is still handled for back-compatibility
> >> # OLD FORMAT translation tables: source-factors, target-factors, number
> of
> >> scores, file
> >> # OLD FORMAT a binary table type (1) is assumed
> >> [ttable-file]
> >> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >>
> >> # no generation models, no generation-file section
> >>
> >> # language models: type(srilm/irstlm), factors, order, file
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>
> >>
> >> # limit on how many phrase translations e for each phrase f are loaded
> >> # 0 = all elements loaded
> >> [ttable-limit]
> >> 20
> >>
> >> # distortion (reordering) weight
> >> [weight-d]
> >> 0.6
> >>
> >> # language model weights
> >> [weight-l]
> >> 0.5000
> >>
> >>
> >> # translation model weights
> >> [weight-t]
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >>
> >> # no generation models, no weight-generation section
> >>
> >> # word penalty
> >> [weight-w]
> >> -1
> >>
> >> [distortion-limit]
> >> 6*
> >>
> >> But the decoding step ends with a segfault with following output for -v
> 3:
> >>
> >> *Defined parameters (per moses.ini or switch):
> >>config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> >>distortion-limit: 6
> >>input-factors: 0
> >>lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>mapping: 0 T 0
> >>ttable-file: 0 0 0 5
> >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> >> le.gz
> >>ttable-limit: 20
> >>verbose: 100
> >>weight-d: 0.6
> >>weight-l: 0.5000
> >>weight-t: 0.2 0.2 0.2 0.2 0.2
> >>weight-w: -1
> >> input type is: text input
> >> Loading lexical distortion models...have 0 models
> >> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz :
> [0.000]
> >> secon
> >> ds
> >> In LanguageModelIRST::Load: nGramOrder = 2
> >> Loading LM file (no MAP)
> >> iARPA
> >> loadtxt()
> >> 1-grams: reading 3195 entries
> >> 2-grams: reading 13313 entries
> >> 3-grams: reading 20399 entries
> >> done
> >> OOV code is 3194
> >> OOV code is 3194
> >> IRST: m_unknownId=3194
> >> creating cache for storing prob, state and statesize of ngrams
> >> Finished loading LanguageModels : [1.000] seconds
> >> About to LoadPhraseTables
> >> Start loading PhraseTable
> >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> >> gz : [1.000] seconds
> >> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >> using standard phrase tables
> >> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
> >> Finished loading phrase tables : [1.000] se

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Barry Haddow
Hi

This is an issue which is not just faced by SMT, but probably by all research 
fields. Evidence from one paper doesn't generally prove or disprove that a 
technique works, you need to consider lots of evidence, from different 
workers in different labs. 

As a young field, SMT has its own problems in building up good experimental 
practices, which are not helped by the tendency to over-sell in research 
papers, and ignore the non-determinism in many parts of the pipeline. 
Non-reproducibility is also a problem, as much of the code used in papers is 
not released, and the complete list of settings required to rerun an 
experiment are rarely given. These problems have been acknowledged, and 
initiatives proposed to address them, but they're far from solved,

best regards - Barry

On Friday 25 March 2011 10:44, Miles Osborne wrote:
> this is something that I have been concerned about for a long time
> now.  and things are actually worse than this, since often only a
> single language pair / test set / training set is used.  claims cannot
> be made on the basis of such shaky evidence,
>
> Miles
>
> On 25 March 2011 09:42, Suzy Howlett  wrote:
> > I've been thinking about the issue of nondeterminism and am somewhat
> > concerned because typically MT results/papers give just a single
> > performance figure for each system. As there is an element of
> > nondeterministic behaviour, it would seem prudent to run several repeats
> > of each system and give mean and standard deviation information instead.
> > Of course, this has a practicality trade-off, so an investigation is
> > warranted to determine the scale of the problem. Is anyone interested in
> > collaborating on a paper or CL squib to address the issue, and bring it
> > to the attention of the MT community (and CL community at large)?
> >
> > Suzy
> >
> > On 25/03/11 11:58 AM, Tom Hoar wrote:
> >> We pick the random set from across the entire collection of documents.
> >> The documents are retrieved as the file system orders them (not
> >> alphabetically sorted). Your comment, "picked in consecutive order" is
> >> interesting. I've often wondered if the order could affect a system's
> >> performance. It's easy enough for me to randomize both the collection
> >> line order and the test set line order.
> >>
> >> The large variance in BLEU would normally be alarming, but this is on a
> >> very small sample corpus of only 40,000 lines. We use the sample corpus
> >> to validate the system installs properly. We haven't seen such large
> >> variations in multi-million pair corpora, but they do range 2-4 BLEU
> >> points.
> >>
> >> Tom
> >>
> >>
> >> -Original Message-
> >> *From*: Hieu Hoang  >> >
> >> *To*: moses-support@mit.edu 
> >> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
> >> config, different n-best lists
> >> *Date*: Thu, 24 Mar 2011 20:43:49 +
> >>
> >> There may be some systematic differences between the randomly choosen
> >> test sets, eg. the sentences are from the same documents 'cos they were
> >> picked in consecutive order from a multi-doc corpus. Otherwise, I'll be
> >> worried about such a large BLEU variation.
> >>
> >>
> >>
> >> also, see here on the evils of MERT
> >> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
> >>
> >> On 24/03/2011 16:06, Tom Hoar wrote:
> >>> We often run multiple trainings on the exact same bitext corpus but
> >>> pull different random samples for each run. We've observed drastically
> >>> different BLEU scores between different runs with BLEUs ranging from
> >>> 30 to 45. This is from exactly the same training data except for the
> >>> randomly-pulled tuning and evaluation sets. We've assumed this
> >>> difference is due to both the random differences in the sets, floating
> >>> point variations between various machines and not using
> >>> --predictable-seeds.
> >>>
> >>> Tom
> >>>
> >>>
> >>>
> >>> -Original Message-
> >>> *From*: Hieu Hoang  >>> >
> >>> *Reply-to*: h...@hoang.co.uk 
> >>> *To*: John Burger  >>> >
> >>> *Cc*: Moses-support  >>> >
> >>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
> >>> config, different n-best lists
> >>> *Date*: Thu, 24 Mar 2011 15:51:48 +
> >>>
> >>> there's little differences in floating point between OS and gcc
> >>> versions. One of the regression test fails because of rounding errors,
> >>> depending on which machine you run it on. Other than truncating the
> >>> scores, there's not a lot we can do.
> >>>
> >>> The mert perl scripts also dabbles in the scores and that may be
> >>> another source of divergence
> >>>
> >>> On 24 March 2011 15:07, John Burger  >>> > wrote:
> >>>
> >>>     Lane Schwartz wrote:
> >>>
> >>>

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Tom Hoar
I think it helps to acknowledge these known issues periodically in this
forum. The Moses community is growing/shifting all the time. Newcomers
might have new ideas and veterans keep on their horizon. 

For our use, this issue is not a major problem. First, BLEU scores are
(hopefully) an interim step to better evaluation methods. Second, the
distance from existing output to 100% usable, unedited output is greater
than the distance between the nondeterministic variations. 

I think efforts would be better invested in developing better methods
for tuning and evaluation (like meteor?).

Tom


-Original Message-
From: Barry Haddow 
To: moses-support@mit.edu
Subject: Re: [Moses-support] Nondeterminism during decoding: same
config, different n-best lists
Date: Fri, 25 Mar 2011 11:03:06 +


Hi

This is an issue which is not just faced by SMT, but probably by all research 
fields. Evidence from one paper doesn't generally prove or disprove that a 
technique works, you need to consider lots of evidence, from different 
workers in different labs. 

As a young field, SMT has its own problems in building up good experimental 
practices, which are not helped by the tendency to over-sell in research 
papers, and ignore the non-determinism in many parts of the pipeline. 
Non-reproducibility is also a problem, as much of the code used in papers is 
not released, and the complete list of settings required to rerun an 
experiment are rarely given. These problems have been acknowledged, and 
initiatives proposed to address them, but they're far from solved,

best regards - Barry

On Friday 25 March 2011 10:44, Miles Osborne wrote:
> this is something that I have been concerned about for a long time
> now.  and things are actually worse than this, since often only a
> single language pair / test set / training set is used.  claims cannot
> be made on the basis of such shaky evidence,
>
> Miles
>
> On 25 March 2011 09:42, Suzy Howlett  wrote:
> > I've been thinking about the issue of nondeterminism and am somewhat
> > concerned because typically MT results/papers give just a single
> > performance figure for each system. As there is an element of
> > nondeterministic behaviour, it would seem prudent to run several repeats
> > of each system and give mean and standard deviation information instead.
> > Of course, this has a practicality trade-off, so an investigation is
> > warranted to determine the scale of the problem. Is anyone interested in
> > collaborating on a paper or CL squib to address the issue, and bring it
> > to the attention of the MT community (and CL community at large)?
> >
> > Suzy
> >
> > On 25/03/11 11:58 AM, Tom Hoar wrote:
> >> We pick the random set from across the entire collection of documents.
> >> The documents are retrieved as the file system orders them (not
> >> alphabetically sorted). Your comment, "picked in consecutive order" is
> >> interesting. I've often wondered if the order could affect a system's
> >> performance. It's easy enough for me to randomize both the collection
> >> line order and the test set line order.
> >>
> >> The large variance in BLEU would normally be alarming, but this is on a
> >> very small sample corpus of only 40,000 lines. We use the sample corpus
> >> to validate the system installs properly. We haven't seen such large
> >> variations in multi-million pair corpora, but they do range 2-4 BLEU
> >> points.
> >>
> >> Tom
> >>
> >>
> >> -Original Message-
> >> *From*: Hieu Hoang  >> >
> >> *To*: moses-support@mit.edu 
> >> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
> >> config, different n-best lists
> >> *Date*: Thu, 24 Mar 2011 20:43:49 +
> >>
> >> There may be some systematic differences between the randomly choosen
> >> test sets, eg. the sentences are from the same documents 'cos they were
> >> picked in consecutive order from a multi-doc corpus. Otherwise, I'll be
> >> worried about such a large BLEU variation.
> >>
> >>
> >>
> >> also, see here on the evils of MERT
> >> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
> >>
> >> On 24/03/2011 16:06, Tom Hoar wrote:
> >>> We often run multiple trainings on the exact same bitext corpus but
> >>> pull different random samples for each run. We've observed drastically
> >>> different BLEU scores between different runs with BLEUs ranging from
> >>> 30 to 45. This is from exactly the same training data except for the
> >>> randomly-pulled tuning and evaluation sets. We've assumed this
> >>> difference is due to both the random differences in the sets, floating
> >>> point variations between various machines and not using
> >>> --predictable-seeds.
> >>>
> >>> Tom
> >>>
> >>>
> >>>
> >>> -Original Message-
> >>> *From*: Hieu Hoang  >>> >
> >>> *Reply-to*: h...@hoang.co.uk 
> >>> *To*: John Burger  >>

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Lane Schwartz
We know that there is nondeterminism during optimization, yet virtually all
papers report results based on a single MERT run. We know that results can
very widely based on language pair and data sets, but a large majority of
papers report results on a single language pair, and often for a single data
set.

While these issues are widely known at the informal level, I think that
Suzy's point is well taken. I think there would be value in published
studies showing just how wide the gap due to nondeterminism can be expected
to be. It may be that such studies already exist, and I'm just not aware of
them. Does anyone know of any?

Cheers,
Lane

On Fri, Mar 25, 2011 at 7:03 AM, Barry Haddow  wrote:

> Hi
>
> This is an issue which is not just faced by SMT, but probably by all
> research
> fields. Evidence from one paper doesn't generally prove or disprove that a
> technique works, you need to consider lots of evidence, from different
> workers in different labs.
>
> As a young field, SMT has its own problems in building up good experimental
> practices, which are not helped by the tendency to over-sell in research
> papers, and ignore the non-determinism in many parts of the pipeline.
> Non-reproducibility is also a problem, as much of the code used in papers
> is
> not released, and the complete list of settings required to rerun an
> experiment are rarely given. These problems have been acknowledged, and
> initiatives proposed to address them, but they're far from solved,
>
> best regards - Barry
>
> On Friday 25 March 2011 10:44, Miles Osborne wrote:
> > this is something that I have been concerned about for a long time
> > now.  and things are actually worse than this, since often only a
> > single language pair / test set / training set is used.  claims cannot
> > be made on the basis of such shaky evidence,
> >
> > Miles
> >
> > On 25 March 2011 09:42, Suzy Howlett  wrote:
> > > I've been thinking about the issue of nondeterminism and am somewhat
> > > concerned because typically MT results/papers give just a single
> > > performance figure for each system. As there is an element of
> > > nondeterministic behaviour, it would seem prudent to run several
> repeats
> > > of each system and give mean and standard deviation information
> instead.
> > > Of course, this has a practicality trade-off, so an investigation is
> > > warranted to determine the scale of the problem. Is anyone interested
> in
> > > collaborating on a paper or CL squib to address the issue, and bring it
> > > to the attention of the MT community (and CL community at large)?
> > >
> > > Suzy
> > >
> > > On 25/03/11 11:58 AM, Tom Hoar wrote:
> > >> We pick the random set from across the entire collection of documents.
> > >> The documents are retrieved as the file system orders them (not
> > >> alphabetically sorted). Your comment, "picked in consecutive order" is
> > >> interesting. I've often wondered if the order could affect a system's
> > >> performance. It's easy enough for me to randomize both the collection
> > >> line order and the test set line order.
> > >>
> > >> The large variance in BLEU would normally be alarming, but this is on
> a
> > >> very small sample corpus of only 40,000 lines. We use the sample
> corpus
> > >> to validate the system installs properly. We haven't seen such large
> > >> variations in multi-million pair corpora, but they do range 2-4 BLEU
> > >> points.
> > >>
> > >> Tom
> > >>
> > >>
> > >> -Original Message-
> > >> *From*: Hieu Hoang  > >> >
> > >> *To*: moses-support@mit.edu 
> > >> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
> > >> config, different n-best lists
> > >> *Date*: Thu, 24 Mar 2011 20:43:49 +
> > >>
> > >> There may be some systematic differences between the randomly choosen
> > >> test sets, eg. the sentences are from the same documents 'cos they
> were
> > >> picked in consecutive order from a multi-doc corpus. Otherwise, I'll
> be
> > >> worried about such a large BLEU variation.
> > >>
> > >>
> > >>
> > >> also, see here on the evils of MERT
> > >> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
> > >>
> > >> On 24/03/2011 16:06, Tom Hoar wrote:
> > >>> We often run multiple trainings on the exact same bitext corpus but
> > >>> pull different random samples for each run. We've observed
> drastically
> > >>> different BLEU scores between different runs with BLEUs ranging from
> > >>> 30 to 45. This is from exactly the same training data except for the
> > >>> randomly-pulled tuning and evaluation sets. We've assumed this
> > >>> difference is due to both the random differences in the sets,
> floating
> > >>> point variations between various machines and not using
> > >>> --predictable-seeds.
> > >>>
> > >>> Tom
> > >>>
> > >>>
> > >>>
> > >>> -Original Message-
> > >>> *From*: Hieu Hoang  > >>> >
> >

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne
There is work published on making mert more stable (on the train so can't
easily dig it up)

Miles

sent using Android

On 25 Mar 2011 12:49, "Lane Schwartz"  wrote:

We know that there is nondeterminism during optimization, yet virtually all
papers report results based on a single MERT run. We know that results can
very widely based on language pair and data sets, but a large majority of
papers report results on a single language pair, and often for a single data
set.

While these issues are widely known at the informal level, I think that
Suzy's point is well taken. I think there would be value in published
studies showing just how wide the gap due to nondeterminism can be expected
to be. It may be that such studies already exist, and I'm just not aware of
them. Does anyone know of any?

Cheers,
Lane

On Fri, Mar 25, 2011 at 7:03 AM, Barry Haddow  wrote:
>
> Hi
>
> This is an is...
-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Barry Haddow
This might be what Miles is referring too
http://www.statmt.org/wmt09/pdf/WMT-0939.pdf

There was some progress towards getting this into moses
http://lium3.univ-lemans.fr/mtmarathon2010/ProjectFinalPresentation/MERT/StabilizingMert.pdf

On Friday 25 March 2011 13:02, Miles Osborne wrote:
> There is work published on making mert more stable (on the train so can't
> easily dig it up)
>
> Miles
>
> sent using Android
>
> On 25 Mar 2011 12:49, "Lane Schwartz"  wrote:
>
> We know that there is nondeterminism during optimization, yet virtually all
> papers report results based on a single MERT run. We know that results can
> very widely based on language pair and data sets, but a large majority of
> papers report results on a single language pair, and often for a single
> data set.
>
> While these issues are widely known at the informal level, I think that
> Suzy's point is well taken. I think there would be value in published
> studies showing just how wide the gap due to nondeterminism can be expected
> to be. It may be that such studies already exist, and I'm just not aware of
> them. Does anyone know of any?
>
> Cheers,
> Lane
>
> On Fri, Mar 25, 2011 at 7:03 AM, Barry Haddow  wrote:
> > Hi
> >
> > This is an is...

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne
Yes, that is the one

Miles

sent using Android

On 25 Mar 2011 13:08, "Barry Haddow"  wrote:

This might be what Miles is referring too
http://www.statmt.org/wmt09/pdf/WMT-0939.pdf

There was some progress towards getting this into moses
http://lium3.univ-lemans.fr/mtmarathon2010/ProjectFinalPresentation/MERT/StabilizingMert.pdf


On Friday 25 March 2011 13:02, Miles Osborne wrote:
> There is work published on making mert more s...

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number S...
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Венцислав Жечев (Ventsislav Zhechev)
Here is what I think about this whole issue.

Wouldn’t it be much more beneficial to the whole community to fix the basic 
beam search issue in Moses that seems to be the root of the problem, rather 
than write papers investigating it? As far as I understand, the issue most 
likely arises from the fact that the beam width limit might cut off some 
hypotheses from a group with equal probability, while leaving some for further 
processing. The fix would include checking the beam border and including all 
hypotheses with the least permissible probability, regardless of the actual 
beam width. This might require some redesign of the data structures that hold 
the hypotheses (I’m writing this without actually inspecting the source code), 
but this cannot be an untenable task. Let’s say this takes 8–16 man-hours to 
code and another 8–16 to test. Any volunteers in academia?


Cheers,

Ventzi

–––
Dr. Ventsislav Zhechev
Computational Linguist

CMS & Language Technologies
Localisation Services
Autodesk Development Sàrl
Neuchâtel, Switzerland

http://VentsislavZhechev.eu
tel: +41 32 723 9122
fax: +41 32 723 9399


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Barry Haddow
Hi Ventzi

The discussion has moved on from the 'beam search' issue, which is not 'the 
root of the problem'. This issue is (afaik) a fairly minor issue which is 
highly unlikely to change anyone's bleu score, but can cause minor 
irritations when checking results. It's not that straightforward to fix as 
it's related to srilm, which we can't change, but using kenlm seems to help.

The wider issue - which stimulated the discussion about papers - is that 
results can be influenced by the use of stochastic algorithms like mert, as 
well as choice of data set, language pair, choice of metric etc, and many 
other arbitrary features of the experimental setup. Results can also be 
influenced by unexpected sources of non-determinism, such as the pointer 
issues above, and out-and-out bugs. These influences are often not 
acknowledged in MT papers.

best regards - Barry

On Friday 25 March 2011 13:18, Венцислав Жечев (Ventsislav Zhechev) wrote:
> Here is what I think about this whole issue.
>
> Wouldn’t it be much more beneficial to the whole community to fix the basic
> beam search issue in Moses that seems to be the root of the problem, rather
> than write papers investigating it? As far as I understand, the issue most
> likely arises from the fact that the beam width limit might cut off some
> hypotheses from a group with equal probability, while leaving some for
> further processing. The fix would include checking the beam border and
> including all hypotheses with the least permissible probability, regardless
> of the actual beam width. This might require some redesign of the data
> structures that hold the hypotheses (I’m writing this without actually
> inspecting the source code), but this cannot be an untenable task. Let’s
> say this takes 8–16 man-hours to code and another 8–16 to test. Any
> volunteers in academia?
>
>
> Cheers,
>
> Ventzi
>
> –––
> Dr. Ventsislav Zhechev
> Computational Linguist
>
> CMS & Language Technologies
> Localisation Services
> Autodesk Development Sàrl
> Neuchâtel, Switzerland
>
> http://VentsislavZhechev.eu
> tel: +41 32 723 9122
> fax: +41 32 723 9399
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] mosesserver -report-all-factors

2011-03-25 Thread Carlos Henriquez
Hi all.

I've been using Moses to translate from English to Spanish with a postprocess 
that requires the target POS of every translated word. For the standalone 
version this was easily achieved with the -report-all-factors flag and parsing 
the output accordingly.

However when we tried the same flag to launch mosesserver we realized it did 
not work. Basically because it is not implemented. I've change the source code 
mosesserver.cpp to check for this flag and send the output to the client with 
all factors if they are needed. I'm sending it in the "text" key, as if it were 
a standard 1-best translation. Maybe it is better to define a "factors" key or 
something to separate both results (like the "align" key), or even better (I 
just came up with this option) it could be asked by the client call with the 
proper parameter, just like "align".

I do not know if this represent some added value to the mosesserver 
application for someone else but if it does, I would love to commit it to the 
SVN repository.

Let me know.

Bye.___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Kenneth Heafield
I haven't tested kenlm on Cygwin, but it could work.  Can you run tests?

1) Install Boost.  Cygwin's package manager should provide it.

2) Run kenlm tests.

wget http://kheafield.com/code/kenlm.tar.gz
tar xzf kenlm.tar.gz
cd kenlm
./test.sh

On 03/25/11 06:44, Sudip Datta wrote:
> I've used gcc in cygwin to compile both Moses and IRSTLM. But as you and
> Barry pointed out I'll try to use kenlm (can't use srilm due to
> licensing restrictions) and if that doesn't work try srilm at my college.
> 
> I think the segfault occurs in Hypothesis.cpp at:
> 
> /m_ffStates[i] = ffs[i]->Evaluate(
> *this,
> m_prevHypo ? m_prevHypo->m_ffStates[i] : NULL,
> &m_scoreBreakdown);
> 
> /May be, it gives some clue in identifying the issue.
> 
> Thanks again  --Sudip.
> 
> On Fri, Mar 25, 2011 at 3:51 PM, Hieu Hoang  > wrote:
> 
> If you've compiled with gcc in cygwin, you can use any lm. The
> stipulation of using only the internal lm only applies Iif you use
> visual studio.
> 
> However, I would personally use srilm to start with as I'm not sure if
> the other lm are fully tested on cygwin
> 
> Hieu
> Sent from my flying horse
> 
> On 25 Mar 2011, at 10:06 AM, Barry Haddow  > wrote:
> 
> > Hi Sudip
> >
> > If you're using windows, then you should use the internal LM. See
> here:
> > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> > afaik this is still the case.
> >
> > Also, there are a couple of odd things in your setup. Firstly,
> you've built a
> > 3-gram LM, but you're telling moses that it's 2-gram:
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > This shouldn't matter, but just in case you're unaware.
> >
> > Also, both the words in your input sentence are unknown. Did the
> phrase table
> > build OK? Maybe you could use zless or zcat to extract and post
> the first few
> > lines of it,
> >
> > best regards - Barry
> >
> > On Friday 25 March 2011 08:13, Sudip Datta wrote:
> >> Hi,
> >>
> >> I am a noob at using Moses and have been trying to build a model
> and then
> >> use the decoder to translate test sentences. I used the following
> command
> >> for training:
> >>
> >> * train-model.perl --root-dir
> /cygdrive/d/moses/fi-en/**fienModel/ --corpus
> >> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> >> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> >>
> >> The process ended cleanly with the following moses.ini file:
> >>
> >> *# input factors
> >> [input-factors]
> >> 0
> >>
> >> # mapping steps
> >> [mapping]
> >> 0 T 0
> >>
> >> # translation tables: table type (hierarchical(0), textual (0),
> binary
> >> (1)), source-factors, target-factors, number of scores, file
> >> # OLD FORMAT is still handled for back-compatibility
> >> # OLD FORMAT translation tables: source-factors, target-factors,
> number of
> >> scores, file
> >> # OLD FORMAT a binary table type (1) is assumed
> >> [ttable-file]
> >> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >>
> >> # no generation models, no generation-file section
> >>
> >> # language models: type(srilm/irstlm), factors, order, file
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>
> >>
> >> # limit on how many phrase translations e for each phrase f are
> loaded
> >> # 0 = all elements loaded
> >> [ttable-limit]
> >> 20
> >>
> >> # distortion (reordering) weight
> >> [weight-d]
> >> 0.6
> >>
> >> # language model weights
> >> [weight-l]
> >> 0.5000
> >>
> >>
> >> # translation model weights
> >> [weight-t]
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >>
> >> # no generation models, no weight-generation section
> >>
> >> # word penalty
> >> [weight-w]
> >> -1
> >>
> >> [distortion-limit]
> >> 6*
> >>
> >> But the decoding step ends with a segfault with following output
> for -v 3:
> >>
> >> *Defined parameters (per moses.ini or switch):
> >>config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> >>distortion-limit: 6
> >>input-factors: 0
> >>lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>mapping: 0 T 0
> >>ttable-file: 0 0 0 5
> >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> >> le.gz
> >>ttable-limit: 20
> >>verbose: 100
> >>weight-d: 0.6
> >>weight-l: 0.5000
> >>weight-t: 0.2 0.2 0.2 0.2 0.2
> >>weight-w: -1
> >> input type is: text input
> >> Loading lexical distortion models...have 0 models
>  

Re: [Moses-support] Suffix arrays in Moses

2011-03-25 Thread Lane Schwartz
On Sat, Nov 28, 2009 at 2:48 PM, Lane Schwartz  wrote:

> I know that during the 2nd MT Marathon in Wandlitz in May 2008, work was
> done on implementing a suffix array data structure so that Moses could
> extract phrase pairs directly from an aligned parallel corpus at runtime,
> without the necessity of first running an explicit off-line phrase table
> extraction process.
>
> I don't know what the end result of that work was, nor do I know if any
> followup work was performed.
>
> My question was in regard to those issues. Specifically, does anyone know
> where things ended with regard to the suffix array code in Moses at the end
> of the MT Marathon in Wandlitz? And is functionality currently present in
> Moses to allow an aligned parallel corpus backed by a suffix array to act in
> place of a pre-computed phrase table?


Hieu, I know that at the Prague MT Marathon you used the Joshua suffix array
implementation to extract hiero grammars for Moses. But does Moses (trunk or
any other branch) have any suffix array-based runtime phrase table?

 I never really got a satisfactory answer to this thread back in November.
Based on that, I'm assuming that the code that ccb, Juri, and others worked
on in Wandlitz never made it into Moses trunk.

Cheers,
Lane
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Implementation of Lattice MERT

2011-03-25 Thread Lane Schwartz
Does anyone know if there's an open implementation of Lattice MERT
(Macherey, et al, 2008)? Strangely, the authors seem to have forgotten to
include the URL to download their implementation. :)

Just to be clear, I don't mean running MERT where you give Moses a
source-language lattice to decode.

Cheers,
Lane
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Suffix arrays in Moses

2011-03-25 Thread Chris Callison-Burch
> I never really got a satisfactory answer to this thread back in November. 
> Based on that, I'm assuming that the code that ccb, Juri, and others worked 
> on in Wandlitz never made it into Moses trunk.

Correct.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Implementation of Lattice MERT

2011-03-25 Thread Chris Dyer
cdec (https://github.com/redpony/cdec) includes an implementation,
called vest. But someone needs to write code that will cause moses to
export its search lattices in the right format (which is a funny
crappy json-based encoding).

On Fri, Mar 25, 2011 at 2:59 PM, Lane Schwartz  wrote:
> Does anyone know if there's an open implementation of Lattice MERT
> (Macherey, et al, 2008)? Strangely, the authors seem to have forgotten to
> include the URL to download their implementation. :)
>
> Just to be clear, I don't mean running MERT where you give Moses a
> source-language lattice to decode.
>
> Cheers,
> Lane
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Implementation of Lattice MERT

2011-03-25 Thread Philipp Koehn
Hi,

Christian Buck was involved in implemented it, but it was never properly
integrated into Moses. Maybe he can share some light on this, and it
may actually already work.

-phi

On Fri, Mar 25, 2011 at 6:59 PM, Lane Schwartz  wrote:
> Does anyone know if there's an open implementation of Lattice MERT
> (Macherey, et al, 2008)? Strangely, the authors seem to have forgotten to
> include the URL to download their implementation. :)
>
> Just to be clear, I don't mean running MERT where you give Moses a
> source-language lattice to decode.
>
> Cheers,
> Lane
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Suffix arrays in Moses

2011-03-25 Thread Philipp Koehn
Hi,

Abby implemented the suffix array code, and it is in Moses,
but not very well documented. It should be possible to use
both a standard (memory, on-disk) phrase table and the suffix
phrase table as back-off.

-phi

On Fri, Mar 25, 2011 at 7:02 PM, Chris Callison-Burch  wrote:
>> I never really got a satisfactory answer to this thread back in November. 
>> Based on that, I'm assuming that the code that ccb, Juri, and others worked 
>> on in Wandlitz never made it into Moses trunk.
>
> Correct.
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Suffix arrays in Moses

2011-03-25 Thread Lane Schwartz
Thanks Philipp. :)

Abby, is the suffix array code just in a branch, or has it been moved into
trunk?

Cheers,
Lane



On Fri, Mar 25, 2011 at 3:22 PM, Philipp Koehn  wrote:

> Hi,
>
> Abby implemented the suffix array code, and it is in Moses,
> but not very well documented. It should be possible to use
> both a standard (memory, on-disk) phrase table and the suffix
> phrase table as back-off.
>
> -phi
>
> On Fri, Mar 25, 2011 at 7:02 PM, Chris Callison-Burch 
> wrote:
> >> I never really got a satisfactory answer to this thread back in
> November. Based on that, I'm assuming that the code that ccb, Juri, and
> others worked on in Wandlitz never made it into Moses trunk.
> >
> > Correct.
>  > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Suzy Howlett
Well, it seems as if there is at least some interest in highlighting the 
issue. I suggest detailed discussion take place off the Moses support 
list, so if you are interested in discussing what steps we could take 
(running some experiments and writing a paper seems like the likely 
strategy, but what experiments, what venue, etc...), email me directly. 
(You don't have to commit to working on the paper to take part.)

Suzy

On 26/03/11 12:08 AM, Barry Haddow wrote:
> This might be what Miles is referring too
> http://www.statmt.org/wmt09/pdf/WMT-0939.pdf
>
> There was some progress towards getting this into moses
> http://lium3.univ-lemans.fr/mtmarathon2010/ProjectFinalPresentation/MERT/StabilizingMert.pdf
>
> On Friday 25 March 2011 13:02, Miles Osborne wrote:
>> There is work published on making mert more stable (on the train so can't
>> easily dig it up)
>>
>> Miles
>>
>> sent using Android
>>
>> On 25 Mar 2011 12:49, "Lane Schwartz"  wrote:
>>
>> We know that there is nondeterminism during optimization, yet virtually all
>> papers report results based on a single MERT run. We know that results can
>> very widely based on language pair and data sets, but a large majority of
>> papers report results on a single language pair, and often for a single
>> data set.
>>
>> While these issues are widely known at the informal level, I think that
>> Suzy's point is well taken. I think there would be value in published
>> studies showing just how wide the gap due to nondeterminism can be expected
>> to be. It may be that such studies already exist, and I'm just not aware of
>> them. Does anyone know of any?
>>
>> Cheers,
>> Lane
>>
>> On Fri, Mar 25, 2011 at 7:03 AM, Barry Haddow  wrote:
>>> Hi
>>>
>>> This is an is...
>

-- 
Suzy Howlett
http://www.showlett.id.au/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Implementation of Lattice MERT

2011-03-25 Thread Christian Buck
Hi!

Philipp Koehn:
> Christian Buck was involved in implemented it, but it was never properly
> integrated into Moses. Maybe he can share some light on this, and it
> may actually already work.

Well, there is a repository here:

https://github.com/christianbuck/Moses-Lattice-MERT

which builds a working executable and there are no bug reports. The code 
was written during the 4th MTM in Dublin, mostly by Karlis Goba and me. 
It is however rather untested and probably could use some optimizations. 
I am interested in getting it to work and after reading briefly over the 
code I am confident that this should not be too hard. And yes, it may 
already work.

Maybe someone could provide a reasonably sized test case with a run of 
the n-best MERT to compare to?

cheers,
buck
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support