Re: [Moses-support] Repeating Non-terminals and Alignment

2017-03-29 Thread Jamie Macbeth
Hi, Hieu,

Yes, I think that's a good description of what I'm hoping for.  Thanks for
the pointers!  Will let you know how it goes.

Jamie

On Wed, Mar 29, 2017 at 1:54 PM, Hieu Hoang  wrote:

> Interesting. So you're not after a general insertion or deletion grammar,
> but a synchronous grammar with occasional replicating subphrases?
>
> I've forgotten how to do it in the main Moses, but I've just rewritten the
> decoder 'moses2' that also supports SCFG. It may be better for you to
> implement it in there because the codebase is smaller, newer and easier to
> read.
>
> I'm not sure exactly how you'll do it but the starting point would be
>moses2/SCFG/Manager.cpp function Decode()
> and drill down from there
>
>
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 29 March 2017 at 18:27, Jamie Macbeth  wrote:
>
>> Hello,
>>
>> I planning to use Moses for a complex language decomposition task.  I
>> would like to be able to translate a sentence like "Bob bought a loaf of
>> bread at the store for $1," to one like "Bob gave 1$ to the store, and the
>> store gave a loaf of bread to Bob" using a tree-based model.  This requires
>> that an NP like "Bob" or "the store" appears once in the source text, but
>> appears twice in the target text.
>>
>> I tried doing this using a string-to-tree rule table where I have a
>> single non-terminal in the source aligned with two non-terminals in the
>> target.  Here's a simple example where I would try to translate "bob" to
>> "bob bob":
>>
>> bob [X] ||| bob [X] ||| 1.0 ||| |||
>>  [X][X]  [X] |||  [X][X] [X][X]  [TOP] ||| 1.0 ||| 1-1 1-2
>> |||
>>
>> You can see that I tried aligning 1-1 and 1-2.  However, when I try to
>> translate "bob" to "bob bob" using this rule table I get a segfault.
>>
>> Would it be possible to support a more flexible non-terminal alignment
>> like this in Moses?  If I wanted to implement this, would it be extremely
>> difficult, and where would I start?
>>
>> Sincerely,
>> Jamie Macbeth
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Repeating Non-terminals and Alignment

2017-03-29 Thread Hieu Hoang
Interesting. So you're not after a general insertion or deletion grammar,
but a synchronous grammar with occasional replicating subphrases?

I've forgotten how to do it in the main Moses, but I've just rewritten the
decoder 'moses2' that also supports SCFG. It may be better for you to
implement it in there because the codebase is smaller, newer and easier to
read.

I'm not sure exactly how you'll do it but the starting point would be
   moses2/SCFG/Manager.cpp function Decode()
and drill down from there


* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/


On 29 March 2017 at 18:27, Jamie Macbeth  wrote:

> Hello,
>
> I planning to use Moses for a complex language decomposition task.  I
> would like to be able to translate a sentence like "Bob bought a loaf of
> bread at the store for $1," to one like "Bob gave 1$ to the store, and the
> store gave a loaf of bread to Bob" using a tree-based model.  This requires
> that an NP like "Bob" or "the store" appears once in the source text, but
> appears twice in the target text.
>
> I tried doing this using a string-to-tree rule table where I have a single
> non-terminal in the source aligned with two non-terminals in the target.
> Here's a simple example where I would try to translate "bob" to "bob bob":
>
> bob [X] ||| bob [X] ||| 1.0 ||| |||
>  [X][X]  [X] |||  [X][X] [X][X]  [TOP] ||| 1.0 ||| 1-1 1-2
> |||
>
> You can see that I tried aligning 1-1 and 1-2.  However, when I try to
> translate "bob" to "bob bob" using this rule table I get a segfault.
>
> Would it be possible to support a more flexible non-terminal alignment
> like this in Moses?  If I wanted to implement this, would it be extremely
> difficult, and where would I start?
>
> Sincerely,
> Jamie Macbeth
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Repeating Non-terminals and Alignment

2017-03-29 Thread Jamie Macbeth
Hello,

I planning to use Moses for a complex language decomposition task.  I would
like to be able to translate a sentence like "Bob bought a loaf of bread at
the store for $1," to one like "Bob gave 1$ to the store, and the store
gave a loaf of bread to Bob" using a tree-based model.  This requires that
an NP like "Bob" or "the store" appears once in the source text, but
appears twice in the target text.

I tried doing this using a string-to-tree rule table where I have a single
non-terminal in the source aligned with two non-terminals in the target.
Here's a simple example where I would try to translate "bob" to "bob bob":

bob [X] ||| bob [X] ||| 1.0 ||| |||
 [X][X]  [X] |||  [X][X] [X][X]  [TOP] ||| 1.0 ||| 1-1 1-2 |||

You can see that I tried aligning 1-1 and 1-2.  However, when I try to
translate "bob" to "bob bob" using this rule table I get a segfault.

Would it be possible to support a more flexible non-terminal alignment like
this in Moses?  If I wanted to implement this, would it be extremely
difficult, and where would I start?

Sincerely,
Jamie Macbeth
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] lmplz crashed on joint_order

2017-03-29 Thread Kenneth Heafield
How embarrassing.  Can you try on head from github.com/kpu/kenlm ?  If that 
fails, I can take this off list.

Kenneth

On March 29, 2017 3:39:20 PM GMT+01:00, Dingyuan Wang  
wrote:
>Dear list,
>
>lmplz crashed on my machine recently. Command is
>
>lmplz -o 4 -S 70% --text zhc-simp.txt --arpa zhc.lm --prune 0 1 1 2
>
>=== 1/5 Counting and sorting n-grams ===
>Reading /home/gumble/docs/E/corpus/zhs/zhc-simp.txt
>5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>tcmalloc: large alloc 2340552704 bytes == 0x55e7ed4f4000 @
>tcmalloc: large alloc 9362194432 bytes == 0x55e878d14000 @
>
>Unigram tokens 886453003 types 66249
>=== 2/5 Calculating and sorting adjusted counts ===
>Chain sizes: 1:794988 2:1961835648 3:3678441728 4:5885507072
>tcmalloc: large alloc 5885509632 bytes == 0x55e7ed4f4000 @
>tcmalloc: large alloc 1961836544 bytes == 0x55e94c29c000 @
>tcmalloc: large alloc 3678445568 bytes == 0x55e9c119 @
>Statistics:
>1 66249 D1=0.549028 D2=1.18255 D3+=0.99644
>2 14266408/22790840 D1=0.615082 D2=1.06095 D3+=1.47555
>3 87810872/205978808 D1=0.742285 D2=1.17282 D3+=1.49899
>4 62909089/415283792 D1=0.698985 D2=1.20588 D3+=1.54463
>Memory estimate for binary LM:
>type  MB
>probing 3417 assuming -p 1.5
>probing 4002 assuming -r models -p 1.5
>trie1653 without quantization
>trie 908 assuming -q 8 -b 8 quantization
>trie1418 assuming -a 22 array pointer compression
>trie 674 assuming -a 22 -q 8 -b 8 array pointer compression and
>quantization
>=== 3/5 Calculating and sorting initial probabilities ===
>tcmalloc: large alloc 4119576576 bytes == 0x55e94c1d8000 @
>tcmalloc: large alloc 9966813184 bytes == 0x55eaaf63 @
>Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
>5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>##**###-##**++#-###-##+###*###*#
>=== 4/5 Calculating and writing order-interpolated probabilities ===
>Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
>5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>---terminate
>called after throwing an instance of 'lm::FormatLoadException'
>  what():  ./lm/common/joint_order.hh:61 in void lm::JointOrder(const
>util::stream::ChainPositions&, Callback&) [with Callback =
>lm::builder::{anonymous}::Callback;
>Compare = lm::SuffixOrder] threw FormatLoadException because `order !=
>current + 1'.
>Detected n-gram without matching suffix
>
>
>-- 
>Dingyuan Wang
>___
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] lmplz crashed on joint_order

2017-03-29 Thread Dingyuan Wang
Dear list,

lmplz crashed on my machine recently. Command is

lmplz -o 4 -S 70% --text zhc-simp.txt --arpa zhc.lm --prune 0 1 1 2

=== 1/5 Counting and sorting n-grams ===
Reading /home/gumble/docs/E/corpus/zhs/zhc-simp.txt
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
tcmalloc: large alloc 2340552704 bytes == 0x55e7ed4f4000 @
tcmalloc: large alloc 9362194432 bytes == 0x55e878d14000 @

Unigram tokens 886453003 types 66249
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:794988 2:1961835648 3:3678441728 4:5885507072
tcmalloc: large alloc 5885509632 bytes == 0x55e7ed4f4000 @
tcmalloc: large alloc 1961836544 bytes == 0x55e94c29c000 @
tcmalloc: large alloc 3678445568 bytes == 0x55e9c119 @
Statistics:
1 66249 D1=0.549028 D2=1.18255 D3+=0.99644
2 14266408/22790840 D1=0.615082 D2=1.06095 D3+=1.47555
3 87810872/205978808 D1=0.742285 D2=1.17282 D3+=1.49899
4 62909089/415283792 D1=0.698985 D2=1.20588 D3+=1.54463
Memory estimate for binary LM:
type  MB
probing 3417 assuming -p 1.5
probing 4002 assuming -r models -p 1.5
trie1653 without quantization
trie 908 assuming -q 8 -b 8 quantization
trie1418 assuming -a 22 array pointer compression
trie 674 assuming -a 22 -q 8 -b 8 array pointer compression and
quantization
=== 3/5 Calculating and sorting initial probabilities ===
tcmalloc: large alloc 4119576576 bytes == 0x55e94c1d8000 @
tcmalloc: large alloc 9966813184 bytes == 0x55eaaf63 @
Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
##**###-##**++#-###-##+###*###*#
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
---terminate
called after throwing an instance of 'lm::FormatLoadException'
  what():  ./lm/common/joint_order.hh:61 in void lm::JointOrder(const
util::stream::ChainPositions&, Callback&) [with Callback =
lm::builder::{anonymous}::Callback;
Compare = lm::SuffixOrder] threw FormatLoadException because `order !=
current + 1'.
Detected n-gram without matching suffix


-- 
Dingyuan Wang
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] lattice mbr output empty translation result

2017-03-29 Thread Angli Liu
Hi, I was using lattice mbr to decode the source sentences; the model was
tuned using mert. However, despite the fact that other decoding methods
such as maximum probability decoding and consensus decoding can output
results without a problem, mbr decoding using the -lmbr flag let the
decoder output an empty file, whatever size, scale and pruning factor I
set.

In its simplest form, the code that caused this problem is essentially
equivalent to the following:

moses \
-f moses.ini \
-output-unknowns file1 \
-n-best-list file2 50 \
-output-search-graph file3 \
-lmbr \
(-lmbr-p 0.8 -lmbr-r 0.8 -mbr-scale 5 -lmbr-pruning-factor 50) \
< in_file \
> out_file

1. parameters in parentheses are optional, though either way nothing was
output by the decoder.
2. the problem essentially is that it is out_file that tuned out to be
empty.

What was the problem? Thanks for your input in advance!

-Angli
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support