Re: [Moses-support] Compact phrase table on-disk format changed

2019-12-08 Thread Marcin Junczys-Dowmunt
Huh, might be worth telling them about. A patch version change from 2.0.0 to 2.0.2 should not break backwards-compatibility. From: Hieu Hoang Sent: Sunday, December 8, 2019 5:24 PM To: Dingyuan Wang Cc: moses-support Subject: Re: [Moses-support] Compact phrase table on-disk format changed

Re: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 268435455

2019-05-09 Thread Marcin Junczys-Dowmunt
Hi, Yes, a smaller phrase table should help. I wrote the table, but that was in 2012 and I cannot really remember what goes on in there. I think making sure that you do not have too many target phrases per source phrase should help. From: He Shiming Sent: Thursday, May 9, 2019 8:49 PM To:

Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction

2018-01-15 Thread Marcin Junczys-Dowmunt
Seems like all I need is there. Will take a look today and report back. From: Kelly Marchisio Sent: Monday, January 15, 2018 9:28 AM To: Marcin Junczys-Dowmunt Cc: moses-support Subject: Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction Sure - it's on my computer locally

Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction

2018-01-15 Thread Marcin Junczys-Dowmunt
Hm, not really. Any chance you give me access to the tuning folder? I could try to run the scorer manually and see if I can reproduce the error. This looks like some debugging is needed. From: Kelly Marchisio Sent: Monday, January 15, 2018 6:57 AM To: Marcin Junczys-Dowmunt Cc: moses-support

Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction

2018-01-13 Thread Marcin Junczys-Dowmunt
weird stuff is going on in the data. From: Kelly Marchisio Sent: Saturday, January 13, 2018 8:28 PM To: Marcin Junczys-Dowmunt Cc: moses-support Subject: Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction Ah, good to know that the scorer was called successfully and that I can

Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction

2018-01-13 Thread Marcin Junczys-Dowmunt
to that? From: Kelly Marchisio Sent: Saturday, January 13, 2018 7:46 PM To: Marcin Junczys-Dowmunt; moses-support Subject: Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction looping back in mailing-list and copying message :) Thanks so much for the response, Marcin! I did see your

Re: [Moses-support] M2 Scorer in EMS for Grammatical Error Correction

2018-01-12 Thread Marcin Junczys-Dowmunt
Hi, We never really used it with EMS, so I do not think anyone can help you here. Did you have a look at the original repo: https://github.com/grammatical/baselines-emnlp2016 ? Otherwise we can probably take this off-list and try to help you personally  From: Kelly Marchisio Sent: Friday,

Re: [Moses-support] Deploying large models

2017-12-12 Thread Marcin Junczys-Dowmunt
Hi, I think the important part is that Liling actually manages to translate several tens of thousands of sentences before that happens. A quick fix would be to break your corpus into pieces of 10K sentences each and loop over the files. I usually have bad experience with trying to translate

Re: [Moses-support] EMS for the neural age?

2017-11-27 Thread Marcin Junczys-Dowmunt
W dniu 27.11.2017 o 11:19, Barry Haddow pisze: > > For the marian usage examples, I would go for the lowest common > denominator - shell scripts. Generally more readable than Makefiles. Good point. My needs for experimental set-ups does not necessarily overlap with readable tutorials.

Re: [Moses-support] EMS for the neural age?

2017-11-26 Thread Marcin Junczys-Dowmunt
This looks good. OK, I guess ducttape it is. W dniu 26.11.2017 o 14:56, Matt Post pisze: > Shuoyang Ding put this together recently: > > https://github.com/shuoyangd/tape4nmt > > matt > > >> On Nov 26, 2017, at 2:31 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.

Re: [Moses-support] EMS for the neural age?

2017-11-26 Thread Marcin Junczys-Dowmunt
guess that commented oneliner snippets are the best thing you can do. > > Cheers, O. > > > 26. listopadu 2017 10:41:16 SEČ, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> > napsal: >> Hi list, >> >> I am preparing a couple of usage example for my NMT

Re: [Moses-support] EMS for the neural age?

2017-11-26 Thread Marcin Junczys-Dowmunt
on-task.html > <http://www.statmt.org/wmt17/translation-task.html> > > > Regards, > Ergun > > > On Sun, Nov 26, 2017 at 12:41 PM, Marcin Junczys-Dowmunt > <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: > > Hi list, > > I

[Moses-support] EMS for the neural age?

2017-11-26 Thread Marcin Junczys-Dowmunt
Hi list, I am preparing a couple of usage example for my NMT toolkit and got hung up on all the preprocessing and other evil stuff. I am wondering is there now anything decent around for doing preprocessing, running experiments and evaluation? Or is the best thing still GNU make (isn't that

Re: [Moses-support] Filtering?

2017-06-09 Thread Marcin Junczys-Dowmunt
Hi, I think a LC_ALL=C sort on the filtered phrase-table might help? W dniu 09.06.2017 o 18:14, Mike Ladwig pisze: > Anyone working with the Johnson pruning scripts? > > I filtered my phrase table with: > > zcat > /home/mike/stelae-projects/fr-en/phrasemodel/model/phrase-table.gz | >

Re: [Moses-support] wer, ter, per, cder

2017-04-11 Thread Marcin Junczys-Dowmunt
BTW, this tool reports 1-WER, etc. Probably so MERT can maximize this value. W dniu 11.04.2017 o 09:07, Marcin Junczys-Dowmunt pisze: > Hi Joerg, > > Compute all scores separately: > > mosesdecoder/bin/evaluator --sctype WER --sctype TER --sctype PER > --sctype CDER -

Re: [Moses-support] Support Moses and GPU on cloud

2017-04-04 Thread Marcin Junczys-Dowmunt
Why would you train your SMT model on a GPU instance? That's far to expensive. Train on a GPU-less instance, then when done attach to an instance that has a GPU. That's what I did. For deployment 15GB might be enough if you make your SMT models small enough. W dniu 04.04.2017 o 10:25, liling

Re: [Moses-support] Support Moses and GPU on cloud

2017-04-04 Thread Marcin Junczys-Dowmunt
Hi Liling, I did both on AWS for my WMT2016 en-ru/ru-en systems. No problems with that. What would be the problems you ran in? W dniu 04.04.2017 o 09:30, liling tan pisze: > Dear Moses community, > > Amittai had written a nice package and setup guide for Moses on AWS. > But to do some NMT on

[Moses-support] Rebuilding moses binary only

2017-03-30 Thread Marcin Junczys-Dowmunt
Hi list, is there a way to tell bjam to only rebuild the moses binary and not the 84 unrelated targets that just happen to be rebuilt out of solidarity? Thanks, Marcin ___ Moses-support mailing list Moses-support@mit.edu

Re: [Moses-support] Select sentences that maximize BLEU from n-best list

2017-03-28 Thread Marcin Junczys-Dowmunt
> > Cheers, > Matthias > > > On Tue, 2017-03-28 at 10:19 +0200, Marcin Junczys-Dowmunt wrote: >> Hi list, >> >> does anyone have a tool that takes a moses-format n-best list and can >> output the single best sentence per source sentence according to BLEU >&

[Moses-support] Select sentences that maximize BLEU from n-best list

2017-03-28 Thread Marcin Junczys-Dowmunt
Hi list, does anyone have a tool that takes a moses-format n-best list and can output the single best sentence per source sentence according to BLEU and a given reference? Or anything that can be shoehorned into something like that? Thanks, Marcin

Re: [Moses-support] Phrase-based NMT

2016-10-10 Thread Marcin Junczys-Dowmunt
That's what I meant when I said I do not like the evaluation part. Since their baseline NMT model has no mechanism to deal with unknown words, it is quite likely that the effect is mainly due to that (although I might be totally wrong on that). Add for instance subword units and the effect

Re: [Moses-support] Phrase-based NMT

2016-10-10 Thread Marcin Junczys-Dowmunt
Hi, There is this work: https://arxiv.org/abs/1606.01792 The model is interesting, but the evaluation part is a bit weak. For some reason this group of authors restricts their findings to Chinese only. There is also no other attempt to deal with unknown words, so the impact of the phrase

Re: [Moses-support] accessing the compact format of the phrase-table

2016-09-23 Thread Marcin Junczys-Dowmunt
feeds.php> [Email us] <mailto:i...@kantanmt.com> On 23 September 2016 at 10:19, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: Hi, If you want a complete dump of the phrase table as text, this is not possible. The comp

Re: [Moses-support] accessing the compact format of the phrase-table

2016-09-23 Thread Marcin Junczys-Dowmunt
Hi, If you want a complete dump of the phrase table as text, this is not possible. The compact phrase table is not reversible. You can use queryPhraseTableMin to ask for the translations of specific phrases. Best, Marcin W dniu 23/09/16 o 10:16, Dimitar Shterionov pisze: Dear all, I want to

Re: [Moses-support] Moses 3.0 cannot start with configuration file and models of moses 2.0

2016-07-06 Thread Marcin Junczys-Dowmunt
I don't think I changed the compact format for ages, unless someone else did somehow wihout me noticing. W dniu 06.07.2016 o 15:40, Hieu Hoang pisze: > try adding or taking our the suffix .binlexr in the filename > > occasionally, the file format in the binary tables change so you might > need

[Moses-support] OD: Differences in compact phrase tables

2016-06-29 Thread Marcin Junczys-Dowmunt
Hi Miriam, yes, it uses random hashes with different seeds, it will be different.--Sent from my phone Wiadomość oryginalna Temat: [Moses-support] Differences in compact phrase tablesOd: Miriam Käshammer Do: moses mailinglist DW: Dear

Re: [Moses-support] UN V1.0 corpus / Europarl - first shot... EN=>FR

2016-05-30 Thread Marcin Junczys-Dowmunt
Hi, Considering the weirdness of the language of UN proceedings this is actually encouraging results. What happens if you mix stuff? W dniu 30.05.2016 o 14:03, Vincent Nguyen pisze: > First, many thanks for the huge work. open some new languages > possibilities not in the europarl. > > I just

[Moses-support] Official release of the United Nations Parallel Corpus v1.0

2016-05-25 Thread Marcin Junczys-Dowmunt
the corpus. In the near future we plan to set up a section with references to papers that describe research done with UN corpus. Feel free to share links and bibliography items with us (either with me or any of the authors of the above paper). Sorry for cross-posting, Marcin Junczys-Dowmunt

Re: [Moses-support] Random segfaults with alternative decoding paths

2016-04-14 Thread Marcin Junczys-Dowmunt
think you can access our machines anymore, sorry :( The account was already deleted. Best, Ales On Wed, Apr 13, 2016 at 10:10 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: Urghs, not good. Can I somehow get access to

Re: [Moses-support] Random segfaults with alternative decoding paths

2016-04-13 Thread Marcin Junczys-Dowmunt
Urghs, not good. Can I somehow get access to that machine? Is it deterministic? W dniu 13.04.2016 o 21:06, Barry Haddow pisze: Hi Ales Well, bitPos=18446744073708512633 looks bogus. Marcin? cheers - Barry On 13/04/16 17:23, Aleš Tamchyna wrote: Hi all, sorry for the delay. I'm attaching

Re: [Moses-support] Random segfaults with alternative decoding paths

2016-04-13 Thread Marcin Junczys-Dowmunt
And what happens if you only use each one of the phrase-tables alone? W dniu 13.04.2016 o 11:40, Hieu Hoang pisze: On 13/04/2016 14:25, Ales Tamchyna wrote: Hi, Let me add some more information to this: when running Moses in gdb, I get the following backtrace: #0 0x006e3ba4 in

Re: [Moses-support] Print list of translation options

2016-03-09 Thread Marcin Junczys-Dowmunt
/PhraseDictionaryCompact.cpp:148 #18 0x0040ef67 in main (argc=6, argv=0x7fffe298) at misc/queryPhraseTableMin.cpp:64 On Wed, Mar 9, 2016 at 1:02 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: Your phrase table has 11 scores, you

Re: [Moses-support] Print list of translation options

2016-03-09 Thread Marcin Junczys-Dowmunt
y has 11 Aborted (core dumped) Am I calling this correctly? Something's not right. On Wed, Mar 9, 2016 at 12:56 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: There is queryPhraseTableMin that reads phrases from stdin and returns all

Re: [Moses-support] Print list of translation options

2016-03-09 Thread Marcin Junczys-Dowmunt
sentence using a query program? Thanks, Lane On Wed, Mar 9, 2016 at 12:42 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: With verbose 3 it's actually there, just before it starts outputting the search graph, take another look. There is a lis

Re: [Moses-support] Scripts for n-best-list rescoring

2016-03-08 Thread Marcin Junczys-Dowmunt
8, 2016 at 8:18 AM, Lane Schwartz <dowob...@gmail.com <mailto:dowob...@gmail.com>> wrote: I don't think there is. At my previous lab, I believe we had to build our own in-house script. It would be nice to have one in moses. On Sat, Oct 31, 2015

Re: [Moses-support] RNNLM Integration?

2016-03-07 Thread Marcin Junczys-Dowmunt
Something like this maybe? http://www.statmt.org/wmt15/pdf/WMT34.pdf W dniu 07.03.2016 o 21:41, Lane Schwartz pisze: Philipp, Are you aware of any published work examining the importance of hypothesis recombination in terms of time/space/quality tradeoffs? Lane On Mon, Mar 7, 2016 at

Re: [Moses-support] Error in Running the Language Model Training of the KenLM

2016-03-07 Thread Marcin Junczys-Dowmunt
Try -S 50% or even smaller. But 4GB is awfully small. W dniu 07.03.2016 o 21:54, BIRENDRA CHAUHAN SINGH pisze: on running this: mkdir ~/lm cd ~/lm ~/mosesdecoder/bin/lmplz -o 3 <~/corpus/news-commentary-v8.fr-en.true.en > news-commentary-v8.fr-en.arpa.en Error: bhupendra@berry:~/lm$

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-28 Thread Marcin Junczys-Dowmunt
. Saving to /netshr/working-fr-en/model/moses.bin.ini.7.tables/reordering-table.7.wbe-msd-bidirectional-fe.minlexr Done Executing: rm -f /netshr/working-fr-en/model/moses.bin.ini.7; ln -s /netshr/working-fr-en/model/moses.bin.ini.7.tables/moses.ini /netshr/working-fr-en/model/m

Re: [Moses-support] Seg Fault when Binarizing Phrase Tables

2016-02-23 Thread Marcin Junczys-Dowmunt
Well, it's an empty file :) W dniu 23.02.2016 o 16:43, Jake Ballinger pisze: Sure thing---I've attached it. Best, Jake On Tue, Feb 23, 2016 at 10:06 AM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: Hi, Can you send me the phra

Re: [Moses-support] Seg Fault when Binarizing Phrase Tables

2016-02-23 Thread Marcin Junczys-Dowmunt
Hi, Can you send me the phrase table you are binarizing? It seems to be small enough. Best, Marcin W dniu 23.02.2016 o 02:11, Jake Ballinger pisze: Hello everyone, I'm trying to set up the baseline system, as mentioned here , When I try to

Re: [Moses-support] Is memory mapping lazy?

2016-02-19 Thread Marcin Junczys-Dowmunt
Hi Lane, For the compact phrase table and reordering table you can use --minphr-memory and --minlexr-memory respectively. That will disable memory mapping entirely and just read both into RAM. Best, Marcin On 20.02.2016 00:29, Lane Schwartz wrote: > Hey, > > This is mostly addressed to Kenneth,

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-18 Thread Marcin Junczys-Dowmunt
ads > and ProcessLexicalTableMin with 4 threads, difficult, right ? > > just letting you know, with 8 threads the processlexicaltablemin seems > to run with 1 thread only . > > > > Le 17/02/2016 23:16, Marcin Junczys-Dowmunt a écrit : >> I just checked, it's really weirdly slo

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-17 Thread Marcin Junczys-Dowmunt
I just checked, it's really weirdly slow now. Apparently using more than 4 threads is a bad idea. But 4 threads seems to be about 2 times faster than just one. I remember that used to work better. Maybe because I haven't tcmalloc linked? On 17.02.2016 23:07, Marcin Junczys-Dowmunt wrote

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-17 Thread Marcin Junczys-Dowmunt
It is, just not very well done. It generally does not make sense to have more than 8-10 threads. That should however be somewhat faster than only a single thread. On 17.02.2016 22:44, Vincent Nguyen wrote: > I have the feeling it's not. > ___ >

Re: [Moses-support] Access to set of possible target words from within feature function

2016-02-12 Thread Marcin Junczys-Dowmunt
onOptionCollection::EvaluateTranslationOptionListWithSourceContext > > Or you add yet another Evaluate() method that passes the feature > function the InputPathList, which really has every translation options > in 1 go > > Hieu Hoang > http://www.hoang.co.uk/hieu > >

Re: [Moses-support] Access to set of possible target words from within feature function

2016-02-12 Thread Marcin Junczys-Dowmunt
Indeed. Thank you very much, dear sir. On 12.02.2016 23:47, Hieu Hoang wrote: > i think this happens just before expansion, once all the translation > options have been collected > > On 12/02/16 22:36, Marcin Junczys-Dowmunt wrote: >> But this happens during hypothesis expansi

[Moses-support] Access to set of possible target words from within feature function

2016-02-12 Thread Marcin Junczys-Dowmunt
Hi everybody, Is there a nice way to access the complete set of translation options for a sentence from within a feature function? I just need the bag of possible words before any processing. Let's say from within "EmptyHypothesisState", there is InputType, but I cannot really find any hook to

Re: [Moses-support] Problem with processPhraseTableMin

2016-02-04 Thread Marcin Junczys-Dowmunt
> > - Uli > > On Wed, Feb 3, 2016 at 12:01 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> > wrote: > >> Weird. >> >> Jeremy, I binarized your phrase-table a couple of times with different >> commits (also the most recent one), and I cannot

Re: [Moses-support] Problem with processPhraseTableMin

2016-02-03 Thread Marcin Junczys-Dowmunt
Weird. Jeremy, I binarized your phrase-table a couple of times with different commits (also the most recent one), and I cannot reproduce the error. Try maybe -threads 10 or 12. I can make the binarized versions available for download. W dniu 02.02.2016 o 18:21, Marcin Junczys-Dowmunt pisze

Re: [Moses-support] Problem with processPhraseTableMin

2016-02-02 Thread Marcin Junczys-Dowmunt
cause any problems. On 02.02.2016 16:58, Kenneth Heafield wrote: > That typically causes a bus error. Why is there an overly huge malloc? > > On 02/02/2016 03:53 PM, Marcin Junczys-Dowmunt wrote: >> I think it fills up your temporary folder, try "-T ." to specify thew >

Re: [Moses-support] Problem with processPhraseTableMin

2016-02-02 Thread Marcin Junczys-Dowmunt
I think it fills up your temporary folder, try "-T ." to specify thew local folder for temporary files. On 02.02.2016 16:21, Jeremy Gwinnup wrote: > Hi, > > I’m having a problem using processPhraseTableMin to compress a phrase table > with 7 scores - the program consistently coredumps at step 3

Re: [Moses-support] Polysynthetic languages?

2016-02-01 Thread Marcin Junczys-Dowmunt
ssor > > I don't know if there's any segmentation methods specific for > Cherokee. > > best wishes, > Rico > > > On 01.02.2016 13:31, Marcin Junczys-Dowmunt wrote: >> >> Hi Mike, >> >> Maybe take a look at Rico's tool for handling u

Re: [Moses-support] Polysynthetic languages?

2016-02-01 Thread Marcin Junczys-Dowmunt
Hi Mike, Maybe take a look at Rico's tool for handling unknown words in neural machine translation. I have been playing around with that for Russian-English and standard phrase-based SMT with some success. I am just not sure if your small corpora will be enough to learn useful segmentations

Re: [Moses-support] Polysynthetic languages?

2016-02-01 Thread Marcin Junczys-Dowmunt
ry. > > You could also try other unsupervised morpheme segmenters like morfessor: > https://github.com/aalto-speech/morfessor [3] > > I don't know if there's any segmentation methods specific for Cherokee. > > best wishes, > Rico > > On 01.02.2016 13:31, Marcin Junczy

Re: [Moses-support] error while creating compact reordering table

2016-01-28 Thread Marcin Junczys-Dowmunt
Hi, first check if it's actually there, for instance, post a directory listing of that path /root/working/binarised-model/ You can also try to add ".minlexr" for the reordering model and ".minphr" for the phrase table, maybe that helps. I am currently confused which version is the right way

Re: [Moses-support] Multilingually Sentence-Aligned Corpora

2016-01-22 Thread Marcin Junczys-Dowmunt
Hi Graham, At the UN we are now working to release an official version of our data. As a bonus to the pair-wise alignment, it will contain a 6-way fully aligned subcorpus for English, French, Spanish, Russian, Chinese, Arabic; about 13M segments per language. We are waiting for some LREC

Re: [Moses-support] MT Marathon 2010 page hacked.

2016-01-06 Thread Marcin Junczys-Dowmunt
> > As for baiting users, I’m not quite sure what you mean by that. The > link you provide is a genuine link to lecture material from the MT > Marathon. > Totally a bait. It's named "survey" and all that so-called MT stuff looks very fishy to me.

Re: [Moses-support] z-mert

2015-12-18 Thread Marcin Junczys-Dowmunt
Hi Sarah, try running the command with LC_ALL=C java -jar ... I think the problem is that Java assumes a German locale and expects floating point number with a comma and not a dot. I spent some time myself to figure that out while using ZMERT. Best, Marcin On 18.12.2015 11:09, Sarah Schulz

Re: [Moses-support] PhraseDictionaryCompact is not registered

2015-12-18 Thread Marcin Junczys-Dowmunt
Hi, I'd say you didn't install cmph or compile against it, look again at: http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc3 On 18.12.2015 15:15, Andrew wrote: > I'm following the baseline system page step-by-step as it says. > I've binarized the phrase table and reordering table using >

[Moses-support] OD: baseline-system has very low BLEU-Score

2015-11-18 Thread Marcin Junczys-Dowmunt
Try testset-small.de with your multibleu command instead of testset-small.en,BestMarcin--Sent from my phone Wiadomość oryginalna Temat: Re: [Moses-support] baseline-system has very low BLEU-ScoreOd: Raphael Hoeps Do: moses-support@mit.eduDW:

[Moses-support] Scripts for n-best-list rescoring

2015-10-31 Thread Marcin Junczys-Dowmunt
Hi, does moses include scripts for n-best-list rescoring/resorting after a new feature has been added to the list? I guess, this can probably be achieved by running a single parameter tuning step on the extended n-best-list, but then I still need to fiddle around with calculating model scores

Re: [Moses-support] OD: lmplz error

2015-10-26 Thread Marcin Junczys-Dowmunt
Actually, is there any good reason to have this option not enabled by default? W dniu 26.10.2015 o 08:52, Marcin Junczys-Dowmunt pisze: This particular error does not indicate singletons, there is a different error message for that. Sometimes discounts are just weird. Kneser-Ney smoohting

[Moses-support] OD: lmplz error

2015-10-23 Thread Marcin Junczys-Dowmunt
Hi,That's what --discount_fallback is for.Best,Marcin--Sent from my phone Wiadomość oryginalna Temat: [Moses-support] lmplz errorOd: Hieu Hoang Do: moses-support DW: hi alldoes anyone know how to fix this error from lmplz:terminate

Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Marcin Junczys-Dowmunt
Hi, yes, definitely wrong turn, all code should be in CompactPT. I am not sure this is actually a code bug, is it working with g++ on macOS? W dniu 2015-10-13 12:50, Jeroen Vermeulen napisał(a): > On 10/13/2015 04:59 PM, Hieu Hoang wrote: > >> you're quite right, i've added a check >>

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-11 Thread Marcin Junczys-Dowmunt
sys1m54.006s sys2m40.239s sys3m43.040s 3m59.816s On 08/10/2015 21:00, Marcin Junczys-Dowmunt wrote: I have a branch, "unblockpt", those locks are gone and caches are thread-local. Hieu claims there is still not speed up. W dniu 08.10.2015 o 21:56, Kenneth Heafield pisze: Good poi

Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt
ane Schwartz wrote: Thanks, Marcin. So when the various components of Moses pass words back and forth, what do they send each other? std::string? StringPiece? On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: For instan

Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt
inefficient. I've found code in KenLM that maps from strings to integers, but not the other way around. Marcin, do you know, for example, where any Moses code is for doing the mapping for any data structure? On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailt

Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt
Hi, This would only be a simple thing if there was a common framework for that, but there isn't. Each datastructure implements its own vocabularies and look-up tables. There is no common set of integers. Best, Marcin W dniu 09.10.2015 o 23:11, Lane Schwartz pisze: Hey, I know this should be

Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt
back and forth, what do they send each other? std::string? StringPiece? On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: For instance in my phrase table that would be mosesdecoder/moses/TranslationMod

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-09 Thread Marcin Junczys-Dowmunt
user6m0.768s user6m56.545s user8m21.316s user9m20.490s user10m22.638s 10m50.360s sys0m15.712s sys0m35.746s sys0m53.254s sys1m19.331s sys1m54.006s sys2m40.239s sys3m43.040s 3m59.816s On 08/10/2015 21:00, Marcin Junczys-Dowmunt wrote: I have a branch, "unblockpt", those lock

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Marcin Junczys-Dowmunt
d(sourcePhrase); > if(it != m_phraseCache.end()) { >LastUsed = it->second; >lu.m_clock = clock(); > return std::make_pair(lu.m_tpv, lu.m_bitsLeft); > } else >return std::make_pair(TargetPhraseVectorPtr(), 0); >} > > > > On

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Marcin Junczys-Dowmunt
We did quite a bit of experimenting with that, usually there is hardly any measureable quality loss until you get below 1000. Good enough for deployment systems. It seems however you can get up 0.4 BLEU increase when going really high (about 5000 and beyond) with larger distortion limits. But

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Marcin Junczys-Dowmunt
es better than CompactPT, > that's the first thing I'd optimize. > > On 10/08/2015 08:30 PM, Marcin Junczys-Dowmunt wrote: >> We did quite a bit of experimenting with that, usually there is hardly >> any measureable quality loss until you get below 1000. Good enough for >>

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-05 Thread Marcin Junczys-Dowmunt
Very bad unpruned and with mulithreading! :) Is this with the nonblockpt branch? I am slowly running out of ideas what might be the cause of this. Frequent vector realloaction? On 05.10.2015 16:48, Hieu Hoang wrote: > what pt implementation did you use, and had it been pre-pruned so that >

Re: [Moses-support] prune phrase table

2015-10-02 Thread Marcin Junczys-Dowmunt
You can use filter-pt from contrib/sigtestfilter without the suffix arrays, needs SALM to compile though. When you only specifg -n 100 it will prune according to p(t|s) W dniu 2015-10-02 17:35, Hieu Hoang napisał(a): > I can't remember, but is there a script that prune the pt, keeping just

[Moses-support] OD: Compact Phrase Table

2015-09-19 Thread Marcin Junczys-Dowmunt
Hi, are you sure that your output path exists?--Sent from my phone Wiadomość oryginalna Temat: [Moses-support] Compact Phrase TableOd: Sanjanashree Palanivel Do: moses-support@mit.eduDW: Dear all,I also tried to binarize the phrase table following this

[Moses-support] Oldest version of boost to work with --with-mm

2015-09-16 Thread Marcin Junczys-Dowmunt
Hi, what's the currently oldest version of boost for moses with the --with-mm option? It seems boost 1.54 is not supported any more although that is still standard for the current Ubuntu LTS? It works with a by-hand installation of 1.59, I haven't tried any in-betweeners. Best, Marcin

Re: [Moses-support] BLEU score

2015-09-07 Thread Marcin Junczys-Dowmunt
Hi Tomek, 4.5% definitely indicate that there was an error in your pipeline (or test data?). However, there are so many places where things could go wrong, that based on the little information you have us I could not even start guessing. Check if your line numbers match, that you use tokenized

Re: [Moses-support] Translation Model binarizing step in EMS - multicore ?

2015-09-02 Thread Marcin Junczys-Dowmunt
-nscores means "number of scores". For multi-threading "-threads 4" can be used. Using more than 12 threads is not recommended. Best, Marcin W dniu 2015-09-02 10:43, Vincent Nguyen napisał(a): > Hi, > > Unless I am mistaken, it seems that binarizing the TM step in EMS in not > multi

Re: [Moses-support] Memory efficient MT

2015-08-26 Thread Marcin Junczys-Dowmunt
I just realized that page is seriously understating the memory-saving effects of my phrase (reordering) table :) On 26.08.2015 18:01, Kenneth Heafield wrote: Hi, How much of http://www.statmt.org/moses/?n=Moses.Optimize have you used? Be sure to read the last line of the page too.

Re: [Moses-support] sigtest filtering reordering

2015-08-19 Thread Marcin Junczys-Dowmunt
Hi, I guess that was the operation system killing the process due to lack of memory. Do you have a filtered phrase-table already? If yes, you can just remove the spurious reordering entries with the script remove-orphaned-reordering-entries.perl (someting like that, I am writing this from

Re: [Moses-support] Do I need to sort reordering model generated by EMS before binarizing with processLexicalTableMin?

2015-08-13 Thread Marcin Junczys-Dowmunt
Hi Jeremy, I believe reordering models come sorted out of the EMS process, so it should just work if nothing else has been done to the model. Otherwise the binarization tool will complain, so it will tell you to sort if it is necessary. Best, Marcin W dniu 2015-08-13 17:31, Jeremy

[Moses-support] Fwd: Re: Is there multithread option for KenLM's build_binary?

2015-08-07 Thread Marcin Junczys-Dowmunt
For reference to the list. Original Message Subject: Re: [Moses-support] Is there multithread option for KenLM's build_binary? Date: Fri, 07 Aug 2015 22:02:36 +0200 From: Marcin Junczys-Dowmunt junc...@amu.edu.pl To: liling tan alvati...@gmail.com Hi Liling

Re: [Moses-support] Parallelizer multi core

2015-08-01 Thread Marcin Junczys-Dowmunt
Hi, I agree with Nick. I am using a 64-core machine. -threads all will grind to a still-stand. I am however fine with a few more threads, say 16. Best, Marcin On 01.08.2015 00:35, Nikolay Bogoychev wrote: Hey, I have opposed this change in the past for two reasons: Using more than 4

Re: [Moses-support] Prepackaged MT research platform

2015-07-02 Thread Marcin Junczys-Dowmunt
Cool. What is word2vec used for in that context? Best, Marcin W dniu 02.07.2015 o 19:29, amittai pisze: Hi -- There are several free pre-packaged Moses distributions, many of which are listed here: http://www.statmt.org/moses/?n=Moses.Packages However I don't know of any free hosting

Re: [Moses-support] Lattice input and source word range in EvaluateWhenApplied

2015-06-30 Thread Marcin Junczys-Dowmunt
June 2015 at 19:30, Marcin Junczys-Dowmunt junc...@amu.edu.pl mailto:junc...@amu.edu.pl wrote: Hi, is there a way to find the current source phrase (input path) for lattices in the following feature function hook? FFState* EvaluateWhenApplied( const Hypothesis cur_hypo

Re: [Moses-support] Weird code in Hypothesis::RecombineCompare()

2015-06-25 Thread Marcin Junczys-Dowmunt
I'm pretty sure that's the major bug we missed until now ;) W dniu 2015-06-25 10:09, Jeroen Vermeulen napisał(a): Looking at replacing WordsBitmap's implementation with std::vectorbool (less code, less memory) I came across this function: « /** check, if two hypothesis can be

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
versions. On 23.06.2015 08:36, Marcin Junczys-Dowmunt wrote: I checked for some of my experiments and I get nearly identical bleu scores when using the standard weights, differences are on the second place behind the comma if at all. These results now seem more likely, though there is still

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
, hyp_len=3937, ref_len=3609) On 22 June 2015 at 17:53, Hokage Sama nvnc...@gmail.com mailto:nvnc...@gmail.com wrote: Ok will do On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt junc...@amu.edu.pl mailto:junc...@amu.edu.pl wrote: I don't think so. However, when you

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
, hyp_len=9361, ref_len=9322) BLEU = 12.45, 49.7/17.5/7.8/3.6 (BP=1.000, ratio=1.005, hyp_len=9373, ref_len=9322) BLEU = 12.30, 49.6/17.6/7.5/3.5 (BP=1.000, ratio=1.007, hyp_len=9385, ref_len=9322) On 23.06.2015 09:11, Marcin Junczys-Dowmunt wrote: Now that I think of it, truecasing should

Re: [Moses-support] Major bug found in Moses

2015-06-22 Thread Marcin Junczys-Dowmunt
O. - Original Message - From: Marcin Junczys-Dowmunt junc...@amu.edu.pl To: moses-support@mit.edu Sent: Friday, 19 June, 2015 19:21:45 Subject: Re: [Moses-support] Major bug found in Moses On that interesting idea that moses should be naturally good at translating things, just

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
Sama wrote: Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt junc...@amu.edu.pl mailto:junc...@amu.edu.pl mailto:junc...@amu.edu.pl mailto:junc...@amu.edu.pl wrote: Difficult to tell with that little data. Once you

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
Hi, I think the average is OK, your variance is however quite high. Did you retrain the entire system or just optimize parameters a couple of times? Two useful papers on the topic: https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf http://www.mt-archive.info/MTS-2011-Cettolo.pdf On

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
trained it with what I could collect so far (i.e. only 190,630 words of parallel data). I retrained the entire system each time without any tuning. On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt junc...@amu.edu.pl mailto:junc...@amu.edu.pl wrote: Hi, I think the average is OK

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
Difficult to tell with that little data. Once you get beyond 100,000 segments (or 50,000 at least) i would say 2000 per dev (for tuning) and test set, rest for training. With that few segments it's hard to give you any recommendations since it might just not give meaningful results. It's

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
You're welcome. Take another close look at those varying bleu scores though. That would make me worry if it happened to me for the same data and the same weights. On 22.06.2015 10:31, Hokage Sama wrote: Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt junc

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
://test.true.sm ~/working/test.translated.en 2 ~/working/test.out ~/mosesdecoder/scripts/generic/multi-bleu.perl -lc ~/corpus/test.true.en ~/working/test.translated.en On 22 June 2015 at 01:20, Marcin Junczys-Dowmunt junc...@amu.edu.pl mailto:junc...@amu.edu.pl wrote: Hm. That's

Re: [Moses-support] Exception when exiting moses

2015-06-21 Thread Marcin Junczys-Dowmunt
be worth trying to replace kenlm's wrappers with that one to reduce maintenance burden. Jeroen On June 21, 2015 8:47:50 PM GMT+07:00, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: Recompiling it just now with debug on. There is already a mistake in https://github.com/moses-smt

Re: [Moses-support] Exception when exiting moses

2015-06-21 Thread Marcin Junczys-Dowmunt
With the compact reordering table or anything else from Compact* ? On 21.06.2015 15:19, Hieu Hoang wrote: i'm using my latest master fork with no problems Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu On 21 June 2015 at 17:02, Marcin Junczys-Dowmunt junc

Re: [Moses-support] Exception when exiting moses

2015-06-21 Thread Marcin Junczys-Dowmunt
wrote: What exception? I can haz stack trace? On 06/21/2015 09:02 AM, Marcin Junczys-Dowmunt wrote: Hi, is anyone else getting exceptions when moses exits with the latest master? It seems to be happening in my reordering table and breaks MERT. Wasn't me though

  1   2   3   4   >