Ah, good to know that the scorer was called successfully and that I can ignore the Levenshtein distance errors.
As for allocating a huge piece of memory -- I realized that though my parallel corpus is aligned, I actually split the original corpus by *paragraph* instead of sentence. They're mostly short paragraphs (each max. ~4 sentences, probably max 80-100 tokens or so), but they are some outliers (the largest being ~250-300 tokens). Most paragraphs would only need a few edits, but the largest might need 10-15+. Could this be causing this problem? On Sat, Jan 13, 2018 at 11:08 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl > wrote: > There seem to be multiple issues here. > > > > As I said, I have null experience with EMS, so maybe someone else can help > with that. > > > > The message in extract.err seems to actually mean, that you were > successful in calling the M2 scorer in EMS, the only problem is it dies 😊 > The Levenshtein message is part of a failsafe that is meant to avoid > exponentially long searches. It does not calculate the M2 metric for a > sentence pair where there would be excessively many edits (these are > usually wrong). Theses messages by themselves should not be a reason for > worrying. > > > > The std::bad_alloc on the other hand is not good. It seems the scorer > tries to allocate some huge piece of memory, probably some negative index > somewhere and then dies. I have not seen this before. Is it possible that > your system is creating a lot superfluous edits and the graph algorithm in > M2 is going crazy due to that? > > > > *From: *Kelly Marchisio <kellymarchi...@gmail.com> > *Sent: *Saturday, January 13, 2018 7:46 PM > *To: *Marcin Junczys-Dowmunt <junc...@amu.edu.pl>; moses-support > <moses-support@mit.edu> > *Subject: *Re: [Moses-support] M2 Scorer in EMS for Grammatical Error > Correction > > > > looping back in mailing-list and copying message :) > > > > Thanks so much for the response, Marcin! > > > > I did see your original repo, thanks for sending along. I'd love to get > this going with EMS because it looks like I can just pass in the M2 scorer > with: > > tuning-settings = "-mertdir $moses-bin-dir -mertargs='--sctype M2SCORER' > -threads $cores" > > However it fails with: > > ERROR: Failed to run '/Users/kellymarchisio/L101Fin > al/experiments/tuning/tmp.1/extractor.sh'. at > /Users/kellymarchisio/L101Final/programs/mosesdecoder/scripts/training/ > mert-moses.pl line 1775. > cp: /Users/kellymarchisio/L101Final/experiments/tuning/tmp.1/moses.ini: > No such file or directory > > There may be an error with the mert-moses script itself used with M2, > because moses.ini was never created within tmp.1 > > > > Additionally, in extract.err, I see: > > Binary write mode is NOT selected > Scorer type: M2SCORER > name: case value: true > Data::m_score_type M2Scorer > Data::Scorer type from Scorer: M2Scorer > loading nbest from run1.best100.out.gz > Levenshtein distance is greater than source size. > Levenshtein distance is greater than source size. > extractor(67381,0x7fffde7dd3c0) malloc: *** mach_vm_map(size=3368542481395712) > failed (error code=3)*** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Exception: std::bad_alloc > > > > I'm curious if you've come across these issues (I'm interested why I'm > seeing "Levenshtein distance is greater than source size.") and if you have > any pointers for how I can get mert-moses.pl to work for me with > M2Scorer. > > > > Best, > > Kelly > > > > On Sat, Jan 13, 2018 at 9:13 PM, Kelly Marchisio <kellymarchi...@gmail.com> > wrote: > > Thanks so much for the response, Marcin! > > > > I did see your original repo, thanks for sending along. I'd love to get > this going with EMS because it looks like I can just pass in the M2 scorer > with: > > tuning-settings = "-mertdir $moses-bin-dir -mertargs='--sctype M2SCORER' > -threads $cores" > > However it fails with: > > ERROR: Failed to run '/Users/kellymarchisio/L101Fin > al/experiments/tuning/tmp.1/extractor.sh'. at > /Users/kellymarchisio/L101Final/programs/mosesdecoder/scripts/training/ > mert-moses.pl line 1775. > cp: /Users/kellymarchisio/L101Final/experiments/tuning/tmp.1/moses.ini: > No such file or directory > > There may be an error with the mert-moses script itself used with M2, > because moses.ini was never created within tmp.1 > > > > Additionally, in extract.err, I see: > > Binary write mode is NOT selected > Scorer type: M2SCORER > name: case value: true > Data::m_score_type M2Scorer > Data::Scorer type from Scorer: M2Scorer > loading nbest from run1.best100.out.gz > Levenshtein distance is greater than source size. > Levenshtein distance is greater than source size. > extractor(67381,0x7fffde7dd3c0) malloc: *** mach_vm_map(size=3368542481395712) > failed (error code=3)*** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Exception: std::bad_alloc > > > > I'm curious if you've come across these issues (I'm interested why I'm > seeing "Levenshtein distance is greater than source size.") and if you have > any pointers for how I can get mert-moses.pl to work for me with > M2Scorer. > > > > Best, > > Kelly > > > > On Fri, Jan 12, 2018 at 9:53 PM, Marcin Junczys-Dowmunt < > junc...@amu.edu.pl> wrote: > > Hi, > > We never really used it with EMS, so I do not think anyone can help you > here. Did you have a look at the original repo: > https://github.com/grammatical/baselines-emnlp2016 ? Otherwise we can > probably take this off-list and try to help you personally 😊 > > > > *From: *Kelly Marchisio <kellymarchi...@gmail.com> > *Sent: *Friday, January 12, 2018 6:20 PM > *To: *moses-support <moses-support@mit.edu> > *Subject: *[Moses-support] M2 Scorer in EMS for Grammatical Error > Correction > > > > Does anyone have experience using the M2 scorer for grammatical error > correction with EMS for tuning and evaluation? Junczys-Dowmunt & > Grundkiewicz (2016) use M2 (https://github.com/grammatica > l/baselines-emnlp2016/tree/c4fbcc09b45a46c7c46bdda2ba10484fa16e8f82), but > I see no examples of using it with EMS. > > > > Does anyone have experience or advice on how I can use the M2 scorer for > GEC in my project? I'm having trouble figuring out how to incorporate it > without an example. (for instance, how best to setup experiment.meta & the > config file to incorporate it) > > > > > > > > > > > > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support