Ah, good to know that the scorer was called successfully and that I can
ignore the Levenshtein distance errors.

As for allocating a huge piece of memory -- I realized that though my
parallel corpus is aligned, I actually split the original corpus by
*paragraph* instead of sentence. They're mostly short paragraphs (each max.
~4 sentences, probably max 80-100 tokens or so), but they are some outliers
(the largest being ~250-300 tokens).  Most paragraphs would only need a few
edits, but the largest might need 10-15+.  Could this be causing this
problem?

On Sat, Jan 13, 2018 at 11:08 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl
> wrote:

> There seem to be multiple issues here.
>
>
>
> As I said, I have null experience with EMS, so maybe someone else can help
> with that.
>
>
>
> The message in extract.err seems to actually mean, that you were
> successful in calling the M2 scorer in EMS, the only problem is it dies 😊
> The Levenshtein message is part of a failsafe that is meant to avoid
> exponentially long searches. It does not calculate the M2 metric for a
> sentence pair where there would be excessively many edits (these are
> usually wrong). Theses messages by themselves should not be a reason for
> worrying.
>
>
>
> The std::bad_alloc on the other hand is not good. It seems the scorer
> tries to allocate some huge piece of memory, probably some negative index
> somewhere and then dies. I have not seen this before. Is it possible that
> your system is creating a lot superfluous edits and the graph algorithm in
> M2 is going crazy due to that?
>
>
>
> *From: *Kelly Marchisio <kellymarchi...@gmail.com>
> *Sent: *Saturday, January 13, 2018 7:46 PM
> *To: *Marcin Junczys-Dowmunt <junc...@amu.edu.pl>; moses-support
> <moses-support@mit.edu>
> *Subject: *Re: [Moses-support] M2 Scorer in EMS for Grammatical Error
> Correction
>
>
>
> looping back in mailing-list and copying message :)
>
>
>
> Thanks so much for the response, Marcin!
>
>
>
> I did see your original repo, thanks for sending along.  I'd love to get
> this going with EMS because it looks like I can just pass in the M2 scorer
> with:
>
> tuning-settings = "-mertdir $moses-bin-dir -mertargs='--sctype M2SCORER'
> -threads $cores"
>
> However it fails with:
>
> ERROR: Failed to run '/Users/kellymarchisio/L101Fin
> al/experiments/tuning/tmp.1/extractor.sh'. at
> /Users/kellymarchisio/L101Final/programs/mosesdecoder/scripts/training/
> mert-moses.pl line 1775.
> cp: /Users/kellymarchisio/L101Final/experiments/tuning/tmp.1/moses.ini:
> No such file or directory
>
> There may be an error with the mert-moses script itself used with M2,
> because moses.ini was never created within tmp.1
>
>
>
> Additionally, in extract.err, I see:
>
> Binary write mode is NOT selected
> Scorer type: M2SCORER
> name: case value: true
> Data::m_score_type M2Scorer
> Data::Scorer type from Scorer: M2Scorer
> loading nbest from run1.best100.out.gz
> Levenshtein distance is greater than source size.
> Levenshtein distance is greater than source size.
> extractor(67381,0x7fffde7dd3c0) malloc: *** mach_vm_map(size=3368542481395712)
> failed (error code=3)*** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> Exception: std::bad_alloc
>
>
>
> I'm curious if you've come across these issues (I'm interested why I'm
> seeing "Levenshtein distance is greater than source size.") and if you have
> any pointers for how I can get mert-moses.pl to work for me with
> M2Scorer.
>
>
>
> Best,
>
> Kelly
>
>
>
> On Sat, Jan 13, 2018 at 9:13 PM, Kelly Marchisio <kellymarchi...@gmail.com>
> wrote:
>
> Thanks so much for the response, Marcin!
>
>
>
> I did see your original repo, thanks for sending along.  I'd love to get
> this going with EMS because it looks like I can just pass in the M2 scorer
> with:
>
> tuning-settings = "-mertdir $moses-bin-dir -mertargs='--sctype M2SCORER'
> -threads $cores"
>
> However it fails with:
>
> ERROR: Failed to run '/Users/kellymarchisio/L101Fin
> al/experiments/tuning/tmp.1/extractor.sh'. at
> /Users/kellymarchisio/L101Final/programs/mosesdecoder/scripts/training/
> mert-moses.pl line 1775.
> cp: /Users/kellymarchisio/L101Final/experiments/tuning/tmp.1/moses.ini:
> No such file or directory
>
> There may be an error with the mert-moses script itself used with M2,
> because moses.ini was never created within tmp.1
>
>
>
> Additionally, in extract.err, I see:
>
> Binary write mode is NOT selected
> Scorer type: M2SCORER
> name: case value: true
> Data::m_score_type M2Scorer
> Data::Scorer type from Scorer: M2Scorer
> loading nbest from run1.best100.out.gz
> Levenshtein distance is greater than source size.
> Levenshtein distance is greater than source size.
> extractor(67381,0x7fffde7dd3c0) malloc: *** mach_vm_map(size=3368542481395712)
> failed (error code=3)*** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> Exception: std::bad_alloc
>
>
>
> I'm curious if you've come across these issues (I'm interested why I'm
> seeing "Levenshtein distance is greater than source size.") and if you have
> any pointers for how I can get mert-moses.pl to work for me with
> M2Scorer.
>
>
>
> Best,
>
> Kelly
>
>
>
> On Fri, Jan 12, 2018 at 9:53 PM, Marcin Junczys-Dowmunt <
> junc...@amu.edu.pl> wrote:
>
> Hi,
>
> We never really used it with EMS, so I do not think anyone can help you
> here. Did you have a look at the original repo:
> https://github.com/grammatical/baselines-emnlp2016 ? Otherwise we can
> probably take this off-list and try to help you personally 😊
>
>
>
> *From: *Kelly Marchisio <kellymarchi...@gmail.com>
> *Sent: *Friday, January 12, 2018 6:20 PM
> *To: *moses-support <moses-support@mit.edu>
> *Subject: *[Moses-support] M2 Scorer in EMS for Grammatical Error
> Correction
>
>
>
> Does anyone have experience using the M2 scorer for grammatical error
> correction with EMS for tuning and evaluation? Junczys-Dowmunt &
> Grundkiewicz (2016) use M2 (https://github.com/grammatica
> l/baselines-emnlp2016/tree/c4fbcc09b45a46c7c46bdda2ba10484fa16e8f82), but
> I see no examples of using it with EMS.
>
>
>
> Does anyone have experience or advice on how I can use the M2 scorer for
> GEC in my project? I'm having trouble figuring out how to incorporate it
> without an example. (for instance, how best to setup experiment.meta & the
> config file to incorporate it)
>
>
>
>
>
>
>
>
>
>
>
>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to