Hi. the fix was checked in a few hours ago.
-phi On Tue, Aug 5, 2014 at 2:35 PM, Judah Schvimer <judah.schvi...@mongodb.com> wrote: > Hi, > > I've been playing around with this and I noticed that the protected flag > only "protects" the first example of a regex in a line. Is there any way to > fix this so that it protects every occurrence? > > Thanks, > Judah > > > On Thu, Jul 31, 2014 at 9:32 AM, Philipp Koehn <pko...@inf.ed.ac.uk> > wrote: > >> Hi, >> >> -no-escape turns off this: >> >> if (!$NO_ESCAPING) >> { >> $text =~ s/\&/\&/g; # escape escape >> $text =~ s/\|/\|/g; # factor separator >> $text =~ s/\</\</g; # xml >> $text =~ s/\>/\>/g; # xml >> $text =~ s/\'/\'/g; # xml >> $text =~ s/\"/\"/g; # xml >> $text =~ s/\[/\[/g; # syntax non-terminal >> $text =~ s/\]/\]/g; # syntax non-terminal >> } >> >> Especially not escaping the "|" will cause trouble. >> >> So, you should not turn this off -- it is completely reversible by the >> detokenizer anyway. >> >> -phi >> >> >> >> On Thu, Jul 31, 2014 at 9:09 AM, Judah Schvimer < >> judah.schvi...@mongodb.com> wrote: >> >>> Thanks, that makes sense. One more question. If I use the -no-escape >>> flag will that cause any problems to moses, or does that still escape the >>> special characters that break moses? >>> >>> Judah >>> >>> >>> On Thu, Jul 31, 2014 at 8:52 AM, Philipp Koehn <pko...@inf.ed.ac.uk> >>> wrote: >>> >>>> Hi, >>>> >>>> this is done deliberately: >>>> >>>> # turn `into ' >>>> $text =~ s/\`/\'/g; >>>> >>>> #turn '' into " >>>> $text =~ s/\'\'/ \" /g; >>>> >>>> The motivation is to normalize corpora who used more ``creative'' ways >>>> of quoting. You may want to remove these lines from the tokenizer or >>>> create a switch for the script to optionally turn it off. >>>> >>>> -phi >>>> >>>> >>>> On Wed, Jul 30, 2014 at 5:38 PM, Judah Schvimer < >>>> judah.schvi...@mongodb.com> wrote: >>>> >>>>> It seems that back ticks(`) are being tokenized to apostrophes(') so >>>>> when they get detokenized they show up as an apostrophe and not a >>>>> backtick. >>>>> Additionally, "-no-escape" seems to turn backticks into apostrophes as >>>>> well. I think this is a bug in the tokenizer. Let me know if you think >>>>> I'm >>>>> doing something wrong. >>>>> >>>>> Thanks, >>>>> Judah >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support