Hi,

after some more thinking about this, I'd relabel your proposal to a 
regular bug report, asking for this particular minor fix:

      Whenever moses expects a single factor only (based on the
      configuration) in input/ttable/generation-table/..., no split
      should be done at all.

Here are the details in your three bullet style wording:

- default is non-factored input
   (or rather: if "input factors" is set "0" only, pipe has no special
   meaning)
   There is still an open issue with phrase/generation/reordering
   tables/suffix arrays/whatever. My suggestion is (without having look
   at the code) that whenever the given table speaks about a single
   factor only according to the moses.ini line, no split should be
   performed at all => no pipe would make any harm.

- surely keep the --factorDelimiter (but make it clear that it
   does/does not apply also to the phrase, generation and reordering
   tables)

- keep the regular ASCII '|' as the default

Cheers, O.


On 11/15/2010 10:51 PM, Lane Schwartz wrote:
> I agree. How's this proposal:
> * Default is non-factored input
> * When using factors, have the optional flag --factorDelimiter to allow
> user-specified character for factor delimiter (thanks, Chris :)
> * When using factors, use a default delimiter char of Unicode character
> 2759, MEDIUM VERTICAL BAR, if none is specified by the user flag
>
> On Mon, Nov 15, 2010 at 4:37 PM, Miles Osborne <mi...@inf.ed.ac.uk
> <mailto:mi...@inf.ed.ac.uk>> wrote:
>
>     i second this.
>
>     but can I make another suggestion.  make the default be *non* factored
>     input.  i reckon that most people using Moses don't actually use
>     factors (hands-up if you do).
>     this means, plain input, with absolutely no meta chars in them.
>
>     and if you are going to use meta-chars, why not just have a flag
>     such as:
>
>     --factorDelimiter=|
>
>     etc.
>
>     Miles
>
>     On 15 November 2010 21:30, Hieu Hoang <hieuho...@gmail.com
>     <mailto:hieuho...@gmail.com>> wrote:
>      > That's a good idea. In the decoder, there's 4 places that has to be
>      > changed cos it's hardcoded
>      >   ConfusionNet
>      >    GenerationDictionary
>      >   LanguageModelJoint
>      >    Word::createFromString
>      >
>      > However, the train-model.perl is more difficult to change
>      >
>      > Hieu
>      > Sent from my flying horse
>      >
>      > On 15 Nov 2010, at 09:00 PM, Lane Schwartz <dowob...@gmail.com
>     <mailto:dowob...@gmail.com>> wrote:
>      >
>      >> I'd like to propose changing the current factor delimiter to
>     something other than the single vertical bar |
>      >>
>      >> Looking through the mailing archives, it seems that the failure
>     to properly purge your corpus of vertical bars is a frequent source
>     of headaches for users. I know I've encountered this problem before,
>     but even knowing that I should do this, just today I had to track
>     down another vertical bar-related problem.
>      >>
>      >> I don't really care what the replacement character(s) ends up
>     being, just so that any corpus munging related to this delimiter
>     gets handled internally by moses rather than being the user's
>     responsibility.
>      >>
>      >> If moses could easily be modified to take a multi-character
>     delimeter, that would probably be best. My suggestion for a
>     single-character delimiter would be something with the following
>     characteristics:
>      >>
>      >> * Character should be printable (ie not a control character)
>      >> * Character should be one that's implemented in most commonly
>     used fonts
>      >> * Character should be highly obscure, and extremely unlikely to
>     appear in a corpus
>      >> * Character should not be confusable with any commonly used
>     character.
>      >>
>      >> Many characters in the Dingbats section of Unicode (block 2700)
>     would fit these desiderata.
>      >>
>      >> I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a
>     highly obscure printable character that looks like a thick vertical
>     bar. It's obviously a vertical bar, but just as obviously not the
>     same thing as the regular vertical bar |.
>      >>
>      >> Cheers,
>      >> Lane
>      >> _______________________________________________
>      >> Moses-support mailing list
>      >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>      >> http://mailman.mit.edu/mailman/listinfo/moses-support
>      >
>      > _______________________________________________
>      > Moses-support mailing list
>      > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>      > http://mailman.mit.edu/mailman/listinfo/moses-support
>      >
>
>
>
>     --
>     The University of Edinburgh is a charitable body, registered in
>     Scotland, with registration number SC005336.
>
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
>                  -- R.A. Heinlein, "Time Enough For Love"
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to