Hi,

yes, this is what the RECASER section in EMS enables.

-phi

On Wed, May 20, 2015 at 2:50 PM, Lane Schwartz <dowob...@gmail.com> wrote:

>  Got it. So then, how was casing handled in the "mbr/mp" column? Was all
> of the data lowercased, then models trained, then recasing applied after
> decoding? Or something else?
>
> On Wed, May 20, 2015 at 1:30 PM, Philipp Koehn <p...@jhu.edu> wrote:
>
>> Hi,
>>
>>  no, the changes are made incrementally.
>>
>>  So the recesed "baseline" is the previous "mbr/mp" column.
>>
>>  -phi
>>
>> On Wed, May 20, 2015 at 2:01 PM, Lane Schwartz <dowob...@gmail.com>
>> wrote:
>>
>>>  Philipp,
>>>
>>>  In Table 2 of the WMT 2009 paper, are the "baseline" and "truecased"
>>> columns directly comparable? In other words, do the two columns indicate
>>> identical conditions other than a single variable (how and/or when casing
>>> was handled)?
>>>
>>>  In the baseline condition, how and when was casing handled?
>>>
>>>  Thanks,
>>> Lane
>>>
>>>
>>> On Wed, May 20, 2015 at 12:43 PM, Philipp Koehn <p...@jhu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>>  see Section 2.2 in our WMT 2009 submission:
>>>> http://www.statmt.org/wmt09/pdf/WMT-0929.pdf
>>>>
>>>>  One practical reason to avoid recasing is the need
>>>> for a second large cased language model.
>>>>
>>>>  But there is of course also the practical issue with
>>>> have a unique truecasing scheme for each data
>>>> condition, handling of headlines, all-caps emphasis,
>>>> etc.
>>>>
>>>>  It would be worth to revisit this issue again under
>>>> different data conditions / language pairs. Both
>>>> options are readily available in EMS.
>>>>
>>>>  Each of the two alternative methods could be
>>>> improved as well. See for instance:
>>>> http://www.aclweb.org/anthology/N06-1001
>>>>
>>>>  -phi
>>>>
>>>>  -phi
>>>>
>>>>
>>>>  On Wed, May 20, 2015 at 12:31 PM, Lane Schwartz <dowob...@gmail.com>
>>>> wrote:
>>>>
>>>>>   Philipp (and others),
>>>>>
>>>>>  I'm wondering what people's experience is regarding when truecasing
>>>>> is applied.
>>>>>
>>>>>  One option is to truecase the training data, then train your TM and
>>>>> LM using that truecased data. Another option would be to lowercase the
>>>>> data, train TM and LM on the lowercased data, and then perform truecasing
>>>>> after decoding.
>>>>>
>>>>>  I assume that the former gives better results, but the latter
>>>>> approach has an advantage in terms of extensibility (namely if you get 
>>>>> more
>>>>> data and update your truecase model, you don't have to re-train all of 
>>>>> your
>>>>> TMs and LMs).
>>>>>
>>>>>  Does anyone have any insights they would care to share on this?
>>>>>
>>>>>  Thanks,
>>>>> Lane
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>
>>>
>>>  --
>>> When a place gets crowded enough to require ID's, social collapse is not
>>> far away.  It is time to go elsewhere.  The best thing about space travel
>>> is that it made it possible to go elsewhere.
>>>                 -- R.A. Heinlein, "Time Enough For Love"
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
>
>  --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
>                 -- R.A. Heinlein, "Time Enough For Love"
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to