Ok,

Thank you

Bests

Cyrine


2014-02-21 14:55 GMT+01:00 Thomas Meyer <ithurts...@gmail.com>:

> Hi,
>
> Ah, in that case it can actually cause problems: your training data should
> always be formatted in the same way as your dev/test data.
>
> 2 possibilities:
>
> - re-tokenize training data with the actual tokenizer script to have the
> same mark-up (then retrain your system)
> - re-tokenize your dev/test data with the same (possibly older) tokenizer
> script as was used for your training data (then run tuning/decoding)
>
> HTH,
> Thomas
>
>
> On 21 February 2014 14:49, cyrine.na...@univ-lorraine.fr <
> cyrine.na...@gmail.com> wrote:
>
>> Thank you Thomas,
>>
>> So, i keep the text with these Special characters, it will not cause
>> problems? beacuse the training corpus is without these characters but only
>> the development and test corpus are like this.
>>
>> Thank you :)
>>
>> Bets
>>
>>
>> 2014-02-21 14:40 GMT+01:00 Thomas Meyer <ithurts...@gmail.com>:
>>
>>>
>>>
>>> Hi,
>>>
>>> That is not a 'problem' but XML 
>>> entities<http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>
>>>  mark-up
>>> for special characters. You don't have to worry about this, as the
>>> tokenizer script does it for all characters in a consistent way.
>>>
>>> Best,
>>> Thomas
>>>
>>>
>>> On 21 February 2014 14:20, cyrine.na...@univ-lorraine.fr <
>>> cyrine.na...@gmail.com> wrote:
>>>
>>>>
>>>> Hello all,
>>>>
>>>> I have a problem with the tokenizer.pl script. i get as a result a
>>>> text ith some special punctuation , like this for example :
>>>>
>>>> EU &apos;s Luxembourg-based statistical office reported
>>>>
>>>> The input file is a .txt file
>>>>
>>>> Is there any solution for this problem
>>>>
>>>> Thank you in advance
>>>>
>>>>
>>>> Bests
>>>> --
>>>> *Cyrine*
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> *Cyrine NASRIPh.D. Student in Computer Science*
>>
>
>


-- 

*Cyrine NASRIPh.D. Student in Computer Science*
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to