Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-13 Thread Achim Ruopp
You will need a Chinese word segmenter to prepare the data for
training/decoding. There are several available (list in no particular
order):
http://code.google.com/p/zhseg/
http://nlp.stanford.edu/software/segmenter.shtml
http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm#cseg
I haven't tried any of them and I believe most of them are for the
Simplified Chinese script.

On Fri, Feb 12, 2010 at 11:10 PM, nati g  wrote:

> Hello,
>
> Did any tried setting up moses for translating english --> chinese?. please
> share any information ,scripts that can be used other than provided in step
> by step guide.
>
> Thanks in Advance.
>
> On Fri, Feb 12, 2010 at 7:15 PM, Christine de Bond  wrote:
>
>> You might ask the moses-list people if anyone has done english-chinese
>> translation / alignment and got any reasonable output. They might give you
>> some more hints!
>>
>> by the way, how big is you parallel corpus?
>> Another idea might be to check if factored translation models are of any
>> help to you (I'm thinking of alignment and reordering factors here - but I'm
>> not sure, if this is appropriate for Chinese...)
>>
>> nati g schrieb:
>>
>>> Hi Christine,
>>> thank you very much for the information.
>>>  I had aleady tried skipping these steps, but the translation quality is
>>> too bad.
>>> unlike to europen languages,double byte languages like
>>> chinese,koren,japanies have a different language syntax.for example
>>> tanslation of an english string with few words may be in a single
>>> character.i guess because of these types of synatic dissimilarites we are
>>> not getting good translation model after training.
>>>  Thank you very much.
>>>
>>> On Thu, Feb 11, 2010 at 7:46 PM, Christine de Bond >> deb...@gmx.net>> wrote:
>>>
>>>Hi
>>>I don't know much about Chinese, but there is no lowercase in
>>>Chinese, right?
>>>You can skip the lowercasing part, if there are no
>>>capital/lowercase letters in Chinese.
>>>
>>>As for tokenizing - best is to have a look at the perl-script so
>>>see what it's doing. You should make sure, that no punctuation (if
>>>there is any in Chinese) is not concatenated with words ( word. ->
>>>word . ) I think, the moses-tokenizer-script should work well for
>>>your corpus - as long as there is no special issue in chinese
>>>punctuation.
>>>(I've so far used it with latin and persian character sets.)
>>>
>>>Best is to try out the tokenizer.perl script with some test
>>>sentences to see what the script is doing to your input.
>>>
>>>Christine
>>>
>>>nati g schrieb:
>>>
>>>Hi,
>>> Thank you very much reply.
>>>i am having concerns about the tokenizer, lowercasing,sort
>>>scripts while training the translation model from corpus.
>>>will thsese no thave any effect on language going to use?
>>>On Thu, Feb 11, 2010 at 2:43 PM, Christine de Bond
>>>mailto:deb...@gmx.net> >>
>>>>> wrote:
>>>
>>>   Hi,
>>>   moses is language-independent. There is no need for adaptation.
>>>   Best is to follow the "Step-by-Step Guide" on the moses
>>>website to
>>>   get started.
>>>
>>>   Regards,
>>>   Christine
>>>
>>>   nati g schrieb:
>>>
>>>   Hello,
>>>Do we need any special scripts to build moses for
>>>translating
>>>   english to chinese.
>>>thanks in advance.
>>>
>>> 
>>>
>>>
>>>
>>>   ___
>>>   Moses-support mailing list
>>>   Moses-support@mit.edu 
>>>>
>>>
>>>
>>>   http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-12 Thread nati g
Hello,

Did any tried setting up moses for translating english --> chinese?. please
share any information ,scripts that can be used other than provided in step
by step guide.

Thanks in Advance.

On Fri, Feb 12, 2010 at 7:15 PM, Christine de Bond  wrote:

> You might ask the moses-list people if anyone has done english-chinese
> translation / alignment and got any reasonable output. They might give you
> some more hints!
>
> by the way, how big is you parallel corpus?
> Another idea might be to check if factored translation models are of any
> help to you (I'm thinking of alignment and reordering factors here - but I'm
> not sure, if this is appropriate for Chinese...)
>
> nati g schrieb:
>
>> Hi Christine,
>> thank you very much for the information.
>>  I had aleady tried skipping these steps, but the translation quality is
>> too bad.
>> unlike to europen languages,double byte languages like
>> chinese,koren,japanies have a different language syntax.for example
>> tanslation of an english string with few words may be in a single
>> character.i guess because of these types of synatic dissimilarites we are
>> not getting good translation model after training.
>>  Thank you very much.
>>
>> On Thu, Feb 11, 2010 at 7:46 PM, Christine de Bond > deb...@gmx.net>> wrote:
>>
>>Hi
>>I don't know much about Chinese, but there is no lowercase in
>>Chinese, right?
>>You can skip the lowercasing part, if there are no
>>capital/lowercase letters in Chinese.
>>
>>As for tokenizing - best is to have a look at the perl-script so
>>see what it's doing. You should make sure, that no punctuation (if
>>there is any in Chinese) is not concatenated with words ( word. ->
>>word . ) I think, the moses-tokenizer-script should work well for
>>your corpus - as long as there is no special issue in chinese
>>punctuation.
>>(I've so far used it with latin and persian character sets.)
>>
>>Best is to try out the tokenizer.perl script with some test
>>sentences to see what the script is doing to your input.
>>
>>Christine
>>
>>nati g schrieb:
>>
>>Hi,
>> Thank you very much reply.
>>i am having concerns about the tokenizer, lowercasing,sort
>>scripts while training the translation model from corpus.
>>will thsese no thave any effect on language going to use?
>>On Thu, Feb 11, 2010 at 2:43 PM, Christine de Bond
>>mailto:deb...@gmx.net> >
>>>> wrote:
>>
>>   Hi,
>>   moses is language-independent. There is no need for adaptation.
>>   Best is to follow the "Step-by-Step Guide" on the moses
>>website to
>>   get started.
>>
>>   Regards,
>>   Christine
>>
>>   nati g schrieb:
>>
>>   Hello,
>>Do we need any special scripts to build moses for
>>translating
>>   english to chinese.
>>thanks in advance.
>>
>> 
>>
>>
>>
>>   ___
>>   Moses-support mailing list
>>   Moses-support@mit.edu 
>>>
>>
>>
>>   http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-11 Thread Miles Osborne
How words are tokenised / segmented etc is crucial when using "small"
amounts of data.  For the vast numbers of people using Moses (people
not training-up on millions of sentence pairs) this is the kind of
thing that needs to be done correctly.

It would be a service to extend the Moses tokeniser to deal with
languages other than just those ones you mentioned before.

Miles

On 11 February 2010 17:51, Christof Pintaske  wrote:
> Hi,
>
> you may want to have a closer look at tokenizer.perl which is used for
> word-breaking. It seems there is some special logic to handle English,
> French, and Italian but nothing much else.
>
> I'm not sure if you can or plan to reveal your findings here on the list
> but at any rate I'd be very interested to learn how Chinese worked for you.
>
> best regards
> Christof
>
> nati g wrote:
>> Hello,
>>  Do we need any special scripts to build moses for translating english
>> to chinese.
>>
>> thanks in advance.
>>
>>
>> 
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-11 Thread Christof Pintaske
Hi,

you may want to have a closer look at tokenizer.perl which is used for 
word-breaking. It seems there is some special logic to handle English, 
French, and Italian but nothing much else.

I'm not sure if you can or plan to reveal your findings here on the list 
but at any rate I'd be very interested to learn how Chinese worked for you.

best regards
Christof

nati g wrote:
> Hello,
>  Do we need any special scripts to build moses for translating english 
> to chinese.
>  
> thanks in advance.
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-11 Thread Christine de Bond
Hi,
moses is language-independent. There is no need for adaptation.
Best is to follow the "Step-by-Step Guide" on the moses website to get 
started.

Regards,
Christine

nati g schrieb:
> Hello,
>  Do we need any special scripts to build moses for translating english 
> to chinese.
>  
> thanks in advance.
> 
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>   
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Build Moses for translating English to Chinese.

2010-02-10 Thread nati g
Hello,
 Do we need any special scripts to build moses for translating english to
chinese.

thanks in advance.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support