Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Matt Post
Incidentally, you can split the fields more simply using the “unpaste” command:

cat file_de-en.tsv | unpaste file.{de,en}

Unpaste is available here:

https://github.com/mjpost/bin/blob/master/unpaste

matt (from my phone)

> Le 30 mars 2020 à 21:01, Artem Shevchenko  a écrit :
> 
> 
> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut - f 1 or 2:
> cat file_de-en.tsv |  cut -f 1 > file.de
> cat file_de-en.tsv |  cut -f 2 > file.en
> 
> so the only question, is europarl v9 better than v8 or v7.
> 
> вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko :
>> Hello, 
>> 
>> thank you very much for your reply.
>> my target is to rebuild translation memory for de-en pair while keeping 
>> truecase in the German phrase table. 
>> In models released with 4.0 for de-en it is all smallcased, which makes 
>> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu 
>> wissen or sie (she) and Sie (you).
>> I observe the file extension is tsv, different to v7. it is a tab-separated 
>> de-en text file.
>> so I need to split it into two.
>> what would be the best way? is there a python script for it?
>> 
>> Is v9 better than v8 and v7?
>> 
>> Thanks!
>> Artem Shevchenko
>> 
>> 
>> 
>> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn :
>>> Hi,
>>> 
>>> you are free to use this data - v9 has only been generated for some
>>> language pairs, since the amount of translations have not increased
>>> significantly for a few years by now.
>>> 
>>> -phi
>>> 
>>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko  wrote:
 Hello,
 
 I have found this:
 http://www.statmt.org/europarl/v9/ dated 2019-02
 It contains parallel corpus v9?
 
 However no mentioning of v9 elsewhere.
 Is it released?
 Can it be used?
 
 Thank you!
 Artem Shevchenko
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Philipp Koehn
Hi,

v9 is mainly for other languages - it is slightly bigger than earlier
versions for languages
where multiple versions exist.

-phi

On Mon, Mar 30, 2020 at 9:01 PM Artem Shevchenko  wrote:

> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut - f 1 or 2:
> cat file_de-en.tsv |  cut -f 1 > file.de
> cat file_de-en.tsv |  cut -f 2 > file.en
>
> so the only question, is europarl v9 better than v8 or v7.
>
> вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko :
>
>> Hello,
>>
>> thank you very much for your reply.
>> my target is to rebuild translation memory for de-en pair while keeping
>> truecase in the German phrase table.
>> In models released with 4.0 for de-en it is all smallcased, which makes
>> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
>> wissen or sie (she) and Sie (you).
>> I observe the file extension is tsv, different to v7. it is a
>> tab-separated de-en text file.
>> so I need to split it into two.
>> what would be the best way? is there a python script for it?
>>
>> Is v9 better than v8 and v7?
>>
>> Thanks!
>> Artem Shevchenko
>>
>>
>>
>> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn :
>>
>>> Hi,
>>>
>>> you are free to use this data - v9 has only been generated for some
>>> language pairs, since the amount of translations have not increased
>>> significantly for a few years by now.
>>>
>>> -phi
>>>
>>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko 
>>> wrote:
>>>
 Hello,

 I have found this:
 http://www.statmt.org/europarl/v9/ dated 2019-02
 It contains parallel corpus v9?

 However no mentioning of v9 elsewhere.
 Is it released?
 Can it be used?

 Thank you!
 Artem Shevchenko

>>>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Artem Shevchenko
found how to split fields in tab-separated de-en sentences.
just if someone needs it, do it with cut - f 1 or 2:
cat file_de-en.tsv |  cut -f 1 > file.de
cat file_de-en.tsv |  cut -f 2 > file.en

so the only question, is europarl v9 better than v8 or v7.

вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko :

> Hello,
>
> thank you very much for your reply.
> my target is to rebuild translation memory for de-en pair while keeping
> truecase in the German phrase table.
> In models released with 4.0 for de-en it is all smallcased, which makes
> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
> wissen or sie (she) and Sie (you).
> I observe the file extension is tsv, different to v7. it is a
> tab-separated de-en text file.
> so I need to split it into two.
> what would be the best way? is there a python script for it?
>
> Is v9 better than v8 and v7?
>
> Thanks!
> Artem Shevchenko
>
>
>
> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn :
>
>> Hi,
>>
>> you are free to use this data - v9 has only been generated for some
>> language pairs, since the amount of translations have not increased
>> significantly for a few years by now.
>>
>> -phi
>>
>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko 
>> wrote:
>>
>>> Hello,
>>>
>>> I have found this:
>>> http://www.statmt.org/europarl/v9/ dated 2019-02
>>> It contains parallel corpus v9?
>>>
>>> However no mentioning of v9 elsewhere.
>>> Is it released?
>>> Can it be used?
>>>
>>> Thank you!
>>> Artem Shevchenko
>>>
>>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Artem Shevchenko
Hello,

thank you very much for your reply.
my target is to rebuild translation memory for de-en pair while keeping
truecase in the German phrase table.
In models released with 4.0 for de-en it is all smallcased, which makes
impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
wissen or sie (she) and Sie (you).
I observe the file extension is tsv, different to v7. it is a tab-separated
de-en text file.
so I need to split it into two.
what would be the best way? is there a python script for it?

Is v9 better than v8 and v7?

Thanks!
Artem Shevchenko



пн, 30 мар. 2020 г. в 21:50, Philipp Koehn :

> Hi,
>
> you are free to use this data - v9 has only been generated for some
> language pairs, since the amount of translations have not increased
> significantly for a few years by now.
>
> -phi
>
> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko 
> wrote:
>
>> Hello,
>>
>> I have found this:
>> http://www.statmt.org/europarl/v9/ dated 2019-02
>> It contains parallel corpus v9?
>>
>> However no mentioning of v9 elsewhere.
>> Is it released?
>> Can it be used?
>>
>> Thank you!
>> Artem Shevchenko
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Philipp Koehn
Hi,

you are free to use this data - v9 has only been generated for some
language pairs, since the amount of translations have not increased
significantly for a few years by now.

-phi

On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko  wrote:

> Hello,
>
> I have found this:
> http://www.statmt.org/europarl/v9/ dated 2019-02
> It contains parallel corpus v9?
>
> However no mentioning of v9 elsewhere.
> Is it released?
> Can it be used?
>
> Thank you!
> Artem Shevchenko
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Artem Shevchenko
Hello,

I have found this:
http://www.statmt.org/europarl/v9/ dated 2019-02
It contains parallel corpus v9?

However no mentioning of v9 elsewhere.
Is it released?
Can it be used?

Thank you!
Artem Shevchenko
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support