Re: [Moses-support] europarl v9 - released or not? Can be used?
Incidentally, you can split the fields more simply using the “unpaste” command: cat file_de-en.tsv | unpaste file.{de,en} Unpaste is available here: https://github.com/mjpost/bin/blob/master/unpaste matt (from my phone) > Le 30 mars 2020 à 21:01, Artem Shevchenko a écrit : > > > found how to split fields in tab-separated de-en sentences. > just if someone needs it, do it with cut - f 1 or 2: > cat file_de-en.tsv | cut -f 1 > file.de > cat file_de-en.tsv | cut -f 2 > file.en > > so the only question, is europarl v9 better than v8 or v7. > > вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko : >> Hello, >> >> thank you very much for your reply. >> my target is to rebuild translation memory for de-en pair while keeping >> truecase in the German phrase table. >> In models released with 4.0 for de-en it is all smallcased, which makes >> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu >> wissen or sie (she) and Sie (you). >> I observe the file extension is tsv, different to v7. it is a tab-separated >> de-en text file. >> so I need to split it into two. >> what would be the best way? is there a python script for it? >> >> Is v9 better than v8 and v7? >> >> Thanks! >> Artem Shevchenko >> >> >> >> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn : >>> Hi, >>> >>> you are free to use this data - v9 has only been generated for some >>> language pairs, since the amount of translations have not increased >>> significantly for a few years by now. >>> >>> -phi >>> >>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko wrote: Hello, I have found this: http://www.statmt.org/europarl/v9/ dated 2019-02 It contains parallel corpus v9? However no mentioning of v9 elsewhere. Is it released? Can it be used? Thank you! Artem Shevchenko > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] europarl v9 - released or not? Can be used?
Hi, v9 is mainly for other languages - it is slightly bigger than earlier versions for languages where multiple versions exist. -phi On Mon, Mar 30, 2020 at 9:01 PM Artem Shevchenko wrote: > found how to split fields in tab-separated de-en sentences. > just if someone needs it, do it with cut - f 1 or 2: > cat file_de-en.tsv | cut -f 1 > file.de > cat file_de-en.tsv | cut -f 2 > file.en > > so the only question, is europarl v9 better than v8 or v7. > > вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko : > >> Hello, >> >> thank you very much for your reply. >> my target is to rebuild translation memory for de-en pair while keeping >> truecase in the German phrase table. >> In models released with 4.0 for de-en it is all smallcased, which makes >> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu >> wissen or sie (she) and Sie (you). >> I observe the file extension is tsv, different to v7. it is a >> tab-separated de-en text file. >> so I need to split it into two. >> what would be the best way? is there a python script for it? >> >> Is v9 better than v8 and v7? >> >> Thanks! >> Artem Shevchenko >> >> >> >> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn : >> >>> Hi, >>> >>> you are free to use this data - v9 has only been generated for some >>> language pairs, since the amount of translations have not increased >>> significantly for a few years by now. >>> >>> -phi >>> >>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko >>> wrote: >>> Hello, I have found this: http://www.statmt.org/europarl/v9/ dated 2019-02 It contains parallel corpus v9? However no mentioning of v9 elsewhere. Is it released? Can it be used? Thank you! Artem Shevchenko >>> ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] europarl v9 - released or not? Can be used?
found how to split fields in tab-separated de-en sentences. just if someone needs it, do it with cut - f 1 or 2: cat file_de-en.tsv | cut -f 1 > file.de cat file_de-en.tsv | cut -f 2 > file.en so the only question, is europarl v9 better than v8 or v7. вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko : > Hello, > > thank you very much for your reply. > my target is to rebuild translation memory for de-en pair while keeping > truecase in the German phrase table. > In models released with 4.0 for de-en it is all smallcased, which makes > impossible to distinguish between e.g. a noun (das Wissen) and a verb zu > wissen or sie (she) and Sie (you). > I observe the file extension is tsv, different to v7. it is a > tab-separated de-en text file. > so I need to split it into two. > what would be the best way? is there a python script for it? > > Is v9 better than v8 and v7? > > Thanks! > Artem Shevchenko > > > > пн, 30 мар. 2020 г. в 21:50, Philipp Koehn : > >> Hi, >> >> you are free to use this data - v9 has only been generated for some >> language pairs, since the amount of translations have not increased >> significantly for a few years by now. >> >> -phi >> >> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko >> wrote: >> >>> Hello, >>> >>> I have found this: >>> http://www.statmt.org/europarl/v9/ dated 2019-02 >>> It contains parallel corpus v9? >>> >>> However no mentioning of v9 elsewhere. >>> Is it released? >>> Can it be used? >>> >>> Thank you! >>> Artem Shevchenko >>> >> ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] europarl v9 - released or not? Can be used?
Hello, thank you very much for your reply. my target is to rebuild translation memory for de-en pair while keeping truecase in the German phrase table. In models released with 4.0 for de-en it is all smallcased, which makes impossible to distinguish between e.g. a noun (das Wissen) and a verb zu wissen or sie (she) and Sie (you). I observe the file extension is tsv, different to v7. it is a tab-separated de-en text file. so I need to split it into two. what would be the best way? is there a python script for it? Is v9 better than v8 and v7? Thanks! Artem Shevchenko пн, 30 мар. 2020 г. в 21:50, Philipp Koehn : > Hi, > > you are free to use this data - v9 has only been generated for some > language pairs, since the amount of translations have not increased > significantly for a few years by now. > > -phi > > On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko > wrote: > >> Hello, >> >> I have found this: >> http://www.statmt.org/europarl/v9/ dated 2019-02 >> It contains parallel corpus v9? >> >> However no mentioning of v9 elsewhere. >> Is it released? >> Can it be used? >> >> Thank you! >> Artem Shevchenko >> > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] europarl v9 - released or not? Can be used?
Hi, you are free to use this data - v9 has only been generated for some language pairs, since the amount of translations have not increased significantly for a few years by now. -phi On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko wrote: > Hello, > > I have found this: > http://www.statmt.org/europarl/v9/ dated 2019-02 > It contains parallel corpus v9? > > However no mentioning of v9 elsewhere. > Is it released? > Can it be used? > > Thank you! > Artem Shevchenko > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] europarl v9 - released or not? Can be used?
Hello, I have found this: http://www.statmt.org/europarl/v9/ dated 2019-02 It contains parallel corpus v9? However no mentioning of v9 elsewhere. Is it released? Can it be used? Thank you! Artem Shevchenko ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support