Hello Sir, Great efforts. May I request to let me know the major diff between comparable ana parallel corpora..
On Thu, Nov 27, 2014 at 1:51 PM, Peter Kolb <pek...@gmail.com> wrote: > Dear colleagues, > > we have released three types of corpora extracted from 23 language > versions of Wikipedia: > > 1. Wikipedia Monolingual Corpora: more than 5 billion tokens of text in 23 > languages extracted from the Wikipedia. The corpora are annotated with > article and paragraph boundaries, number of incoming links for each > article, anchor texts used to refer to each article (textlinks) and their > frequencies, crosslanguage links, categories and more ( > http://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/). > There is also a script that allows to extract domain-specific sub-corpora > if you provide a list of desired categories. > > 2. Wikipedia Comparable Corpora: more than 41 million bilingually aligned > Wikipedia articles for 253 language pairs ( > http://linguatools.org/tools/corpora/wikipedia-comparable-corpora/). > > 3. Wikipedia Parallel Titles Corpora: bilingual titles of Wikipedia > articles, extended with redirects and textlinks. 487,406,497 unique > parallel segments for 253 language pairs ( > http://linguatools.org/tools/corpora/wikipedia-parallel-titles-corpora/). > > Additionally, there is a tiny German-English parallel corpus containing > 6,802 sentence pairs extracted from bilingual quotations in the German > Wikipedia: > http://linguatools.org/tools/corpora/wikipedia-parallel-quotations-corpora/ > . > > All corpora are released under a Creative Commons Attribution Share-alike > license and are freely available at http://linguatools.org/tools/corpora/. > > Best regards, > Peter Kolb > > -- > > Peter Kolb & Procházková GbR > Perleberger Str. 55 > D-10559 Berlin > > E-Mail: peter.k...@linguatools.org > Internet: http://www.linguatools.org > > > _______________________________________________ > Mt-list site list > Mt-list@eamt.org > http://lists.eamt.org/mailman/listinfo/mt-list > -- *Regards,* Vishal Goyal, Ph.D., M.Tech., MCA, M.C.S.D. Assistant Professor(Stage III), Department of Computer Science, Punjabi University Patiala-147002. [*Online Hindi to Punjabi Machine Translation Tool -* http://h2p.learnpunjabi.org ] [*Statistical Approach Based Hindi to Punjabi Machine Translation System * - http://statmt.org/~vishal/hp/index.cgi - http://tdil-dc.in/hi2pu/index.cgi ] *[Research Cell: An International Journal of Engineering Sciences, http://ijoes.vidyapublications.com <http://ijoes.vidyapublications.com>]*
_______________________________________________ Mt-list site list Mt-list@eamt.org http://lists.eamt.org/mailman/listinfo/mt-list