Hello Sir,
Great efforts. May I request to let me know the major diff between
comparable ana parallel corpora..

On Thu, Nov 27, 2014 at 1:51 PM, Peter Kolb <pek...@gmail.com> wrote:

> Dear colleagues,
>
> we have released three types of corpora extracted from 23 language
> versions of Wikipedia:
>
> 1. Wikipedia Monolingual Corpora: more than 5 billion tokens of text in 23
> languages extracted from the Wikipedia. The corpora are annotated with
> article and paragraph boundaries, number of incoming links for each
> article, anchor texts used to refer to each article (textlinks) and their
> frequencies, crosslanguage links, categories and more (
> http://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/).
> There is also a script that allows to extract domain-specific sub-corpora
> if you provide a list of desired categories.
>
> 2. Wikipedia Comparable Corpora: more than 41 million bilingually aligned
> Wikipedia articles for 253 language pairs (
> http://linguatools.org/tools/corpora/wikipedia-comparable-corpora/).
>
> 3. Wikipedia Parallel Titles Corpora: bilingual titles of Wikipedia
> articles, extended with redirects and textlinks. 487,406,497 unique
> parallel segments for 253 language pairs (
> http://linguatools.org/tools/corpora/wikipedia-parallel-titles-corpora/).
>
> Additionally, there is a tiny German-English parallel corpus containing
> 6,802 sentence pairs extracted from bilingual quotations in the German
> Wikipedia:
> http://linguatools.org/tools/corpora/wikipedia-parallel-quotations-corpora/
> .
>
> All corpora are released under a Creative Commons Attribution Share-alike
> license and are freely available at http://linguatools.org/tools/corpora/.
>
> Best regards,
> Peter Kolb
>
> --
>
> Peter Kolb & Procházková GbR
> Perleberger Str. 55
> D-10559 Berlin
>
> E-Mail: peter.k...@linguatools.org
> Internet: http://www.linguatools.org
>
>
> _______________________________________________
> Mt-list site list
> Mt-list@eamt.org
> http://lists.eamt.org/mailman/listinfo/mt-list
>



-- 
*Regards,*
Vishal Goyal,
Ph.D., M.Tech., MCA, M.C.S.D.
Assistant Professor(Stage III),
Department of Computer Science,
Punjabi University Patiala-147002.

[*Online Hindi to Punjabi Machine Translation Tool -*
http://h2p.learnpunjabi.org ]
[*Statistical Approach Based Hindi to Punjabi Machine Translation System *
- http://statmt.org/~vishal/hp/index.cgi
- http://tdil-dc.in/hi2pu/index.cgi
]
*[Research Cell: An International Journal of Engineering Sciences,
http://ijoes.vidyapublications.com <http://ijoes.vidyapublications.com>]*
_______________________________________________
Mt-list site list
Mt-list@eamt.org
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to