[Apologies for multiple postings]
We are happy to announce that 3 new written corpora are now available in
our catalogue.
**
*Danish Gigaword Corpus *
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0318/>
*ISLRN: 024-504-318-388-3
<http://www.islrn.org/resources/024-504-318-388-3>*
This corpus consists of over a billion words for Danish collected from
various websites. Domains are distributed as follows: Legal (308.8
million words), Social Media (261.4 million words), Subtitles (130.1
million words), Debates (108.4 million words), Conversations (0.7
million words), Web (101.02 million words), Encyclopedia (55.6 million
words), Literature (31.3 million words), Manuals (2.6 million words),
Books (2.1 million words), Religion (600k words), News (40 million
words), Other (1.2 million words).
*English-Punjabi Code-Mixed Social Media Content*
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0319/>
*ISLRN: **695-759-706-170-8*
<http://www.islrn.org/resources/695-759-706-170-8>
The English-Punjabi Code-Mixed Social Media Content corpus is composed
of 893,615 parallel sentences of English-Punjabi in the following
domains: Agriculture, Culture, Entertainment, Health, Religion, Sports,
Technology, Tourism, Education, and Entertainment.
*Parallel Corpora for 6 Indian Languages *
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0320/>
*ISLRN: 657-350-757-058-6
<http://www.islrn.org/resources/657-350-757-058-6>*
The Parallel Corpora for 6 Indian Languages contains data sets for
Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000
words – 37,000 parallel sentences), Malayalam (660,000 words – 29,000
parallel sentences), Tamil (747,000 words – 35,000 parallel sentences),
Telugu (951,000 words – 43,000 parallel sentences), and Urdu (1,200,000
words – 33,000 parallel sentences), translated into English. Each data
set was created by taking around 100 Indian-language Wikipedia pages and
obtaining four independent translations in English of each of the
sentences in those documents via non-professional translators hired by
crowdsourcing on Amazon Mechanical Turk.
For more information on the catalogue or if you would like to enquire
about having your resources distributed by ELRA, please *contact us
<mailto:cont...@elda.org>*.
_________________________________________
Visit the *ELRA Catalogue of Language Resources <http://catalog.elra.info>*
Visit the *Universal Catalogue <http://universal.elra.info>***
*Archives
<http://www.elra.info/en/catalogues/language-resources-announcements>*of
ELRA Language Resources Catalogue Updates_______________________________________________
Mt-list site list
Mt-list@lists.eamt.org
https://lists.eamt.org/cgi-bin/mailman/listinfo/mt-list