[Apologies for multiple postings]

We are happy to announce that 3 new written corpora are now available in our catalogue.
**
*Danish Gigaword Corpus * <http://catalog.elra.info/en-us/repository/browse/ELRA-W0318/> *ISLRN: 024-504-318-388-3 <http://www.islrn.org/resources/024-504-318-388-3>* This corpus consists of over a billion words for Danish collected from various websites. Domains are distributed as follows: Legal (308.8 million words), Social Media (261.4 million words), Subtitles (130.1 million words), Debates (108.4 million words), Conversations (0.7 million words), Web (101.02 million words), Encyclopedia (55.6 million words), Literature (31.3 million words), Manuals (2.6 million words), Books (2.1 million words), Religion (600k words), News (40 million words), Other (1.2 million words).

*English-Punjabi Code-Mixed Social Media Content* <http://catalog.elra.info/en-us/repository/browse/ELRA-W0319/> *ISLRN: **695-759-706-170-8* <http://www.islrn.org/resources/695-759-706-170-8> The English-Punjabi Code-Mixed Social Media Content corpus is composed of 893,615 parallel sentences of English-Punjabi in the following domains: Agriculture, Culture, Entertainment, Health, Religion, Sports, Technology, Tourism, Education, and Entertainment.

*Parallel Corpora for 6 Indian Languages * <http://catalog.elra.info/en-us/repository/browse/ELRA-W0320/> *ISLRN: 657-350-757-058-6 <http://www.islrn.org/resources/657-350-757-058-6>* The Parallel Corpora for 6 Indian Languages contains data sets for Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000 words – 37,000 parallel sentences), Malayalam (660,000 words – 29,000 parallel sentences), Tamil (747,000 words – 35,000 parallel sentences), Telugu (951,000 words – 43,000 parallel sentences), and Urdu (1,200,000 words – 33,000 parallel sentences), translated into English. Each data set was created by taking around 100 Indian-language Wikipedia pages and obtaining four independent translations in English of each of the sentences in those documents via non-professional translators hired by crowdsourcing on Amazon Mechanical Turk.

For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please *contact us <mailto:cont...@elda.org>*.
_________________________________________
Visit the *ELRA Catalogue of Language Resources <http://catalog.elra.info>*
Visit the *Universal Catalogue <http://universal.elra.info>***
*Archives <http://www.elra.info/en/catalogues/language-resources-announcements>*of ELRA Language Resources Catalogue Updates
_______________________________________________
Mt-list site list
Mt-list@lists.eamt.org
https://lists.eamt.org/cgi-bin/mailman/listinfo/mt-list

Reply via email to