[Corpora-List] ELRA Catalogue of Language Resources - Update

Hélène Mazo via Corpora Mon, 13 Nov 2023 08:47:10 -0800

[Apologies for multiple postings]

We are happy to announce that 1 new written corpus, 1 new monolinguallexicon and 2 new speech resources are now available in our catalogue.

Corpus for fine-grained analysis and automatic detection of irony onTwitter <https://catalogue.elra.info/en-us/repository/browse/ELRA-W0337/>

ISLRN: 478-366-550-085-8 <http://www.islrn.org/resources/478-366-550-085-8>

This corpus was annotated by trained annotators (Master’s students inLinguistics) using a detailed annotation scheme for ironycategorization, which describes four labels: ‘ironic by means of apolarity contrast’, ‘situational irony’, ‘other verbal irony’ and ‘notironic’. It consists of 4791 instances with an irony label and a tweet ID.

Bitext Synonym Data - General Language<https://catalogue.elra.info/en-us/repository/browse/ELRA-L0202/>

ISLRN: 470-885-612-363-1 <http://www.islrn.org/resources/470-885-612-363-1>

The Bitext Synonym Data - General Language includes 31,723 entries andmore than 100,000 synonyms for English language. This dataset is a setof synonyms developed to augment the English version of Wordnet, apowerful open-sourcelexical database, released in 2005. All synonyms can be linked to BitextLexical Data - English (see ELRA-L0140) for lemmatization, POS andmorphological information.

Corpus of Spontaneous Japanese (CSJ)<https://catalog.elra.info/en-us/repository/browse/ELRA-S0488/>

ISLRN: 280-594-494-328-0 <https://islrn.org/resources/280-594-494-328-0/>

The "Corpus of Spontaneous Japanese" (or CSJ) contains about 650 hoursof spontaneous speech that correspond to about 7000k words. All thesespeech materials are recorded using head-worn close-talking microphonesand DAT, and down-sampled to 16kHz, 16bit accuracy. The speech materialis transcribed both at orthographic and phonetic levels. In addition,segment label, intonation label, and other miscellaneous annotations areprovided for a subset of CSJ, called the Core, which contains about 500kwords or 45 hours of speech.

EWA-DB – Early Warning of Alzheimer speech database<https://catalogue.elra.info/en-us/repository/browse/ELRA-S0489/>

ISLRN: 730-022-142-264-9 <http://www.islrn.org/resources/730-022-142-264-9>

EWA-DB is a speech database that contains data from 3 clinical groups:Alzheimer's disease, Parkinson's disease, mild cognitive impairment, anda control group of healthy subjects. Speech samples of each clinicalgroup were obtained using the EWA smartphone application, which contains4 different language tasks: sustained vowel phonation, diadochokinesis,object and action naming (30 objects and 30 actions), picturedescription (two single pictures and three complex pictures). The totalnumber of speakers in the database is 1649. Of these, there are 87people with Alzheimer's disease, 175 people with Parkinson's disease, 62people with mild cognitive impairment, 2 people with a mixed diagnosisof Alzheimer's + Parkinson's disease and 1323 healthy controls.

For more information on the catalogue or if you would like to enquireabout having your resources distributed by ELRA, please contact us<mailto:[email protected]>.

_________________________________________

Visit the ELRA Catalogue of Language Resources <http://catalog.elra.info>
Visit the Universal Catalogue <http://universal.elra.info>

Archives of ELRA Language Resources Catalogue Updates<http://www.elra.info/en/catalogues/language-resources-announcements>

--

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] ELRA Catalogue of Language Resources - Update

Reply via email to