ELRA Catalogue of Language Resources - Update

info--- via Mt-list Mon, 13 Feb 2023 08:08:28 -0800

[Apologies for multiple postings]

We are happy to announce that 1 new written corpus, 3 new monolinguallexica and 2 new bilingual lexica are now available in our catalogue.

Learner Corpus of Portuguese L2 – COPLE2<http://catalog.elra.info/en-us/repository/browse/ELRA-W0331/>

ISLRN: 936-320-703-366-7 <http://www.islrn.org/resources/936-320-703-366-7>

The Learner Corpus of Portuguese as Second/Foreign Language (COPLE2) isa corpus of written and oral texts produced by students of Portuguese asForeign/Second Language courses in the Instituto de Cultura e LínguaPortuguesa (the Institute of Portuguese Language and Culture) (ICLP –FLUL) and by applicants for examinations in the Centro de Avaliação dePortuguês Língua Estrangeira (Center for Evaluation of Portuguese as aForeign Language) (CAPLE – FLUL). The corpus contains texts fromlearners with 15 different native languages (L1s) and proficiencies fromA1 to C1, and covers different topics and types of tasks. It is encodedin TEI format through the TEITOK environment. The corpus includes atthe moment a total of 182,474 tokens and 978 texts, classified accordingto the CEFR scales. The corpus contains annotations for part of speech,lemma and learner errors. All the information encoded is searchablethrough the CQP query language.

CALEM (Comprehensive Arabic LEMmas)<http://catalog.elra.info/en-us/repository/browse/ELRA-L0133/>

ISLRN: 462-532-124-988-8 <http://www.islrn.org/resources/462-532-124-988-8>

Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabicinflected word forms (stems) and their corresponding lemmas. It iscomposed of 164,272 lemmas representing 7,151,106 stems, detailed asfollows: 720 Arabic particles, 15,291 broken plurals, 2,464,239 verbs,4,675,856 nouns. The lexicon is provided as plain text in UTF8 encodingand represents about 284 Mb of data.

MADED (Moroccan Arabic Dialect Electronic Dictionary)<http://catalog.elra.info/en-us/repository/browse/ELRA-L0134/>

ISLRN: 977-057-254-691-5 <http://www.islrn.org/resources/977-057-254-691-5>

Moroccan Arabic Dialect Electronic Dictionary (MADED) is an electroniclexicon containing almost 13,000 entries. They are written in Arabicscript wherein each Modern Standard Arabic (MSA) lemma is provided withits corresponding Moroccan Arabic equivalent. In addition, MADED entriesare annotated with useful metadata such as part-of-speech (POS), originand root. MADED is designed for the practical use of the NLP community.This dictionary is provided as a csv file and represents about 2 Mb of data.

MORV (Moroccan Morphological vocabulary)<http://catalog.elra.info/en-us/repository/browse/ELRA-L0135/>

ISLRN: 064-194-729-767-0 <http://www.islrn.org/resources/064-194-729-767-0>

The Moroccan Morphological vocabulary is a lexicon containing more than4.6 M entries describing a given Moroccan Arabic word with fourteen (14)morphological and semantic features: the word orthographic form, thesegmentation (prefix and suffix), part-of-speech (POS), gender, number,tense and transitivity (for verbs), its origin, dialectal lemma, Arabiclemma, the root, voice, state, and affirmative/negative form. Thisvocabulary is provided as a csv file and represents about 350 Mb of data.



CroaTPAS <http://catalog.elra.info/en-us/repository/browse/ELRA-M0108/>
ISLRN: 649-554-159-147-9 <http://www.islrn.org/resources/649-554-159-147-9>

CroaTPAS is a bi-lingual lexicon in Croatian and English. It was createdby manual annotation from the Croatian Web as Corpus and patterncreation using the Skema editor on the Sketch Engine platform. CroaTPASis tailor-made to represent verb polysemy and currently contains a totalof 683 patterns (belonging to 180 Croatian verbs) expressing differentverb senses and 22.677 annotated corpus lines. Moreover, the resourceincludes 109 metonymic sub patterns linked to 1112 corpus linesfeaturing 62 different metonymic shifts.



T-PAS <http://catalog.elra.info/en-us/repository/browse/ELRA-M0109/>
ISLRN: 432-666-503-743-8 <http://www.islrn.org/resources/432-666-503-743-8>

T-PAS is a digital lexicographic resource consisting of a corpus-derivedcollection of Italian verb valency structures, whose argument slots havebeen manually annotated with a set of hierarchically organised semanticlabels called Semantic Types.As of today, T-PAS contains a total of 1164 Italian verb entriescontaining 5529 patterns expressing different verb senses, and 252943annotated corpus lines. Moreover, the resource includes 84 metonymicsubpatterns linked to 1218 corpus lines featuring 37 different metonymicshifts.

For more information on the catalogue or if you would like to enquireabout having your resources distributed by ELRA, please contact us<mailto:[email protected]>.

_________________________________________

Visit the ELRA Catalogue of Language Resources <http://catalog.elra.info>
Visit the Universal Catalogue <http://universal.elra.info>

Archives<http://www.elra.info/en/catalogues/language-resources-announcements>ofELRA Language Resources Catalogue Updates

ELRA Catalogue of Language Resources - Update

Reply via email to