[Apologies for multiple postings]*
*
We are happy to announce that 66 new monolingual lexicons and 1 speech
resource are now available in our catalogue. Moreover, 4 speech
resources are now available at reduced fees.
*1) New Language Resources:*
*Bitext Lexical Datasets*
<http://catalog.elra.info/en-us/repository/search/?q=Bitext+Lexical+Dataset>
The series of *Bitext Lexical Datasets* for the generic vocabulary
includes Lemmas, POS tagging, Frequency, Named Entities and Offensive
features. Depending on the dataset and language, other syntactic and
morphological features are also provided. The following 15 languages are
available:
As a complement to the datasets mentioned above, 11 datasets of
*Language Variants* can also be obtained:
1. Arabic (MSA)
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0136/>dataset
and Arabic Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0151/>dataset
consisting of Arabic Gulf, Arabic Najdi, Arabic Egypt and Arabic MSA
variants,
2. Chinese (Simplified)
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0137/>dataset,
Chinese (Traditional)
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0138/>dataset,
and Chinese Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0152/>dataset
(Simplified + Traditional),
3. Dutch
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0139/>dataset
and Dutch Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0153/>dataset
consisting of Netherlands and Belgium variants,
4. English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0140/>dataset
and English Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0154/>dataset
consisting of United States, United Kingdom and India variants,
5. Finnish
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0141/>dataset
and Finnish Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0155/>dataset
consisting of Standard and Colloquial Finnish variants,
6. French
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0142/>dataset
and French Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0156/>dataset
consisting of France, Canada and Switzerland variants,
7. German
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0143/>dataset
and German Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0157/>dataset
consisting of Germany and Switzerland variants,
8. Indonesian
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0144/>dataset,
9. Italian
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0145/>dataset
and Italian Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0158/>dataset
consisting of Italy and Switzerland variants,
10. Malay
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0146/>dataset,
11. Norwegian (Bokmal)
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0147/>dataset
and Norwegian Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0159/>dataset
consisting of Bokmal and Nynorsk variants,
12. Portuguese
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0148/>dataset
and Portuguese Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0160/>dataset
consisting of Portugal and Brazil variants,
13. Spanish
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0149/>dataset
and Spanish Language Variants
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0161/>dataset
consisting of Spain, North America, Central America, Andes and
Southern Cone variants,
*Bitext Synthetic Data*
<http://catalog.elra.info/en-us/repository/search/?q=Bitext+Synthetic+Data>
The Bitext Synthetic Data consist of pre-built training data for intent
detection and are provided for 20 verticals for English and Spanish
languages. They cover the most common intents for each vertical and
include a large number of example utterances for each intent, with
optional entity/slot annotations for each utterance. Data is distributed
as models or open text files.
For each language, the following verticals are available:
1. Automotive: 52 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0162/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0182/>)
2. Retail banking: 26 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0163/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0183/>)
3. Education: 37 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0164/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0184/>)
4. Event and ticketing: 25 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0165/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0185/>)
5. Field Service: 27 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0166/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0186/>)
6. Healthcare: 40 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0167/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0187/>)
7. Hospitality: 24 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0168/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0188/>)
8. Insurance: 38 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0169/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0189/>)
9. Legal : 29 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0170/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0190/>)
10. Manufacturing: 34 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0171/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0191/>)
11. Media Streaming: 24 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0172/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0192/>)
12. Mortgage and loans: 39 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0173/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0193/>)
13. Moving and storage: 29 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0174/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0194/>)
14. Real estate and construction: 28 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0175/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0195/>)
15. Restaurant/ bar chains: 30 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0176/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0196/>)
16. Retail Ecomm: 34 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0177/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0197/>)
17. Telecommunication: 26 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0178/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0198/>)
18. Travel: 33 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0179/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0199/>)
19. Utilities: 21 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0180/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0200/>)
20. Wealth management: 24 intents (English
<http://catalog.elra.info/en-us/repository/browse/ELRA-L0181/>,
Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0201/>)
*Persian Kids’ Speech Corpus*
<http://catalog.elra.info/en-us/repository/browse/ELRA-S0487/>
The Persian Kids’ Speech Corpus consists of speech signals recorded by
286 children (141 girls, 145 boys), from 6 to 9 years old, through an
Andreas Mic Anti-Noise microphone and a Premium Speechmike headphone.
This recorded data was manually checked and labeled. Finally, a corpus
containing 162,395 samples with a duration of 33 hours and 44 minutes
was created. The samples are distributed as follows:
1. 29,057 Words (478 minutes),
2. 17,429 SubWords (260 minutes),
3. 43,838 Syllables (485 minutes),
4. 70,078 Phonemes (765 minutes),
5. 1,993 Extra Vocabulary (36 minutes).
The prepared speech corpus comprehensively contains all the 29 Persian
phonemes, 118 syllables, 56 sub-words, and 711 words and is particularly
applicable to speech recognition and linguistics studies.
*2) Reduced fees for the following speech resources:*
* *Chinese Mandarin (South) database*
<http://catalog.elra.info/en-us/repository/browse/ELRA-S0397/>
* *Chinese Mandarin (North) database*
<http://catalog.elra.info/en-us/repository/browse/ELRA-S0398/>
* *Japanese Kids Speech database (Lower Grade)*
<http://catalog.elra.info/en-us/repository/browse/ELRA-S0411/>
* *Japanese Kids Speech database (Upper Grade)*
<http://catalog.elra.info/en-us/repository/browse/ELRA-S0412/>**
For more information on the catalogue or if you would like to enquire
about having your resources distributed by ELRA, please *contact us*
<mailto:cont...@elda.org>.
_________________________________________
Visit the *ELRA Catalogue of Language Resources* <http://catalog.elra.info>
Visit the *Universal Catalogue* <http://universal.elra.info>**
*Archives *
<http://www.elra.info/en/catalogues/language-resources-announcements>of
ELRA Language Resources Catalogue Updates
/Our apologies if you have received multiple copies of this announcement./
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info