[Apologies for multiple postings]*
*

 We are happy to announce that 66 new monolingual lexicons and 1 speech resource are now available in our catalogue. Moreover, 4 speech resources are now available at reduced fees.

*1) New Language Resources:*

*Bitext Lexical Datasets* <http://catalog.elra.info/en-us/repository/search/?q=Bitext+Lexical+Dataset>

The series of *Bitext Lexical Datasets* for the generic vocabulary includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The following 15 languages are available:

As a complement to the datasets mentioned above, 11 datasets of *Language Variants* can also be obtained:


1. Arabic (MSA)
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0136/>dataset
   and Arabic Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0151/>dataset
   consisting of Arabic Gulf, Arabic Najdi, Arabic Egypt and Arabic MSA
   variants,
2. Chinese (Simplified)
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0137/>dataset,
   Chinese (Traditional)
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0138/>dataset,
   and Chinese Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0152/>dataset
   (Simplified + Traditional),
3. Dutch
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0139/>dataset
   and Dutch Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0153/>dataset
   consisting of Netherlands and Belgium variants,
4. English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0140/>dataset
   and English Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0154/>dataset
   consisting of United States, United Kingdom and India variants,
5. Finnish
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0141/>dataset
   and Finnish Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0155/>dataset
   consisting of Standard and Colloquial Finnish variants,
6. French
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0142/>dataset
   and French Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0156/>dataset
   consisting of France, Canada and Switzerland variants,
7. German
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0143/>dataset
   and German Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0157/>dataset
   consisting of Germany and Switzerland variants,
8. Indonesian
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0144/>dataset,
9. Italian
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0145/>dataset
   and Italian Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0158/>dataset
   consisting of Italy and Switzerland variants,
10. Malay
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0146/>dataset,
11. Norwegian (Bokmal)
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0147/>dataset
   and Norwegian Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0159/>dataset
   consisting of Bokmal and Nynorsk variants,
12. Portuguese
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0148/>dataset
   and Portuguese Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0160/>dataset
   consisting of Portugal and Brazil variants,
13. Spanish
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0149/>dataset
   and Spanish Language Variants
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0161/>dataset
   consisting of Spain, North America, Central America, Andes and
   Southern Cone variants,

*Bitext Synthetic Data* <http://catalog.elra.info/en-us/repository/search/?q=Bitext+Synthetic+Data>

The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English and Spanish languages. They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for each utterance. Data is distributed as models or open text files.

For each language, the following verticals are available:

1. Automotive: 52 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0162/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0182/>)
2. Retail banking: 26 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0163/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0183/>)
3. Education: 37 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0164/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0184/>)
4. Event and ticketing: 25 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0165/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0185/>)
5. Field Service: 27 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0166/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0186/>)
6. Healthcare: 40 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0167/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0187/>)
7. Hospitality: 24 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0168/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0188/>)
8. Insurance: 38 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0169/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0189/>)
9. Legal : 29 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0170/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0190/>)
10. Manufacturing: 34 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0171/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0191/>)
11. Media Streaming: 24 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0172/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0192/>)
12. Mortgage and loans: 39 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0173/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0193/>)
13. Moving and storage: 29 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0174/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0194/>)
14. Real estate and construction: 28 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0175/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0195/>)
15. Restaurant/ bar chains: 30 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0176/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0196/>)
16. Retail Ecomm: 34 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0177/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0197/>)
17. Telecommunication: 26 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0178/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0198/>)
18. Travel: 33 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0179/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0199/>)
19. Utilities: 21 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0180/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0200/>)
20. Wealth management: 24 intents (English
   <http://catalog.elra.info/en-us/repository/browse/ELRA-L0181/>,
   Spanish <http://catalog.elra.info/en-us/repository/browse/ELRA-L0201/>)

*Persian Kids’ Speech Corpus* <http://catalog.elra.info/en-us/repository/browse/ELRA-S0487/>

The Persian Kids’ Speech Corpus consists of speech signals recorded by 286 children (141 girls, 145 boys), from 6 to 9 years old, through an Andreas Mic Anti-Noise microphone and a Premium Speechmike headphone. This recorded data was manually checked and labeled. Finally, a corpus containing 162,395 samples with a duration of 33 hours and 44 minutes was created. The samples are distributed as follows:

1. 29,057 Words (478 minutes),
2. 17,429 SubWords (260 minutes),
3. 43,838 Syllables (485 minutes),
4. 70,078 Phonemes (765 minutes),
5. 1,993 Extra Vocabulary (36 minutes).

The prepared speech corpus comprehensively contains all the 29 Persian phonemes, 118 syllables, 56 sub-words, and 711 words and is particularly applicable to speech recognition and linguistics studies.

*2) Reduced fees for the following speech resources:*

 * *Chinese Mandarin (South) database*
   <http://catalog.elra.info/en-us/repository/browse/ELRA-S0397/>
 * *Chinese Mandarin (North) database*
   <http://catalog.elra.info/en-us/repository/browse/ELRA-S0398/>
 * *Japanese Kids Speech database (Lower Grade)*
   <http://catalog.elra.info/en-us/repository/browse/ELRA-S0411/>
 * *Japanese Kids Speech database (Upper Grade)*
   <http://catalog.elra.info/en-us/repository/browse/ELRA-S0412/>**


For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please *contact us* <mailto:cont...@elda.org>.
_________________________________________

Visit the *ELRA Catalogue of Language Resources* <http://catalog.elra.info>
Visit the *Universal Catalogue* <http://universal.elra.info>**
*Archives * <http://www.elra.info/en/catalogues/language-resources-announcements>of ELRA Language Resources Catalogue Updates

/Our apologies if you have received multiple copies of this announcement./
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

Reply via email to