**

*Dear all,*

**

**

*

The CLASSLA Knowledge centre for South Slavic languages (https://www.clarin.si/info/k-centre/ <https://www.clarin.si/info/k-centre/>) is delighted to announce the release of the pilot versions (v0.1) of the CLASSLA web corpora for Croatian (2.3 billion words), Serbian (2.4 billion words) and Slovenian (1.9 billion words). They are available for querying via the CLARIN.SI concordancers (https://www.clarin.si/ske/#open <https://www.clarin.si/ske/#open>). The main features of the newly released corpora, aside from their large size and recency (crawled in 2022) is their automatic enrichment with genre information (https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier <https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier>) and their linguistic processing with the improved CLASSLA-Stanza annotation pipeline (https://pypi.org/project/classla/ <https://pypi.org/project/classla/>). The pilot versions of these corpora are intended to gather valuable user feedback, while the official release (v1.0) of the three existing corpora, along with web corpora for Bosnian, Montenegrin, Macedonian, and Bulgarian, is scheduled for later this year.


We warmly welcome you to explore our corpora and feel free to reach out to us at helpdesk.clas...@clarin.si <mailto:helpdesk.clas...@clarin.si>with any ideas for improvements. You are also invited to read our blog post on the use of CLASSLA web corpora via the open CLARIN.SI concordancers: https://www.clarin.si/info/k-centre/classla-web-bigger-and-better-web-corpora-for-croatian-serbian-and-slovenian-on-clarin-si-concordancers/ <https://www.clarin.si/info/k-centre/classla-web-bigger-and-better-web-corpora-for-croatian-serbian-and-slovenian-on-clarin-si-concordancers/>.


If you are interested in South Slavic resources and technologies, we also invite you to join the CLASSLA mailing list (https://mailman.ijs.si/mailman/listinfo/classla <https://mailman.ijs.si/mailman/listinfo/classla>) and to follow the CLARIN.SI infrastructure on Twitter (https://twitter.com/ClarinSlovenia <https://twitter.com/ClarinSlovenia>).*

Best regards,

Taja Kuzman, Nikola Ljubešić and many other CLASSLAers

_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

Reply via email to