**
*Dear all,*
**
**
*
The CLASSLA Knowledge centre for South Slavic languages
(https://www.clarin.si/info/k-centre/
<https://www.clarin.si/info/k-centre/>) is delighted to announce the
release of the pilot versions (v0.1) of the CLASSLA web corpora for
Croatian (2.3 billion words), Serbian (2.4 billion words) and Slovenian
(1.9 billion words). They are available for querying via the CLARIN.SI
concordancers (https://www.clarin.si/ske/#open
<https://www.clarin.si/ske/#open>). The main features of the newly
released corpora, aside from their large size and recency (crawled in
2022) is their automatic enrichment with genre information
(https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier
<https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier>)
and their linguistic processing with the improved CLASSLA-Stanza
annotation pipeline (https://pypi.org/project/classla/
<https://pypi.org/project/classla/>). The pilot versions of these
corpora are intended to gather valuable user feedback, while the
official release (v1.0) of the three existing corpora, along with web
corpora for Bosnian, Montenegrin, Macedonian, and Bulgarian, is
scheduled for later this year.
We warmly welcome you to explore our corpora and feel free to reach out
to us at helpdesk.clas...@clarin.si
<mailto:helpdesk.clas...@clarin.si>with any ideas for improvements. You
are also invited to read our blog post on the use of CLASSLA web corpora
via the open CLARIN.SI concordancers:
https://www.clarin.si/info/k-centre/classla-web-bigger-and-better-web-corpora-for-croatian-serbian-and-slovenian-on-clarin-si-concordancers/
<https://www.clarin.si/info/k-centre/classla-web-bigger-and-better-web-corpora-for-croatian-serbian-and-slovenian-on-clarin-si-concordancers/>.
If you are interested in South Slavic resources and technologies, we
also invite you to join the CLASSLA mailing list
(https://mailman.ijs.si/mailman/listinfo/classla
<https://mailman.ijs.si/mailman/listinfo/classla>) and to follow the
CLARIN.SI infrastructure on Twitter (https://twitter.com/ClarinSlovenia
<https://twitter.com/ClarinSlovenia>).*
Best regards,
Taja Kuzman, Nikola Ljubešić and many other CLASSLAers
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info