[INDOLOGY] Sanskrit and Vedic Sanskrit language models available on Hugging Face

Sebastian Nehrdich via INDOLOGY Mon, 27 Feb 2023 07:02:16 -0800

Dear list members,

In the context of the BMBF project "ChronBMM - Dating text corpora
using Bayesian Mixture Models" (https://chronbmm.phil.hhu.de/) Oliver
Hellwig, Sven Sellmer and I have developed a number of contextual
language models based on the XLM-RoBERTa architecture that achieve
very strong performance on a host of downstream tasks.
We make two of these models available on our organization on Hugging
Face: https://huggingface.co/chronbmm


One is a multidomain general Sanskrit model trained on more than 2GB
of web-scraped Sanskrit material from various sources (manually typed
etexts as well as OCR input). The second model is a specific model
fine-tuned on a corpus of Vedic Sanskrit which achieves SOTA on POS
tagging,  dependency parsing, and other related tasks. Both models
accept Devanagari as input. No further preprocessing such as Sandhi
splitting is necessary.
With best regards,

Sebastian

_______________________________________________
INDOLOGY mailing list
[email protected]
https://list.indology.info/mailman/listinfo/indology

[INDOLOGY] Sanskrit and Vedic Sanskrit language models available on Hugging Face

Reply via email to