[Corpora-List] [LREC2026 Tutorial] Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies

Daryna Dementieva via Corpora Wed, 18 Feb 2026 01:10:50 -0800

Dear colleagues,

Do you care about improving language technologies beyond mainstream languages? 
Do you wonder how to collect data for low-resource languages? Or how to create 
the first translation system? And then adapt efficiently to various downstream 
tasks?


We are pleased to announce an upcoming LREC2026 tutorial
“Low-Resource, High-Impact: Building Corpora for Inclusive Language 
Technologies.”

This tutorial is aimed at NLP practitioners, researchers, and developers 
working with multilingual and low-resource languages who are interested in 
building more equitable, inclusive, and socially impactful language 
technologies. 

**Tutorial overview**
The tutorial covers the full lifecycle of NLP technologies development for a 
language, including:

* Data collection and corpus creation (e.g., web crawling and annotation)
* Parallel sentence mining and machine translation
* Downstream applications such as text classification and multimodal reasoning
* Strategies for addressing data scarcity, cultural variance, and 
reproducibility
* Fair and community-informed development practices

**Who should attend**

* Researchers and practitioners in NLP and multilingual technologies
* Corpus builders and linguists working on underrepresented languages
* Developers interested in low-resource or inclusive NLP
* Students and early-career researchers

**Scope and highlights**

* Case studies spanning 10+ languages from diverse language families and 
geopolitical contexts
* Coverage of both digitally resource-rich and severely underrepresented 
languages
* Emphasis on hands-on methods and applied modeling frameworks

**Save the date and place**:
Saturday, 16 May 2026, morning session, Room 6

More information:
https://tum-nlp.github.io/low-resource-tutorial/

Stay tuned for our website – we will fully open-source the tutorial materials!

Additionally, we would like to have an overview of overall practices and 
challenges researchers facing working with non-mainstream languages. If you are 
such a researcher, you are working on a very surprising language, or just have 
experience to share about the topic, please, fill in this form to participate 
in the interview: https://forms.gle/L81hpvZGfemyMjtX7 

**Organisers**:

Ekaterina (Katya) Artemova, Toloka.ai
Laurie Burchell, Common Crawl Foundation
Daryna Dementieva, Technical University of Munich
Shu Okabe, Technical University of Munich
Mariya Shmatova, Toloka.ai
Pedro Ortiz Suarez, Common Crawl Foundation

See you at LREC!

Best regards,
Daryna Dementieva
On behalf of Tutorial Organisers
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] [LREC2026 Tutorial] Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies

Reply via email to