Dear colleagues, Do you care about improving language technologies beyond mainstream languages? Do you wonder how to collect data for low-resource languages? Or how to create the first translation system? And then adapt efficiently to various downstream tasks?
We are pleased to announce an upcoming LREC2026 tutorial “Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies.” This tutorial is aimed at NLP practitioners, researchers, and developers working with multilingual and low-resource languages who are interested in building more equitable, inclusive, and socially impactful language technologies. **Tutorial overview** The tutorial covers the full lifecycle of NLP technologies development for a language, including: * Data collection and corpus creation (e.g., web crawling and annotation) * Parallel sentence mining and machine translation * Downstream applications such as text classification and multimodal reasoning * Strategies for addressing data scarcity, cultural variance, and reproducibility * Fair and community-informed development practices **Who should attend** * Researchers and practitioners in NLP and multilingual technologies * Corpus builders and linguists working on underrepresented languages * Developers interested in low-resource or inclusive NLP * Students and early-career researchers **Scope and highlights** * Case studies spanning 10+ languages from diverse language families and geopolitical contexts * Coverage of both digitally resource-rich and severely underrepresented languages * Emphasis on hands-on methods and applied modeling frameworks **Save the date and place**: Saturday, 16 May 2026, morning session, Room 6 More information: https://tum-nlp.github.io/low-resource-tutorial/ Stay tuned for our website – we will fully open-source the tutorial materials! Additionally, we would like to have an overview of overall practices and challenges researchers facing working with non-mainstream languages. If you are such a researcher, you are working on a very surprising language, or just have experience to share about the topic, please, fill in this form to participate in the interview: https://forms.gle/L81hpvZGfemyMjtX7 **Organisers**: Ekaterina (Katya) Artemova, Toloka.ai Laurie Burchell, Common Crawl Foundation Daryna Dementieva, Technical University of Munich Shu Okabe, Technical University of Munich Mariya Shmatova, Toloka.ai Pedro Ortiz Suarez, Common Crawl Foundation See you at LREC! Best regards, Daryna Dementieva On behalf of Tutorial Organisers _______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
