Extended Deadline - Workshop on Multilingual De-Identification of (Sensitive) Language Resources @ LREC 2022

ELRA ELDA Information Fri, 08 Apr 2022 04:58:44 -0700

[Apologies for multiple postings]

*EXTENDED DEADLINE*: FINAL CALL FOR PAPERS
Workshop on Multilingual De-Identification of (Sensitive) Language Resources

To be held in conjunction with the 13th International Language Resourcesand Evaluation Conference (LREC 2022)

20 June 2022, Le Palais du Pharo, Marseille, France

https://sites.google.com/vicomtech.org/multilingual-de-identification

EXTENDED Deadline for submission: 17 April 2022

*Description*

The General Data Protection Regulation (GDPR - Regulation (EU) 2016/679of the European Parliament and of the Council of 27 April 2016) ensuresthe protection of natural persons with regard to the processing ofpersonal data and on the free movement of such data. The GDPR outlines aspecific set of rules that protect citizens and user data and createtransparency in information sharing. GDPR is the strictest data privacyregulation in the world, and considerable work is taking place todevelop techniques and deploy systems that help comply with thisregulation while rendering data accessible and, thus, usable for furtherprocessing.Different techniques are studied to guarantee such compliance, implyingdifferent levels of sensitive content protection and with a short- orlong-term guarantee depending on whether we may have access toadditional related information. In this regard, we can read about workon anonymization, de-identification and pseudonymization. Whileanonymization implies a zero re-identification risk, which is extremelydifficult to secure, de-identification and pseudonymization represent anattainable target under the GDPR, given that this regulation definespseudonymization as “the processing of personal data in such a mannerthat the personal data can no longer be attributed to a specific datasubject without the use of additional information, provided that suchadditional information is kept separately and is subject to technicaland organisational measures to ensure that the personal data are notattributed to an identified or identifiable natural person.”Bearing this context in mind, multilingual approaches and kits for(sensitive) language resources de-identification may provide the meansto share language data while also protecting private or sensitive databy spotting then deleting, obfuscating, pseudonymizing or encryptingperson identifying information.

De-identification is typically performed for the purpose of protectingan individual’s private activities while maintaining the usefulness ofthe gathered data for research and development purposes. This workshopaims at discussing the various approaches to effective and reliable textde-identification, focusing on some sensitive domains such as themedical and legal domains, but not only.

Based on these premises a consensus emerges that shows a clear situationand needs:1. Tools for the multilingual de-identification of (sensitive)language resources are becoming essential to ensure that such resourcescan be shared.2. De-identification is crucial to ensure that all legal & ethicalconsiderations are taken into account during the production/repurposingphases but also that the quality/nature of the de-identified data setsremains appropriate to conduct research activities.3. European Public Administrations need personal data processingtools to handle the extremely large amounts of data they manage.4. Europe’s multilingual context will benefit from approaches andtools that can support the European Digital Market in their multilingualdata exchanges.


*Workshop Objectives and Topics of Interest*

This workshop is organised by members of the MAPA project, funded by theEU Connecting Europe Facility (CEF) program (https://mapa-project.eu/).This project has developed a toolkit for the de-identification of textsin the medical and legal fields which addresses all EU officiallanguages. It has followed a BERT-based Named Entity Recognitionapproach for personal information identification. A wide range of topicshave been considered and are hot topics open for discussion to allparticipants of this workshop. Among them, we have the following:1. Sensitive personal information, domains and services that requirede-identification

2.    Corpora annotation and/or creation
3.    Annotation guidelines and platforms
4.    De-identification tools, data and/or applications
5.    De-identification and minority languages
6.    Multi-domain and/or multilingual processing
7.    NLP techniques and tools used for de-identification
8.    Multimodal de-identification
9.    Validation and benchmarking of de-identified resources
10.    Evaluation of de-identification tools and applications

11. Evaluation protocols: how to evaluate, metrics, approaches, data,experiences

12.    Best practices

13. Approaches, activities and systems addressing “anonymization” arealso welcome to

share their experience.
14.    Any other topic related to de-identification

This workshop will also be a good forum to discuss the possibility todesign and initiate a new (annual) Challenge (evaluation campaign) onthis important topic.We invite submissions for full papers and system demonstrations thataddress these questions and other related issues relevant to the workshop.


*Workshop Programme and Audience Addressed*

This full-day workshop aims at bringing together technology orientedworking groups as well as institutions requiring de-identificationsupport that can present their cases. Being de-identification amulti-topic and multi-problem technique, the workshop aims to getresearchers, developers and groups needing their services together todiscuss approaches, techniques, capabilities and potential collaborations.



*Organising Committee*
- Victoria Arranz (ELDA/ELRA, France)
- Montse Cuadros (Vicomtech, Spain)
- Aitor Garcia Pablos (Vicomtech, Spain)
- Cyril Grouin (Université Paris Saclay, CNRS, LISN, France)
- Manuel Herranz (Pangeanic, Spain)

*Programme Committee*
- Khalid Choukri (ELDA/ELRA, France)
- Hercules Dalianis (Stockholm University, Sweden)
- Amando Estela (Pangeanic, Spain)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Albert Gatt (Malta University, Malta)
- Lucie Gianola (Université Paris Saclay, CNRS, LISN, France)
- Ona de Gibert (BSC, Spain)
- Marwa Hadj Salah (ELDA/ELRA, France)
- Udo Hahn (University of Jena, Germany)
- Thomas Kleinbauer (COMPRISE project)
- Maite Melero (BSC, Spain)
- Mickaël Rigault (ELDA/ELRA, France)
- Patrick Paroubek (Université Paris Saclay, CNRS, LISN, France)
- Naiara Perez (Vicomtech, Spain)
- Stelios Piperidis (Athena Research & Innovation Center, Greece)
- Prokopis Prokopidis (Athena Research & Innovation Center, Greece)
- Mike Rosner (Malta University, Malta)
- Roberts Rozis (TILDE, Latvia)
- Özlem Uzuner (George Mason University, USA)
- Emmanuel Vincent (Inria Nancy - Grand Est, France)
- Rinalds Vīksna (TILDE, Latvia)
- Pierre Zweigenbaum (Université Paris Saclay, CNRS, LISN, France)

*Important dates*
Submission of full papers: EXTENDED to Sunday 17 April 2022
Notification of acceptance of papers and demonstrations: Tuesday 10 May 2022
Submission of camera-ready version: 23 May 2022
Workshop: Monday 20 June 2022

*Submission*

Authors should use the START system(https://www.softconf.com/lrec2022/MDLR/) and follow the LREC author’skit (https://lrec2022.lrec-conf.org/en/submission2022/authors-kit/) forsubmitting their papers (the templates are provided on this page) .

Accepted papers will be published in the workshop proceedings along withthe LREC main conference (https://lrec2022.lrec-conf.org/en/)Proceedings by ELRA.


For further queries, please contact Victoria Arranz at arr...@elda.org.

Extended Deadline - Workshop on Multilingual De-Identification of (Sensitive) Language Resources @ LREC 2022

Reply via email to