[Corpora-List] Core metadata schema for interpreting corpora – inviting feedback

Nannan Liu via Corpora Mon, 10 Jun 2024 00:41:06 -0700

Dear colleagues,



Interpreting corpora are a type of language resource that interweaves 
multilingualism with multimodality, spoken with signed languages, and 
split-second processing with contextualised interactions. Compiling an 
interpreting corpus incurs significant efforts: annotating 1 hour of signs can 
take 320 hours (Wehrmeyer 2019), and transcribing oral features often defies 
automatic recognition.



As the first step towards reusing such valuable datasets, we created the core 
metadata schema to consistently and informatively describe an interpreting 
corpus. The schema is based on a review of 114 corpora (see 
https://unic.dipintra.it/Metadata.aspx), FAIR principles (Wilkinson et al. 
2016), international standards (International Organization for Standardization 
2015, 2019), similar initiatives (e.g. Paquot et al. 2023), and ontologies of 
the interpreting community (e.g. Pöchhacker 2022). It is available at 
https://tinyurl.com/intpmetadata, and example implementations using four 
community, conference and sign language interpreting corpora can be found at 
https://tinyurl.com/intpmetadata-example.



We’d like to encourage more colleagues to provide feedback on the schema by the 
end of July. The response at the CIUTI conference two weeks ago was heartening, 
and we invite you to co-create a metadata standard that fits the past, current 
and future needs of the interpreting community.

Thank you for your cooperation.



With best wishes,
Nannan Liu and Mariachiara Russo



References
International Organization for Standardization (2015). ISO 24622-1 Language 
resource management –– Component Metadata Infrastructure (CMDI) –– Part 1: The 
Component Metadata Model. International Standardization Organization.
International Organization for Standardization (2019). ISO 24622-2:2019 
Language resource management –– Component Metadata Infrastructure (CMDI) –– 
Part 2: Component metadata specification language. International 
Standardization Organization.
Paquot, M., König, A., Stemle, E. & Frey, J.-C. (2023, January 27). Core 
metadata schema for learner corpora. Open Data @ UCLouvain, 
https://tinyurl.com/L2metadataV2.
Pöchhacker, F. (2022). Introducing interpreting studies (3rd ed.). London and 
New York: Routledge.
Wehrmeyer, E. (2019). A corpus for signed language interpreting research. 
Interpreting 21 (1), 62–90.
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., 
Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., 
and others (2016). The FAIR Guiding Principles for scientific data management 
and stewardship. Scientific Data 3 (1), 1–9.

Dr Nannan Liu
Marie Curie Fellow
Project FAITH<https://cordis.europa.eu/project/id/101108651>
Department of Interpreting and Translation
University of Bologna

_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

[Corpora-List] Core metadata schema for interpreting corpora – inviting feedback

Reply via email to