Dear colleagues,

Apologies for the omission in our previous announcement.

Here are the relevant key links:

FR-MIGR-TWIT Corpus 2.0 (Zenodo): 
https://zenodo.org/records/17828433

OLiNDiNUM — Linguistic Observatory of Online/Digital Debate: 
https://olindinum.huma-num.fr/en/research/research-projects/

FR-MIGR-TWIT Corpus 1.0 (Zenodo): 
https://zenodo.org/records/17652658
____________________________________________________________________

We are pleased to announce the release of the FR-MIGR-TWIT Corpus 2.0:
https://zenodo.org/records/17828433

The MIGR-TWIT Corpus is a multilingual corpus of tweets developed within the 
framework of OLiNDiNUM — the Linguistic Observatory of Online Debate. With the 
aim of documenting and analyzing online public discourse on (im)migration in 
contemporary European politics, this corpus makes it possible to observe 
diachronic evolution over the past decade (2011–2022) and linguistic variation 
across the political spectrum and in two national contexts: France and the UK. 

The corpus has previously been published through the following modules: 
•       Tweets of right and far-right politics in Europe (Battaglia, Blandino, 
Jeon & Pietrandrea, 2022)
https://zenodo.org/records/7347479
o       UK-R-MIGR-RA-TWIT-2012-2022
o       FR-R-MIGR-TWIT-2011-2022 
•       Tweets of French left-wing politics (Pietrandrea & Jeon, 2023)
https://zenodo.org/records/7871602
o       FR-L-MIGR-TWIT-2011-2022

Taken together, the published corpora have attracted research interest from, 
among others, corpus linguistics, discourse studies, and communication and 
information sciences, reaching more than 2K downloads on Zenodo by 2025.

Compiled from the FR-R and FR-L modules, the FR-MIGR-TWIT Corpus comprises 
17,395 tweets posted by 39 French political figures and parties (16 right-wing 
and 23 left-wing) between 2011 and 2022. Tweets containing migr- derivatives 
were retrieved via the Twitter API v2 Academic Research, and truncated retweets 
(>140 characters) were restored through targeted verification.
This second version provides multilayer linguistic annotations of all 
occurrences of forms derived from the Latin root migr-. 

The FR-MIGR-TWIT Corpus 2.0 offers:

•       multilayer linguistic annotations associated with each occurrence of a 
migr- derivative (MIGR-LEXICON), including semantic roles (ROLE_SEM), syntactic 
functions (FUNC_SYN), lemmatised forms (LEMMA), as well as features and 
collocational items related to modification (MODIFICATION, LEMMA_MODIF_*, 
LEMMA_NOUN-1) and list/parallelism constructions (LIST_PAR, LENGTH-1, 
#forme#_MIGR-LIST_PAR) (Non-exhaustive list);

•       tweet URLs (tweet_url) and 44 types of data retrieved through the Full 
Archive Search endpoints of the Twitter API v2, such as the textual content of 
tweets, posting date, user ID, number of retweets, likes, replies, and quotes. 
(Non-exhaustive list)
 
The corpus is available in CSV and TEI-XML formats, with TEI files providing 
the text layer and CSV files providing stand-off linguistic annotations.

Following FR-MIGR-TWIT Corpus 1.0 (Jeon, Battaglia & Pietrandrea), FR-MIGR-TWIT 
Corpus 2.0 (https://zenodo.org/records/17828433) was developed by Sangwan Jeon 
(Université de Lille) and Paola Pietrandrea (Université de Lille, IUF). 

Changelog
version 2.0 (© 2025 Jeon & Pietrandrea)
– Added multilayer linguistic annotations
– Corrected delimiter-related errors
– Added TEI-XML format
– Added a basic Python query script
– Added README.md

The creation of the corpus and the inter-annotator agreement process were 
funded by Université de Lille, Campus France, and the Institut Universitaire de 
France.

The corpus is accessible through Zenodo and is scheduled to be archived in 
Ortolang. 

Best regards,
Sangwan Jeon (University of Lille)
Paola Pietrandrea (University of Lille, Institut Universitaire de France)
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to