IberLEF 2023 Task - PoliticEs: Political ideology detection in Spanish texts

Held as part of the evaluation forum IberLEF 2023
<> in the XXXIX edition of the
International Conference of the Spanish Society for Natural Language
Processing (SEPLN 2023 <>)

September 26, 2023. Jaén, Andalusia, Spain

Dear All,

We are inviting researchers and students to participate in the
shared-task PoliticEs
2023: Political ideology detection in Spanish texts, held as part of IberLEF
2023, the shared evaluation campaign for Natural Language Processing
systems in Spanish and other Iberian languages, collocated with SEPLN 2023

The goal of this task is to extract political ideology information from
Spanish texts. For this, an automatic document classification task on
clusters of texts is proposed. It consists of extracting the self-assigned
gender and profession as demographic traits, and the political ideology as
a psychographic trait from a set of texts written in Spanish from several
authors that share those traits. Political ideology is considered as a
binary and as a multiclass problem. The PoliticES 2023 shared task is based
on a previous task named PoliticES 2022 presented at IberLEF2022
(García-Díaz et. al. 2022b) where the dataset was an extension of the
PoliCorpus 2020 dataset (García-Díaz et al., 2022a). The novelty of this
year is that participants will work with clusters of texts written by
different users, but with the same traits, instead of profiling users to
prevent legal and ethical issues.

The participants will be provided development, development_test, training
and test datasets in Spanish from an extension of the PoliCorpus 2020
(García-Díaz et al., 2022) and the corpus used for the PoliticES 2022
shared task (García-Díaz et. al. 2022b).  The dataset was collected between
2020 and 2022 from the Twitter accounts of politicians, political
journalists and celebrities in Spain using the UMUCorpusClassifier
(García-Díaz et al., 2020). We automatically created clusters of texts
mixing some of these extracted tweets in order to prevent ethical and
privacy issues about author profiling in Twitter. Each cluster is composed
of 80 tweets written by different users that share all the traits under
evaluation. We labeled each cluster with the self-assigned gender (male,
female), profession (politician, celebrity, journalist) and political
spectrum on two axes: binary (left, right) and multiclass (left,
moderate_left, moderate_right, right). Moreover, the Twitter mentions of
the politicians were anonymised by replacing them with the token @user. In
addition, other Twitter accounts mentions were also encoded as @user. Other
entities, such as political party references, are also replaced with the
@political_party token. Consequently, the text traits cannot be guessed
trivially by reading the user's name and searching information on them on
the Internet. The dataset is composed of approximately 2800 different

Finally, in order to facilitate participation in the competition, a
notebook with two baselines will be provided. The first one will be based
on BoW and the second one will be based on Transformers. To download the
data, the notebook and participate, go to

Yesterday, we released the training dataset that can be found in the
"Files" subsection of the "Participate" tab. It is worth mentioning that
this dataset includes all the instances that were also released during the
Practice stage; so, it is not needed to combine both datasets.

Finally, remember that the CodaLab competition is open to submit your
results with the development dataset provided. This dataset is also
available in the same section as the training dataset.

Best regards,

The PoliticES 2023 organizing committee



   García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García,
   R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic
   corpus for Natural Language Processing tasks. Procesamiento del Lenguaje
   Natural, 65, 139-142.

   García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022a).
   Psychographic traits identification based on political ideology: An author
   analysis study on Spanish politicians’ tweets posted in 2020. Future
   Generation Computer Systems, 130(1), 59-74.

   García-Díaz, J. A., Jiménez Zafra, S. M., Martín Valdivia, M. T.,
   García-Sánchez, F., Ureña López, L. A., & Valencia García, R. (2022b).
   Overview of PoliticEs 2022: Spanish Author Profiling for Political
   Ideology. Procesamiento del Lenguaje Natural, 69, 265-272.

Important dates


   Release of development corpora: Feb 13, 2023

   Release of training corpora: Mar 13, 2023

   Release of test corpora and start of evaluation campaign: Apr 17, 2023

   End of evaluation campaign (deadline for runs submission): May 3, 2023

   Publication of official results: May 5, 2023

   Paper submission: May 29, 2023

   Review notification: Jun 17, 2023

   Camera ready submission: Jun 27, 2023

   IberLEF Workshop (SEPLN 2023): Sep 26, 2023 (Jaén, Andalusia, Spain)

   Publication of proceedings: Sep ??, 2023

Organizing committee


   José Antonio García-Díaz (UMUTeam, Universidad de Murcia)

   Salud María Jiménez-Zafra (SINAI, Universidad de Jaén)

   María-Teresa Martín Valdivia (SINAI, Universidad de Jaén)

   Francisco García-Sánchez (UMUTeam, Universidad de Murcia)

   L. Alfonso Ureña-López (SINAI, Universidad de Jaén)

   Rafael Valencia-García (UMUTeam, Universidad de Murcia)

