[Corpora-List] [PoliticES@IberLEF2023] IberLEF 2023 Task - PoliticEs: Political ideology detection in Spanish texts - Training set released!

Salud María Jiménez Zafra via Corpora Tue, 14 Mar 2023 05:04:25 -0700

Training set released!

SECOND CALL FOR PARTICIPATION


IberLEF 2023 Task - PoliticEs: Political ideology detection in Spanish texts

Held as part of the evaluation forum IberLEF 2023
<https://sites.google.com/view/iberlef-2023> in the XXXIX edition of the
International Conference of the Spanish Society for Natural Language
Processing (SEPLN 2023 <http://sepln2023.sepln.org/en/home/>)

September 26, 2023. Jaén, Andalusia, Spain

Codalab link: https://codalab.lisn.upsaclay.fr/competitions/10173

Dear All,

We are inviting researchers and students to participate in the
shared-task PoliticEs
2023: Political ideology detection in Spanish texts, held as part of IberLEF
2023, the shared evaluation campaign for Natural Language Processing
systems in Spanish and other Iberian languages, collocated with SEPLN 2023
Conference.

The goal of this task is to extract political ideology information from
Spanish texts. For this, an automatic document classification task on
clusters of texts is proposed. It consists of extracting the self-assigned
gender and profession as demographic traits, and the political ideology as
a psychographic trait from a set of texts written in Spanish from several
authors that share those traits. Political ideology is considered as a
binary and as a multiclass problem. The PoliticES 2023 shared task is based
on a previous task named PoliticES 2022 presented at IberLEF2022
(García-Díaz et. al. 2022b) where the dataset was an extension of the
PoliCorpus 2020 dataset (García-Díaz et al., 2022a). The novelty of this
year is that participants will work with clusters of texts written by
different users, but with the same traits, instead of profiling users to
prevent legal and ethical issues.

The participants will be provided development, development_test, training
and test datasets in Spanish from an extension of the PoliCorpus 2020
(García-Díaz et al., 2022) and the corpus used for the PoliticES 2022
shared task (García-Díaz et. al. 2022b).  The dataset was collected between
2020 and 2022 from the Twitter accounts of politicians, political
journalists and celebrities in Spain using the UMUCorpusClassifier
(García-Díaz et al., 2020). We automatically created clusters of texts
mixing some of these extracted tweets in order to prevent ethical and
privacy issues about author profiling in Twitter. Each cluster is composed
of 80 tweets written by different users that share all the traits under
evaluation. We labeled each cluster with the self-assigned gender (male,
female), profession (politician, celebrity, journalist) and political
spectrum on two axes: binary (left, right) and multiclass (left,
moderate_left, moderate_right, right). Moreover, the Twitter mentions of
the politicians were anonymised by replacing them with the token @user. In
addition, other Twitter accounts mentions were also encoded as @user. Other
entities, such as political party references, are also replaced with the
@political_party token. Consequently, the text traits cannot be guessed
trivially by reading the user's name and searching information on them on
the Internet. The dataset is composed of approximately 2800 different
clusters.

Finally, in order to facilitate participation in the competition, a
notebook with two baselines will be provided. The first one will be based
on BoW and the second one will be based on Transformers. To download the
data, the notebook and participate, go to
https://codalab.lisn.upsaclay.fr/competitions/10173.

Yesterday, we released the training dataset that can be found in the
"Files" subsection of the "Participate" tab. It is worth mentioning that
this dataset includes all the instances that were also released during the
Practice stage; so, it is not needed to combine both datasets.

Finally, remember that the CodaLab competition is open to submit your
results with the development dataset provided. This dataset is also
available in the same section as the training dataset.

Best regards,

The PoliticES 2023 organizing committee


References

   -

   García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García,
   R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic
   corpus for Natural Language Processing tasks. Procesamiento del Lenguaje
   Natural, 65, 139-142.
   -

   García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022a).
   Psychographic traits identification based on political ideology: An author
   analysis study on Spanish politicians’ tweets posted in 2020. Future
   Generation Computer Systems, 130(1), 59-74.
   -

   García-Díaz, J. A., Jiménez Zafra, S. M., Martín Valdivia, M. T.,
   García-Sánchez, F., Ureña López, L. A., & Valencia García, R. (2022b).
   Overview of PoliticEs 2022: Spanish Author Profiling for Political
   Ideology. Procesamiento del Lenguaje Natural, 69, 265-272.


Important dates

   -

   Release of development corpora: Feb 13, 2023
   -

   Release of training corpora: Mar 13, 2023
   -

   Release of test corpora and start of evaluation campaign: Apr 17, 2023
   -

   End of evaluation campaign (deadline for runs submission): May 3, 2023
   -

   Publication of official results: May 5, 2023
   -

   Paper submission: May 29, 2023
   -

   Review notification: Jun 17, 2023
   -

   Camera ready submission: Jun 27, 2023
   -

   IberLEF Workshop (SEPLN 2023): Sep 26, 2023 (Jaén, Andalusia, Spain)
   -

   Publication of proceedings: Sep ??, 2023


Organizing committee

   -

   José Antonio García-Díaz (UMUTeam, Universidad de Murcia)
   -

   Salud María Jiménez-Zafra (SINAI, Universidad de Jaén)
   -

   María-Teresa Martín Valdivia (SINAI, Universidad de Jaén)
   -

   Francisco García-Sánchez (UMUTeam, Universidad de Murcia)
   -

   L. Alfonso Ureña-López (SINAI, Universidad de Jaén)
   -

   Rafael Valencia-García (UMUTeam, Universidad de Murcia)


[image: Universidad de Jaén] <http://www.uja.es/> *Salud María Jiménez
Zafra*
sjza...@ujaen.es


Universidad de Jaén
Grupo de Investigación SINAI <http://sinai.ujaen.es/> | Departamento de
Informática
EPS Jaén, Edificio A3, Despacho 219
Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992

[image: Universidad de Jaén] <http://www.uja.es/>

_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

[Corpora-List] [PoliticES@IberLEF2023] IberLEF 2023 Task - PoliticEs: Political ideology detection in Spanish texts - Training set released!

Reply via email to