[Apologies for cross-postings]

The First Workshop on Multi-Language Processing in a Globalising World (MLP
September 4-5, Dublin City University, Ireland

Workshop Registration NOW Open!

Shared Tasks on Cross-lingual Word Segmentation and Morpheme Segmentation

Release of Test Set: July 24, 2017

We have released corpora in eight languages (Basque, Farsi, Filipino,
Finnish, Kazakh, Marathi, Uyghur, Vietnamese) and there are more to come.
Training and development sets are already available.  We will send you the
data upon received your registration email to the shared task.

---------- Forwarded message ----------
From: Qun Liu <liuqu...@gmail.com>
Date: Thu, Jun 1, 2017 at 10:50 PM
Subject: MLP2017: First Workshop on Multi-Language Processing in a
Globalising World

The First Workshop on Multi-Language Processing in a Globalising World (MLP

September 4-5, Dublin City University, Ireland


*MLP 2017 **Call for Abstracts*

*MLP 2017 **Call for Participation in Shared Tasks*


*MLP 2017  **Call for Abstracts*

The First Workshop on Multi-Language Processing in a Globalising World
(MLP2017), organized by ADAPT Centre, Dublin City University (DCU),
Ireland, will be held at DCU on September 4–5, 2017.
Globalisation has, on the one hand, brought us significant growth in free
international trade and cross-cultural communication, as well as access to
newly-developed technology, media, education, healthcare, consumer goods,
etc. On the other hand, it may have negative impacts on local societies,
such as cultural homogenisation. To embrace cultural diversity and
multilingual phenomena, experts with research interests in different
languages are invited to participate in this workshop. This international
workshop will be organised as a forum and we invite natural language
processing researchers and linguists to come together to discuss the
current status and future directions of research in multilingualism and
minority languages in this globalising world.
The workshop aims to provide a research forum dedicated to state-of-the-art
methods and techniques on multi-language and cross-language processing and
exploring the use of such technologies in specific tasks. The workshop will
solicit original and ongoing research contributions related to the theme,
which includes but are not limited to:


   Theoretical and applied linguistic research for multilinguality and
   minority languages

   Text encoding theory and transcoding techniques

   Resource construction, such as multilingual corpora and corpora for
   minority languages

   Speech, lexical, syntactic, semantic analytics for multiple languages or
   minority languages

   Cross-language adaptation for natural language processing

   Multi-language, cross-language and minority language processing methods
   and applications in machine translation, speech recognition, information
   retrieval etc.

   Evaluation metrics for multi-language, cross-language and minority
   language processing

   Multi-language, cross-language and minority language processing for
   social media and user generated content

   Deep learning and expressions for multi-language processing

   Minority languages in emergency responses and security/disaster

   Multi-language or cross-language named entity recognition, entity
   relation extraction and event extraction

   Multi-language or cross-language linked data or knowledge graph

   Multi-language or cross-language anaphora resolution and discourse

   Multi-language or cross-language sentiment analysis

   Multi-language or cross-language text classification and generation

   Transliteration and machine translation

The language of the Workshop is English. Abstract submissions may include
research results as well as work in progress. Submissions must have a clear
focus on specific issues pertaining to and cross-language processing,
including minority language processing. Descriptions of commercial systems
are welcome, provided the authors are willing to discuss the details of
their work. The number of pages should be limited to 1–2 pages.

We suggest you structure your abstract using the following headings but it
is not mandatory:

   - Introduction
   - Existing Work
   - Methods Proposed
   - Results
   - Conclusion

Only the PDF version of the abstract is accepted. The submitted abstract
will be subject to a double-blind review, and must not contain authors'
names and affiliations.
Abstracts are submitted to https://easychair.org/conferences/?conf=mlp2017.
Important Dates:

   - Abstract Submission: June 30, 2017
   - Notification of Acceptance: July 20, 2017
   - Final Manuscript Submission: August 5, 2017
   - Registration: July 25, 2017
   - Conference: September 4-5, 2017


Conference Chair: Qun LIU (ADAPT Centre, Dublin City University, Ireland)
Program Chair:
Mikel L. Forcada (Universitat d’Alacant, Spain)


*MLP 2017  Call for Participation in Shared Tasks on Cross-lingual Word
Segmentation and Morpheme Segmentation*

The analysis of word formation is among the most fundamental natural
language processing (NLP) technologies for extracting basic processing
units for further NLP tasks in many languages.  There are broadly two
groups of segmentation tasks related to word formation, i.e. morpheme
segmentation and word segmentation.  Morpheme segmentation is required in
languages such as Turkish, for example, where words are formed by stems,
root words, prefixes, and/or suffixes.  It is the foundation for further
morphological analysis tasks.  Word segmentation is necessary in languages
such as Mandarin Chinese, where there are no word boundaries in the writing

Although there is clear similarity among different languages in terms of
either morpheme segmentation or word segmentation, most of these tools are
designed specifically for one language.  In this shared task, we encourage
the participants to submit the results of one system/method as applied to
multiple languages for one of the two segmentation tasks.  These systems
are expected to demonstrate the ability of cross-lingual processing on the
segmentation tasks, which would give insights to our community into the
building of fundamental NLP tools for low resource languages.

Popular languages such as Chinese and Japanese are also included in the
task for two reasons.  Firstly, although morpheme segmentation and word
segmentation tools for these languages have been developed for many years
and are often regarded as mature technologies, human creativity,
variability of textual genres and dialects as exhibited in language
evolution still make them challenging problems to these languages.
Secondly, we would like to encourage participants of this shared task to
develop systems/methods that can be used across different languages where
morpheme segmentation or word segmentation is required for natural language

A corpus of at least 2,000 sentences will be prepared as the training set
in each language for either morpheme segmentation or word segmentation.
Development and test sets will each include 1,000 sentences for system
development and evaluation purposes.  The whole corpus will comprise
multiple genres s where plausible in both subtasks.  Recommendations of
additional language resources will also be listed/provided for some
languages by the organizers.  These resources might include, but will not
be limited to, dictionaries, articles, social media posts and bilingual
(aligned) texts for the target languages.

The tasks will be organized into two subtasks - constrained and
semi-constrained, in the sense on the availability of annotated data in the
corpora.  In the constrained subtasks, participants will use only the
corpora provided by the shared task in the development of systems, where
comparisons among different technologies exhibiting their pros and cons are
easier to be made.  In the semi-constrained subtasks, participants are
encouraged to use additional publicly available resources to further
improve the performance of their systems.  The four subtasks are as
follows; participants can take part in any (and all) of the subtasks.  It
should be noted that for the external data used in semi-constrained
subtasks, only un-annotated (raw) data can be used, while annotated data
with word or morpheme boundaries cannot.



   Task: Word Segmentation (WS)

      Subtask: Word Segmentation - Constrained (WSC)

      Subtask: Word Segmentation - Semi-constrained (WSS)

   Task: Morpheme Segmentation (MS)

      Subtask: Morpheme Segmentation - Constrained (MSC)

      Subtask: Morpheme Segmentation - Semi-constrained (MSS)

In the development, results of systems tuned only with the given
development sets must be submitted.  Participants may also submit
additional results tuned with different development sets, provided a
description on how these sets are produced is given, e.g. a subset derived
manually from the original given development set or by using some other
method.  The organizers will provide results of baseline systems for
constrained morpheme segmentation (MSC) and constrained word segmentation
(WSC) tasks.  The results of submitted systems will be evaluated against
the prepared test set for each language.  Precision, recall and F1 measure
will be used as metrics for the evaluation.

TARGET LANGUAGES (listed in alphabetical order)


   Word Segmentation: Mandarin Chinese, Thai, Vietnamese.

   Morpheme Segmentation: Basque, Farsi, Japanese, Finnish, Kazakh,
   Marathi, Uyghur.


The format of the data is shown as below.


   Uyghur; morpheme segmentation

 ئسلاھ‪//‬ئات ئاچ‪//‬ئې‪//‬ۋەت‪//‬ىش‪//‬نى چوڭ‪//‬قۇر ئىلگىرى سۈر//دۇق


   Basque, i.e. Euskara; morpheme segmentation

      Paper\\a\\k mahai\\a\\ren gain\\ean daude


   Mandarin Chinese; word segmentation

      美國 喬治亞 州 首府 亞特蘭大


   May 20, 2017             Shared Task Website Ready
   May 20, 2017      First Call for Participants Ready
   May 20, 2017      Registration Begins
   June 20, 2017      Release of Training Set
   July 5, 2017      Dryrun: Release of Development Set
   July 8, 2017      Dry run: Results Submission on Development Set
   July 10, 2017      Dryrun: Release of Scores
   July 12, 2017      Release of Surprise Languages (Training and
   Development Sets)
   July 20, 2017      Registration Ends
   July 24, 2017      Release of Test Set
   July 31, 2017      Submission of Systems
   August 4, 2017      System Results
   August 11, 2017      System Description Paper Due
   August 18, 2017      Notification of Acceptance
   August 25, 2017      Camera-Ready Deadline


Please send a registration email to mlp2017.sharedta...@gmail.com with the
following information:





   Contact person:


      Last Name

      First Name

      Email address

   Tasks and Subtasks to participate in.

The title of a registration email should be: *Registration*.

ORGANIZERS: [listed in alphabetical order]

Alberto Poncelas

ADAPT Centre, Dublin City University

Alex Huynh

University of Science, Vietnam National University Ho Chi Minh City

Chao-Hong Liu

ADAPT Centre, Dublin City University

Dinh Dien

University of Science, Vietnam National University Ho Chi Minh City

Francis Tyers UiT

Norgga árktalaš universitehta

Majid Latifi

Universitat Politècnica de Catalunya


Inner Mongolia University

Prachya Boonkwan

National Electronics and Computer Technology Center

Teresa Lynn

ADAPT Centre, Dublin City University

Thepchai Supnithi

National Electronics and Computer Technology Center

Tommi A Pirinen

Universität Hamburg

Qun Liu

ADAPT Centre, Dublin City University

Vinit Ravishankar

Maharashtra Institute of Technology

Yating Yang

 University of Chinese Academy of Sciences
Mt-list site list

Reply via email to