[Apologies for cross-postings] ================================================================== The First Workshop on Multi-Language Processing in a Globalising World (MLP 2017) September 4-5, Dublin City University, Ireland http://mlp.computing.dcu.ie/
Workshop Registration NOW Open! https://www.eventbrite.com/e/multi-language-processing-in-a- globalising-world-tickets-35962771618 Shared Tasks on Cross-lingual Word Segmentation and Morpheme Segmentation http://mlp.computing.dcu.ie/mlp2017_Shared_Task.html Release of Test Set: July 24, 2017 We have released corpora in eight languages (Basque, Farsi, Filipino, Finnish, Kazakh, Marathi, Uyghur, Vietnamese) and there are more to come. Training and development sets are already available. We will send you the data upon received your registration email to the shared task. ================================================================== ---------- Forwarded message ---------- From: Qun Liu <liuqu...@gmail.com> Date: Thu, Jun 1, 2017 at 10:50 PM Subject: MLP2017: First Workshop on Multi-Language Processing in a Globalising World The First Workshop on Multi-Language Processing in a Globalising World (MLP 2017) September 4-5, Dublin City University, Ireland http://mlp.computing.dcu.ie/ *MLP 2017 **Call for Abstracts* *MLP 2017 **Call for Participation in Shared Tasks* ---------------------------------------------------------- *MLP 2017 **Call for Abstracts* The First Workshop on Multi-Language Processing in a Globalising World (MLP2017), organized by ADAPT Centre, Dublin City University (DCU), Ireland, will be held at DCU on September 4–5, 2017. Globalisation has, on the one hand, brought us significant growth in free international trade and cross-cultural communication, as well as access to newly-developed technology, media, education, healthcare, consumer goods, etc. On the other hand, it may have negative impacts on local societies, such as cultural homogenisation. To embrace cultural diversity and multilingual phenomena, experts with research interests in different languages are invited to participate in this workshop. This international workshop will be organised as a forum and we invite natural language processing researchers and linguists to come together to discuss the current status and future directions of research in multilingualism and minority languages in this globalising world. The workshop aims to provide a research forum dedicated to state-of-the-art methods and techniques on multi-language and cross-language processing and exploring the use of such technologies in specific tasks. The workshop will solicit original and ongoing research contributions related to the theme, which includes but are not limited to: - Theoretical and applied linguistic research for multilinguality and minority languages - Text encoding theory and transcoding techniques - Resource construction, such as multilingual corpora and corpora for minority languages - Speech, lexical, syntactic, semantic analytics for multiple languages or minority languages - Cross-language adaptation for natural language processing - Multi-language, cross-language and minority language processing methods and applications in machine translation, speech recognition, information retrieval etc. - Evaluation metrics for multi-language, cross-language and minority language processing - Multi-language, cross-language and minority language processing for social media and user generated content - Deep learning and expressions for multi-language processing - Minority languages in emergency responses and security/disaster management - Multi-language or cross-language named entity recognition, entity relation extraction and event extraction - Multi-language or cross-language linked data or knowledge graph - Multi-language or cross-language anaphora resolution and discourse analysis - Multi-language or cross-language sentiment analysis - Multi-language or cross-language text classification and generation - Transliteration and machine translation The language of the Workshop is English. Abstract submissions may include research results as well as work in progress. Submissions must have a clear focus on specific issues pertaining to and cross-language processing, including minority language processing. Descriptions of commercial systems are welcome, provided the authors are willing to discuss the details of their work. The number of pages should be limited to 1–2 pages. We suggest you structure your abstract using the following headings but it is not mandatory: - Introduction - Existing Work - Methods Proposed - Results - Conclusion Only the PDF version of the abstract is accepted. The submitted abstract will be subject to a double-blind review, and must not contain authors' names and affiliations. Abstracts are submitted to https://easychair.org/conferences/?conf=mlp2017. Important Dates: - Abstract Submission: June 30, 2017 - Notification of Acceptance: July 20, 2017 - Final Manuscript Submission: August 5, 2017 - Registration: July 25, 2017 - Conference: September 4-5, 2017 Contact: Conference Chair: Qun LIU (ADAPT Centre, Dublin City University, Ireland) Program Chair: Mikel L. Forcada (Universitat d’Alacant, Spain) Email: mlp2017.organisat...@gmail.com ----------------------------------------------------- *MLP 2017 Call for Participation in Shared Tasks on Cross-lingual Word Segmentation and Morpheme Segmentation* The analysis of word formation is among the most fundamental natural language processing (NLP) technologies for extracting basic processing units for further NLP tasks in many languages. There are broadly two groups of segmentation tasks related to word formation, i.e. morpheme segmentation and word segmentation. Morpheme segmentation is required in languages such as Turkish, for example, where words are formed by stems, root words, prefixes, and/or suffixes. It is the foundation for further morphological analysis tasks. Word segmentation is necessary in languages such as Mandarin Chinese, where there are no word boundaries in the writing system. Although there is clear similarity among different languages in terms of either morpheme segmentation or word segmentation, most of these tools are designed specifically for one language. In this shared task, we encourage the participants to submit the results of one system/method as applied to multiple languages for one of the two segmentation tasks. These systems are expected to demonstrate the ability of cross-lingual processing on the segmentation tasks, which would give insights to our community into the building of fundamental NLP tools for low resource languages. Popular languages such as Chinese and Japanese are also included in the task for two reasons. Firstly, although morpheme segmentation and word segmentation tools for these languages have been developed for many years and are often regarded as mature technologies, human creativity, variability of textual genres and dialects as exhibited in language evolution still make them challenging problems to these languages. Secondly, we would like to encourage participants of this shared task to develop systems/methods that can be used across different languages where morpheme segmentation or word segmentation is required for natural language processing. A corpus of at least 2,000 sentences will be prepared as the training set in each language for either morpheme segmentation or word segmentation. Development and test sets will each include 1,000 sentences for system development and evaluation purposes. The whole corpus will comprise multiple genres s where plausible in both subtasks. Recommendations of additional language resources will also be listed/provided for some languages by the organizers. These resources might include, but will not be limited to, dictionaries, articles, social media posts and bilingual (aligned) texts for the target languages. The tasks will be organized into two subtasks - constrained and semi-constrained, in the sense on the availability of annotated data in the corpora. In the constrained subtasks, participants will use only the corpora provided by the shared task in the development of systems, where comparisons among different technologies exhibiting their pros and cons are easier to be made. In the semi-constrained subtasks, participants are encouraged to use additional publicly available resources to further improve the performance of their systems. The four subtasks are as follows; participants can take part in any (and all) of the subtasks. It should be noted that for the external data used in semi-constrained subtasks, only un-annotated (raw) data can be used, while annotated data with word or morpheme boundaries cannot. - Task: Word Segmentation (WS) - Subtask: Word Segmentation - Constrained (WSC) - Subtask: Word Segmentation - Semi-constrained (WSS) - Task: Morpheme Segmentation (MS) - Subtask: Morpheme Segmentation - Constrained (MSC) - Subtask: Morpheme Segmentation - Semi-constrained (MSS) In the development, results of systems tuned only with the given development sets must be submitted. Participants may also submit additional results tuned with different development sets, provided a description on how these sets are produced is given, e.g. a subset derived manually from the original given development set or by using some other method. The organizers will provide results of baseline systems for constrained morpheme segmentation (MSC) and constrained word segmentation (WSC) tasks. The results of submitted systems will be evaluated against the prepared test set for each language. Precision, recall and F1 measure will be used as metrics for the evaluation. TARGET LANGUAGES (listed in alphabetical order) - Word Segmentation: Mandarin Chinese, Thai, Vietnamese. - Morpheme Segmentation: Basque, Farsi, Japanese, Finnish, Kazakh, Marathi, Uyghur. DATA SAMPLE The format of the data is shown as below. - Uyghur; morpheme segmentation ئسلاھ//ئات ئاچ//ئې//ۋەت//ىش//نى چوڭ//قۇر ئىلگىرى سۈر//دۇق - Basque, i.e. Euskara; morpheme segmentation Paper\\a\\k mahai\\a\\ren gain\\ean daude - Mandarin Chinese; word segmentation 美國 喬治亞 州 首府 亞特蘭大 SCHEDULE May 20, 2017 Shared Task Website Ready May 20, 2017 First Call for Participants Ready May 20, 2017 Registration Begins June 20, 2017 Release of Training Set July 5, 2017 Dryrun: Release of Development Set July 8, 2017 Dry run: Results Submission on Development Set July 10, 2017 Dryrun: Release of Scores July 12, 2017 Release of Surprise Languages (Training and Development Sets) July 20, 2017 Registration Ends July 24, 2017 Release of Test Set July 31, 2017 Submission of Systems August 4, 2017 System Results August 11, 2017 System Description Paper Due August 18, 2017 Notification of Acceptance August 25, 2017 Camera-Ready Deadline Registration: Please send a registration email to mlp2017.sharedta...@gmail.com with the following information: - Institution: - Name - Country - Contact person: - Title - Last Name - First Name - Email address - Tasks and Subtasks to participate in. The title of a registration email should be: *Registration*. ORGANIZERS: [listed in alphabetical order] Alberto Poncelas ADAPT Centre, Dublin City University Alex Huynh University of Science, Vietnam National University Ho Chi Minh City Chao-Hong Liu ADAPT Centre, Dublin City University Dinh Dien University of Science, Vietnam National University Ho Chi Minh City Francis Tyers UiT Norgga árktalaš universitehta Majid Latifi Universitat Politècnica de Catalunya Nasun-Urt Inner Mongolia University Prachya Boonkwan National Electronics and Computer Technology Center Teresa Lynn ADAPT Centre, Dublin City University Thepchai Supnithi National Electronics and Computer Technology Center Tommi A Pirinen Universität Hamburg Qun Liu ADAPT Centre, Dublin City University Vinit Ravishankar Maharashtra Institute of Technology Yating Yang University of Chinese Academy of Sciences
_______________________________________________ Mt-list site list Mt-list@eamt.org http://lists.eamt.org/mailman/listinfo/mt-list