Final Call For Papers

[apologies for cross-postings]

=====================================================

Corpora and Tools for Processing Corpora
http://propor2016.di.fc.ul.pt/?page_id=383



***NEW DATE*** July 13, 2016 — Tomar, Portugal

Workshop co-located with PROPOR 2016
http://propor2016.di.fc.ul.pt/



Motivation

A great deal of the popularity of statistical machine translation solutions
is due to the availability of software packages that are making
increasingly easier and faster to train a working machine translation
system. For this deployment to take place, these packages have been seen as
just requiring to be fed with a sufficiently large volume of data,
including some form of parallel corpora of raw text.

While advances in ever more sophisticated aspects of language technology
have permitted this to become increasingly feasible, it has been left in
the shadow the fact that the data needed to feed these systems still
require a considerable deal of preparation. Given the volume of appropriate
corpora needed, this preparation can only be practical if suitable datasets
are available, on the one hand; and, on the other hand, if this preparation
is supported by a number of shallow processing tools, such as boilerplate
removers, tokenisers, orthographic normalisers, hyphenators, foreign word
detectors, inflectional analysers, etc.

While the construction of this type of tools is no longer a hot topic for
cutting-edge research in language technology, resorting to them may turn
out to be in many cases less easy than finding and using the much more
sophisticated modules needed to deploy the machine translation systems.
This is a specially acute situation when it comes to the vast majority of
languages, which are comparatively less resourced than English in terms of
language technology, and it comes to tools performing at the state of the
art level and furthermore are openly available to be reused.

It goes without saying that these negative circumstances go on par with and
get aggravated by the fact that suitable parallel texts are not available
or easy to obtain. Interestingly, many times such tools and datasets exist
and yet their development has never been documented in a publication or
their availability has never been disseminated.



Aims

The present workshop seeks to contribute to improve on this state of
affairs by helping to map both available parallel datasets suitable to feed
statistical machine translation systems and available language processing
tools useful for their preparation.

While pursuing this goal, the workshop seeks also to exchange ideas and
disseminate best practices that help to foster the ELRC and CEF.AT
<http://cef.at/> (http://www.lr-coordination.eu) initiatives.



Call

We thus invite submissions reporting on language resources suitable to
support statistical machine translation from/into Portuguese and on
processing tools for their preparation. Different types of presentations
are possible, under the form of an oral presentation and/or of a
demonstration. While the workshop seeks to attract and promote papers
concerning language resources and tools not yet documented in previous
publications, for the sake of encompassing representativeness, renewed
papers on the other tools and resources are also welcome.



Formats

The submissions should be in the .pdf file format, should not exceed 8
pages, and should use the article template that can be found here:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0 (consider
sections under header “CS Proceedings and Other Multiauthor Volumes”).
Papers shall be submitted via the EasyChair online platform:
https://www.easychair.org/conferences/conference_change_yes.cgi?a=10930112.



Publication

Accepted papers will be published in a special issue of the journal of the
Portuguese Language Department of the Directorate-General of Translation of
the European Commission, freely available online.



Fees

The participation in the workshop, for authors or non-authors of papers
alike, is free of charge. The organization of the workshop is supported by
the Portuguese Language Department of the Directorate-General of
Translation of the European Commission.



Language

The workshop invites submissions on resources and tools for any language
that fit into the stated aim of this workshop. English is the working
language for submissions and in the workshop.



Dates

February 25: First call for papers

March 24: Final call for papers

April 15: Deadline for submissions

May 16: Notification sent to authors

June 1: Camera-ready papers ready

JULY 13, 2016: Workshop takes place



Organization Committee

Hilário Leal Fontes, DGT — European Commission (chair)

Paulo Batista, DGT — European Commission

António Branco, University of Lisbon



Programme Committee

António Branco, University of Lisbon  (co-chair)

Hilário Leal Fontes, European Commission (co-chair)

Alexandru Ceausu, AMPLEXOR Luxembourg

Aline Villavicencio, Universidade Federal do Rio Grande do Sul

Amália Mendes, Centro de Linguística da Universidade de Lisboa

Belinda Maia, Universidade do Porto

Francis Tyers, Universitetet i Tromsø

Gabriel Lopes, Faculdade de Ciências e Tecnologia, UNL

Gorka Labaka, University of the Basque Country

Jorge Baptista, CECL/U. Algarve and L2F-Spoken Language Lab/INESC ID Lisboa

José Ramom Pichel Campos, imaxin|software

Luís Trigo, LIAAD-INESC Porto L.A.

Luísa Coheur, IST/INESC-ID Lisboa

M.T. Carrasco Benitez, European Commission

Maria José Machado, European Commission

Michael Jellinghaus, European Commission

Mikel Forcada, DLSI — Universitat d’Alacant

Paulo Quaresma, Universidade de Évora

Paulo Correia, European Commission

Thiago Pardo, Universidade de São Paulo

Xavier Gómez Guinovart, Universidade de Vigo



Contacts:

Hilário Leal Fontes, hilario.fon...@ec.europa.eu
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to