[apologies for cross-posting]






Workshop on Corpora and Tools for Processing Corpora

http://propor2016.di.fc.ul.pt/?page_id=383


July 12, 2016 — Tomar, Portugal




Co-located with PROPOR 2016

http://propor2016.di.fc.ul.pt/






=====================  Call For Papers   ======================



*Motivation*

A great deal of the popularity of statistical machine translation
solutions is due to the availability of software packages that are
making increasingly easier and faster to train a working machine
translation system. For this deployment to take place, these packages
have been seen as just requiring to be fed with a sufficiently large
volume of data, including some form of parallel corpora of raw text.

While advances in ever more sophisticated aspects of language technology
have permitted this to become increasingly feasible, it has been left in
the shadow the fact that the data needed to feed these systems still
require a considerable deal of preparation. Given the volume of
appropriate corpora needed, this preparation can only be practical if
suitable datasets are available, on the one hand; and, on the other
hand, if this preparation is supported by a number of shallow processing
tools, such as boilerplate removers, tokenisers, orthographic
normalisers, hyphenators, foreign word detectors, inflectional
analysers, etc.

While the construction of this type of tools is no longer a hot topic
for cutting-edge research in language technology, resorting to them may
turn out to be in many cases less easy than finding and using the much
more sophisticated modules needed to deploy the machine translation
systems. This is a specially acute situation when it comes to the vast
majority of languages, which are comparatively less resourced than
English in terms of language technology, and it comes to tools
performing at the state of the art level and furthermore are openly
available to be reused.

It goes without saying that these negative circumstances go on par with
and get aggravated by the fact that suitable parallel texts are not
available or easy to obtain. Interestingly, many times such tools and
datasets exist and yet their development has never been documented in a
publication or their availability has never been disseminated.



*Aims*

The present workshop seeks to contribute to improve on this state of
affairs by helping to map both available parallel datasets suitable to
feed statistical machine translation systems and available language
processing tools useful for their preparation.

While pursuing this goal, the workshop seeks also to exchange ideas and
disseminate best practices that help to foster the ELRC and CEF.AT
(http://www.lr-coordination.eu) initiatives.



*Call*

We thus invite submissions reporting on language resources suitable to
support statistical machine translation from/into Portuguese and on
processing tools for their preparation. Different types of presentations
are possible, under the form of an oral presentation and/or of a
demonstration. While the workshop seeks to attract and promote papers
concerning language resources and tools not yet documented in previous
publications, for the sake of encompassing representativeness, renewed
papers on the other tools and resources are also welcome.



*Formats*

The submissions should be in the .pdf file format, should not exceed 8
pages, and should use the article template that can be found here:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0(consider
sections under header “CS Proceedings and Other Multiauthor Volumes”).
Papers shall be submitted via the EasyChair online platform:
https://www.easychair.org/conferences/conference_change_yes.cgi?a=10930112.



*Publication*

Accepted papers will be published in a special issue of the journal of
the Portuguese Language Department of the Directorate-General of
Translation of the European Commission, freely available online.



*Fees*

The participation in the workshop, for authors or non-authors of papers
alike, is free of charge. The organization of the workshop is supported
by the Portuguese Language Department of the Directorate-General of
Translation of the European Commission.



*Language*

The workshop invites submissions on resources and tools for any language
that fit into the stated aim of this workshop. English is the working
language for submissions and in the workshop.



*Dates*

February 25: First call for papers
March 21: Final call for papers
April 15: Deadline for submissions
May 16: Notification sent to authors
June 1: Camera-ready papers ready
July 12, 2016: Workshop takes place



*Organization Committee*

Hilário Leal Fontes, DGT — European Commission (chair)
Paulo Batista, DGT — European Commission
António Branco, University of Lisbon



*Programme Committee*

Hilário Leal Fontes, European Commission (co-chair)
António Branco, University of Lisbon  (co-chair)
Alexandru Ceausu, AMPLEXOR Luxembourg
Aline Villavicencio, Universidade Federal do Rio Grande do Sul
Amália Mendes, Centro de Linguística da Universidade de Lisboa
Belinda Maia, Universidade do Porto
Francis Tyers, Universitetet i Tromsø
Gabriel Lopes, Faculdade de Ciências e Tecnologia, UNL
Gorka Labaka, University of the Basque Country
Jorge Baptista, CECL/U. Algarve and L2F-Spoken Language Lab/INESC ID Lisboa
José Ramom Pichel Campos, imaxin|software
Luís Trigo, LIAAD-INESC Porto L.A.
Luísa Coheur, IST/INESC-ID Lisboa
M.T. Carrasco Benitez, European Commission
Maria José Machado, European Commission
Michael Jellinghaus, European Commission
Mikel Forcada, DLSI — Universitat d’Alacant
Paulo Quaresma, Universidade de Évora
Paulo Correia, European Commission
Thiago Pardo, Universidade de São Paulo
Xavier Gómez Guinovart, Universidade de Vigo



*Contact:*

Hilário Leal Fontes, hilario.fon...@ec.europa.eu








_______________________________________________
Mt-list site list
Mt-list@eamt.org
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to