Final Call For Papers [apologies for cross-postings]
===================================================== Corpora and Tools for Processing Corpora http://propor2016.di.fc.ul.pt/?page_id=383 ***NEW DATE*** July 13, 2016 — Tomar, Portugal Workshop co-located with PROPOR 2016 http://propor2016.di.fc.ul.pt/ Motivation A great deal of the popularity of statistical machine translation solutions is due to the availability of software packages that are making increasingly easier and faster to train a working machine translation system. For this deployment to take place, these packages have been seen as just requiring to be fed with a sufficiently large volume of data, including some form of parallel corpora of raw text. While advances in ever more sophisticated aspects of language technology have permitted this to become increasingly feasible, it has been left in the shadow the fact that the data needed to feed these systems still require a considerable deal of preparation. Given the volume of appropriate corpora needed, this preparation can only be practical if suitable datasets are available, on the one hand; and, on the other hand, if this preparation is supported by a number of shallow processing tools, such as boilerplate removers, tokenisers, orthographic normalisers, hyphenators, foreign word detectors, inflectional analysers, etc. While the construction of this type of tools is no longer a hot topic for cutting-edge research in language technology, resorting to them may turn out to be in many cases less easy than finding and using the much more sophisticated modules needed to deploy the machine translation systems. This is a specially acute situation when it comes to the vast majority of languages, which are comparatively less resourced than English in terms of language technology, and it comes to tools performing at the state of the art level and furthermore are openly available to be reused. It goes without saying that these negative circumstances go on par with and get aggravated by the fact that suitable parallel texts are not available or easy to obtain. Interestingly, many times such tools and datasets exist and yet their development has never been documented in a publication or their availability has never been disseminated. Aims The present workshop seeks to contribute to improve on this state of affairs by helping to map both available parallel datasets suitable to feed statistical machine translation systems and available language processing tools useful for their preparation. While pursuing this goal, the workshop seeks also to exchange ideas and disseminate best practices that help to foster the ELRC and CEF.AT <http://cef.at/> (http://www.lr-coordination.eu) initiatives. Call We thus invite submissions reporting on language resources suitable to support statistical machine translation from/into Portuguese and on processing tools for their preparation. Different types of presentations are possible, under the form of an oral presentation and/or of a demonstration. While the workshop seeks to attract and promote papers concerning language resources and tools not yet documented in previous publications, for the sake of encompassing representativeness, renewed papers on the other tools and resources are also welcome. Formats The submissions should be in the .pdf file format, should not exceed 8 pages, and should use the article template that can be found here: http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0 (consider sections under header “CS Proceedings and Other Multiauthor Volumes”). Papers shall be submitted via the EasyChair online platform: https://www.easychair.org/conferences/conference_change_yes.cgi?a=10930112. Publication Accepted papers will be published in a special issue of the journal of the Portuguese Language Department of the Directorate-General of Translation of the European Commission, freely available online. Fees The participation in the workshop, for authors or non-authors of papers alike, is free of charge. The organization of the workshop is supported by the Portuguese Language Department of the Directorate-General of Translation of the European Commission. Language The workshop invites submissions on resources and tools for any language that fit into the stated aim of this workshop. English is the working language for submissions and in the workshop. Dates February 25: First call for papers March 24: Final call for papers April 15: Deadline for submissions May 16: Notification sent to authors June 1: Camera-ready papers ready JULY 13, 2016: Workshop takes place Organization Committee Hilário Leal Fontes, DGT — European Commission (chair) Paulo Batista, DGT — European Commission António Branco, University of Lisbon Programme Committee António Branco, University of Lisbon (co-chair) Hilário Leal Fontes, European Commission (co-chair) Alexandru Ceausu, AMPLEXOR Luxembourg Aline Villavicencio, Universidade Federal do Rio Grande do Sul Amália Mendes, Centro de Linguística da Universidade de Lisboa Belinda Maia, Universidade do Porto Francis Tyers, Universitetet i Tromsø Gabriel Lopes, Faculdade de Ciências e Tecnologia, UNL Gorka Labaka, University of the Basque Country Jorge Baptista, CECL/U. Algarve and L2F-Spoken Language Lab/INESC ID Lisboa José Ramom Pichel Campos, imaxin|software Luís Trigo, LIAAD-INESC Porto L.A. Luísa Coheur, IST/INESC-ID Lisboa M.T. Carrasco Benitez, European Commission Maria José Machado, European Commission Michael Jellinghaus, European Commission Mikel Forcada, DLSI — Universitat d’Alacant Paulo Quaresma, Universidade de Évora Paulo Correia, European Commission Thiago Pardo, Universidade de São Paulo Xavier Gómez Guinovart, Universidade de Vigo Contacts: Hilário Leal Fontes, hilario.fon...@ec.europa.eu
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support