Re: [Moses-support] Moses untranslated words
Did you tokenize the test set same as the training corpus ?? Sometimes words beside (concatenated) to {" : ; , - _ .} remain untranslated Marwa N Refaie On 27 Feb 2016 19:20, at 19:20, Haithem Afliwrote: >Hi Kamel, > >Could you provide some examples? > >-Haithem > >On 26 February 2016 at 13:18, kamel Bouzidi >wrote: > >> Hi , >> I have a problem with the output moses . i found some untranslated >words >> in moses output , i don't understand why , and this word it's not ( >oov ) >> because it exist in my language model. >> can you help me . >> and thank you. >> >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > >-- > >*Haithem Afli* | Postdoctoral Researcher >ADAPT Centre >School of Computing p: +353 (0) 1 700 6711 >Dublin City University m: +353 (0) 89 984 6260 ><%2B353%20%280%29%2089%20449%206268> >Dublin 9 e: haithem.a...@adaptcentre.ie >Ireland www.adaptcentre.ie > >-- > > >*Email Disclaimer"This e-mail and any files transmitted with it are >confidential and are intended solely for use by the addressee. Any >unauthorised dissemination, distribution or copying of this message and >any attachments is strictly prohibited. If you have received this >e-mail in error, please notify the sender and delete the message. Any >views or opinions presented in this e-mail may solely be the views of >the author and cannot be relied upon as being those of Dublin City >University. E-mail communications such as this cannot be guaranteed to >be virus-free, timely, secure or error-free and Dublin City University >does not accept liability for any such matters or their consequences. >Please consider the environment before printing this e-mail."Séanadh >Ríomhphoist"Tá an ríomhphost seo agus aon chomhad a sheoltar leis faoi >rún agus is lena úsáid ag an seolaí agus sin amháin é. Tá cosc iomlán >ar scaipeadh, dháileadh nó chóipeáil neamhúdaraithe ar an >teachtaireacht seo agus ar aon cheangaltán atá ag dul leis. Má tá an >ríomhphost seo faighte agat trí dhearmad cuir sin in iúl le do thoil >don seoltóir agus scrios an teachtaireacht. D’fhéadfadh sé gurb iad >tuairimí an údair agus sin amháin atá in aon tuairimí no dearcthaí atá >curtha i láthair sa ríomhphost seo agus níor chóir glacadh leo mar >thuairimí nó dhearcthaí Ollscoil Chathair Bhaile Átha Cliath. Ní >ghlactar leis go bhfuil cumarsáid ríomhphoist den sórt seo saor ó >víreas, in am, slán, nó saor ó earráid agus ní ghlacann Ollscoil >Chathair Bhaile Átha Cliath le dliteanas in aon chás den sórt sin ná as >aon iarmhairt a d’eascródh astu. Cuimhnigh ar an timpeallacht le do >thoil sula gcuireann tú an ríomhphost seo i gcló."* > > > > > >___ >Moses-support mailing list >Moses-support@mit.edu >http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses untranslated words
Hi Kamel, Could you provide some examples? -Haithem On 26 February 2016 at 13:18, kamel Bouzidiwrote: > Hi , > I have a problem with the output moses . i found some untranslated words > in moses output , i don't understand why , and this word it's not ( oov ) > because it exist in my language model. > can you help me . > and thank you. > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- *Haithem Afli* | Postdoctoral Researcher ADAPT Centre School of Computing p: +353 (0) 1 700 6711 Dublin City University m: +353 (0) 89 984 6260 <%2B353%20%280%29%2089%20449%206268> Dublin 9 e: haithem.a...@adaptcentre.ie Ireland www.adaptcentre.ie -- *Email Disclaimer"This e-mail and any files transmitted with it are confidential and are intended solely for use by the addressee. Any unauthorised dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender and delete the message. Any views or opinions presented in this e-mail may solely be the views of the author and cannot be relied upon as being those of Dublin City University. E-mail communications such as this cannot be guaranteed to be virus-free, timely, secure or error-free and Dublin City University does not accept liability for any such matters or their consequences. Please consider the environment before printing this e-mail."Séanadh Ríomhphoist"Tá an ríomhphost seo agus aon chomhad a sheoltar leis faoi rún agus is lena úsáid ag an seolaí agus sin amháin é. Tá cosc iomlán ar scaipeadh, dháileadh nó chóipeáil neamhúdaraithe ar an teachtaireacht seo agus ar aon cheangaltán atá ag dul leis. Má tá an ríomhphost seo faighte agat trí dhearmad cuir sin in iúl le do thoil don seoltóir agus scrios an teachtaireacht. D’fhéadfadh sé gurb iad tuairimí an údair agus sin amháin atá in aon tuairimí no dearcthaí atá curtha i láthair sa ríomhphost seo agus níor chóir glacadh leo mar thuairimí nó dhearcthaí Ollscoil Chathair Bhaile Átha Cliath. Ní ghlactar leis go bhfuil cumarsáid ríomhphoist den sórt seo saor ó víreas, in am, slán, nó saor ó earráid agus ní ghlacann Ollscoil Chathair Bhaile Átha Cliath le dliteanas in aon chás den sórt sin ná as aon iarmhairt a d’eascródh astu. Cuimhnigh ar an timpeallacht le do thoil sula gcuireann tú an ríomhphost seo i gcló."* ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] [CPF] Special Session on Language Technologies LT-2016 at ICIST-2016
Call For Papers The Special Session on LANGUAGE TECHNOLOGIES will be held at the 22nd International Conference on Information and Software Technologies (ICIST) scheduled on the 13-15 October, 2016 in Druskininkai, Lithuania. SCOPE Since human language is the most natural way of communication, linguistically competent software would greatly facilitate humans’ interaction with computers and help in our needs. The field of Language Technologies (LT) gained a lot of interest and made enormous progress during the last decades. LT is an interdisciplinary field dealing with the statistical or rule-based modeling and involving practitioners of artificial intelligence, computer science, engineering, information retrieval, linguistics, phonetics, or psychology. The special session on LT provides a focus for this work, and encourages interdisciplinary approach to speech and language research and technology bringing together experts from both academia and industry. The paper submissions reporting original results and system development experience as well as real-world applications are kindly welcomed to this session. TOPICS Authors are invited to submit full papers describing original research work associated with Language Technologies including, but not limited to: - Natural Language Processing (tagging systems, stemming, parsing and syntactical analysis, corpus-based language engineering) - Natural Language Understanding (text analysis, ontology, formal semantics) - Language-based Knowledge Engineering (text and data mining, knowledge acquisition, knowledge representation and reasoning) - Cognitive models and AI techniques (graph based models, semantic nets, neural networks, and cognitive maps) - Language Generation (dialogue-based systems, creative and writing systems, language synthesis, translation) - Multi-modalities Computational Linguistics (speech recognition, speech-text conversions, speech analysis and textual tagging) - Applications and Systems (search and information retrieval, web applications, forensics, cognitive systems, question-answer systems, translation systems, documents classifiers) SUBMISSION OF PAPERS Authors are invited to submit their papers in English through the conference submission system. Submissions must be original and should not have been published previously. All papers will be peer-reviewed the ICIST-2016 Program Committee and judged with respect to their quality, originality, and relevance. For further details, please consult the conference web pages. PUBLICATION All accepted papers will appear in the ICIST Conference Proceedings Volume (published by Springer as a part of CCIS series) and submitted for indexing to Thomson Reuters ISI. Selected authors of best papers will be also invited to submit extended versions of their papers to Information Technology and Control journal (ISSN 1392-124X; Thomson Reuters ISI Impact factor 0,623 (2014)). IMPORTANT DATES 01 05 2016 – Submission of papers 05 06 2016 – Notification of papers acceptance 23 06 2016 – Final manuscripts due SPECIAL SESSION CHAIR Assoc. Prof. Jurgita Kapočiūtė-Dzikienė Vytautas Magnus University, Lithuania j.kapociute-dziki...@if.vdu.lt CO-CHAIRS Peter Dirix University of Leuven, Belgium peedi...@hotmail.com SPONSORED AND SUPPORTED BY Kaunas University of Technology (KTU) Vytautas Magnus University (VMU) CONFERENCE CONTACTS E-mail: ic...@ktu.lt Web site: http://icist.if.ktu.lt/ More information and registration on http://icist.if.ktu.lt/ ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] 2ndCFP - 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia
Apologies for cross posting. you are requested to kindly circulate it for wider publicity... *3rd Workshop on Indian Language Data: Resources and Evaluation (WILDRE-3)* *Date: Tuesday, 24th May 2016* *Venue: *Grand Hotel Bernardin Conference Center, Portorož, Slovenia (Organized under LREC2016 (23-28 May 2016) *Website:* · *Main website* - http://sanskrit.jnu.ac.in/conf/wildre3 · *Submit papers on* - http://www.softconf.com/lrec2016/WILDRE3/ WILDRE – the 3rd workshop on Indian Language Data: Resources and Evaluation is being organized in Portorož, Slovenia on 24th May, 2016 under the LREC platform. India has a huge linguistic diversity and has seen concerted efforts from the Indian government and industry towards developing language resources. European Language Resource Association (ELRA) and its associate organizations have been very active and successful in addressing the challenges and opportunities related to language resource creation and evaluation. It is therefore a great opportunity for resource creators of Indian languages to showcase their work on this platform and also to interact and learn from those involved in similar initiatives all over the world. The broader objectives of the WILDRE will be · To map the status of Indian Language Resources · To investigate challenges related to creating and sharing various levels of language resources · To promote a dialogue between language resource developers and users To provide opportunity for researchers from India to collaborate with researchers from other parts of the world *IMPORTANT * *DATES * *March 01, 2016 Paper submissions due **(Extended Deadline)* ** *March 26, 2016 Paper notification of acceptance * *April 6, 2016 Camera-ready papers due * *May 24, 2016 Workshop* *SUBMISSIONS* Papers must describe original, completed or in progress, and unpublished work. Each submission will be reviewed by two program committee members. Accepted papers will be given up to 10 pages (for full papers) 5 pages (for short papers and posters) in the workshop proceedings, and will be presented oral presentation or poster. Papers should be formatted according to the style-sheet, which will be provided on the LREC 2016 website (lrec2016.lrec-conf.org/en/). Please submit papers in PDF/doc format to the LREC website. We are seeking submissions under the following category · Full papers (10 pages) · Short papers (work in progress – 5 pages) · Posters (innovative ideas/proposals, research proposal of students - 1 page) · Demo (of working online/standalone systems - 1 page) WILDRE-3 will have a special focus on Demos of Indian Language Technology. In the past few years, as more resources have been developed and made available, there has been an increased activity in developing usable technology using these. WILDRE-3 would like to encourage and widen the Demo track to allow the community to showcase their demos and have mutually beneficial interactions with each other as well as resource developers. WILDRE-3 will invite technical, policy and position paper submissions on the following topics related to Indian Language Resources: · Corpora - text, speech, multimodal, methodologies, annotation and tools · Lexicons and Machine-readable dictionaries · Ontologies · Grammars · Language resources for basic NLP, IR and Speech Technology tasks, tools and Infrastructure for constructing and sharing language resources · Standards or specifications for language resources applications · Licensing and copyright issues Both submission and review processes handled electronically. The review process will be blind. The workshop website will provide the submission guidelines and the link for the electronic submission. *Special Note :* The review process will be completely anonymous. Therefore, those who have submitted their manuscript or those who are planning to submit manuscript, are requested to submit their 'anonymous' *manuscript that is* *without author name, affiliation,email etc*. When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation ones, etc. For further information on this initiative, please refer to http://lrec2016.lrec-conf.org/en/ *Conference Chairs* · Girish Nath Jha, Jawaharlal Nehru University, India · Kalika Bali, Microsoft Research India Lab, Bangalore, India · Sobha L, AU-KBC, Anna University, Chennai, India *Program Committee (to be
[Moses-support] Moses untranslated words
Hi , I have a problem with the output moses . i found some untranslated words in moses output , i don't understand why , and this word it's not ( oov ) because it exist in my language model. can you help me . and thank you.___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] bleu-annotation / analysis.perl
Ok obviously this is a modified bleu algorithm, similar to what sentence-bleu does.However I believe this is still not right for unigram sentences.De : "Vincent Nguyen"Date : 26 févr. 2016 22:21:59A : moses-support@mit.eduSujet : Re: [Moses-support] bleu-annotation / analysis.perlAm I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ?(by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)Le 26/02/2016 10:02, Vincent Nguyen a écrit :___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support