Re: [Moses-support] Moses untranslated words

2016-02-27 Thread Marwa N Refaie
Did you tokenize the test set same as the training corpus ??
Sometimes words beside (concatenated) to {" : ; , - _ .} remain untranslated

Marwa N Refaie



On 27 Feb 2016 19:20, at 19:20, Haithem Afli  wrote:
>Hi Kamel,
>
>Could you provide some examples?
>
>-Haithem
>
>On 26 February 2016 at 13:18, kamel Bouzidi 
>wrote:
>
>> Hi ,
>> I have a problem with the output moses . i found some untranslated
>words
>> in moses output , i don't understand why , and this word it's not (
>oov )
>> because  it exist in my language model.
>> can you help me .
>> and thank you.
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>--
>
>*Haithem Afli* | Postdoctoral Researcher
>ADAPT Centre
>School of Computing p: +353 (0) 1 700 6711
>Dublin City University m: +353 (0) 89 984 6260
><%2B353%20%280%29%2089%20449%206268>
>Dublin 9 e: haithem.a...@adaptcentre.ie
>Ireland www.adaptcentre.ie
>
>--
>
>
>*Email Disclaimer"This e-mail and any files transmitted with it are
>confidential and are intended solely for use by the addressee. Any
>unauthorised dissemination, distribution or copying of this message and
>any attachments is strictly prohibited. If you have received this
>e-mail in error, please notify the sender and delete the message. Any
>views or opinions presented in this e-mail may solely be the views of
>the author and cannot be relied upon as being those of Dublin City
>University. E-mail communications such as this cannot be guaranteed to
>be virus-free, timely, secure or error-free and Dublin City University
>does not accept liability for any such matters or their consequences.
>Please consider the environment before printing this e-mail."Séanadh
>Ríomhphoist"Tá an ríomhphost seo agus aon chomhad a sheoltar leis faoi
>rún agus is lena úsáid ag an seolaí agus sin amháin é. Tá cosc iomlán
>ar scaipeadh, dháileadh nó chóipeáil neamhúdaraithe ar an
>teachtaireacht seo agus ar aon cheangaltán atá ag dul leis. Má tá an
>ríomhphost seo faighte agat trí dhearmad cuir sin in iúl le do thoil
>don seoltóir agus scrios an teachtaireacht. D’fhéadfadh sé gurb iad
>tuairimí an údair agus sin amháin atá in aon tuairimí no dearcthaí atá
>curtha i láthair sa ríomhphost seo agus níor chóir glacadh leo mar
>thuairimí nó dhearcthaí Ollscoil Chathair Bhaile Átha Cliath. Ní
>ghlactar leis go bhfuil cumarsáid ríomhphoist den sórt seo saor ó
>víreas, in am, slán, nó saor ó earráid agus ní ghlacann Ollscoil
>Chathair Bhaile Átha Cliath le dliteanas in aon chás den sórt sin ná as
>aon iarmhairt a d’eascródh astu. Cuimhnigh ar an timpeallacht le do
>thoil sula gcuireann tú an ríomhphost seo i gcló."*
>
>
>
>
>
>___
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses untranslated words

2016-02-27 Thread Haithem Afli
Hi Kamel,

Could you provide some examples?

-Haithem

On 26 February 2016 at 13:18, kamel Bouzidi 
wrote:

> Hi ,
> I have a problem with the output moses . i found some untranslated words
> in moses output , i don't understand why , and this word it's not ( oov )
> because  it exist in my language model.
> can you help me .
> and thank you.
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 

*Haithem Afli* | Postdoctoral Researcher
ADAPT Centre
School of Computing p: +353 (0) 1 700 6711
Dublin City University m: +353 (0) 89 984 6260
<%2B353%20%280%29%2089%20449%206268>
Dublin 9 e: haithem.a...@adaptcentre.ie
Ireland www.adaptcentre.ie

-- 


*Email Disclaimer"This e-mail and any files transmitted with it are 
confidential and are intended solely for use by the addressee. Any unauthorised 
dissemination, distribution or copying of this message and any attachments is 
strictly prohibited. If you have received this e-mail in error, please notify 
the sender and delete the message. Any views or opinions presented in this 
e-mail may solely be the views of the author and cannot be relied upon as being 
those of Dublin City University. E-mail communications such as this cannot be 
guaranteed to be virus-free, timely, secure or error-free and Dublin City 
University does not accept liability for any such matters or their 
consequences. Please consider the environment before printing this 
e-mail."Séanadh Ríomhphoist"Tá an ríomhphost seo agus aon chomhad a sheoltar 
leis faoi rún agus is lena úsáid ag an seolaí agus sin amháin é. Tá cosc iomlán 
ar scaipeadh, dháileadh nó chóipeáil neamhúdaraithe ar an teachtaireacht seo 
agus ar aon cheangaltán atá ag dul leis. Má tá an ríomhphost seo faighte agat 
trí dhearmad cuir sin in iúl le do thoil don seoltóir agus scrios an 
teachtaireacht. D’fhéadfadh sé gurb iad tuairimí an údair agus sin amháin atá 
in aon tuairimí no dearcthaí atá curtha i láthair sa ríomhphost seo agus níor 
chóir glacadh leo mar thuairimí nó dhearcthaí Ollscoil Chathair Bhaile Átha 
Cliath. Ní ghlactar leis go bhfuil cumarsáid ríomhphoist den sórt seo saor ó 
víreas, in am, slán, nó saor ó earráid agus ní ghlacann Ollscoil Chathair 
Bhaile Átha Cliath le dliteanas in aon chás den sórt sin ná as aon iarmhairt a 
d’eascródh astu. Cuimhnigh ar an timpeallacht le do thoil sula gcuireann tú an 
ríomhphost seo i gcló."*

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] [CPF] Special Session on Language Technologies LT-2016 at ICIST-2016

2016-02-27 Thread Robertas Damaševičius
Call For Papers

The Special Session on LANGUAGE TECHNOLOGIES will be held at the 22nd
International Conference on Information and Software Technologies (ICIST)
scheduled on the 13-15 October, 2016 in Druskininkai, Lithuania.

SCOPE
Since human language is the most natural way of communication,
linguistically competent software would greatly facilitate humans’
interaction with computers and help in our needs. The field of Language
Technologies (LT) gained a lot of interest and made enormous progress
during the last decades. LT is an interdisciplinary field dealing with the
statistical or rule-based modeling and involving practitioners of
artificial intelligence, computer science, engineering, information
retrieval, linguistics, phonetics, or psychology. The special session on LT
provides a focus for this work, and encourages interdisciplinary approach
to speech and language research and technology bringing together experts
from both academia and industry. The paper submissions reporting original
results and system development experience as well as real-world
applications are kindly welcomed to this session.

TOPICS
Authors are invited to submit full papers describing original research work
associated with Language Technologies including, but not limited to:

   - Natural Language Processing (tagging systems, stemming, parsing and
   syntactical analysis, corpus-based language engineering)
   - Natural Language Understanding (text analysis, ontology, formal
   semantics)
   - Language-based Knowledge Engineering (text and data mining, knowledge
   acquisition, knowledge representation and reasoning)
   - Cognitive models and AI techniques (graph based models, semantic nets,
   neural networks, and cognitive maps)
   - Language Generation (dialogue-based systems, creative and writing
   systems, language synthesis, translation)
   - Multi-modalities Computational Linguistics (speech recognition,
   speech-text conversions, speech analysis and textual tagging)
   - Applications and Systems (search and information retrieval, web
   applications, forensics, cognitive systems, question-answer systems,
   translation systems, documents classifiers)


SUBMISSION OF PAPERS
Authors are invited to submit their papers in English through the
conference submission system. Submissions must be original and should not
have been published previously. All papers will be peer-reviewed the
ICIST-2016 Program Committee and judged with respect to their quality,
originality, and relevance. For further details, please consult the
conference web pages.

PUBLICATION
All accepted papers will appear in the ICIST Conference Proceedings Volume
(published by Springer as a part of CCIS series) and submitted for indexing
to Thomson Reuters ISI. Selected authors of best papers will be also
invited to submit extended versions of their papers to Information
Technology and Control journal (ISSN 1392-124X; Thomson Reuters ISI Impact
factor 0,623 (2014)).

IMPORTANT DATES
01 05 2016 – Submission of papers
05 06 2016 – Notification of papers acceptance
23 06 2016 – Final manuscripts due

SPECIAL SESSION CHAIR
Assoc. Prof. Jurgita Kapočiūtė-Dzikienė
Vytautas Magnus University, Lithuania
j.kapociute-dziki...@if.vdu.lt

CO-CHAIRS
Peter Dirix
University of Leuven, Belgium
peedi...@hotmail.com

SPONSORED AND SUPPORTED BY
Kaunas University of Technology (KTU)
Vytautas Magnus University (VMU)

CONFERENCE CONTACTS
E-mail: ic...@ktu.lt
Web site: http://icist.if.ktu.lt/

More information and registration on http://icist.if.ktu.lt/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] 2ndCFP - 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia

2016-02-27 Thread Atul Kr. Ojha
Apologies for cross posting. you are requested to kindly circulate it for
wider publicity...


*3rd Workshop on Indian Language Data: Resources and Evaluation (WILDRE-3)*

*Date: Tuesday, 24th May 2016*

*Venue: *Grand Hotel Bernardin Conference Center, Portorož, Slovenia
(Organized under LREC2016 (23-28 May 2016)

*Website:*

· *Main website* - http://sanskrit.jnu.ac.in/conf/wildre3

· *Submit papers on* - http://www.softconf.com/lrec2016/WILDRE3/

WILDRE – the 3rd workshop on Indian Language Data: Resources and Evaluation
is being organized in Portorož, Slovenia on 24th May, 2016 under the LREC
platform.  India has a huge linguistic diversity and has seen concerted
efforts from the Indian government and industry towards developing language
resources. European Language Resource Association (ELRA) and its associate
organizations have been very active and successful in addressing the
challenges and opportunities related to language resource creation and
evaluation. It is therefore a great opportunity for resource creators of
Indian languages to showcase their work on this platform and also to
interact and learn from those involved in similar initiatives all over the
world. The broader objectives of the WILDRE will be

· To map the status of Indian Language Resources

· To investigate challenges related to creating and sharing various
levels of language resources

· To promote a dialogue between language resource developers and
users
​ ​
To provide opportunity for researchers from India to collaborate with
researchers from other parts of the world

*​IMPORTANT ​*
*DATES   *

​
​
*March​ 01, 2016​ ​​Paper submissions due  ​**(Extended Deadline)*
*​*

*​​March 26, 2016 Paper notification of acceptance *

*April 6, 2016 Camera-ready papers due *

*May 24, 2016 Workshop*



*SUBMISSIONS*

Papers must describe original, completed or in progress, and unpublished
work. Each submission will be reviewed by two program committee members.

Accepted papers will be given up to 10 pages (for full papers) 5 pages (for
short papers and posters) in the workshop proceedings, and will be
presented oral presentation or poster.

Papers should be formatted according to the style-sheet, which will be
provided on the LREC 2016 website (lrec2016.lrec-conf.org/en/). Please
submit papers in PDF/doc format to the LREC website.

We are seeking submissions under the following category

· Full papers (10 pages)

· Short papers (work in progress – 5 pages)

· Posters (innovative ideas/proposals, research proposal of
students - 1 page)

· Demo (of working online/standalone systems - 1 page)

WILDRE-3 will have a special focus on Demos of Indian Language Technology.
In the past few years, as more resources have been developed and made
available, there has been an increased activity in developing usable
technology using these. WILDRE-3 would like to encourage and widen the Demo
track to allow the community to showcase their demos and have mutually
beneficial interactions with each other as well as resource developers.

WILDRE-3 will invite technical, policy and position paper submissions on
the following topics related to Indian Language Resources:

· Corpora -  text, speech, multimodal, methodologies, annotation
and tools

· Lexicons and Machine-readable dictionaries

· Ontologies

· Grammars

· Language resources for basic NLP, IR and Speech Technology tasks,
tools and Infrastructure for constructing and sharing language resources

· Standards or specifications for language resources  applications

· Licensing and copyright issues

Both submission and review processes handled electronically. The review
process will be blind.  The workshop website will provide the submission
guidelines and the link for the electronic submission.

*Special Note :*
​The ​
review process will be completely anonymous. Therefore, those who have
submitted their manuscript or those who are planning to submit manuscript,
are requested to submit their 'anonymous' *manuscript that is* *without
author name, affiliation,email etc*.

When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e. also
technologies, standards, evaluation kits, etc.) that have been used for the
work described in the paper or are a new result of your research. Moreover,
ELRA encourages all LREC authors to share the described LRs (data, tools,
services, etc.), to enable their reuse, replicability of experiments,
including evaluation ones, etc.

For further information on this initiative, please refer to
http://lrec2016.lrec-conf.org/en/


*Conference Chairs*

· Girish Nath Jha, Jawaharlal Nehru University, India

· Kalika Bali, Microsoft Research India Lab, Bangalore, India

· Sobha L, AU-KBC, Anna University, Chennai, India



*Program Committee (to be 

[Moses-support] Moses untranslated words

2016-02-27 Thread kamel Bouzidi
Hi ,


I have a problem with the output moses . i found some untranslated words in 
moses output , i don't understand why ,
 and this word it's not ( oov ) because  it exist in my language model.
 can you help me . 


and thank you.___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] bleu-annotation / analysis.perl

2016-02-27 Thread vnguyen

Ok obviously this is a modified 
bleu  algorithm, similar to what sentence-bleu does.However I believe this 
is still not right for unigram 
sentences.De : "Vincent Nguyen" 
Date : 26 févr. 2016 22:21:59A : 
moses-support@mit.eduSujet : Re: [Moses-support] bleu-annotation / 
analysis.perlAm I correct saying that when sentences length is less 
or equal to 4 tokens then the BLEU score should be 1 for exact matches and 
0 when not exact match ?(by definition of 
http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)Le 26/02/2016 10:02, Vincent Nguyen a écrit 
:___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support