[Moses-support] z-mert

2015-12-18 Thread Sarah Schulz
Hi,

I am using z-mert for the first time since I had to implement my own
score for tuning.
But when I try to run it, I get the following error while parse the
param.txt file:

Exception in thread "main" java.util.InputMismatchException
at java.util.Scanner.throwFor(Scanner.java:864)
at java.util.Scanner.next(Scanner.java:1485)
at java.util.Scanner.nextDouble(Scanner.java:2413)
at MertCore.processParamFile(MertCore.java:1537)
at MertCore.initialize(MertCore.java:310)
at MertCore.(MertCore.java:239)
at ZMERT.main(ZMERT.java:44)

My param.txt looks like this:

lm_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
d_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
tm_0 ||| 0.3 Opt 0.25 0.75 0.25 0.75
tm_1 ||| 0.2 Opt 0.25 0.75 0.25 0.75
tm_2 ||| 0.2 Opt 0.25 0.75 0.25 0.75
tm_3 ||| 0.3 Opt 0.25 0.75 0.25 0.75
w_0 ||| 0.0 Opt -0.5 0.5 -0.5 0.5
normalization = none

I was wondering if a type cast to double is missing in the code but
before changing the z-mert code, I wanted to make sure I didn't get
anything else wrong.

Does anybody have experience with that?

Cheers,

Sarah
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] z-mert

2015-12-18 Thread Marcin Junczys-Dowmunt
Hi Sarah,
try running the command with

LC_ALL=C java -jar ...

I think the problem is that Java assumes a German locale and expects 
floating point number with a comma and not a dot. I spent some time 
myself to figure that out while using ZMERT.
Best,
Marcin

On 18.12.2015 11:09, Sarah Schulz wrote:
> Hi,
>
> I am using z-mert for the first time since I had to implement my own
> score for tuning.
> But when I try to run it, I get the following error while parse the
> param.txt file:
>
> Exception in thread "main" java.util.InputMismatchException
>   at java.util.Scanner.throwFor(Scanner.java:864)
>   at java.util.Scanner.next(Scanner.java:1485)
>   at java.util.Scanner.nextDouble(Scanner.java:2413)
>   at MertCore.processParamFile(MertCore.java:1537)
>   at MertCore.initialize(MertCore.java:310)
>   at MertCore.(MertCore.java:239)
>   at ZMERT.main(ZMERT.java:44)
>
> My param.txt looks like this:
>
> lm_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
> d_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
> tm_0 ||| 0.3 Opt 0.25 0.75 0.25 0.75
> tm_1 ||| 0.2 Opt 0.25 0.75 0.25 0.75
> tm_2 ||| 0.2 Opt 0.25 0.75 0.25 0.75
> tm_3 ||| 0.3 Opt 0.25 0.75 0.25 0.75
> w_0 ||| 0.0 Opt -0.5 0.5 -0.5 0.5
> normalization = none
>
> I was wondering if a type cast to double is missing in the code but
> before changing the z-mert code, I wanted to make sure I didn't get
> anything else wrong.
>
> Does anybody have experience with that?
>
> Cheers,
>
> Sarah
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] 1st CfP: 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016) at LREC 2016

2015-12-18 Thread Carla Parra
 

(apologies for cross-posting) 

2ND WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR TRANSLATION MEMORIES
(NLP4TM 2016) 

http://rgcl.wlv.ac.uk/nlp4tm2016/ [1] 

to be held at LREC 2016 (Portorož, Slovenia), May 28, 2016 

Submission deadline: February 10, 2016 

1. CALL FOR PAPERS 

Translation Memories (TM) are amongst the most used tools by
professional translators, if not the most used. The underlying idea of
TMs is that a translator should benefit as much as possible from
previous translations by being able to retrieve how a similar sentence
was translated before. Moreover, the usage of TMs aims at guaranteeing
that new translations follow the client's specified style and
terminology. Despite the fact that the core idea of these systems relies
on comparing segments (typically of sentence length) from the document
to be translated with segments from previous translations, most of the
existing TM systems hardly use any language processing for this. Instead
of addressing this issue, most of the work on translation memories
focused on improving the user experience by allowing processing of a
variety of document formats, intuitive user interfaces, etc. 

The term second generation translation memories has been around for more
than ten years and it promises translation memory software that
integrates linguistic processing in order to improve the translation
process. This linguistic processing can involve tasks such as matching
of subsentential chunks, edit distance operations between syntactic
trees, incorporation of semantic and discourse information in the
matching process. This workshop invites papers presenting second
generation translation memories and related initiatives. 

Terminologies, glossaries and ontologies are also very useful for
translation memories, by facilitating the task of the translator and
ensuring a consistent translation. The field of Natural Language
Processing (NLP) has proposed numerous methods for terminology
extraction and ontology extraction. Researchers are encouraged to submit
papers to the workshop which show how these methods are being
successfully applied to Translation Memories. In addition, papers
discussing the integration of Machine Translation and Translation
Memories or studies about automatic building of translation memories
from corpora are also welcomed. 

2. TOPICS OF INTEREST 

This workshop invites original papers which show how language processing
can help translation memories. Topics of interest include but are not
limited to: 

* 

Improving matching and retrieval of segments by using morphological,
syntactic, semantic and discourse information 
* 

Automatic extraction of terminologies and ontologies for translation
memories 
* 

Integration of named entity recognition and terminologies in matching
and retrieval 
* 

Using natural language processing for automatic construction of
translation memories 
* 

Extracting and aligning TM segments from a parallel or comparable corpus

* 

Construction of translation memories using the Internet 
* 

Corpus based studies about the usefulness of TM for specific domains 
* 

Development of hybrid TM and MT translation systems 
* 

Study of NLP techniques used by TM tools available in the market 
* 

Automatic methods for TM cleaning and maintenance 

  

3. SHARED TASK 

A shared task on cleaning translation memories will be organised. A
training set will be distributed to be used to develop and train the
participants' systems. The testing will be done on 500 segments
distributed during the testing phase.

* TASK: Automatically clean translation memories

* 

TRAINING SET: 1,500 TM segments annotated with information on whether
they are a valid translation of each other 
* 

TEST SET: 500 TM segments 
* 

LANGUAGE PAIRS: 

* 

English-Italian 
* 

English-German 
* 

English-Spanish 

* 

RELEASE OF THE TRAINING DATA: end of January 2016 

Participants are encouraged to submit working notes of their systems to
be presented during the workshop. More details, including the shared
task schedule will be announced soon in a dedicated Call for
Participation. 

4. SUBMISSION INFORMATION 

We invite contributions of either long papers (8 pages + 2 references)
which present unpublished original research or short paper/demos of
systems which present work in progress or working systems (4 pages + 2
references). The submissions do not need to be anonymised. 

All the papers will have to be submitted in PDF format via the START
system by following this link: https://www.softconf.com/lrec2016/NLP4TM/
[2] 

5. Identify, Describe and Share your LRs

As scientific work requires accurate citations of referenced work so as
to
allow the community to understand the whole context and also replicate
the
experiments conducted by other researchers, LREC 2016 endorses the need
to
uniquely Identify LRs 

[Moses-support] Special Issue of the Machine Translation journal: Natural Language Processing for Translation Memories

2015-12-18 Thread Rohit Gupta
Apologies for any duplicates.

--


Special Issue of the Machine Translation journal: Natural Language
Processing for Translation Memories



http://www.springer.com/computer/artificial/journal/10590



Guest editors:

Constantin Orasan (University of Wolverhampton, UK)

Marcello Federico (FBK, Italy)



Submission deadline: May 15, 2016



1. Call For Papers



Translation Memories (TM) are amongst the most widely used tools by
professional translators. The underlying idea of TMs is that a translator
should benefit as much as possible from previous translations by being able
to retrieve the way in which a similar sentence was translated before.
Moreover, the usage of TMs aims to guarantee that new translations follow
the client’s specified style and terminology. Despite the fact that the
core idea of these systems relies on comparing segments (typically of
sentence length) from the document to be translated with segments from
previous translations, most of the existing TM systems hardly use any
language processing for this. Instead of addressing this issue, most of the
work on translation memories focused on improving the user experience by
allowing processing of a variety of document formats, intuitive user
interfaces, etc.



The term second generation translation memories has been around for more
than ten years and it promises translation memory software that integrates
linguistic processing in order to improve the translation process. This
linguistic processing can involve tasks such as the matching of
subsentential chunks, editing distance operations between syntactic trees,
and the incorporation of semantic and discourse information in the matching
process.



Terminologies, glossaries and ontologies are also very useful for
translation memories, by facilitating the task of the translator and
ensuring a consistent translation. The field of Natural Language Processing
(NLP) has proposed numerous methods for terminology extraction and ontology
extraction.



Other ways of enhancing Translation Memories with information from NLP
components are to integrate Machine Translation and Translation Memories,
and automatically build and clean translation memories from corpora and
from the web.



This special issue builds on the success of the NLP4TM workshop organised
in conjunction with RANLP 2015 and the forthcoming second edition of this
workshop at LREC 2016, which will include a shared task on the cleaning of
translation memories. Authors of papers accepted at these workshops are
encouraged to submit extended versions for the special issue. However,
having a paper accepted at the workshop does not constitute a precondition
for submitting a paper for the special issue.



2. Topics of interest



This special issue invites original papers which show how language
processing can help translation memories. Topics of interest include but
are not limited to:

- improving matching and retrieval of segments by using morphological,
syntactic, semantic and discourse information

- automatic extraction of terminologies and ontologies for translation
memories

- integration of named entity recognition and terminologies in matching and
retrieval

- using natural language processing for automatic construction of
translation memories

- extracting and aligning TM segments from a parallel or comparable corpus

- construction of translation memories using the Internet

- corpus based studies about the usefulness of TM for specific domains

- development of hybrid TM and MT translation systems

- study of NLP techniques used by TM tools available in the market

- automatic methods for TM cleaning and maintenance



Note: extended versions of paper previously published at conferences and
workshops are likely to be eligible. Please consult us if you have any
doubts.



4. Submission guidelines



Authors should follow the "Instructions for Authors" available on the MT
Journal website:http://www.springer.com/computer/artificial/journal/10590

Submissions must be limited to 15 pages (including references)



Papers should be submitted online directly on the MT journal's submission
website:

http://www.editorialmanager.com/coat/default.asp, indicating this special
issue in ‘article type’.



5. Important dates



Submission deadline: 15th May 2016

First round of reviews: 15th July 2016

Resubmission of improved versions: 22nd August 2016

Final decisions to authors: 19th Sep 2016

Camera ready papers: 8th Oct 2016

Publication in Issue 3 of the Machine Translation journal 2016



--

Thanks & Regards,
*Rohit Gupta*

*Marie Curie Early Stage Researcher, EXPERT Project*Research Group in
Computational Linguistics
Research Institute of Information and Language Processing
University of Wolverhampton
http://pers-www.wlv.ac.uk/~in4089/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Slides or paper walking through SearchNormal::ProcessOneHypothesis ?

2015-12-18 Thread Lane Schwartz
Thanks, Wilker. That does look promising.

I love this little footnote from the paper: "We do not know if WLd is
documented anywhere, but from inspection it is used in Moses (Koehn et al.,
2007). This was confirmed by Philipp Koehn and Hieu Hoang (p.c.)."

On Fri, Dec 18, 2015 at 10:12 AM, Wilker Aziz  wrote:

> Hi,
>
> I hope it is not too late to add to this discussion.
>
> If you are comfortable with weighted deduction, Adam Lopez's 2009 EACL
> paper is very a good reference for phrase-based reordering spaces. If I
> remember well the implementation in Moses does exactly what he shows with
> the logic program WLd.
>
> http://alopez.github.io/papers/eacl2009-lopez.pdf
>
> Cheers,
>
> Wilker
>
> On 16 December 2015 at 00:56, Matthias Huck  wrote:
>
>> Hi Lane,
>>
>> Well, you can find excellent descriptions of phrase-based decoding
>> algorithms in the literature, though possibly not all details of this
>> specific implementation.
>>
>> I like this description:
>>
>> R. Zens, and H. Ney. Improvements in Dynamic Programming Beam Search for
>> Phrase-based Statistical Machine Translation. In International Workshop
>> on Spoken Language Translation (IWSLT), pages 195-205, Honolulu, HI,
>> USA, October 2008.
>>
>> http://www.hltpr.rwth-aachen.de/publications/download/618/Zens-IWSLT-2008.pdf
>>
>> It's what's implemented in Jane, RWTH's open source statistical machine
>> translation toolkit.
>>
>> J. Wuebker, M. Huck, S. Peitz, M. Nuhn, M. Freitag, J. Peter, S.
>> Mansour, and H. Ney. Jane 2: Open Source Phrase-based and Hierarchical
>> Statistical Machine Translation. In International Conference on
>> Computational Linguistics (COLING), pages 483-491, Mumbai, India,
>> December 2012.
>>
>> http://www.hltpr.rwth-aachen.de/publications/download/830/Wuebker-COLING-2012.pdf
>>
>> However, I believe that the distinction of coverage hypotheses and
>> lexical hypotheses is a unique property of the RWTH systems.
>>
>> The formalization in the Zens & Ney paper is very nicely done. With hard
>> distortion limits or coverage-based reordering constraints, you may need
>> a few more steps in the algorithm. E.g., if you have a hard distortion
>> limit, you will probably want to avoid leaving a gap and then extending
>> your sequence in a way that puts your current position further away from
>> the gap than your maximum jump width. Other people should know more
>> about how exactly Moses' phrase-based decoder is dealing with this.
>>
>> I can recommend Richard Zens' PhD thesis as well.
>> http://www.hltpr.rwth-aachen.de/publications/download/562/Zens--2008.pdf
>>
>> I also remember that the following publication from Microsoft Research
>> is pretty helpful:
>>
>> Robert C. Moore and Chris Quirk, Faster Beam-Search Decoding for Phrasal
>> Statistical Machine Translation, in Proceedings of MT Summit XI,
>> European Association for Machine Translation, September 2007.
>> http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf
>>
>> Cheers,
>> Matthias
>>
>>
>>
>> On Tue, 2015-12-15 at 22:33 +, Hieu Hoang wrote:
>> > I've been looking at this and it is surprisingly complicated. I think
>> > the code is designed to predetermine if extending a hypothesis will
>> > lead it down a path that won't ever be completed.
>> >
>> >
>> > Don't know any slide that explains the reasoning, Philipp Koehn
>> > explained it to me once and it seems pretty reasonable.
>> >
>> >
>> >
>> > I wouldn't mind seeing this code cleaned up a bit and abstracted and
>> > formalised. I've made a start with the cleanup in my new decoder
>> >
>> >
>> https://github.com/moses-smt/mosesdecoder/blob/perf_moses2/contrib/other-builds/moses2/Search/Search.cpp#L36
>> >Search::CanExtend()
>> >
>> >
>> > There was an Aachen paper from years ago comparing different
>> > distortion limit heuristics - can't remember the authors or title.
>> > Maybe someone know more
>> >
>> >
>> >
>> >
>> >
>> > Hieu Hoang
>> > http://www.hoang.co.uk/hieu
>> >
>> >
>> > On 15 December 2015 at 20:59, Lane Schwartz 
>> > wrote:
>> > Hey all,
>> >
>> >
>> > So the SearchNormal::ProcessOneHypothesis() method in
>> > SearchNormal.cpp is responsible for taking an existing
>> > hypothesis, creating all legal new extension hypotheses, and
>> > adding those new hypotheses to the appropriate decoder
>> > stacks.
>> >
>> >
>> > First off, the method is actually reasonably well commented,
>> > so kudos to whoever did that. :)
>> >
>> >
>> > That said, does anyone happen to have any slides that actually
>> > walk through this process, specifically slides that take into
>> > account the interaction with the distortion limit? That
>> > interaction is where most of the complexity of this method
>> > comes from. I don't know about others, but even having a
>> > pretty good notion of what's going on here, the 

Re: [Moses-support] Compiling Moses with -fPIC

2015-12-18 Thread Miriam Käshammer
FYI

 

I added the following to Jamroot and then called bjam with --with-pic


if [ option.get "with-pic" : : "yes" ] {

requirements += -fPIC ;

}

 

Hints to compile the dependencies with PIC as well:

cmph: ./configure --with-pic

boost: ./b2 cxxflags=-fPIC

 

Gesendet: Dienstag, 15. Dezember 2015 um 18:26 Uhr
Von: "Hieu Hoang" 
An: "Miriam Käshammer" , Moses-support@mit.edu
Betreff: Re: [Moses-support] Compiling Moses with -fPIC


you may have to compile boost, cmph and irstlm with the -fPIC too.

 
On 14/12/15 16:06, "Miriam Käshammer" wrote:



Dear Moses community,

 

My goal is to link Moses (the decoder) as a static library into some other shared library. As far as I understand the compiler/linker output of this other library, I need to compile the Moses library with parameter -fPIC (position independent code). Could you help me in achieving this?

 

I already tried to add "cxxflags=-fPIC" to the bjam command like this:

./bjam -j8 -d2 -a --with-boost="${PREFIX}" --with-xmlrpc-c="${PREFIX}" --with-cmph="${PREFIX}" --with-irstlm="${PREFIX}" --install-scripts="${PREFIX}"/scripts link=static cxxflags=-fPIC

However, the build process just seems to get stuck before it actually starts, see attached log.

 

Any help/comment is appreciated.

Thanks!

Miriam

 

 

 

 

 
 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


 

-- 
Hieu Hoang
http://www.hoang.co.uk/hieu




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese & Arabic Tokenizers

2015-12-18 Thread Dingyuan Wang
Hi Tom,

As far as I know, the following are widely-used and open-source Chinese
tokenizers:

* https://github.com/fxsjy/jieba
* http://sourceforge.net/projects/zpar/
* https://github.com/NLPchina/ansj_seg

And this proprietary one:

* http://ictclas.nlpir.org/

(Disclaimer: I am one of the developers of jieba, and I personally use
this.)

--
Dingyuan Wang
2015年12月19日 00:51於 "Tom Hoar" 寫道:

> I'm looking for Chinese and Arabic tokenizers. We've been using
> Stanford's for a while but it has downfalls. The Chinese mode loads its
> statistical models very slowly. The Arabic mode stems the resulting
> tokens. The coup de grace is that their latest jar update (9 days ago)
> was compiled run only with Java 1.8.
>
> So, with the exception of Stanford, what choices are available for
> Chinese and Arabic that you're finding worthwhile?
>
> Thanks!
> Tom
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese & Arabic Tokenizers

2015-12-18 Thread Matthias Huck
Hi Tom,

There used to be a freely available Chinese word segmenter provided by
the LDC as well. Unfortunately, things keep disappearing from the web.
https://web.archive.org/web/20130907032401/http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm

For Arabic, I think that many academic research groups used to work with
MADA. But it seems like you'll need a special license for commercial
use.
http://www1.cs.columbia.edu/~rambow/software-downloads/MADA_Distribution.html
https://secure.nouvant.com/columbia/technology/cu14012/license/492

Or you try MorphTagger/Segmenter, a segmentation tool for Arabic SMT. 
http://www.hltpr.rwth-aachen.de/~mansour/MorphSegmenter/
It may not be maintained any more. You can contact Saab Mansour to ask
about it.

Saab has published a couple of papers about this, some of which report
comparisons of different Arabic segmentation strategies for SMT.
http://www.hltpr.rwth-aachen.de/publications/download/687/Mansour-IWSLT-2010.pdf
http://www.hltpr.rwth-aachen.de/publications/download/808/Mansour-LREC-2012.pdf
http://link.springer.com/article/10.1007%2Fs10590-011-9102-0

Cheers,
Matthias


On Sat, 2015-12-19 at 01:19 +0800, Dingyuan Wang wrote:
> Hi Tom,
> 
> As far as I know, the following are widely-used and open-source Chinese
> tokenizers:
> 
> * https://github.com/fxsjy/jieba
> * http://sourceforge.net/projects/zpar/
> * https://github.com/NLPchina/ansj_seg
> 
> And this proprietary one:
> 
> * http://ictclas.nlpir.org/
> 
> (Disclaimer: I am one of the developers of jieba, and I personally use
> this.)
> 
> --
> Dingyuan Wang
> 2015年12月19日 00:51於 "Tom Hoar" 寫道:
> 
> > I'm looking for Chinese and Arabic tokenizers. We've been using
> > Stanford's for a while but it has downfalls. The Chinese mode loads its
> > statistical models very slowly. The Arabic mode stems the resulting
> > tokens. The coup de grace is that their latest jar update (9 days ago)
> > was compiled run only with Java 1.8.
> >
> > So, with the exception of Stanford, what choices are available for
> > Chinese and Arabic that you're finding worthwhile?
> >
> > Thanks!
> > Tom
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Chinese & Arabic Tokenizers

2015-12-18 Thread Tom Hoar
I'm looking for Chinese and Arabic tokenizers. We've been using 
Stanford's for a while but it has downfalls. The Chinese mode loads its 
statistical models very slowly. The Arabic mode stems the resulting 
tokens. The coup de grace is that their latest jar update (9 days ago) 
was compiled run only with Java 1.8.

So, with the exception of Stanford, what choices are available for 
Chinese and Arabic that you're finding worthwhile?

Thanks!
Tom
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Slides or paper walking through SearchNormal::ProcessOneHypothesis ?

2015-12-18 Thread Wilker Aziz
Hi,

I hope it is not too late to add to this discussion.

If you are comfortable with weighted deduction, Adam Lopez's 2009 EACL
paper is very a good reference for phrase-based reordering spaces. If I
remember well the implementation in Moses does exactly what he shows with
the logic program WLd.

http://alopez.github.io/papers/eacl2009-lopez.pdf

Cheers,

Wilker

On 16 December 2015 at 00:56, Matthias Huck  wrote:

> Hi Lane,
>
> Well, you can find excellent descriptions of phrase-based decoding
> algorithms in the literature, though possibly not all details of this
> specific implementation.
>
> I like this description:
>
> R. Zens, and H. Ney. Improvements in Dynamic Programming Beam Search for
> Phrase-based Statistical Machine Translation. In International Workshop
> on Spoken Language Translation (IWSLT), pages 195-205, Honolulu, HI,
> USA, October 2008.
>
> http://www.hltpr.rwth-aachen.de/publications/download/618/Zens-IWSLT-2008.pdf
>
> It's what's implemented in Jane, RWTH's open source statistical machine
> translation toolkit.
>
> J. Wuebker, M. Huck, S. Peitz, M. Nuhn, M. Freitag, J. Peter, S.
> Mansour, and H. Ney. Jane 2: Open Source Phrase-based and Hierarchical
> Statistical Machine Translation. In International Conference on
> Computational Linguistics (COLING), pages 483-491, Mumbai, India,
> December 2012.
>
> http://www.hltpr.rwth-aachen.de/publications/download/830/Wuebker-COLING-2012.pdf
>
> However, I believe that the distinction of coverage hypotheses and
> lexical hypotheses is a unique property of the RWTH systems.
>
> The formalization in the Zens & Ney paper is very nicely done. With hard
> distortion limits or coverage-based reordering constraints, you may need
> a few more steps in the algorithm. E.g., if you have a hard distortion
> limit, you will probably want to avoid leaving a gap and then extending
> your sequence in a way that puts your current position further away from
> the gap than your maximum jump width. Other people should know more
> about how exactly Moses' phrase-based decoder is dealing with this.
>
> I can recommend Richard Zens' PhD thesis as well.
> http://www.hltpr.rwth-aachen.de/publications/download/562/Zens--2008.pdf
>
> I also remember that the following publication from Microsoft Research
> is pretty helpful:
>
> Robert C. Moore and Chris Quirk, Faster Beam-Search Decoding for Phrasal
> Statistical Machine Translation, in Proceedings of MT Summit XI,
> European Association for Machine Translation, September 2007.
> http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf
>
> Cheers,
> Matthias
>
>
>
> On Tue, 2015-12-15 at 22:33 +, Hieu Hoang wrote:
> > I've been looking at this and it is surprisingly complicated. I think
> > the code is designed to predetermine if extending a hypothesis will
> > lead it down a path that won't ever be completed.
> >
> >
> > Don't know any slide that explains the reasoning, Philipp Koehn
> > explained it to me once and it seems pretty reasonable.
> >
> >
> >
> > I wouldn't mind seeing this code cleaned up a bit and abstracted and
> > formalised. I've made a start with the cleanup in my new decoder
> >
> >
> https://github.com/moses-smt/mosesdecoder/blob/perf_moses2/contrib/other-builds/moses2/Search/Search.cpp#L36
> >Search::CanExtend()
> >
> >
> > There was an Aachen paper from years ago comparing different
> > distortion limit heuristics - can't remember the authors or title.
> > Maybe someone know more
> >
> >
> >
> >
> >
> > Hieu Hoang
> > http://www.hoang.co.uk/hieu
> >
> >
> > On 15 December 2015 at 20:59, Lane Schwartz 
> > wrote:
> > Hey all,
> >
> >
> > So the SearchNormal::ProcessOneHypothesis() method in
> > SearchNormal.cpp is responsible for taking an existing
> > hypothesis, creating all legal new extension hypotheses, and
> > adding those new hypotheses to the appropriate decoder
> > stacks.
> >
> >
> > First off, the method is actually reasonably well commented,
> > so kudos to whoever did that. :)
> >
> >
> > That said, does anyone happen to have any slides that actually
> > walk through this process, specifically slides that take into
> > account the interaction with the distortion limit? That
> > interaction is where most of the complexity of this method
> > comes from. I don't know about others, but even having a
> > pretty good notion of what's going on here, the discussion of
> > "the closest thing to the left" is still a bit opaque.
> >
> >
> > Anyway, if anyone knows of a good set of slides, or even a
> > good description in a paper, of what's going on here, I'd
> > appreciate any pointers.
> >
> >
> > Thanks,
> > Lane
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > 

Re: [Moses-support] Doubts on Multiple Decoding Paths

2015-12-18 Thread Philipp Koehn
Hi,

that sounds right.

The "union" option is fairly new, developed by Michael Denkowski.
I am not aware of any empirical study of the different methods,
so I'd be curious to see what you find.

-phi

On Fri, Dec 18, 2015 at 1:35 AM, Anoop (അനൂപ്)  wrote:

> Hi,
>
> I am trying to understand the multiple decoding paths feature in Moses.
>
> The documentation (http://www.statmt.org/moses/?n=Advanced.Models#ntoc7)
> describes 3 methods: both, either and union
>
> The following is my understanding of the options. Please let me know if it
> is correct:
>
>
>- With *both* option, the constituent phrases of the target hypothesis
>come from both tables (since they are shared) and are scored with both the
>tables.
>- With *either*  option, all the constituent phrases of a target
>hypothesis come from a single table, but different hypothesis can use
>different tables. Each hypothesis is scored using one table only. I did not
>understand the " additional options are collected from the other tables"
>bit in the documentation.
>- With *union* option, the constituent phrases of a target hypothesis
>come from different tables and are scored using scores from all the tables.
>Use 0 if the option doesn't appear in some table, unless the
>*default-average-others=true* option is used.
>
>
> Regards,
> Anoop.
>
> --
> I claim to be a simple individual liable to err like any other fellow
> mortal. I own, however, that I have humility enough to confess my errors
> and to retrace my steps.
>
> http://flightsofthought.blogspot.com
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] 1st CFP: LREC 9th Workshop and Shared Task on Building and Using Comparable Corpora

2015-12-18 Thread Reinhard Rapp


Call for Papers

9th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA

Special Topic: Continuous Vector Space Models and Comparable Corpora

Shared Task: Identifying Parallel Sentences in Comparable Corpora

https://comparable.limsi.fr/bucc2016/

Monday, May 23, 2016

Co-located with LREC 2016, Portorož, Slovenia
  
DEADLINE FOR PAPERS: February 10, 2016



MOTIVATION

In the language engineering and the linguistics communities, research
on comparable corpora has been motivated by two main reasons. In
language engineering, on the one hand, it is chiefly motivated by the
need to use comparable corpora as training data for statistical
Natural Language Processing applications such as statistical machine
translation or cross-lingual retrieval. In linguistics, on the other
hand, comparable corpora are of interest in themselves by making
possible inter-linguistic discoveries and comparisons. It is generally
accepted in both communities that comparable corpora are documents in
one or several languages that are comparable in content and form in
various degrees and dimensions. We believe that the linguistic
definitions and observations related to comparable corpora can improve
methods to mine such corpora for applications of statistical NLP. As
such, it is of great interest to bring together builders and users of
such corpora.


SHARED TASK

There will be a shared task on "Identifying Parallel Sentences in
Comparable Corpora" whose details will be described on the
workshop website (URL see above).


TOPICS

Beyond this year's special topic "Continuous Vector Space Models and 
Comparable Corpora" and the shared task on "Identifying Parallel 
Sentences in Comparable Corpora", we solicit contributions including
but not limited to the following topics:

Building comparable corpora:

  * Human translations
  * Automatic and semi-automatic methods
  * Methods to mine parallel and non-parallel corpora from the Web
  * Tools and criteria to evaluate the comparability of corpora
  * Parallel vs non-parallel corpora, monolingual corpora
  * Rare and minority languages, across language families
  * Multi-media/multi-modal comparable corpora

Applications of comparable corpora:

  * Human translations
  * Language learning
  * Cross-language information retrieval & document categorization
  * Bilingual projections
  * Machine translation
  * Writing assistance

Mining from comparable corpora:

  * Cross-language distributional semantics
  * Extraction of parallel segments or paraphrases from comparable corpora
  * Extraction of translations of single words and multi-word expressions, 
proper names, named entities, etc.


IMPORTANT DATES

  February 10, 2016Deadline for submission of full papers
 March 10, 2016Notification of acceptance
 March 25, 2016Camera-ready papers due
   May 23, 2016Workshop date


SUBMISSION INFORMATION

Papers should follow the LREC main conference formatting details (to be
announced on the conference website http://lrec2016.lrec-conf.org/en/ )
and should be submitted as a PDF-file via the START workshop manager at

  https://www.softconf.com/lrec2016/BUCC2016/

Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding six (6)
pages. Characteristics of short papers include: a small, focused
contribution; work in progress; a negative result; an opinion piece;
an interesting application nugget. Long paper submissions must
describe substantial, original, completed and unpublished work without
exceeding ten (10) pages.

Reviewing will be double blind, so the papers should not reveal the
authors' identity. Accepted papers will be published in the workshop
proceedings.

Double submission policy: Parallel submission to other meetings or
publications is possible but must be immediately notified to the
workshop organizers.

Please also observe the following two paragraphs which are applicable
to all LREC workshops as well as to the main conference:

Describing your LRs in the LRE Map is now a normal practice in the 
submission procedure of LREC (introduced in 2010 and adopted by other 
conferences). To continue the efforts initiated at LREC 2014 about 
“Sharing LRs” (data, tools, web-services, etc.), authors will have 
the possibility,  when submitting a paper, to upload LRs in a special 
LREC repository.  This effort of sharing LRs, linked to the LRE Map 
for their description, may become a new “regular” feature for conferences 
in our field, thus contributing to creating a common repository where 
everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so 
as to allow the community to understand the whole context and also 
replicate the experiments conducted by other researchers, LREC 2016 
endorses the need to uniquely Identify LRs through 

[Moses-support] PhraseDictionaryCompact is not registered

2015-12-18 Thread Andrew
I'm following the baseline system page step-by-step as it says.I've binarized 
the phrase table and reordering table using processPhraseTableMin and 
processLexicalTableMin,edited the moses.ini as written, but upon executing, it 
gives an exception with "PhraseDictionaryCompact is not registered" message.
I've done some googling, and tried running processLexicalTable (without "Min") 
to no good,and also tried editing as PhraseDictionaryBinary, 
PhraseDictionaryOnDisk, which succeeded in running the task, but gets aborted 
upon writing the input sentence.
Is there be any other workaround / fix to this?
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] PhraseDictionaryCompact is not registered

2015-12-18 Thread Marcin Junczys-Dowmunt
Hi,
I'd say you didn't install cmph or compile against it, look again at:

http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc3

On 18.12.2015 15:15, Andrew wrote:
> I'm following the baseline system page step-by-step as it says.
> I've binarized the phrase table and reordering table using 
> processPhraseTableMin and processLexicalTableMin,
> edited the moses.ini as written,
> but upon executing, it gives an exception with 
> "PhraseDictionaryCompact is not registered" message.
>
> I've done some googling, and tried running processLexicalTable 
> (without "Min") to no good,
> and also tried editing as PhraseDictionaryBinary, PhraseDictionaryOnDisk,
> which succeeded in running the task, but gets aborted upon writing the 
> input sentence.
>
> Is there be any other workaround / fix to this?
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] PhraseDictionaryCompact is not registered

2015-12-18 Thread Ulrich Germann
make -f contrib/Makefiles/install-dependencies.gmake cmph
./bjam --with-cmph=$(pwd)/opt 



On Fri, Dec 18, 2015 at 2:15 PM, Andrew  wrote:

> I'm following the baseline system page step-by-step as it says.
> I've binarized the phrase table and reordering table using
> processPhraseTableMin and processLexicalTableMin,
> edited the moses.ini as written,
> but upon executing, it gives an exception with "PhraseDictionaryCompact
> is not registered" message.
>
> I've done some googling, and tried running processLexicalTable (without
> "Min") to no good,
> and also tried editing as PhraseDictionaryBinary, PhraseDictionaryOnDisk,
> which succeeded in running the task, but gets aborted upon writing the
> input sentence.
>
> Is there be any other workaround / fix to this?
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Doubts on Multiple Decoding Paths

2015-12-18 Thread Michael Denkowski
Hi Anoop,

Confirming that your reading of "union" is in fact how it works.  If you
want each phrase to be scored by all tables without having to worry about
making sure every phrase is in every table, I you can use
PhraseDictionaryGroup with default-average-others=true.  This multiplies
the size of the phrase feature set by the number of models, so I recommend
running mer-moses.pl with --batch-mira.

Best,
Michael

On Fri, Dec 18, 2015 at 1:08 PM, Philipp Koehn  wrote:

> Hi,
>
> that sounds right.
>
> The "union" option is fairly new, developed by Michael Denkowski.
> I am not aware of any empirical study of the different methods,
> so I'd be curious to see what you find.
>
> -phi
>
> On Fri, Dec 18, 2015 at 1:35 AM, Anoop (അനൂപ്) <
> anoop.kunchukut...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to understand the multiple decoding paths feature in Moses.
>>
>> The documentation (http://www.statmt.org/moses/?n=Advanced.Models#ntoc7)
>> describes 3 methods: both, either and union
>>
>> The following is my understanding of the options. Please let me know if
>> it is correct:
>>
>>
>>- With *both* option, the constituent phrases of the target
>>hypothesis come from both tables (since they are shared) and are scored
>>with both the tables.
>>- With *either*  option, all the constituent phrases of a target
>>hypothesis come from a single table, but different hypothesis can use
>>different tables. Each hypothesis is scored using one table only. I did 
>> not
>>understand the " additional options are collected from the other tables"
>>bit in the documentation.
>>- With *union* option, the constituent phrases of a target hypothesis
>>come from different tables and are scored using scores from all the 
>> tables.
>>Use 0 if the option doesn't appear in some table, unless the
>>*default-average-others=true* option is used.
>>
>>
>> Regards,
>> Anoop.
>>
>> --
>> I claim to be a simple individual liable to err like any other fellow
>> mortal. I own, however, that I have humility enough to confess my errors
>> and to retrace my steps.
>>
>> http://flightsofthought.blogspot.com
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support