[Moses-support] project

2012-09-04 Thread devabathini praveena
Hai,
 I am Praveena.I am using moses for developing a machine translation system
for Indian Languages.Can this moses will work for Indian Lnguages.I have
all the data required for developing a system but I don't know how to make
this work. Can you guys help me in developing the project.


Regards,
D.Praveena.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-04 Thread Tan, Jun
Hi Koehn,

So the factor separator must be "|"? 
I tagged all the data via some other tool,  and default separator is "_". 
I also have noticed the separator of target phrase in the phrase table is "|", 
even I changed the separator to "_" during the training process. I changed all 
the separator in the phrase-table from "|" to "_", and the decoding did work. 


-Original Message-
From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn
Sent: Wednesday, September 05, 2012 4:22 AM
To: Tan, Jun
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Dose Moses support binarised translation table for 
factored model?

Hi,

this should be working.

What seems odd to me that you are using "_" as factor separator, while it is 
standard to use "|". There is no option in processPhraseTable to change the 
separator.

-phi

On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun  wrote:
> Hi all,
>
>
>
> I built a factored model following the guideline on Moses web page. In 
> order to faster the decoding speed, I’m trying to use the binarised phrase 
> table.
>
> The binaring progress is finished, when trying to decode with the 
> binarised phrase table, the translation got failed.  The input and 
> output are the same.
>
> Dose Moses support binarised translation table for factored model? 
> Does anybody also meet this issue?
>
> Below are the outputs of the decoding process:
>
>
>
> 1.decoding with binarised phrase-table:
>
> [root@Redhat-252 binarised-model]# echo 'the_DT' | 
> /data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini
>
> Defined parameters (per moses.ini or switch):
>
> config: moses.ini
>
> distortion-limit: 6
>
> factor-delimiter: _
>
> input-factors: 0
>
> lmodel-file: 0 0 3
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
>
> mapping: 0 T 0
>
> ttable-file: 1 0 0,1 5
> /data/english-chinese_POS_tag/binarised-model/phrase-table
>
> ttable-limit: 20
>
> weight-d: 0.6
>
> weight-l: 0.2500 0.2500
>
> weight-t: 0.20 0.20 0.20 0.20 0.20
>
> weight-w: -1
>
> /data/moses/moses-smt-mosesdecoder/bin
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : 
> [0.001] seconds
>
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : 
> [7.148] seconds
>
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Finished loading LanguageModels : [7.214] seconds
>
> Start loading PhraseTable
> /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] 
> seconds
>
> filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table
>
> Finished loading phrase tables : [7.214] seconds
>
> IO from STDOUT/STDIN
>
> Created input-output object : [7.214] seconds
>
> Translating line 0  in thread id 140249033144064
>
> Translating: the
>
>
>
> reading bin ttable
>
> size of OFF_T 8
>
> binary phrasefile loaded, default OFF_T: -1
>
> Line 0: Collecting options took 0.000 seconds
>
> Line 0: Search took 0.000 seconds
>
> the
>
> BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] <<0.000, 
> -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 
> 0.000>> 0-0
>
> Line 0: Translation took 0.894 seconds total
>
>
>
> 2.Normal decoding
>
>
>
> [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | 
> /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini
>
> Defined parameters (per moses.ini or switch):
>
> config: train/model/moses.ini
>
> distortion-limit: 6
>
> factor-delimiter: _
>
> input-factors: 0
>
> lmodel-file: 0 0 3
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
>
> mapping: 0 T 0
>
> ttable-file: 0 0 0,1 5
> /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
>
> ttable-limit: 20
>
> weight-d: 0.6
>
> weight-l: 0.2500 0.2500
>
> weight-t: 0.20 0.20 0.20 0.20 0.20
>
> weight-w: -1
>
> /data/moses/moses-smt-mosesdecoder/bin
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : 
> [0.000] seconds
>
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : 
> [4.239] seconds
>
> /data/english-chinese_POS_tag/chinese-pos-lm/engl

Re: [Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-04 Thread Philipp Koehn
Hi,

this should be working.

What seems odd to me that you are using "_" as factor separator, while it
is standard to use "|". There is no option in processPhraseTable to change
the separator.

-phi

On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun  wrote:
> Hi all,
>
>
>
> I built a factored model following the guideline on Moses web page. In order
> to faster the decoding speed, I’m trying to use the binarised phrase table.
>
> The binaring progress is finished, when trying to decode with the binarised
> phrase table, the translation got failed.  The input and output are the
> same.
>
> Dose Moses support binarised translation table for factored model? Does
> anybody also meet this issue?
>
> Below are the outputs of the decoding process:
>
>
>
> 1.decoding with binarised phrase-table:
>
> [root@Redhat-252 binarised-model]# echo 'the_DT' |
> /data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini
>
> Defined parameters (per moses.ini or switch):
>
> config: moses.ini
>
> distortion-limit: 6
>
> factor-delimiter: _
>
> input-factors: 0
>
> lmodel-file: 0 0 3
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
>
> mapping: 0 T 0
>
> ttable-file: 1 0 0,1 5
> /data/english-chinese_POS_tag/binarised-model/phrase-table
>
> ttable-limit: 20
>
> weight-d: 0.6
>
> weight-l: 0.2500 0.2500
>
> weight-t: 0.20 0.20 0.20 0.20 0.20
>
> weight-w: -1
>
> /data/moses/moses-smt-mosesdecoder/bin
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001]
> seconds
>
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148]
> seconds
>
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Finished loading LanguageModels : [7.214] seconds
>
> Start loading PhraseTable
> /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds
>
> filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table
>
> Finished loading phrase tables : [7.214] seconds
>
> IO from STDOUT/STDIN
>
> Created input-output object : [7.214] seconds
>
> Translating line 0  in thread id 140249033144064
>
> Translating: the
>
>
>
> reading bin ttable
>
> size of OFF_T 8
>
> binary phrasefile loaded, default OFF_T: -1
>
> Line 0: Collecting options took 0.000 seconds
>
> Line 0: Search took 0.000 seconds
>
> the
>
> BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] <<0.000, -1.000,
> -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000>> 0-0
>
> Line 0: Translation took 0.894 seconds total
>
>
>
> 2.Normal decoding
>
>
>
> [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' |
> /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini
>
> Defined parameters (per moses.ini or switch):
>
> config: train/model/moses.ini
>
> distortion-limit: 6
>
> factor-delimiter: _
>
> input-factors: 0
>
> lmodel-file: 0 0 3
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
>
> mapping: 0 T 0
>
> ttable-file: 0 0 0,1 5
> /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
>
> ttable-limit: 20
>
> weight-d: 0.6
>
> weight-l: 0.2500 0.2500
>
> weight-t: 0.20 0.20 0.20 0.20 0.20
>
> weight-w: -1
>
> /data/moses/moses-smt-mosesdecoder/bin
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000]
> seconds
>
> /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Start loading LanguageModel
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239]
> seconds
>
> /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
> warning: non-zero probability for  in closed-vocabulary LM
>
> Finished loading LanguageModels : [4.254] seconds
>
> Start loading PhraseTable
> /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254]
> seconds
>
> filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
>
> Finished loading phrase tables : [4.254] seconds
>
> Start loading phrase table from
> /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254]
> seconds
>
> Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
>
> 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---

Re: [Moses-support] data training problem

2012-09-04 Thread Philipp Koehn
Hi,

the script train-truecaser.perl is used to train truecase model.
It's format is:
  train-truecaser.perl -model MODEL_FILE -corpus CORPUS_FILE

So, if you have a text corpus, this trains a model. You would typically
take the parallel corpus that you use to train the translation model
as training corpus for training the truecaser.

-phi

On Sun, Sep 2, 2012 at 10:28 PM, Arezki Sadoune  wrote:
> Hello !
> I am a student and a new Moses user. I am still data training in order to
> build my first baseline system. I have a problem with the truecasing script
> "train-truecaser.perl". The model file for training the truecaser is nowhere
> in the folder.
> The script needs à file 'truecase-model.en/fr/de" in addition to my data
> tokenised file. (manual p.73)
> Where can I find the truecase-models for English, French and German?
> Thanks a lot for your help
> Arezki
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] data training problem

2012-09-04 Thread Arezki Sadoune
Hello !
I am a student and a new Moses user. I am still data training in order to build 
my first baseline system. I have a problem with the truecasing script 
"train-truecaser.perl". The model file for training the truecaser is nowhere in 
the folder.
The script needs à file 'truecase-model.en/fr/de" in addition to my data 
tokenised file. (manual p.73)
Where can I find the truecase-models for English, French and German?
Thanks a lot for your help
Arezki___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Opening at Bing

2012-09-04 Thread Abhishek Arun
Bing is hiring a Program Manager for an SMT related project for our London 
office.

Please shoot me a note if you are interested.

Link: 
https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=77984&jlang=EN

Thanks,
Abhishek
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] 3rd CALL FOR PAPERS: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task) at COLING 2012

2012-09-04 Thread Tsuyoshi Okita
-Apologies for duplicat multiple postings-
***THIRD CALL FOR PAPERS***

Second Workshop on Applying Machine Learning Techniques to Optimise
the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task) at
COLING 2012

Mumbai (India), 9th December, 2012
URL: http://www.dfki.de/ml4hmt/

The workshop and associated shared task are an effort to trigger a
systematic investigation on improving state-of-the-art hybrid machine
translation, making use of advanced machine-learning (ML)
methodologies. It follows the ML4HMT-11 workshop which took place last
November in Barcelona. The first workshop also road-tested a shared
task (and associated data set) and laid the basis for a broader reach
in 2012.
Regular Papers ML4HMT-12

We are soliciting original papers on hybrid MT, including (but not
limited to):
* use of machine learning methods in hybrid MT;
* system combination: parallel in multi-engine MT (MEMT) or sequential
  in statistical post-editing (SPMT);
* combining phrases and translation units from different types of MT;
* syntactic pre-/re-ordering;
* using richer linguistic information in phrase-based or in hierarchical
  SMT;
* learning resources (e.g., transfer rules, transduction grammars) for
  probabilistic rule-based MT.

Full papers should be anonymous and follow the COLING full paper
format (http://www.coling2012-iitb.org/call_for_papers.php). To submit
contributions, please follow the instructions at the Workshop
management system submission website:
https://www.softconf.com/coling2012/ML4HMT12/. The contributions will
undergo a double-blind review by members of the programme committee.


Shared Task ML4HMT-12

The main focus of the Shared Task is to address the question:

-Can Hybrid MT and System Combination techniques benefit from extra
 information (linguistically motivated, decoding, runtime, confidence
 scores, or other meta-data) from the systems involved?

Participants are invited to build hybrid MT systems and/or system
combinations by using the output of several MT systems of different
types, as provided by the organisers.  While participants are
encouraged to use machine learning techniques to explore the
additional meta-data information sources, other general improvements
in hybrid and combination based MT are welcome to participate in the
challenge.  For systems that exploit additional meta-data information
the challenge is that additional meta-data is highly heterogeneous and
(individual) system specific.


Data: The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data
sets, in each case translating into EN.


* (ES-EN): Participants are given a bilingual tuning set aligned
  at a sentence level. Each "bilingual sentence" contains: 1) the
  source sentence, 2) the target (reference) sentence and 3) the
  corresponding multiple output translations from four systems, based
  on different MT approaches (Apertium, Ramirez-Sanchez, 2006; Lucy,
  Alonso and Thurmair, 2003; Moses, Koehn et. al., 2007). The output
  has been annotated with system-internal meta-data information
  derived from the translation process of each of the systems.

* (ZH-EN) A corresponding data set for ZH-EN with output translations
  from three systems (Moses, Koehn et. al., 2007;ICT_Chiero, Mi
  et. al., 2009; and Huajian RBMT) will be provided. (Participants
  are required to fill out a shared task evaluation agreement form
  and obtain the ZH-EN data from LDC).
Participants are challenged to build an MT mechanism where possible
making effective use of the system-specific MT meta-data output. They
can provide solutions based on opensource systems, or develop their
own mechanisms. The tuning set can be used for tuning the systems or
for training the systems. Final submissions have to include
translation output on a test set, which will be made available one
week after training data release. Data will be provided to build
language/reordering models, possibly re-using existing resources from
MT research.

Participants can also make use of additional (linguistic analysis,
confidence estimation etc.) tools, if their systems require so, but
they have to explicitly declare this upon submission, so that they are
judged as "unconstrained" systems. This will allow for a better
comparison between participating systems.

System output will be judged via peer-based human evaluation as well
as automatic evaluation. During the evaluation phase, participants
will be requested to rank system outputs of other participants through
a web-based interface (Appraise, Federmann 2010). Automatic metrics
include BLEU (Papineni et. Al, 2002), TER (Snover et al., 2006) and
METEOR (Lavie, 2005).

Shared task participants will be invited to submit system description
papers (7 pages, not blind and should follow COLING format,
http://www.coling2012-iitb.org/call_for_papers.php).

For submissions, please follow the instructions at the Workshop
management system submission
website:https://www.softconf.com/coling2012/ML4HMT12/


Impor

Re: [Moses-support] Moses in Eclipse

2012-09-04 Thread Hieu Hoang
hi miriam

it may be that the eclipse parser doesn't know about c++ macros. They 
think that macros are functions but can't resolve them

if you manage to get the eclipse project file to build, please commit 
your changes to github. It would be useful for me and other developers

On 03/09/2012 17:48, Miriam Kaeshammer wrote:
> Hello,
>
> does anybody of you use Eclipse for coding for Moses? How do you build
> using boost?
>
> As a start, I am using the project files provided in the git-repository
> (in mosesdecoder/contrib/other-builds). For each of the projects, in the
> project properties, I specified
>   bjam ${workspace_loc}/mosesdecoder
> as the Build command and switched off "Generate Makefiles
> automatically". Given the output on the Console tab, this seems to work.
> However, Eclipse itself does still complain about errors (reported in
> the Problems tab), such as unresolved inclusions and functions.
>
> Is there a different way to specify the bjam build process? Do you use
> the provided project files?
>
> Some more infos about my system:
> Ubuntu 12.04, Eclipse Indigo 3.7.2 with CDT 8.0.2, recent Moses checkout
> (c639cdbb38c3140454be62f4d88843f0bfa05aa8)
>
> I'd be thankful for any hints/comments.
>
> Best,
> Miriam
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Questions in http://kheafield.com/code/kenlm/developers/

2012-09-04 Thread Kenneth Heafield
Hi,

KenLM requires that you have an ARPA file already.  You can get one 
from SRILM or IRSTLM.  Please don't sent this question to moses-support 
a third time.

Kenneth

On 09/04/12 01:17, Fong Po Po wrote:
> Dear all:
> I have read
> http://kheafield.com/code/kenlm/developers/
> I see
> wget -O - http://kheafield.com/code/kenlm.tar.gz |tar xz
> cd kenlm
> ./compile.sh
> ./query file.arpa  If file.arpa does not exist, we cannot do this command:
> ./query file.arpa  How can we do this command if file.arpa does not exist?
> Thanks!
> Best Regards,
> Fong Pui Chi
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support