Re: [Moses-support] error in installing srilm

2013-12-08 Thread rohit dholakia
Hi Arththika,

  Have you looked at the FAQ ?

http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html

I had set up SRILM long ago, and we have tcl support anyway. But, if you
don't have tcl in your path or on your system, you might want to look at
A1) part d.

hope that helps.





On Sun, Dec 8, 2013 at 8:58 PM, Arththika Paramanathan <
arthiparamanat...@gmail.com> wrote:

> Hi,
> I faced some problems in installing srilm. Can anyone help me?. I attached
> the error file.
> Thank you
>
> --
> regards,
> P.Arththika
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] error in installing srilm

2013-12-08 Thread Arththika Paramanathan
Hi,
I faced some problems in installing srilm. Can anyone help me?. I attached
the error file.
Thank you

-- 
regards,
P.Arththika


srilm-error
Description: Binary data
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] WARNING: directory exists but does not match parameters:

2013-12-08 Thread Philipp Koehn
Hi,

the filtering script attempts to re-use the model in the filtered directory.
However, the filtered directory was called previously erroneously, so
you get the error message.

What should you do?
Delete the filtered directory and run the filtering script afterwards.

-phi

On Sun, Dec 8, 2013 at 5:31 PM, renubalyan  wrote:
> Hi,
>
> I am using the http://www.statmt.org/moses/?n=Moses.Baseline to build the
> baseline system.
>
> I have trained, tuned and tested the sentence successfully.
>
> However, I am stuck up at the evaluation step (page 35)- while I am
> filtering using the following command:
>
> renu@sandeep-RS:~/Desktop/working$
> /home/renu/Desktop/mosesdecoder/scripts/training/filter-model-given-input.pl
> filtered-newstest2011 mert-work/moses.ini
> /home/renu/Desktop/corpus/newstest2011.true.fr -Binarizer
> /home/renu/Desktop/mosesdecoder/bin/processPhraseTable
>
> I get the following error:
>
> WARNING: directory exists but does not match parameters:
>   (mert-work/moses.ini ne mert-work/moses.ini ||
> /home/renu/Desktop/working/filtered-newstest2011/input.8828 ne
> /home/renu/Desktop/corpus/newstest2011.true.fr)
>
>
> I do not understand the reason for the above error.
>
> Kindly help.
>
> Thanks
> Renu
>
> ---
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ---
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] using Moses in Monolingual dialogue setting

2013-12-08 Thread Andrew

Hi,

I'm using Moses in monolingual dialogue setting as in 
http://aritter.github.io/mt_chat.pdf,where source and target are both in 
English and target is a response to source.I'd like to propose a little thought 
experiment in this setting, and hear what you think would happen.

Suppose we have a conversation with six utterances, A1,B1,A2,B2,A3,B3 where A 
and B indicate speakers,and the number indicates n-th statement by the speaker. 
They are all in one conversation of continuous topic.
Now suppose we train it using Moses in two different ways as following:1) 
Source file contains A1, A2, A3 and target contains B1, B2, B3 so that A1-B1 is 
a pair and so on.2) Source contains A1,B1,A2,B2,A3 and target contains 
B1,A2,B2,A3,B3, taking advantage of the fact that response is a stimulus to the 
next response.
Then, How will the results be different and why?Since GIZA++ gets alignment in 
both directions, will 2) result in any of A1~B3 being the translation of any 
other?

This may be a strange question, but I would really like to get your insight.
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] WARNING: directory exists but does not match parameters:

2013-12-08 Thread renubalyan
Hi,

I am using the http://www.statmt.org/moses/?n=Moses.Baseline to build the
baseline system.

I have trained, tuned and tested the sentence successfully.

However, I am stuck up at the evaluation step (page 35)- while I am filtering
using the following command:

renu@sandeep-RS:~/Desktop/working$
/home/renu/Desktop/mosesdecoder/scripts/training/filter-model-given-input.pl
filtered-newstest2011 mert-work/moses.ini
/home/renu/Desktop/corpus/newstest2011.true.fr -Binarizer
/home/renu/Desktop/mosesdecoder/bin/processPhraseTable

I get the following error:

WARNING: directory exists but does not match parameters:
  (mert-work/moses.ini ne mert-work/moses.ini ||
/home/renu/Desktop/working/filtered-newstest2011/input.8828 ne
/home/renu/Desktop/corpus/newstest2011.true.fr)


I do not understand the reason for the above error.

Kindly help.

Thanks
Renu
---

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
---

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] error during testing

2013-12-08 Thread amir haghighi
the file   model/reordering-table.* is not empty but the file
evaluation/*.filtered.*/reordering-table.1.*is!
my test set is not empty.

thank you for your answers.


On Sun, Dec 8, 2013 at 3:29 PM, Hieu Hoang  wrote:

> everything looks ok, I'm not sure why it's segfaulting
>
> is the file
>   model/reordering-table.*
> empty? If it is, then you should look in the log file
>   steps/*/TRAINING_build-reordering.*.STDERR
>
> or is
>   evaluation/*.filtered.*/reordering-table.1.*
> empty? is your test set empty?
>
>
>
> On 8 December 2013 09:47, amir haghighi wrote:
>
>> yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
>> all of the pre-processioning  steps are done with moses scripts.
>>
>> here is the EMS config file content:
>>
>> 
>> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
>> 
>>
>> [GENERAL]
>>
>> ### directory in which experiment is run
>> #
>> working-dir = /opt/tools/workingEms
>>
>> # specification of the language pair
>> input-extension = En
>> output-extension = Fa
>> pair-extension = En-Fa
>>
>> ### directories that contain tools and data
>> #
>> # moses
>> moses-src-dir =
>> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
>> #
>> # moses binaries
>> moses-bin-dir = $moses-src-dir/bin
>> #
>> # moses scripts
>> moses-script-dir = $moses-src-dir/scripts
>> #
>> # directory where GIZA++/MGIZA programs resides
>> external-bin-dir = $moses-src-dir/tools
>> #
>> # srilm
>> #srilm-dir = $moses-src-dir/srilm/bin/i686
>> #
>> # irstlm
>> irstlm-dir = /opt/tools/irstlm/bin
>> #
>> # randlm
>> #randlm-dir = $moses-src-dir/randlm/bin
>> #
>> # data
>> toy-data = /opt/tools/dataset/mizan
>>
>> ### basic tools
>> #
>> # moses decoder
>> decoder = $moses-bin-dir/moses
>>
>> # conversion of phrase table into binary on-disk format
>> ttable-binarizer = $moses-bin-dir/processPhraseTable
>>
>> # conversion of rule table into binary on-disk format
>> #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2"
>>
>> # tokenizers - comment out if all your data is already tokenized
>> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>> $input-extension"
>> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>> $output-extension"
>>
>> # truecasers - comment out if you do not use the truecaser
>> input-truecaser = $moses-script-dir/recaser/truecase.perl
>> output-truecaser = $moses-script-dir/recaser/truecase.perl
>> detruecaser = $moses-script-dir/recaser/detruecase.perl
>>
>> ### generic parallelizer for cluster and multi-core machines
>> # you may specify a script that allows the parallel execution
>> # parallizable steps (see meta file). you also need specify
>> # the number of jobs (cluster) or cores (multicore)
>> #
>> #generic-parallelizer =
>> $moses-script-dir/ems/support/generic-parallelizer.perl
>> #generic-parallelizer =
>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>
>> ### cluster settings (if run on a cluster machine)
>> # number of jobs to be submitted in parallel
>> #
>> #jobs = 10
>>
>> # arguments to qsub when scheduling a job
>> #qsub-settings = ""
>>
>> # project for priviledges and usage accounting
>> #qsub-project = iccs_smt
>>
>> # memory and time
>> #qsub-memory = 4
>> #qsub-hours = 48
>>
>> ### multi-core settings
>> # when the generic parallelizer is used, the number of cores
>> # specified here
>> cores = 8
>>
>> #
>> # PARALLEL CORPUS PREPARATION:
>> # create a tokenized, sentence-aligned corpus, ready for training
>>
>> [CORPUS]
>>
>> ### long sentences are filtered out, since they slow down GIZA++
>> # and are a less reliable source of data. set here the maximum
>> # length of a sentence
>> #
>> max-sentence-length = 80
>>
>> [CORPUS:toy]
>>
>> ### command to run to get raw corpus files
>> #
>> # get-corpus-script =
>>
>> ### raw corpus files (untokenized, but sentence aligned)
>> #
>> raw-stem = $toy-data/M_Tr
>>
>> ### tokenized corpus files (may contain long sentences)
>> #
>> #tokenized-stem =
>>
>> ### if sentence filtering should be skipped,
>> # point to the clean training data
>> #
>> #clean-stem =
>>
>> ### if corpus preparation should be skipped,
>> # point to the prepared training data
>> #
>> #lowercased-stem =
>>
>> #
>> # LANGUAGE MODEL TRAINING
>>
>> [LM]
>>
>> ### tool to be used for language model training
>> # srilm
>> #lm-training = $srilm-dir/ngram-count
>> #settings = "-interpolate -kndiscount -unk"
>>
>> # irstlm training
>> # msb = modified kneser ney; p=0 no singleton pruning
>> #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores
>> $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp"
>> #settings = "-s msb -p 0"
>>
>> # order of the language model
>> order = 5
>>
>> ### tool to be used for training randomized language mode

Re: [Moses-support] error during testing

2013-12-08 Thread Hieu Hoang
everything looks ok, I'm not sure why it's segfaulting

is the file
  model/reordering-table.*
empty? If it is, then you should look in the log file
  steps/*/TRAINING_build-reordering.*.STDERR

or is
  evaluation/*.filtered.*/reordering-table.1.*
empty? is your test set empty?



On 8 December 2013 09:47, amir haghighi  wrote:

> yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
> all of the pre-processioning  steps are done with moses scripts.
>
> here is the EMS config file content:
>
> 
> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
> 
>
> [GENERAL]
>
> ### directory in which experiment is run
> #
> working-dir = /opt/tools/workingEms
>
> # specification of the language pair
> input-extension = En
> output-extension = Fa
> pair-extension = En-Fa
>
> ### directories that contain tools and data
> #
> # moses
> moses-src-dir =
> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
> #
> # moses binaries
> moses-bin-dir = $moses-src-dir/bin
> #
> # moses scripts
> moses-script-dir = $moses-src-dir/scripts
> #
> # directory where GIZA++/MGIZA programs resides
> external-bin-dir = $moses-src-dir/tools
> #
> # srilm
> #srilm-dir = $moses-src-dir/srilm/bin/i686
> #
> # irstlm
> irstlm-dir = /opt/tools/irstlm/bin
> #
> # randlm
> #randlm-dir = $moses-src-dir/randlm/bin
> #
> # data
> toy-data = /opt/tools/dataset/mizan
>
> ### basic tools
> #
> # moses decoder
> decoder = $moses-bin-dir/moses
>
> # conversion of phrase table into binary on-disk format
> ttable-binarizer = $moses-bin-dir/processPhraseTable
>
> # conversion of rule table into binary on-disk format
> #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2"
>
> # tokenizers - comment out if all your data is already tokenized
> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
> $input-extension"
> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
> $output-extension"
>
> # truecasers - comment out if you do not use the truecaser
> input-truecaser = $moses-script-dir/recaser/truecase.perl
> output-truecaser = $moses-script-dir/recaser/truecase.perl
> detruecaser = $moses-script-dir/recaser/detruecase.perl
>
> ### generic parallelizer for cluster and multi-core machines
> # you may specify a script that allows the parallel execution
> # parallizable steps (see meta file). you also need specify
> # the number of jobs (cluster) or cores (multicore)
> #
> #generic-parallelizer =
> $moses-script-dir/ems/support/generic-parallelizer.perl
> #generic-parallelizer =
> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>
> ### cluster settings (if run on a cluster machine)
> # number of jobs to be submitted in parallel
> #
> #jobs = 10
>
> # arguments to qsub when scheduling a job
> #qsub-settings = ""
>
> # project for priviledges and usage accounting
> #qsub-project = iccs_smt
>
> # memory and time
> #qsub-memory = 4
> #qsub-hours = 48
>
> ### multi-core settings
> # when the generic parallelizer is used, the number of cores
> # specified here
> cores = 8
>
> #
> # PARALLEL CORPUS PREPARATION:
> # create a tokenized, sentence-aligned corpus, ready for training
>
> [CORPUS]
>
> ### long sentences are filtered out, since they slow down GIZA++
> # and are a less reliable source of data. set here the maximum
> # length of a sentence
> #
> max-sentence-length = 80
>
> [CORPUS:toy]
>
> ### command to run to get raw corpus files
> #
> # get-corpus-script =
>
> ### raw corpus files (untokenized, but sentence aligned)
> #
> raw-stem = $toy-data/M_Tr
>
> ### tokenized corpus files (may contain long sentences)
> #
> #tokenized-stem =
>
> ### if sentence filtering should be skipped,
> # point to the clean training data
> #
> #clean-stem =
>
> ### if corpus preparation should be skipped,
> # point to the prepared training data
> #
> #lowercased-stem =
>
> #
> # LANGUAGE MODEL TRAINING
>
> [LM]
>
> ### tool to be used for language model training
> # srilm
> #lm-training = $srilm-dir/ngram-count
> #settings = "-interpolate -kndiscount -unk"
>
> # irstlm training
> # msb = modified kneser ney; p=0 no singleton pruning
> #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores $cores
> -irst-dir $irstlm-dir -temp-dir $working-dir/tmp"
> #settings = "-s msb -p 0"
>
> # order of the language model
> order = 5
>
> ### tool to be used for training randomized language model from scratch
> # (more commonly, a SRILM is trained)
> #
> #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8"
>
> ### script to use for binary table format for irstlm or kenlm
> # (default: no binarization)
>
> # irstlm
> #lm-binarizer = $irstlm-dir/compile-lm
>
> # kenlm, also set type to 8
> #lm-binarizer = $moses-bin-dir/build_binary
> #type = 8
>
> ### script to create quantized lang

Re: [Moses-support] error during testing

2013-12-08 Thread amir haghighi
yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
all of the pre-processioning  steps are done with moses scripts.

here is the EMS config file content:


### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###


[GENERAL]

### directory in which experiment is run
#
working-dir = /opt/tools/workingEms

# specification of the language pair
input-extension = En
output-extension = Fa
pair-extension = En-Fa

### directories that contain tools and data
#
# moses
moses-src-dir = /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
#
# moses binaries
moses-bin-dir = $moses-src-dir/bin
#
# moses scripts
moses-script-dir = $moses-src-dir/scripts
#
# directory where GIZA++/MGIZA programs resides
external-bin-dir = $moses-src-dir/tools
#
# srilm
#srilm-dir = $moses-src-dir/srilm/bin/i686
#
# irstlm
irstlm-dir = /opt/tools/irstlm/bin
#
# randlm
#randlm-dir = $moses-src-dir/randlm/bin
#
# data
toy-data = /opt/tools/dataset/mizan

### basic tools
#
# moses decoder
decoder = $moses-bin-dir/moses

# conversion of phrase table into binary on-disk format
ttable-binarizer = $moses-bin-dir/processPhraseTable

# conversion of rule table into binary on-disk format
#ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2"

# tokenizers - comment out if all your data is already tokenized
input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
$input-extension"
output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
$output-extension"

# truecasers - comment out if you do not use the truecaser
input-truecaser = $moses-script-dir/recaser/truecase.perl
output-truecaser = $moses-script-dir/recaser/truecase.perl
detruecaser = $moses-script-dir/recaser/detruecase.perl

### generic parallelizer for cluster and multi-core machines
# you may specify a script that allows the parallel execution
# parallizable steps (see meta file). you also need specify
# the number of jobs (cluster) or cores (multicore)
#
#generic-parallelizer =
$moses-script-dir/ems/support/generic-parallelizer.perl
#generic-parallelizer =
$moses-script-dir/ems/support/generic-multicore-parallelizer.perl

### cluster settings (if run on a cluster machine)
# number of jobs to be submitted in parallel
#
#jobs = 10

# arguments to qsub when scheduling a job
#qsub-settings = ""

# project for priviledges and usage accounting
#qsub-project = iccs_smt

# memory and time
#qsub-memory = 4
#qsub-hours = 48

### multi-core settings
# when the generic parallelizer is used, the number of cores
# specified here
cores = 8

#
# PARALLEL CORPUS PREPARATION:
# create a tokenized, sentence-aligned corpus, ready for training

[CORPUS]

### long sentences are filtered out, since they slow down GIZA++
# and are a less reliable source of data. set here the maximum
# length of a sentence
#
max-sentence-length = 80

[CORPUS:toy]

### command to run to get raw corpus files
#
# get-corpus-script =

### raw corpus files (untokenized, but sentence aligned)
#
raw-stem = $toy-data/M_Tr

### tokenized corpus files (may contain long sentences)
#
#tokenized-stem =

### if sentence filtering should be skipped,
# point to the clean training data
#
#clean-stem =

### if corpus preparation should be skipped,
# point to the prepared training data
#
#lowercased-stem =

#
# LANGUAGE MODEL TRAINING

[LM]

### tool to be used for language model training
# srilm
#lm-training = $srilm-dir/ngram-count
#settings = "-interpolate -kndiscount -unk"

# irstlm training
# msb = modified kneser ney; p=0 no singleton pruning
#lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores $cores
-irst-dir $irstlm-dir -temp-dir $working-dir/tmp"
#settings = "-s msb -p 0"

# order of the language model
order = 5

### tool to be used for training randomized language model from scratch
# (more commonly, a SRILM is trained)
#
#rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8"

### script to use for binary table format for irstlm or kenlm
# (default: no binarization)

# irstlm
#lm-binarizer = $irstlm-dir/compile-lm

# kenlm, also set type to 8
#lm-binarizer = $moses-bin-dir/build_binary
#type = 8

### script to create quantized language model format (irstlm)
# (default: no quantization)
#
#lm-quantizer = $irstlm-dir/quantize-lm

### script to use for converting into randomized table format
# (default: no randomization)
#
#lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8"

### each language model to be used has its own section here

[LM:toy]

### command to run to get raw corpus files
#
#get-corpus-script = ""

### raw corpus (untokenized)
#
raw-corpus = $toy-data/M_Tr.$output-extension

### tokenized corpus files (may contain long sentences)
#
#tokenized-corpus =

### if corpus preparation should be skipped,
# point to the prepared language mode