from:"amir haghighi"

[Moses-support] Looking for Position

2021-02-01 Thread amir haghighi

Hi Everyone,

(I apologize for sending this message to this list)

I am experienced in Machine Translation and have10 years experience in this
field. I'm currently seeking a postDoc position in MT or NLP.
Could anyone please guide  me how I can find a  proper position? Is there
any other mailing list except the corpora list for finding postdoc
positions? Is it convenient to send email to the professors and ask about
the vacancies or all of the vacancies will be officially announced?

I appreciate any help you can provide.
Thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Google SMT system

2020-08-25 Thread amir haghighi

Hi everyone

I was wondering if the old version of Google Translator (SMT version) can
be accessed somewhere? I need the results of a SMT system for my research.
I found some apks but they can't be installed on new phones.
I'd be appreciated if you could help me.


Regards
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Google SMT translator

2020-08-02 Thread amir haghighi

Hi everyone

I was wondering if the old version of Google Translator (SMT version) can
be accessed somewhere?

Regards
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Free cloud service to train NMT

2018-06-08 Thread amir haghighi

Thanks Hieu.

Is it possible to train the NMT system on the laptop? I mean is there any
laptop that I can buy for this purpose?

Thanks

On Fri, Jun 8, 2018 at 8:36 PM, Hieu Hoang  wrote:

> try this
>   https://developer.nvidia.com/academic_gpu_seeding
> or search the web
>
> Hieu Hoang
>
> On 8 June 2018 at 14:14, amir haghighi  wrote:
>
>> Hello
>>
>> I'm going to set up an NMT system using openNMT or Nematus but I can't
>> run it on my laptop and I don't have access to any cluster.
>> I was wondering if there is any free cloud computing service which can be
>> used to setup a full-size state of the art NMT system?
>>
>> Thanks
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Free cloud service to train NMT

2018-06-08 Thread amir haghighi

Hello

I'm going to set up an NMT system using openNMT or Nematus but I can't run
it on my laptop and I don't have access to any cluster.
I was wondering if there is any free cloud computing service which can be
used to setup a full-size state of the art NMT system?

Thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] SMT decoding complexity

2017-02-27 Thread amir haghighi

Thanks Philipp,

Yes, my formula has exponential terma but as the growth of factorial is
greater than the growth of exponential, the complexity of the decoding
algorithm(without any constraint on reordering or pruning the search space)
is of O(n!), right?

Thanks Barry, I'll read the paper.

On Mon, Feb 27, 2017 at 7:24 PM, Philipp Koehn <p...@jhu.edu> wrote:

> Hi,
>
> I am not sure if you follow your question - in the formula you cite,
> there are exponential terms: 2^n and T^n.
>
> The Knight paper is worth trying to understand (it's on IBM Models,
> but applies similarly to phrase-based models).
>
> Also keep in mind that limited reordering windows and beam search
> makes actual decoding algorithm implementations linear.
>
> -phi
>
> On Sun, Feb 26, 2017 at 1:16 PM, amir haghighi
> <amir.haghighi.tehr...@gmail.com> wrote:
> > Hi all,
> >
> > In the Moses manual and also in SMT textbooks it is mentioned that the
> > decoding complexity for PB-SMT is exponential in the source sentence
> length.
> > If we have a source sentence with length n, in decoding by hypothesis
> > expansion, we have power(2,n) state, each of them can be reordered in n!
> > orders, and each state can be translated in power(T,n), where T is the
> > number of translation options, right?
> > so the decoder complexity is power(2,n)*n!*power(T,n), so why its
> mentioned
> > that the complexity is exponential?
> >
> > Could someone please explain for me how the decoder complexity is
> > calculated?
> > I've read the Knight(1999) paper, but I couldn't understand it. Could you
> > please introduce another reference?
> >
> > Thanks
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] SMT decoding complexity

2017-02-26 Thread amir haghighi

Hi all,

In the Moses manual and also in SMT textbooks it is mentioned that the
decoding complexity for PB-SMT is exponential in the source sentence
length. If we have a source sentence with length n, in decoding by
hypothesis expansion, we have power(2,n) state, each of them can be
reordered in n! orders, and each state can be translated in power(T,n),
where T is the number of translation options, right?
so the decoder complexity is power(2,n)*n!*power(T,n), so why its mentioned
that the complexity is exponential?

Could someone please explain for me how the decoder complexity is
calculated?
I've read the Knight(1999) paper, but I couldn't understand it. Could you
please introduce another reference?

Thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Fwd: Significant Test

2017-01-31 Thread amir haghighi

Hi All

Could someone please explain for me what does "significant test" and
"p-value" mean?
I've read both Koehn
's
and Clark 's papers
on significant test but I still don't know what does a p-value such as 0.05
means. Does it mean that if the difference between the scores of two
systems is x, with probability=95%, if we repeat the experiments we will
get the same difference?
Does significant test deal with randomness of tuning process? or it deals
with test set selection?

Any help would be appreciated
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] please help me with the code - getting word index

2015-06-20 Thread amir haghighi

Thanks Matthias
ChartHypothesis::GetCurrSourceRange() gets the source span that all
terminals and non terminals in the current hypothesis cover in the source
sentence. I'd like to know which terminals (non terminals) are corresponded
to which source word's index in the source. Could you guide me how to
obtain that?

Thanks again


On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck mh...@inf.ed.ac.uk wrote:

 Hi,

 You can calculate absolute positions in the source sentence based on the
 words range of the current hypothesis and those of the direct
 predecessors (in case of right-hand side non-terminals).

 Take a look at these methods:

 InputPath::GetWordsRange()
 ChartHypothesis::GetCurrSourceRange()
 ChartCellLabel::GetCoverage()

 Cheers,
 Matthias


 On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
  Hi everybody
 
 
  I wrote the following code to get an ordered list from the source words
  inside a hypothesis. It gets the words in their translation order, but I
  need not only the words' strings, but also the index of each word in  the
  original sentence.
 
  could you please help me how to get the index of each word in srcPhrase,
 in
  the sentence?
 
 
  void Amir::GetSourcePhrase2(const ChartHypothesis  cur_hypo,Phrase
  srcPhrase) const
  {
  AmirUtils utility;
  TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
  const Phrase *sourcePh=targetPh.GetRuleSource();
   int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
  std::vector Word source, orderedSource;
  std::vector int alignmentVector;
  std::vector bool isAligned;
 
  std::vector std::set size_t  sourcePosSets;
 
  for(int targetP=0; targetP targetWordsNum; targetP++ ){
  //std::cerrsetting alignments for targetword:
 targetPendl;
 
 
 sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
  }
 
 
  for(int ii=targetWordsNum-1; ii=0; ii--){
  std::set size_t cur_srcPosSet=sourcePosSets[ii];
  for (std::set size_t::const_iterator alignmet =
  cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
  int  alignmentElement=*alignmet;
  for(int index=0; indexii; index++ ){ //keep the rightmost one
 and
  remove the othres
  //remove it from the list
  if(sourcePosSets[index].size()0){
  //std::cerr removing *alignmetendl;
  //std::cerr  for set with size:
  sourcePosSets[index].size()endl;
  sourcePosSets[index].erase(alignmentElement);
  }
 
  }
  }
  }
 
  for (size_t posT = 0; posT  cur_hypo.GetCurrTargetPhrase().GetSize();
  ++posT) {
const Word word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
if (word.IsNonTerminal()){
  // non-term. fill out with prev hypo
 
  size_t nonTermInd =
 
 cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
  const ChartHypothesis *prevHypo =
 cur_hypo.GetPrevHypo(nonTermInd);
 
  GetSourcePhrase2(*prevHypo,srcPhrase);
  }
else{
 
for(std::setsize_t::const_iterator
  it=sourcePosSets[posT].begin();it !=  sourcePosSets[posT].end() ;
 it++
  ){
srcPhrase.AddWord(sourcePh-GetWord(*it));
}
}
  }
 
 
  }
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support



 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] compare hypothesis in moses chart

2014-12-17 Thread amir haghighi

Hi everyone

I need to implement the compare function for hiero model. I was wondering
which hypothesis are going to be compared with each other? those hypos that
covers the same source spans (for example all hypos that cover [x y]
spans)? or those ones that covers the source spans with the same length( [x
x+d] spans)?

Cheers
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] string of Words + states in feature functions

2014-12-16 Thread amir haghighi

Hoang, Hieu and Matthias,Thank you all so much, your explanations really
helped me.
Hieu, could you please add your explanations about compare function to
adding-feature-function web page?

I'm confused about how compare function works. which hypothesis are going
to be compared with each other? those hypos that covers the same source
spans (for example all hypos that cover [x y] spans)? or those ones that
covers the source spans with the same length( [x x+d] spans)?

Cheers






On Wed, Dec 10, 2014 at 2:02 PM, Matthias Huck mh...@inf.ed.ac.uk wrote:

 Hi Amir,

 The input is passed to the feature functions via
 InitializeForInput(InputType const source).
 This method is called before search and collecting of translation
 options (cf. moses/FF/FeatureFunction.h). You can set a member variable
 to have access to the input in your scoring method.

 Alternatively, if you implement EvaluateWithSourceContext(), the input
 is passed directly to the method as a parameter (const InputType input)
 and you can use that.
 Finally, there's another option in the EvaluateWhenApplied() methods.
 You can get the input from the Hypothesis object:
 const InputType input = hypo.GetManager().GetSource();

 The input is an InputType object. Moses knows different input types, see
 InputTypeEnum in moses/TypeDef.h . So what you get might differ
 depending on what was passed to the decoder. If you're happy with
 implementing your feature for sentence input only, then you can cast the
 input to a Sentence object. The Sentence object gives you convenient
 access methods, in particular GetSize() and GetWord(size_t pos). You can
 thus obtain the sequence of words in the input. Words can contain
 several factors in Moses. The factor with index 0 is typically the
 surface form. Access it using the [] operator.

 I guess you will never really want to work directly with the string
 representation of the factor, but at this point you would be able to get
 it and for instance print it to your debug output.

 Hope this was helpful as another answer to your first question.

 Cheers,
 Matthias


 On Wed, 2014-12-10 at 11:41 +0330, amir haghighi wrote:
  Hi everyone
 
 
 
  I'm implementing a feature function in moses-chart. I need the source
  words string and also their indexes in the source sentence. I've
  written a function that gets the source words but I don't know how
  extract word string from a word.
  could anyone guide me how to do that? as I know, each word is
  implemented as an array of factors, which of them is its string?
 
 
  I have also some questions about the states in the stateful features,
  what kind of variables should be stored in each state? only those ones
  that should be used in the compare function? or any variable from the
  previous hypothesis  that we use in our feature?
 
 
  Thanks in advance!
 
 
  Cheers
 
  Amir
 
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support



 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] string of Words + states in feature functions

2014-12-10 Thread amir haghighi

Hi everyone

I'm implementing a feature function in moses-chart. I need the source words
string and also their indexes in the source sentence. I've written a
function that gets the source words but I don't know how extract word
string from a word.
could anyone guide me how to do that? as I know, each word is implemented
as an array of factors, which of them is its string?

I have also some questions about the states in the stateful features,
what kind of variables should be stored in each state? only those ones that
should be used in the compare function? or any variable from the previous
hypothesis  that we use in our feature?

Thanks in advance!

Cheers
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] stateless or stateful?!

2014-10-14 Thread amir haghighi

Thanks so much Hieu for your reply.

I mean, if I call GetOutputPhrase() for hypothesis number 11 in the
following example, what will it return? will it return a house or I
should follow the previous hypothesis of 11 (in this case, 7 and 9) and
then call GetOutputPhrase() for each of them to obtain the target phrase?
or maybe I should store something in the states! I don't understand what
type of things I should store in the states.

 41 X TOP - s S /s (1,1) [0..5] -3.593 0.000,
-2.606, -9.711, 2.526 20
 20 X S   - NP V NP(0,0) (1,1) (2,2) [1..4] -1.988 0.000,
-1.737, -6.501, 2.526 3 5 11
  3 X NP  - this [1..1]  0.486 0.000,
-0.434, -1.330, 2.303
  5 X V   - is   [2..2] -1.267 0.000,
-0.434, -2.533, 0.000
 11 X NP  - DT NN  (0,0) (1,1)   [3..4] -2.698 0.000,
-0.869, -5.396, 0.000 7 9
  7 X DT  - a[3..3] -1.012 0.000,
-0.434, -2.024, 0.000
  9 X NN  - house[4..4] -2.887 0.000,
-0.434, -5.774, 0.000


and another important question: How should I use my feature function by
running chart decoder with Ems? my feature doesn't need any argument.
I added the following lines to my config file, is it right?!

   [tune]

  decoder-settings = -threads 8 -feature-add  -weight-add abc= 0.1

   [EVALUATION]
   decoder-settings = -search-algorithm 1 -cube-pruning-pop-limit
5000 -s 5000 -threads 8 -feature-add
-weight-add abc= 0.1 



On Mon, Oct 13, 2014 at 11:44 AM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 hi amir

 On 12 October 2014 21:36, amir haghighi amir.haghighi...@gmail.com
 wrote:

 Hi everyone

 I'm gonna to add a feature function to Hiero model in Moses and I have
 some questions about the moses code. any response would be much appreciated
 :)

 for implementing my feature function to assign score to each hypothesis,
 I need: the whole source sentence
 the words in source sentence that are translated so far
 target words that are produced so far

 1. my feature function should be stateless or stateful? can I implement
 it as a stateless function and extract needed information from this vector?
 std::vectorconst ChartHypothesis* m_prevHypos;

 If you want to access m_prevHypos, then it should be a stateful feature.
 The Moses feature function framework allows stateless feature to also do
 this, but that isn't really correct, it risk making random errors.


 2. which of these evaluate functions should be implemented?
 EvaluateWhenApplied(  const ChartHypothesis /* cur_hypo */,  int /*
 featureID - used to index the state in the previous hypotheses */,
 ScoreComponentCollection* accumulator) const  or
 void SkeletonStatefulFF::EvaluateWithSourceContext(const InputType input
 , const InputPath inputPath  , const TargetPhrase targetPhrase  , const
 StackVec *stackVec  , ScoreComponentCollection scoreBreakdown  ,
 ScoreComponentCollection *estimatedFutureScore) const

 You should use EvaluateWhenApplied(). The hypotheses, and the variable
 m_prevHypos, is available in this function.


 3. In hierarchical translation, is there any gap between produced target
 phrases? How can I get produced target phrases in the code?

 i don't understand your question. Can you give an example?



  Thanks in advance
 Amir







 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] stateless or stateful?!

2014-10-12 Thread amir haghighi

Hi everyone

I'm gonna to add a feature function to Hiero model in Moses and I have some
questions about the moses code. any response would be much appreciated :)

for implementing my feature function to assign score to each hypothesis, I
need: the whole source sentence
the words in source sentence that are translated so far
target words that are produced so far

1. my feature function should be stateless or stateful? can I implement it
as a stateless function and extract needed information from this vector?
std::vectorconst ChartHypothesis* m_prevHypos;

2. which of these evaluate functions should be implemented?
EvaluateWhenApplied(  const ChartHypothesis /* cur_hypo */,  int /*
featureID - used to index the state in the previous hypotheses */,
ScoreComponentCollection* accumulator) const  or
void SkeletonStatefulFF::EvaluateWithSourceContext(const InputType input  ,
const InputPath inputPath  , const TargetPhrase targetPhrase  , const
StackVec *stackVec  , ScoreComponentCollection scoreBreakdown  ,
ScoreComponentCollection *estimatedFutureScore) const
3. In hierarchical translation, is there any gap between produced target
phrases? How can I get produced target phrases in the code?


 Thanks in advance
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: about the moses code

2014-07-14 Thread amir haghighi

Thank you very much Mr Hieu for uploading this helpful video.
unfortunately some part of it, are corrupted( especially the first
minutes), and  what you are typing in the terminal can not be seen.
It would be great if you could upload a better one.
Thank you again.

Amir


On Mon, Jul 14, 2014 at 2:46 AM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 I've just uploaded a youtube video about this
https://www.youtube.com/watch?v=P43h827uLacfeature=youtu.be
 Hope thats useful to you



 On 13 July 2014 22:53, amir haghighi amir.haghighi...@gmail.com wrote:


 Hello all

 it is a week that I want to open moses code with Netbeans od eclipse IDE
 and I cant.
 regrading that moses code does not have any make or configure file, could
 you please help me how can I open and run its code with those IDEs?
 I should add my feature function to moses code but I can't even open the
 code with an IDE.

 I would be very grateful if you could help me.

 Thank you
 Amir


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Fwd: about the moses code

2014-07-13 Thread amir haghighi

Hello all

it is a week that I want to open moses code with Netbeans od eclipse IDE
and I cant.
regrading that moses code does not have any make or configure file, could
you please help me how can I open and run its code with those IDEs?
I should add my feature function to moses code but I can't even open the
code with an IDE.

I would be very grateful if you could help me.

Thank you
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] How to start coding in moses

2014-05-03 Thread amir haghighi

Hi
I want to implement a new reordering model (as a feature function) for
moses. I need the whole sentence which moses is translating now, and
the previous phrases which are translated. I am not familiar with
moses code and it seems very complicated to me. I had took a look at
code documentation but I don't know which classes should I study. I
would be very thankful if everyone could help me.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] tuning takes tooooo long:(

2014-02-18 Thread amir haghighi

Thank you Barry for the explanation.
The problem was solved and I finally obtained the bleu score.

Regards
Amir




On Mon, Feb 17, 2014 at 12:39 AM, Barry Haddow
bhad...@staffmail.ed.ac.ukwrote:

  HI Amir

 You should be able to do the building of the LM with irstlm, and
 binarising and decoding with KenLM, all via EMS.

 Anyway, for the decoding, is Moses still running? Is it using CPU? It
 could be swapping. If it's not running, did it crash? You could try running
 again with a small stack size (the -s parameter in the decoder-settings)

 cheers - Barry


 On 15/02/14 09:51, amir haghighi wrote:

   Hello Barry

  Thank you for your help.

 I commneted out the KenLm in the configuration file and used buil-binary
 to binarize lm file.( it doesn't create arpa file, should I create it
 manually? )
 It didn't get any error but it still takes too long to finish the testing
 process.
 I have 5000 sentences in my test set. I checked test.output.1 file and
 noticed that  it contains only the translation of 4690 sentences. it seems
 that it gets into a loop for translating the 4691'th sentence.
  I am afraid that I made a mistake in setting the parameters in my
 configuration file. I will be thankful if you could take a look at my file.

  Thank you again
  amir




 On Fri, Feb 14, 2014 at 5:28 AM, Barry Haddow 
 bhad...@staffmail.ed.ac.ukwrote:

  Hi Amir

 Even if you use IRSTLM to build the language model, you can still use
 KenLM for decoding. Make sure you create an arpa file with IRSTLM, then use
 build_binary to binarise it so that it loads quickly with KenLM. Then you
 can use multi-threaded decoding,

 cheers - Barry


 On 14/02/14 13:01, amir haghighi wrote:

   Thank you Barry,

  I use IRSTLM to build the language model. Can I use multi-thread for
 decoder-setting?
  I get IRST LM is not threadsafe error.

  I want to use IRSTLM, is there any other way to speed up the tuning step
 ?

  Regrads


 On Fri, Feb 14, 2014 at 1:53 AM, Barry Haddow bhad...@staffmail.ed.ac.uk
  wrote:

  Hi Amir

 You can add

 decoder-settings = -threads 4

 to your TUNING stanza.

 Also try

 filter-settings = -MinScore 2:0.0001

 for more aggressive filtering.

 Running tuning on a laptop though is always going to be slow,

 cheers - Barry


 On 14/02/14 09:26, amir haghighi wrote:

  Thank you arezki and yohit

  I don't know how can I change multi-thread setting in ems config file.



 On Fri, Feb 14, 2014 at 12:36 AM, Arezki Sadoune arezkisado...@yahoo.fr
  wrote:

   Hello Amir
 I think your tuning process will go faster if you use a multi-threaded
 Mert.
 /home/mert-moses.pl --threads 4
  you have of course tu indicate 8 instead of 4 if your laptop is
 equipped with eight cores
  Best regards


   Le Vendredi 14 février 2014 8h27, amir haghighi 
 amir.haghighi...@gmail.com a écrit :
   Hello

  I have a corpus with 400'000 sentences for training, 1000 sentences
 for tuning and 100'000 sentences for test. I couldn't run ems on my corpus,
 after 3 days, with my old laptop.
 I have bought a new laptop (core i7, cpu 2.40 , 8G Ram) but I can't
 still run ems! it is 3 days that it is in the tuning step and it is not
 finished yet.
  Is it possible that it gets in an endless loop?
  How can I check it's process?

  regards
  Amir

  ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] tuning takes tooooo long:(

2014-02-15 Thread amir haghighi

Hello Barry

Thank you for your help.

I commneted out the KenLm in the configuration file and used buil-binary to
binarize lm file.( it doesn't create arpa file, should I create it
manually? )
It didn't get any error but it still takes too long to finish the testing
process.
I have 5000 sentences in my test set. I checked test.output.1 file and
noticed that  it contains only the translation of 4690 sentences. it seems
that it gets into a loop for translating the 4691'th sentence.
I am afraid that I made a mistake in setting the parameters in my
configuration file. I will be thankful if you could take a look at my file.

Thank you again
amir




On Fri, Feb 14, 2014 at 5:28 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote:

  Hi Amir

 Even if you use IRSTLM to build the language model, you can still use
 KenLM for decoding. Make sure you create an arpa file with IRSTLM, then use
 build_binary to binarise it so that it loads quickly with KenLM. Then you
 can use multi-threaded decoding,

 cheers - Barry


 On 14/02/14 13:01, amir haghighi wrote:

   Thank you Barry,

  I use IRSTLM to build the language model. Can I use multi-thread for
 decoder-setting?
  I get IRST LM is not threadsafe error.

  I want to use IRSTLM, is there any other way to speed up the tuning step ?

  Regrads


 On Fri, Feb 14, 2014 at 1:53 AM, Barry Haddow 
 bhad...@staffmail.ed.ac.ukwrote:

  Hi Amir

 You can add

 decoder-settings = -threads 4

 to your TUNING stanza.

 Also try

 filter-settings = -MinScore 2:0.0001

 for more aggressive filtering.

 Running tuning on a laptop though is always going to be slow,

 cheers - Barry


 On 14/02/14 09:26, amir haghighi wrote:

  Thank you arezki and yohit

  I don't know how can I change multi-thread setting in ems config file.



 On Fri, Feb 14, 2014 at 12:36 AM, Arezki Sadoune 
 arezkisado...@yahoo.frwrote:

   Hello Amir
 I think your tuning process will go faster if you use a multi-threaded
 Mert.
 /home/mert-moses.pl --threads 4
  you have of course tu indicate 8 instead of 4 if your laptop is
 equipped with eight cores
  Best regards


   Le Vendredi 14 février 2014 8h27, amir haghighi 
 amir.haghighi...@gmail.com a écrit :
   Hello

  I have a corpus with 400'000 sentences for training, 1000 sentences for
 tuning and 100'000 sentences for test. I couldn't run ems on my corpus,
 after 3 days, with my old laptop.
 I have bought a new laptop (core i7, cpu 2.40 , 8G Ram) but I can't
 still run ems! it is 3 days that it is in the tuning step and it is not
 finished yet.
  Is it possible that it gets in an endless loop?
  How can I check it's process?

  regards
  Amir

  ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support




### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###


[GENERAL]

### directory in which experiment is run
#
working-dir = /opt/working/ems3

# specification of the language pair
input-extension = En
output-extension = Fa
pair-extension = En-Fa

### directories that contain tools and data
# 
# moses
moses-src-dir = /opt/tools/mosesdecoder
#
# moses binaries
moses-bin-dir = $moses-src-dir/bin
#
# moses scripts
moses-script-dir = $moses-src-dir/scripts
#
# directory where GIZA++/MGIZA programs resides
external-bin-dir = $moses-src-dir/tools
#
# srilm
#srilm-dir = $moses-src-dir/srilm/bin/i686
#
# irstlm
irstlm-dir = /opt/tools/irstlm/bin
#
# randlm
#randlm-dir = $moses-src-dir/randlm/bin
#
# data
toy-data = /opt/dataset/mizan

### basic tools
#
# moses decoder
decoder = $moses-bin-dir/moses

# conversion of phrase table into binary on-disk format
ttable-binarizer = $moses-bin-dir/processPhraseTable

# conversion of rule table into binary on-disk format
#ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 5 100 2

# tokenizers - comment out if all your data is already tokenized
input-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l 
$input-extension
output-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l 
$output-extension

# truecasers - comment out if you do not use the truecaser
input-truecaser = $moses-script-dir/recaser/truecase.perl
output-truecaser = $moses-script-dir/recaser/truecase.perl
detruecaser = $moses-script-dir/recaser/detruecase.perl

### generic parallelizer for cluster and multi-core machines
# you may specify a script that allows the parallel execution
# parallizable steps (see meta file). you also need specify 
# the number of jobs (cluster) or cores (multicore)
#
#generic-parallelizer = $moses-script-dir/ems

[Moses-support] getting zero bleu score on the test set

2014-02-15 Thread amir haghighi

Hello
my Ems system is trained with 400'000 sentences. the phrase table and
reordering model are filled correctly but I get ( test: 0.00 (1.100) BLEU)
by running Ems on my test set which has 500 sentence.
I don't know why this happens?

Thank you for any help
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] tuning takes tooooo long:(

2014-02-14 Thread amir haghighi

Thank you arezki and yohit

I don't know how can I change multi-thread setting in ems config file.



On Fri, Feb 14, 2014 at 12:36 AM, Arezki Sadoune arezkisado...@yahoo.frwrote:

 Hello Amir
 I think your tuning process will go faster if you use a multi-threaded
 Mert.
 /home/mert-moses.pl --threads 4
  you have of course tu indicate 8 instead of 4 if your laptop is equipped
 with eight cores
 Best regards


   Le Vendredi 14 février 2014 8h27, amir haghighi 
 amir.haghighi...@gmail.com a écrit :
  Hello

 I have a corpus with 400'000 sentences for training, 1000 sentences for
 tuning and 100'000 sentences for test. I couldn't run ems on my corpus,
 after 3 days, with my old laptop.
 I have bought a new laptop (core i7, cpu 2.40 , 8G Ram) but I can't still
 run ems! it is 3 days that it is in the tuning step and it is not finished
 yet.
 Is it possible that it gets in an endless loop?
 How can I check it's process?

 regards
 Amir

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] tuning takes tooooo long:(

2014-02-14 Thread amir haghighi

Thank you Barry,

I use IRSTLM to build the language model. Can I use multi-thread for
decoder-setting?
I get IRST LM is not threadsafe error.

I want to use IRSTLM, is there any other way to speed up the tuning step ?

Regrads


On Fri, Feb 14, 2014 at 1:53 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote:

  Hi Amir

 You can add

 decoder-settings = -threads 4

 to your TUNING stanza.

 Also try

 filter-settings = -MinScore 2:0.0001

 for more aggressive filtering.

 Running tuning on a laptop though is always going to be slow,

 cheers - Barry


 On 14/02/14 09:26, amir haghighi wrote:

  Thank you arezki and yohit

  I don't know how can I change multi-thread setting in ems config file.



 On Fri, Feb 14, 2014 at 12:36 AM, Arezki Sadoune 
 arezkisado...@yahoo.frwrote:

   Hello Amir
 I think your tuning process will go faster if you use a multi-threaded
 Mert.
 /home/mert-moses.pl --threads 4
  you have of course tu indicate 8 instead of 4 if your laptop is equipped
 with eight cores
  Best regards


   Le Vendredi 14 février 2014 8h27, amir haghighi 
 amir.haghighi...@gmail.com a écrit :
   Hello

  I have a corpus with 400'000 sentences for training, 1000 sentences for
 tuning and 100'000 sentences for test. I couldn't run ems on my corpus,
 after 3 days, with my old laptop.
 I have bought a new laptop (core i7, cpu 2.40 , 8G Ram) but I can't still
 run ems! it is 3 days that it is in the tuning step and it is not finished
 yet.
  Is it possible that it gets in an endless loop?
  How can I check it's process?

  regards
  Amir

  ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] tuning takes tooooo long:(

2014-02-13 Thread amir haghighi

Hello

I have a corpus with 400'000 sentences for training, 1000 sentences for
tuning and 100'000 sentences for test. I couldn't run ems on my corpus,
after 3 days, with my old laptop.
I have bought a new laptop (core i7, cpu 2.40 , 8G Ram) but I can't still
run ems! it is 3 days that it is in the tuning step and it is not finished
yet.
Is it possible that it gets in an endless loop?
How can I check it's process?

regards
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] exception during tuning step

2014-02-09 Thread amir haghighi

Hello all



when I run moses EMS, in the tuning step, it gives this exception:

Exception: moses/FF/Factory.cpp:235 in void
Moses::FeatureRegistry::Construct(const string, const string) threw
UnknownFeatureException because `i == registry_.end()'.
Feature name IRSTLM is not registered.
Exit code: 1
Failed to run moses with the config
/opt/working/ems/tuning/moses.filtered.ini.1 at
/opt/tools/mosesdecoder/scripts/training/mert-moses.pl line 1271.
cp: cannot stat '/opt/working/ems/tuning/tmp.1/moses.ini': No such file or
directory


I will be thankful if you could help me to solve this problem.

Regards

amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] word alignment-words' indexes and sentences' length

2014-01-24 Thread amir haghighi

I use the built-in tokenizer in the Moses.
how can I change this tokenizer? should I change the  source code?

Regards
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] word alignment-words' indexes and sentences' length

2014-01-24 Thread amir haghighi

Thank you Barry for your help.

Hi Amin,
I can't see the link. could you please attach it to your email?

Regrads
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] word alignment-words' indexes and sentences' length

2014-01-23 Thread amir haghighi

I removed all of the double spaces from the corpus but there are some
double spaces in the tokenised file yet.
My source language is Persian and I have half-spaces in my corpus. I
noticed that after the tokenisation step,these half-spaces are converted to
double-spaces. this conversion disturb the sentence's length and the
alignment.
How can I prevent from this conversion?

Thank you again
Amir


On Wed, Jan 22, 2014 at 2:10 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 yes, remove the double space. Sometimes, the double space is ignored,
 sometimes it's counted as a 'word' with no characters, depending on exactly
 how the program tokenizes the line.




 On 22 January 2014 10:09, amir haghighi amir.haghighi...@gmail.comwrote:

 Thank you Hieu,

 The corpus is utf8, but there is a double space in this line. are double
 spaces regarded as a word?
 should I remove double spaces from the lines manually to get the correct
 sentence's length?



 On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang hieuho...@gmail.com wrote:


 On 20/01/2014 13:45, amir haghighi wrote:

   Hello

  I've some questions about the giza word alignment.

  1-where is the final alignment file?Is it the aligned.1.grow in
 the model folder?

 yes.


  2-do indexes of the words of both target and source sentences start
 from 0?

 yes


  3- how does giza calculate the length of a sentence?

 the number of words

  I have a sentence with 11 tokens that are separated with space, but in
 the alignment file it length is 13.

 strange. Are you sure your corpus file is encoded as UTF8? Are there
 double spaces in the line?


  Regards
  Amir



 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support




 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] word alignment-words' indexes and sentences' length

2014-01-22 Thread amir haghighi

Thank you Hieu,

The corpus is utf8, but there is a double space in this line. are double
spaces regarded as a word?
should I remove double spaces from the lines manually to get the correct
sentence's length?



On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang hieuho...@gmail.com wrote:


 On 20/01/2014 13:45, amir haghighi wrote:

   Hello

  I've some questions about the giza word alignment.

  1-where is the final alignment file?Is it the aligned.1.grow in the
 model folder?

 yes.


  2-do indexes of the words of both target and source sentences start from
 0?

 yes


  3- how does giza calculate the length of a sentence?

 the number of words

  I have a sentence with 11 tokens that are separated with space, but in
 the alignment file it length is 13.

 strange. Are you sure your corpus file is encoded as UTF8? Are there
 double spaces in the line?


  Regards
  Amir



 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] word alignment-words' indexes and sentences' length

2014-01-20 Thread amir haghighi

Hello

I've some questions about the giza word alignment.

1-where is the final alignment file?Is it the aligned.1.grow in the
model folder?

2-do indexes of the words of both target and source sentences start from 0?

3- how does giza calculate the length of a sentence? I have a sentence with
11 tokens that are separated with space, but in the alignment file it
length is 13.

Regards
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] how long should EVALUATION:test:decode step last?

2014-01-11 Thread amir haghighi

Hi,
I am running Ems to get the bleu score for 5 test sentences. It is in
EVALUATION:test:decode step for three days. I would like to know is it
normal? how long should this step last?

in this step, I also gets the following message:
Use of uninitialized value $post_decoding_transliteration in string eq at
/opt/tools/mosesdecoder/scripts/ems/experiment.perl line 2655.
 is this an error?
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] error during testing

2013-12-19 Thread amir haghighi

Dear Mr Nadeem

Thank you very very much for your help.
During the previous month, I tried everything for solving this problem but
none of them had worked.
After your email, I remembered that I upgraded my gcc and probably it is
the main cause of the problem.

I will get the upgraded Moses code and hope the problem will be solved.
(thanks to Dr Hieu)

Regards
Amir

On Wed, Dec 18, 2013 at 8:22 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

ah, that's good. However, if you get the latest Moses code by doing
git pull
you shouldn't get the problem. It was fixed on the 15th November

https://github.com/moses-smt/mosesdecoder/commit/17887a27969e83f4100bd0f4af98986e33999fbe

On 18 December 2013 16:40, nadeem khan nad_sta...@yahoo.com wrote:

I tried everything , as I was working on ubuntu 13.10 and there we got
latest gcc and I am getting testing problem of segm fault and then I
downgraded by gcc ,G++ etc boost to 1.48 all tthings goes just fine...

Regards
Nadeem

On Wednesday, December 18, 2013 9:33 PM, Hieu Hoang
hieu.ho...@ed.ac.uk wrote:
there shouldn't be a problem with gcc version. There was a problem but
it has been fixed

http://article.gmane.org/gmane.comp.nlp.moses.user/9868/match=gcc+4.8.2
However, it doesn't create an empty data set.

Can you please tell me exactly what is the command that produces the
segfault. And can you please make available for download (via dropbox or
google drive) the files needed for me to reproduce the error.

On 18 December 2013 16:08, nadeem khan nad_sta...@yahoo.com wrote:

hi amir;

it is problem of your gcc version try older gcc then it works fine.

On Wednesday, December 18, 2013 7:56 PM, amir haghighi
amir.haghighi...@gmail.com wrote:
Hello,

My problem is not solved yet:(.

I changed the test data several times, but every time it got the
segmentation fault error! the reordering table of the training data set
is not empty but for all of the test data sets, it is empty.
could anybody help me?

Regards
Amir

___
Moses-support mailing listMoses-support@...
http://gmane.org/get-address.php?address=Moses%2dsupport%2d3s7WtUTddSA%40public.gmane.orghttp://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] error during compiling moses

2013-12-19 Thread amir haghighi

Dear Kenneth Heafield

Thank you for your help. I installed libbz2-dev and the problem was solved.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] error during testing

2013-12-19 Thread amir haghighi

 Dear Hieu

I got the latest code but the problem is not solved yet. It still gives the
segmentation fault error ~x(.
Is there any other way to solve this problem except downgrading gcc?

Regards
Amir





On Wed, Dec 18, 2013 at 8:22 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 ah, that's good. However, if you get the latest Moses code by doing
git pull
 you shouldn't get the problem. It was fixed on the 15th November

 https://github.com/moses-smt/mosesdecoder/commit/17887a27969e83f4100bd0f4af98986e33999fbe




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] error during testing

2013-12-18 Thread amir haghighi

  Hello,

My problem is not solved yet:(.

I changed the test data several times, but every time it got the
segmentation fault error! the reordering table of the training data set
is not empty but for all of the test data sets, it is empty.
could anybody help me?

Regards
Amir



amir haghighi  wrote:

the file   model/reordering-table.* is not empty but the file
evaluation/*.filtered.*/reordering-table.1.*is!
 my test set is not empty.

 thank you for your answers.


On Sun, Dec 8, 2013 at 3:29 PM, Hieu Hoang 
hieu.hoang-5whefg1t...@public.gmane.org wrote:

   everything looks ok, I'm not sure why it's segfaulting

 is the file
   model/reordering-table.*
 empty? If it is, then you should look in the log file
   steps/*/TRAINING_build-reordering.*.STDERR

 or is
   evaluation/*.filtered.*/reordering-table.1.*
 empty? is your test set empty?



 On 8 December 2013 09:47, amir haghighi amir.haghighi.64 at 
 gmail.comamir.haghighi.64-re5jqeeqqe8avxtiumw...@public.gmane.org
  wrote:

  yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
 all of the pre-processioning  steps are done with moses scripts.

  here is the EMS config file content:

 
 ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
 

 [GENERAL]

 ### directory in which experiment is run
 #
 working-dir = /opt/tools/workingEms

 # specification of the language pair
 input-extension = En
 output-extension = Fa
 pair-extension = En-Fa

 ### directories that contain tools and data
 #
 # moses
 moses-src-dir =
 /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
 #
 # moses binaries
 moses-bin-dir = $moses-src-dir/bin
 #
 # moses scripts
 moses-script-dir = $moses-src-dir/scripts
 #
 # directory where GIZA++/MGIZA programs resides
 external-bin-dir = $moses-src-dir/tools
 #
 # srilm
 #srilm-dir = $moses-src-dir/srilm/bin/i686
 #
 # irstlm
 irstlm-dir = /opt/tools/irstlm/bin
 #
 # randlm
 #randlm-dir = $moses-src-dir/randlm/bin
 #
 # data
 toy-data = /opt/tools/dataset/mizan

 ### basic tools
 #
 # moses decoder
 decoder = $moses-bin-dir/moses

 # conversion of phrase table into binary on-disk format
 ttable-binarizer = $moses-bin-dir/processPhraseTable

 # conversion of rule table into binary on-disk format
 #ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 5 100 2

 # tokenizers - comment out if all your data is already tokenized
 input-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l
 $input-extension
 output-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l
 $output-extension

 # truecasers - comment out if you do not use the truecaser
 input-truecaser = $moses-script-dir/recaser/truecase.perl
 output-truecaser = $moses-script-dir/recaser/truecase.perl
 detruecaser = $moses-script-dir/recaser/detruecase.perl

 ### generic parallelizer for cluster and multi-core machines
 # you may specify a script that allows the parallel execution
 # parallizable steps (see meta file). you also need specify
 # the number of jobs (cluster) or cores (multicore)
 #
 #generic-parallelizer =
 $moses-script-dir/ems/support/generic-parallelizer.perl
 #generic-parallelizer =
 $moses-script-dir/ems/support/generic-multicore-parallelizer.perl

 ### cluster settings (if run on a cluster machine)
 # number of jobs to be submitted in parallel
 #
 #jobs = 10

 # arguments to qsub when scheduling a job
 #qsub-settings = 

 # project for priviledges and usage accounting
 #qsub-project = iccs_smt

 # memory and time
 #qsub-memory = 4
 #qsub-hours = 48

 ### multi-core settings
 # when the generic parallelizer is used, the number of cores
 # specified here
 cores = 8

 #
 # PARALLEL CORPUS PREPARATION:
 # create a tokenized, sentence-aligned corpus, ready for training

 [CORPUS]

 ### long sentences are filtered out, since they slow down GIZA++
 # and are a less reliable source of data. set here the maximum
 # length of a sentence
 #
 max-sentence-length = 80

 [CORPUS:toy]

 ### command to run to get raw corpus files
 #
 # get-corpus-script =

 ### raw corpus files (untokenized, but sentence aligned)
 #
 raw-stem = $toy-data/M_Tr

 ### tokenized corpus files (may contain long sentences)
 #
 #tokenized-stem =

 ### if sentence filtering should be skipped,
 # point to the clean training data
 #
 #clean-stem =

 ### if corpus preparation should be skipped,
 # point to the prepared training data
 #
 #lowercased-stem =

 #
 # LANGUAGE MODEL TRAINING

 [LM]

 ### tool to be used for language model training
 # srilm
 #lm-training = $srilm-dir/ngram-count
 #settings = -interpolate -kndiscount -unk

 # irstlm training
 # msb = modified kneser ney; p=0 no singleton pruning
 #lm-training = $moses-script-dir/generic/trainlm-irst2.perl -cores
 $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp
 #settings = -s msb -p 0

Re: [Moses-support] error during testing

2013-12-08 Thread amir haghighi

-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000 for cube
pruning
#
decoder-settings = -search-algorithm 1 -cube-pruning-pop-limit 5000 -s
5000

### specify size of n-best list, if produced
#
#nbest = 100

### multiple reference translations
#
#multiref = yes

### prepare system output for scoring
# this may include detokenization and wrapping output in sgm
# (needed for nist-bleu, ter, meteor)
#
detokenizer = $moses-script-dir/tokenizer/detokenizer.perl -l
$output-extension
#recaser = $moses-script-dir/recaser/recase.perl
wrapping-script = $moses-script-dir/ems/support/wrap-xml.perl
$output-extension
#output-sgm =

### BLEU
#
nist-bleu = $moses-script-dir/generic/mteval-v13a.pl
nist-bleu-c = $moses-script-dir/generic/mteval-v13a.pl -c
#multi-bleu = $moses-script-dir/generic/multi-bleu.perl
#ibm-bleu =

### TER: translation error rate (BBN metric) based on edit distance
# not yet integrated
#
# ter =

### METEOR: gives credit to stem / worknet synonym matches
# not yet integrated
#
# meteor =

### Analysis: carry out various forms of analysis on the output
#
analysis = $moses-script-dir/ems/support/analysis.perl
#
# also report on input coverage
analyze-coverage = yes
#
# also report on phrase mappings used
report-segmentation = yes
#
# report precision of translations for each input word, broken down by
# count of input word in corpus and model
#report-precision-by-coverage = yes
#
# further precision breakdown by factor
#precision-by-coverage-factor = pos
#
# visualization of the search graph in tree-based models
#analyze-search-graph = yes

[EVALUATION:test]

### input data
#
input-sgm = $toy-data/M_Ts.$input-extension
# raw-input =
# tokenized-input =
# factorized-input =
# input =

### reference data
#
reference-sgm = $toy-data/M_Ts.$output-extension
# raw-reference =
# tokenized-reference =
# reference =

### analysis settings
# may contain any of the general evaluation analysis settings
# specific setting: base coverage statistics on earlier run
#
#precision-by-coverage-base = $working-dir/evaluation/test.analysis.5

### wrapping frame
# for nist-bleu and other scoring scripts, the output needs to be wrapped
# in sgm markup (typically like the input sgm)
#
wrapping-frame = $input-sgm

##
### REPORTING: summarize evaluation scores

[REPORTING]

### currently no parameters for reporting section




On Sat, Dec 7, 2013 at 7:21 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 are you sure the parallel data is encoded in UTF8? Was it tokenized,
 cleaned and escaped by the Moses scripts or by another external script?

 Can you please send me you EMS config file too


 On 7 December 2013 14:03, amir haghighi amir.haghighi...@gmail.comwrote:

 Hi,

 I have also the same problem in evaluation step with EMS and I would be
 thankful if you could help me.
 the lexical reordering file is emtpy and the log of the output in
 evaluation_test_filter.2.stderr is:

 Using SCRIPTS_ROOTDIR:
 /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/scripts
 (9) create moses.ini @ Sat Dec  7 04:50:15 PST 2013
 Executing: mkdir -p /opt/tools/workingEms/evaluation/test.filtered.2
 Considering factor 0
 Considering factor 0
 filtering /opt/tools/workingEms/model/phrase-table.2 -
 /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1...
 0 of 2197240 phrases pairs used (0.00%) - note: max length 10
 binarizing...cat
 /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1 |
 LC_ALL=C sort -T /opt/tools/workingEms/evaluation/test.filtered.2 |

 /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processPhraseTable
 -ttable 0 0 - -nscores 5 -out
 /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1
 processing ptree for stdin
 Segmentation fault (core dumped)
 filtering
 /opt/tools/workingEms/model/reordering-table.2.wbe-msd-bidirectional-fe.gz
 -

 /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe...
 0 of 2197240 phrases pairs used (0.00%) - note: max length 10

 binarizing.../opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processLexicalTable
 -in

 /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
 -out

 /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
 processLexicalTable v0.1 by Konrad Rawlik
 processing

 /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
 to

 /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe.*
 ERROR: empty lexicalised reordering file



 Barry Haddow bhaddow@... writes:

 
  Hi Irene
 
But the output is empty. And the errors are 1. segmentation fault
  2. error: empty lexicalized
reordering file
 
  Is this lexicalised reordering file empty then?
 
  It would be helpful if you could post the full log of the output when
  your run the filter command,
 
  cheers - Barry

Re: [Moses-support] error during testing

2013-12-08 Thread amir haghighi

the file   model/reordering-table.* is not empty but the file
evaluation/*.filtered.*/reordering-table.1.*is!
my test set is not empty.

thank you for your answers.


On Sun, Dec 8, 2013 at 3:29 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 everything looks ok, I'm not sure why it's segfaulting

 is the file
   model/reordering-table.*
 empty? If it is, then you should look in the log file
   steps/*/TRAINING_build-reordering.*.STDERR

 or is
   evaluation/*.filtered.*/reordering-table.1.*
 empty? is your test set empty?



 On 8 December 2013 09:47, amir haghighi amir.haghighi...@gmail.comwrote:

 yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
 all of the pre-processioning  steps are done with moses scripts.

 here is the EMS config file content:

 
 ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
 

 [GENERAL]

 ### directory in which experiment is run
 #
 working-dir = /opt/tools/workingEms

 # specification of the language pair
 input-extension = En
 output-extension = Fa
 pair-extension = En-Fa

 ### directories that contain tools and data
 #
 # moses
 moses-src-dir =
 /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
 #
 # moses binaries
 moses-bin-dir = $moses-src-dir/bin
 #
 # moses scripts
 moses-script-dir = $moses-src-dir/scripts
 #
 # directory where GIZA++/MGIZA programs resides
 external-bin-dir = $moses-src-dir/tools
 #
 # srilm
 #srilm-dir = $moses-src-dir/srilm/bin/i686
 #
 # irstlm
 irstlm-dir = /opt/tools/irstlm/bin
 #
 # randlm
 #randlm-dir = $moses-src-dir/randlm/bin
 #
 # data
 toy-data = /opt/tools/dataset/mizan

 ### basic tools
 #
 # moses decoder
 decoder = $moses-bin-dir/moses

 # conversion of phrase table into binary on-disk format
 ttable-binarizer = $moses-bin-dir/processPhraseTable

 # conversion of rule table into binary on-disk format
 #ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 5 100 2

 # tokenizers - comment out if all your data is already tokenized
 input-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l
 $input-extension
 output-tokenizer = $moses-script-dir/tokenizer/tokenizer.perl -a -l
 $output-extension

 # truecasers - comment out if you do not use the truecaser
 input-truecaser = $moses-script-dir/recaser/truecase.perl
 output-truecaser = $moses-script-dir/recaser/truecase.perl
 detruecaser = $moses-script-dir/recaser/detruecase.perl

 ### generic parallelizer for cluster and multi-core machines
 # you may specify a script that allows the parallel execution
 # parallizable steps (see meta file). you also need specify
 # the number of jobs (cluster) or cores (multicore)
 #
 #generic-parallelizer =
 $moses-script-dir/ems/support/generic-parallelizer.perl
 #generic-parallelizer =
 $moses-script-dir/ems/support/generic-multicore-parallelizer.perl

 ### cluster settings (if run on a cluster machine)
 # number of jobs to be submitted in parallel
 #
 #jobs = 10

 # arguments to qsub when scheduling a job
 #qsub-settings = 

 # project for priviledges and usage accounting
 #qsub-project = iccs_smt

 # memory and time
 #qsub-memory = 4
 #qsub-hours = 48

 ### multi-core settings
 # when the generic parallelizer is used, the number of cores
 # specified here
 cores = 8

 #
 # PARALLEL CORPUS PREPARATION:
 # create a tokenized, sentence-aligned corpus, ready for training

 [CORPUS]

 ### long sentences are filtered out, since they slow down GIZA++
 # and are a less reliable source of data. set here the maximum
 # length of a sentence
 #
 max-sentence-length = 80

 [CORPUS:toy]

 ### command to run to get raw corpus files
 #
 # get-corpus-script =

 ### raw corpus files (untokenized, but sentence aligned)
 #
 raw-stem = $toy-data/M_Tr

 ### tokenized corpus files (may contain long sentences)
 #
 #tokenized-stem =

 ### if sentence filtering should be skipped,
 # point to the clean training data
 #
 #clean-stem =

 ### if corpus preparation should be skipped,
 # point to the prepared training data
 #
 #lowercased-stem =

 #
 # LANGUAGE MODEL TRAINING

 [LM]

 ### tool to be used for language model training
 # srilm
 #lm-training = $srilm-dir/ngram-count
 #settings = -interpolate -kndiscount -unk

 # irstlm training
 # msb = modified kneser ney; p=0 no singleton pruning
 #lm-training = $moses-script-dir/generic/trainlm-irst2.perl -cores
 $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp
 #settings = -s msb -p 0

 # order of the language model
 order = 5

 ### tool to be used for training randomized language model from scratch
 # (more commonly, a SRILM is trained)
 #
 #rlm-training = $randlm-dir/buildlm -falsepos 8 -values 8

 ### script to use for binary table format for irstlm or kenlm
 # (default: no binarization)

 # irstlm
 #lm-binarizer = $irstlm-dir/compile-lm

 # kenlm, also set type to 8

Re: [Moses-support] error during testing

2013-12-07 Thread amir haghighi

Hi,

I have also the same problem in evaluation step with EMS and I would be
thankful if you could help me.
the lexical reordering file is emtpy and the log of the output in
evaluation_test_filter.2.stderr is:

Using SCRIPTS_ROOTDIR:
/opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/scripts
(9) create moses.ini @ Sat Dec  7 04:50:15 PST 2013
Executing: mkdir -p /opt/tools/workingEms/evaluation/test.filtered.2
Considering factor 0
Considering factor 0
filtering /opt/tools/workingEms/model/phrase-table.2 -
/opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1...
0 of 2197240 phrases pairs used (0.00%) - note: max length 10
binarizing...cat
/opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1 |
LC_ALL=C sort -T /opt/tools/workingEms/evaluation/test.filtered.2 |
/opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processPhraseTable
-ttable 0 0 - -nscores 5 -out
/opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1
processing ptree for stdin
Segmentation fault (core dumped)
filtering
/opt/tools/workingEms/model/reordering-table.2.wbe-msd-bidirectional-fe.gz
-
/opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe...
0 of 2197240 phrases pairs used (0.00%) - note: max length 10
binarizing.../opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processLexicalTable
-in
/opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
-out
/opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
processLexicalTable v0.1 by Konrad Rawlik
processing
/opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
to
/opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe.*
ERROR: empty lexicalised reordering file



Barry Haddow bhaddow@... writes:


 Hi Irene

   But the output is empty. And the errors are 1. segmentation fault
 2. error: empty lexicalized
   reordering file

 Is this lexicalised reordering file empty then?

 It would be helpful if you could post the full log of the output when
 your run the filter command,

 cheers - Barry

 On 26/10/12 17:59, Irene Huang wrote:
  Hi, I have trained and tuned the model, now I am using
 
   ~/mosesdecoder/scripts/training/filter-model-given-input.pl
  http://filter-model-given-input.pl filtered-newstest2011
  mert-work/moses.ini ~/corpus/newstest2011.true.fr
  http://newstest2011.true.fr  \
-Binarizer ~/mosesdecoder/bin/processPhraseTable
 
  to filter the phrase table.
 
  But the output is empty. And the errors are 1. segmentation fault
  2. error: empty lexicalized reordering file
 
  So does this mean it's out of memory error?
 
  Thanks
 
 
  ___
  Moses-support mailing list
  Moses-support@...
  http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] (no subject)

2013-12-02 Thread amir haghighi

hello all,

I want to run EMS on my ubuntu system but it got the following error:

/graph.0.png' @ error/convert.c/ConvertImageCommand/3127.
convert: no decode delegate for this image format
`/tmp/magick-22773fbzHNDTaMj071' @ error/constitute.c/ReadImage/552.
convert: Postscript delegate failed `steps/0/graph.0.ps': No such file or
directory @ error/ps.c/ReadPSImage/837.
convert: no images defined `steps/0/graph.0.png' @
error/convert.c/ConvertImageCommand/3127.

I have already installed Imagemagick on mu system.



Regards
Amir
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

38 matches

Mail list logo