Hi Sylvain

I'm not familiar with this part of the code, but the strange score suggests 
that there's some uninitialised memory. You could try running through valgrind 
and it might give some clues,

cheers - Barry

On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
> Hi all
> 
>   I'm using Moses API for decoding a confusion network. The CN is
> created from the output of an ASR engine and a confusion matrix. More
> precisely (even though it's probably irrelevant to my problem), the ASR
> engine provides a string of phonemes (1-best) and the confusion matrix
> provides alternatives for each phonemes (the idea was described in Jiang
> et al., _Phonetic representation based speech translation_, MT Summit
> XIII, 2011).
> 
> When the CN is dumped into a file and I use
> moses -f moses.phonemes.cn.ini < CN
> to decode it, everything is fine.
> 
> But when I use Moses API (loading the same configuration file), I get
> incomplete translations, like:
> 
> ASR output (French): "nous font sont toujours chimistes plume
> rassembleront ch je trouve que le office de ce tout de suite"
> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
> swa s swa t u d s h i t"
> Translation: "of"
> score: 903011968.000000
> 
> Note that the transcription is poor (I haven't really tuned the ASR
> engine), but still, the translation ought to be more than just "of".
> Sometimes it's several words, I guess it's a phrase in the phrase table.
> The word generally seems to be the translation of a word in the source
> sentence.
> When I use moses on command line to translate either the 1-best or the
> the CN, I get a reasonable translation. When I use the API to translate
> the 1-best phonetic representation, I also get a reasonable translation.
> I think the CN object is created correctly because moses loads it and
> prints it prior to decoding (this is normal verbose behavior). I also
> tried to create a PCN object, and got exactly the same results. So I
> guess the problem is either how I tell moses to decode it or how I
> extract the result from the Hypothesis object. But I'm clueless about
> what's the problem is here, since the code is working when I just
> translate a string. The translation score seems ridiculously high too.
> I'll give below the corresponding code.
> 
> Decoding and hypothesis extraction:
> ***********************************
> [...]
> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
> &system);
> manager->ProcessSentence();
> const Hypothesis* hypo = manager->GetBestHypothesis();
> string hyp = moses_get_hyp(hypo);
> [...]
> pair->translation_score = UntransformScore(hypo->GetScore());
> [...]
> 
> string moses_get_hyp(const Hypothesis* hypo) {
>     return hypo->GetTargetPhraseStringRep();
> }
> 
> 
> Creation of the CN:
> *******************
> 
> /** new class derived from ConfusionNet, with a new method for directly
> creating CN */
> class MyConfusionNet : public ConfusionNet {
>   public:
>     void addCol(Column);
> };
> 
> void MyConfusionNet::addCol(Column col) {
>     data.push_back(col);
> }
> 
> /** create a column of the CN */
> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
> cm, const char * ph, int width, double thresh, const vector<FactorType>
> &factor_order) {
> 
>     MyConfusionNet::Column col;
> 
>     phoneme_conf_t * ph_conf =
> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
>     if(ph_conf==NULL) {
>         return col;
>     }
> 
>     int i;
>     for(i = 0; i<cm->n_phonemes; i++) {
>         vector<float> scores;
>         float score = float(ph_conf[i].p);
>         if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
>             string wd(cm->phonemes[ph_conf[i].phoneme]);
>             Word word;
>             word.CreateFromString(Input,factor_order,wd,false);
>             scores.push_back(score);
>             pair<Word,vector<float> > linkdata(word,scores);
>             col.push_back(linkdata);
>         }
>     }
> 
>     return col;
> }
> 
> /** Creates a confusion network from a NULL terminated phonemes list and
> a phonemes confusion matrix */
> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
> char ** phonemes, int width, double thresh, const vector<FactorType>
> &factor_order) {
>     debug("start");
> 
>     MyConfusionNet * cn = new MyConfusionNet();
> 
>     int i = 0;
>     while(phonemes[i]!=NULL) {
>         debug("%s",phonemes[i]);
>         MyConfusionNet::Column col =
> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
>         cn->addCol(col);
>         i += 1;
>     }
> 
>     return cn;
> }
> 
> So, if anyone has an idea about what's wrong here.... thanks!
> 
> cheers,
> 
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to