Hi all

  I'm using Moses API for decoding a confusion network. The CN is
created from the output of an ASR engine and a confusion matrix. More
precisely (even though it's probably irrelevant to my problem), the ASR
engine provides a string of phonemes (1-best) and the confusion matrix
provides alternatives for each phonemes (the idea was described in Jiang
et al., _Phonetic representation based speech translation_, MT Summit
XIII, 2011).

When the CN is dumped into a file and I use
moses -f moses.phonemes.cn.ini < CN
to decode it, everything is fine.

But when I use Moses API (loading the same configuration file), I get
incomplete translations, like:

ASR output (French): "nous font sont toujours chimistes plume
rassembleront ch je trouve que le office de ce tout de suite"
Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
swa s swa t u d s h i t"
Translation: "of"
score: 903011968.000000

Note that the transcription is poor (I haven't really tuned the ASR
engine), but still, the translation ought to be more than just "of".
Sometimes it's several words, I guess it's a phrase in the phrase table.
The word generally seems to be the translation of a word in the source
sentence.
When I use moses on command line to translate either the 1-best or the
the CN, I get a reasonable translation. When I use the API to translate
the 1-best phonetic representation, I also get a reasonable translation.
I think the CN object is created correctly because moses loads it and
prints it prior to decoding (this is normal verbose behavior). I also
tried to create a PCN object, and got exactly the same results. So I
guess the problem is either how I tell moses to decode it or how I
extract the result from the Hypothesis object. But I'm clueless about
what's the problem is here, since the code is working when I just
translate a string. The translation score seems ridiculously high too.
I'll give below the corresponding code.

Decoding and hypothesis extraction:
***********************************
[...]
Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
&system);
manager->ProcessSentence();
const Hypothesis* hypo = manager->GetBestHypothesis();
string hyp = moses_get_hyp(hypo);
[...]
pair->translation_score = UntransformScore(hypo->GetScore());
[...]

string moses_get_hyp(const Hypothesis* hypo) {
    return hypo->GetTargetPhraseStringRep();
}


Creation of the CN:
*******************

/** new class derived from ConfusionNet, with a new method for directly
creating CN */
class MyConfusionNet : public ConfusionNet {
  public:
    void addCol(Column);
};

void MyConfusionNet::addCol(Column col) {
    data.push_back(col);
}

/** create a column of the CN */
static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
cm, const char * ph, int width, double thresh, const vector<FactorType>
&factor_order) {

    MyConfusionNet::Column col;

    phoneme_conf_t * ph_conf =
(phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
    if(ph_conf==NULL) {
        return col;
    }

    int i;
    for(i = 0; i<cm->n_phonemes; i++) {
        vector<float> scores;
        float score = float(ph_conf[i].p);
        if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
            string wd(cm->phonemes[ph_conf[i].phoneme]);
            Word word;
            word.CreateFromString(Input,factor_order,wd,false);
            scores.push_back(score);
            pair<Word,vector<float> > linkdata(word,scores);
            col.push_back(linkdata);
        }
    }

    return col;
}

/** Creates a confusion network from a NULL terminated phonemes list and
a phonemes confusion matrix */
static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
char ** phonemes, int width, double thresh, const vector<FactorType>
&factor_order) {
    debug("start");

    MyConfusionNet * cn = new MyConfusionNet();

    int i = 0;
    while(phonemes[i]!=NULL) {
        debug("%s",phonemes[i]);
        MyConfusionNet::Column col =
create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
        cn->addCol(col);
        i += 1;
    }

    return cn;
}

So, if anyone has an idea about what's wrong here.... thanks!

cheers,

-- 
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to