wild guessing here: in TranslationTask::Run, I see there are many alternatives for processing the sentence, like doLatticeMBR etc, not just runing Manager::ProcessSentence() Maybe one of these alternatives must be run for processing confusion networks?
cheers Sylvain On 26/04/12 15:53, Sylvain Raybaud wrote: > Hi Barrow > > Thanks for the tip, that sounds likely indeed. I'll try it again but > last time I ran the software through valgrind, I got so many errors in > external libs that I just gave up. > > In the meantime, here is the complete fonction that handles the > decoding, in case someone sees something obviously wrong in here... > > static void moses_translate_phonemes(manager_data_t * pool, > translation_pair_t * pair) { > debug("starting"); > > const TranslationSystem& system = > StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT); > /* there is only one translation system for now */ > const StaticData &staticData = StaticData::Instance(); > const vector<FactorType> &inputFactorOrder = > staticData.GetInputFactorOrder(); > > MyConfusionNet * cn = > phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool->mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder); > > Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), > &system); > manager->ProcessSentence(); > const Hypothesis* hypo = manager->GetBestHypothesis(); > > string hyp = moses_get_hyp(hypo); > char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char)); > strcpy(hyp_ret,hyp.c_str()); > > pair->translation_score = UntransformScore(hypo->GetScore()); > translation_pair_set_target(pair, hyp_ret,NULL); > > delete manager; > delete cn; > > } > > cheers, > > Sylvain > > On 26/04/12 13:49, Barry Haddow wrote: >> Hi Sylvain >> >> I'm not familiar with this part of the code, but the strange score suggests >> that there's some uninitialised memory. You could try running through >> valgrind >> and it might give some clues, >> >> cheers - Barry >> >> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote: >>> Hi all >>> >>> I'm using Moses API for decoding a confusion network. The CN is >>> created from the output of an ASR engine and a confusion matrix. More >>> precisely (even though it's probably irrelevant to my problem), the ASR >>> engine provides a string of phonemes (1-best) and the confusion matrix >>> provides alternatives for each phonemes (the idea was described in Jiang >>> et al., _Phonetic representation based speech translation_, MT Summit >>> XIII, 2011). >>> >>> When the CN is dumped into a file and I use >>> moses -f moses.phonemes.cn.ini < CN >>> to decode it, everything is fine. >>> >>> But when I use Moses API (loading the same configuration file), I get >>> incomplete translations, like: >>> >>> ASR output (French): "nous font sont toujours chimistes plume >>> rassembleront ch je trouve que le office de ce tout de suite" >>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l >>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d >>> swa s swa t u d s h i t" >>> Translation: "of" >>> score: 903011968.000000 >>> >>> Note that the transcription is poor (I haven't really tuned the ASR >>> engine), but still, the translation ought to be more than just "of". >>> Sometimes it's several words, I guess it's a phrase in the phrase table. >>> The word generally seems to be the translation of a word in the source >>> sentence. >>> When I use moses on command line to translate either the 1-best or the >>> the CN, I get a reasonable translation. When I use the API to translate >>> the 1-best phonetic representation, I also get a reasonable translation. >>> I think the CN object is created correctly because moses loads it and >>> prints it prior to decoding (this is normal verbose behavior). I also >>> tried to create a PCN object, and got exactly the same results. So I >>> guess the problem is either how I tell moses to decode it or how I >>> extract the result from the Hypothesis object. But I'm clueless about >>> what's the problem is here, since the code is working when I just >>> translate a string. The translation score seems ridiculously high too. >>> I'll give below the corresponding code. >>> >>> Decoding and hypothesis extraction: >>> *********************************** >>> [...] >>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), >>> &system); >>> manager->ProcessSentence(); >>> const Hypothesis* hypo = manager->GetBestHypothesis(); >>> string hyp = moses_get_hyp(hypo); >>> [...] >>> pair->translation_score = UntransformScore(hypo->GetScore()); >>> [...] >>> >>> string moses_get_hyp(const Hypothesis* hypo) { >>> return hypo->GetTargetPhraseStringRep(); >>> } >>> >>> >>> Creation of the CN: >>> ******************* >>> >>> /** new class derived from ConfusionNet, with a new method for directly >>> creating CN */ >>> class MyConfusionNet : public ConfusionNet { >>> public: >>> void addCol(Column); >>> }; >>> >>> void MyConfusionNet::addCol(Column col) { >>> data.push_back(col); >>> } >>> >>> /** create a column of the CN */ >>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t * >>> cm, const char * ph, int width, double thresh, const vector<FactorType> >>> &factor_order) { >>> >>> MyConfusionNet::Column col; >>> >>> phoneme_conf_t * ph_conf = >>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph); >>> if(ph_conf==NULL) { >>> return col; >>> } >>> >>> int i; >>> for(i = 0; i<cm->n_phonemes; i++) { >>> vector<float> scores; >>> float score = float(ph_conf[i].p); >>> if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) { >>> string wd(cm->phonemes[ph_conf[i].phoneme]); >>> Word word; >>> word.CreateFromString(Input,factor_order,wd,false); >>> scores.push_back(score); >>> pair<Word,vector<float> > linkdata(word,scores); >>> col.push_back(linkdata); >>> } >>> } >>> >>> return col; >>> } >>> >>> /** Creates a confusion network from a NULL terminated phonemes list and >>> a phonemes confusion matrix */ >>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const >>> char ** phonemes, int width, double thresh, const vector<FactorType> >>> &factor_order) { >>> debug("start"); >>> >>> MyConfusionNet * cn = new MyConfusionNet(); >>> >>> int i = 0; >>> while(phonemes[i]!=NULL) { >>> debug("%s",phonemes[i]); >>> MyConfusionNet::Column col = >>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order); >>> cn->addCol(col); >>> i += 1; >>> } >>> >>> return cn; >>> } >>> >>> So, if anyone has an idea about what's wrong here.... thanks! >>> >>> cheers, >>> > > -- Sylvain Raybaud _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support