Re: [Moses-support] decoding a confusion network using Moses' API
Hi all So, here is the answer. To extract the string from the Hypothesis object, I used to use just this method: Hypothesis::GetTargetPhraseStringRep(); for some reason, it seems to work when translating a string but not when translating a CN or a lattice. I now use the following function (inspired by what I found in mosesserver.cpp): string moses_get_hyp(const Hypothesis* hypo) { string current(); Phrase p = hypo-GetCurrTargetPhrase(); for (size_t pos = 0 ; posp.GetSize() ; pos++) { const Factor *factor = p.GetFactor(pos, 0); current += factor-GetString()+string( ); } const Hypothesis * prev = hypo-GetPrevHypo(); if(prev != NULL) return moses_get_hyp(prev)+string( )+current; return current; } I must confess that I don't really understand what I'm doing :( I'm just copying code that works, and, well, that works. cheers, Sylvain On 27/04/12 13:11, Sylvain Raybaud wrote: Hi Barrow By adding cerr [S2TT] GOT TRANSLATION: *hypo endl; I was able to determine that the translation that are actually generated look reasonable. The problem therefore lays in how I extract it from the hypo object. I think I'll be able to find the problem. I'll let the list know. thanks for the help! cheers, Sylvain On 26/04/12 17:54, Barry Haddow wrote: Hi Sylvain I think ProcessSentence() is the right method to call. If you look at moses server then you'll see a less cluttered example of how to use the Moses api. It may be your moses_get_hyp() is not back-tracking through the hypothesis correctly. Note that you are calling UntransformScore() which probably explains your odd translation score. It doesn't make much sense to do this, as you won't get a probability (it's not normalised). It is unusual though, that you appear to have a positive translation score (in log space). If you increase the verbosity of moses (to 2 or 3) you'll get a better idea what it is doing, and you can see whether it really is producing of as the translation, and why. cheers - Barry On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote: wild guessing here: in TranslationTask::Run, I see there are many alternatives for processing the sentence, like doLatticeMBR etc, not just runing Manager::ProcessSentence() Maybe one of these alternatives must be run for processing confusion networks? cheers Sylvain On 26/04/12 15:53, Sylvain Raybaud wrote: Hi Barrow Thanks for the tip, that sounds likely indeed. I'll try it again but last time I ran the software through valgrind, I got so many errors in external libs that I just gave up. In the meantime, here is the complete fonction that handles the decoding, in case someone sees something obviously wrong in here... static void moses_translate_phonemes(manager_data_t * pool, translation_pair_t * pair) { debug(starting); const TranslationSystem system = StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT); /* there is only one translation system for now */ const StaticData staticData = StaticData::Instance(); const vectorFactorType inputFactorOrder = staticData.GetInputFactorOrder(); MyConfusionNet * cn = phonemes_to_cn(pool-mp_engine-phonemes_cm,pair-source-phonemes,pool- mp_config-cn_width,pool-mp_config-cn_thresh,inputFactorOrder); Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), system); manager-ProcessSentence(); const Hypothesis* hypo = manager-GetBestHypothesis(); string hyp = moses_get_hyp(hypo); char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char)); strcpy(hyp_ret,hyp.c_str()); pair-translation_score = UntransformScore(hypo-GetScore()); translation_pair_set_target(pair, hyp_ret,NULL); delete manager; delete cn; } cheers, Sylvain On 26/04/12 13:49, Barry Haddow wrote: Hi Sylvain I'm not familiar with this part of the code, but the strange score suggests that there's some uninitialised memory. You could try running through valgrind and it might give some clues, cheers - Barry On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote: Hi all I'm using Moses API for decoding a confusion network. The CN is created from the output of an ASR engine and a confusion matrix. More precisely (even though it's probably irrelevant to my problem), the ASR engine provides a string of phonemes (1-best) and the confusion matrix provides alternatives for each phonemes (the idea was described in Jiang et al., _Phonetic representation based speech translation_, MT Summit XIII, 2011). When the CN is dumped into a file and I use moses -f moses.phonemes.cn.ini CN to decode it, everything is fine. But when I use Moses API (loading the same configuration file), I get incomplete translations, like: ASR output (French): nous font sont toujours chimistes plume rassembleront ch je trouve que le office de ce tout de suite
Re: [Moses-support] decoding a confusion network using Moses' API
wild guessing here: in TranslationTask::Run, I see there are many alternatives for processing the sentence, like doLatticeMBR etc, not just runing Manager::ProcessSentence() Maybe one of these alternatives must be run for processing confusion networks? cheers Sylvain On 26/04/12 15:53, Sylvain Raybaud wrote: Hi Barrow Thanks for the tip, that sounds likely indeed. I'll try it again but last time I ran the software through valgrind, I got so many errors in external libs that I just gave up. In the meantime, here is the complete fonction that handles the decoding, in case someone sees something obviously wrong in here... static void moses_translate_phonemes(manager_data_t * pool, translation_pair_t * pair) { debug(starting); const TranslationSystem system = StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT); /* there is only one translation system for now */ const StaticData staticData = StaticData::Instance(); const vectorFactorType inputFactorOrder = staticData.GetInputFactorOrder(); MyConfusionNet * cn = phonemes_to_cn(pool-mp_engine-phonemes_cm,pair-source-phonemes,pool-mp_config-cn_width,pool-mp_config-cn_thresh,inputFactorOrder); Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), system); manager-ProcessSentence(); const Hypothesis* hypo = manager-GetBestHypothesis(); string hyp = moses_get_hyp(hypo); char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char)); strcpy(hyp_ret,hyp.c_str()); pair-translation_score = UntransformScore(hypo-GetScore()); translation_pair_set_target(pair, hyp_ret,NULL); delete manager; delete cn; } cheers, Sylvain On 26/04/12 13:49, Barry Haddow wrote: Hi Sylvain I'm not familiar with this part of the code, but the strange score suggests that there's some uninitialised memory. You could try running through valgrind and it might give some clues, cheers - Barry On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote: Hi all I'm using Moses API for decoding a confusion network. The CN is created from the output of an ASR engine and a confusion matrix. More precisely (even though it's probably irrelevant to my problem), the ASR engine provides a string of phonemes (1-best) and the confusion matrix provides alternatives for each phonemes (the idea was described in Jiang et al., _Phonetic representation based speech translation_, MT Summit XIII, 2011). When the CN is dumped into a file and I use moses -f moses.phonemes.cn.ini CN to decode it, everything is fine. But when I use Moses API (loading the same configuration file), I get incomplete translations, like: ASR output (French): nous font sont toujours chimistes plume rassembleront ch je trouve que le office de ce tout de suite Phonetic representation: n u f on s on t t u ge u r ch i m i s t z p l y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d swa s swa t u d s h i t Translation: of score: 903011968.00 Note that the transcription is poor (I haven't really tuned the ASR engine), but still, the translation ought to be more than just of. Sometimes it's several words, I guess it's a phrase in the phrase table. The word generally seems to be the translation of a word in the source sentence. When I use moses on command line to translate either the 1-best or the the CN, I get a reasonable translation. When I use the API to translate the 1-best phonetic representation, I also get a reasonable translation. I think the CN object is created correctly because moses loads it and prints it prior to decoding (this is normal verbose behavior). I also tried to create a PCN object, and got exactly the same results. So I guess the problem is either how I tell moses to decode it or how I extract the result from the Hypothesis object. But I'm clueless about what's the problem is here, since the code is working when I just translate a string. The translation score seems ridiculously high too. I'll give below the corresponding code. Decoding and hypothesis extraction: *** [...] Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), system); manager-ProcessSentence(); const Hypothesis* hypo = manager-GetBestHypothesis(); string hyp = moses_get_hyp(hypo); [...] pair-translation_score = UntransformScore(hypo-GetScore()); [...] string moses_get_hyp(const Hypothesis* hypo) { return hypo-GetTargetPhraseStringRep(); } Creation of the CN: *** /** new class derived from ConfusionNet, with a new method for directly creating CN */ class MyConfusionNet : public ConfusionNet { public: void addCol(Column); }; void MyConfusionNet::addCol(Column col) { data.push_back(col); } /** create a column of the CN */ static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t * cm, const char * ph, int