Hi all So, here is the answer. To extract the string from the Hypothesis object, I used to use just this method:
Hypothesis::GetTargetPhraseStringRep(); for some reason, it seems to work when translating a string but not when translating a CN or a lattice. I now use the following function (inspired by what I found in mosesserver.cpp): string moses_get_hyp(const Hypothesis* hypo) { string current(""); Phrase p = hypo->GetCurrTargetPhrase(); for (size_t pos = 0 ; pos<p.GetSize() ; pos++) { const Factor *factor = p.GetFactor(pos, 0); current += factor->GetString()+string(" "); } const Hypothesis * prev = hypo->GetPrevHypo(); if(prev != NULL) return moses_get_hyp(prev)+string(" ")+current; return current; } I must confess that I don't really understand what I'm doing :( I'm just copying code that works, and, well, that works. cheers, Sylvain On 27/04/12 13:11, Sylvain Raybaud wrote: > Hi Barrow > > By adding > cerr << "[S2TT] GOT TRANSLATION: " << *hypo << endl; > > I was able to determine that the translation that are actually generated > look reasonable. The problem therefore lays in how I extract it from the > "hypo" object. I think I'll be able to find the problem. I'll let the > list know. > > thanks for the help! > > cheers, > > Sylvain > > On 26/04/12 17:54, Barry Haddow wrote: >> Hi Sylvain >> >> I think ProcessSentence() is the right method to call. If you look at moses >> server then you'll see a less cluttered example of how to use the Moses api. >> It may be your moses_get_hyp() is not back-tracking through the hypothesis >> correctly. >> >> Note that you are calling UntransformScore() which probably explains your >> odd >> translation score. It doesn't make much sense to do this, as you won't get a >> probability (it's not normalised). It is unusual though, that you appear to >> have a positive translation score (in log space). >> >> If you increase the verbosity of moses (to 2 or 3) you'll get a better idea >> what it is doing, and you can see whether it really is producing "of" as the >> translation, and why. >> >> cheers - Barry >> >> On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote: >>> wild guessing here: in TranslationTask::Run, I see there are many >>> alternatives for processing the sentence, like doLatticeMBR etc, not >>> just runing Manager::ProcessSentence() >>> Maybe one of these alternatives must be run for processing confusion >>> networks? >>> >>> cheers >>> >>> Sylvain >>> >>> On 26/04/12 15:53, Sylvain Raybaud wrote: >>>> Hi Barrow >>>> >>>> Thanks for the tip, that sounds likely indeed. I'll try it again but >>>> last time I ran the software through valgrind, I got so many errors in >>>> external libs that I just gave up. >>>> >>>> In the meantime, here is the complete fonction that handles the >>>> decoding, in case someone sees something obviously wrong in here... >>>> >>>> static void moses_translate_phonemes(manager_data_t * pool, >>>> translation_pair_t * pair) { >>>> debug("starting"); >>>> >>>> const TranslationSystem& system = >>>> StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT); >>>> /* there is only one translation system for now */ >>>> const StaticData &staticData = StaticData::Instance(); >>>> const vector<FactorType> &inputFactorOrder = >>>> staticData.GetInputFactorOrder(); >>>> >>>> MyConfusionNet * cn = >>>> phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool-> >>>> mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder); >>>> >>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), >>>> &system); >>>> manager->ProcessSentence(); >>>> const Hypothesis* hypo = manager->GetBestHypothesis(); >>>> >>>> string hyp = moses_get_hyp(hypo); >>>> char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char)); >>>> strcpy(hyp_ret,hyp.c_str()); >>>> >>>> pair->translation_score = UntransformScore(hypo->GetScore()); >>>> translation_pair_set_target(pair, hyp_ret,NULL); >>>> >>>> delete manager; >>>> delete cn; >>>> >>>> } >>>> >>>> cheers, >>>> >>>> Sylvain >>>> >>>> On 26/04/12 13:49, Barry Haddow wrote: >>>>> Hi Sylvain >>>>> >>>>> I'm not familiar with this part of the code, but the strange score >>>>> suggests that there's some uninitialised memory. You could try running >>>>> through valgrind and it might give some clues, >>>>> >>>>> cheers - Barry >>>>> >>>>> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote: >>>>>> Hi all >>>>>> >>>>>> I'm using Moses API for decoding a confusion network. The CN is >>>>>> created from the output of an ASR engine and a confusion matrix. More >>>>>> precisely (even though it's probably irrelevant to my problem), the ASR >>>>>> engine provides a string of phonemes (1-best) and the confusion matrix >>>>>> provides alternatives for each phonemes (the idea was described in >>>>>> Jiang et al., _Phonetic representation based speech translation_, MT >>>>>> Summit XIII, 2011). >>>>>> >>>>>> When the CN is dumped into a file and I use >>>>>> moses -f moses.phonemes.cn.ini < CN >>>>>> to decode it, everything is fine. >>>>>> >>>>>> But when I use Moses API (loading the same configuration file), I get >>>>>> incomplete translations, like: >>>>>> >>>>>> ASR output (French): "nous font sont toujours chimistes plume >>>>>> rassembleront ch je trouve que le office de ce tout de suite" >>>>>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l >>>>>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d >>>>>> swa s swa t u d s h i t" >>>>>> Translation: "of" >>>>>> score: 903011968.000000 >>>>>> >>>>>> Note that the transcription is poor (I haven't really tuned the ASR >>>>>> engine), but still, the translation ought to be more than just "of". >>>>>> Sometimes it's several words, I guess it's a phrase in the phrase >>>>>> table. The word generally seems to be the translation of a word in the >>>>>> source sentence. >>>>>> When I use moses on command line to translate either the 1-best or the >>>>>> the CN, I get a reasonable translation. When I use the API to translate >>>>>> the 1-best phonetic representation, I also get a reasonable >>>>>> translation. I think the CN object is created correctly because moses >>>>>> loads it and prints it prior to decoding (this is normal verbose >>>>>> behavior). I also tried to create a PCN object, and got exactly the >>>>>> same results. So I guess the problem is either how I tell moses to >>>>>> decode it or how I extract the result from the Hypothesis object. But >>>>>> I'm clueless about what's the problem is here, since the code is >>>>>> working when I just translate a string. The translation score seems >>>>>> ridiculously high too. I'll give below the corresponding code. >>>>>> >>>>>> Decoding and hypothesis extraction: >>>>>> *********************************** >>>>>> [...] >>>>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(), >>>>>> &system); >>>>>> manager->ProcessSentence(); >>>>>> const Hypothesis* hypo = manager->GetBestHypothesis(); >>>>>> string hyp = moses_get_hyp(hypo); >>>>>> [...] >>>>>> pair->translation_score = UntransformScore(hypo->GetScore()); >>>>>> [...] >>>>>> >>>>>> string moses_get_hyp(const Hypothesis* hypo) { >>>>>> return hypo->GetTargetPhraseStringRep(); >>>>>> } >>>>>> >>>>>> >>>>>> Creation of the CN: >>>>>> ******************* >>>>>> >>>>>> /** new class derived from ConfusionNet, with a new method for directly >>>>>> creating CN */ >>>>>> class MyConfusionNet : public ConfusionNet { >>>>>> public: >>>>>> void addCol(Column); >>>>>> }; >>>>>> >>>>>> void MyConfusionNet::addCol(Column col) { >>>>>> data.push_back(col); >>>>>> } >>>>>> >>>>>> /** create a column of the CN */ >>>>>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t * >>>>>> cm, const char * ph, int width, double thresh, const vector<FactorType> >>>>>> &factor_order) { >>>>>> >>>>>> MyConfusionNet::Column col; >>>>>> >>>>>> phoneme_conf_t * ph_conf = >>>>>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph); >>>>>> if(ph_conf==NULL) { >>>>>> return col; >>>>>> } >>>>>> >>>>>> int i; >>>>>> for(i = 0; i<cm->n_phonemes; i++) { >>>>>> vector<float> scores; >>>>>> float score = float(ph_conf[i].p); >>>>>> if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) { >>>>>> string wd(cm->phonemes[ph_conf[i].phoneme]); >>>>>> Word word; >>>>>> word.CreateFromString(Input,factor_order,wd,false); >>>>>> scores.push_back(score); >>>>>> pair<Word,vector<float> > linkdata(word,scores); >>>>>> col.push_back(linkdata); >>>>>> } >>>>>> } >>>>>> >>>>>> return col; >>>>>> } >>>>>> >>>>>> /** Creates a confusion network from a NULL terminated phonemes list >>>>>> and a phonemes confusion matrix */ >>>>>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const >>>>>> char ** phonemes, int width, double thresh, const vector<FactorType> >>>>>> &factor_order) { >>>>>> debug("start"); >>>>>> >>>>>> MyConfusionNet * cn = new MyConfusionNet(); >>>>>> >>>>>> int i = 0; >>>>>> while(phonemes[i]!=NULL) { >>>>>> debug("%s",phonemes[i]); >>>>>> MyConfusionNet::Column col = >>>>>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order); >>>>>> cn->addCol(col); >>>>>> i += 1; >>>>>> } >>>>>> >>>>>> return cn; >>>>>> } >>>>>> >>>>>> So, if anyone has an idea about what's wrong here.... thanks! >>>>>> >>>>>> cheers, >>> >> >> -- >> Barry Haddow >> University of Edinburgh >> +44 (0) 131 651 3173 >> > > -- Sylvain Raybaud _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support