Hi all

So, here is the answer. To extract the string from the Hypothesis
object, I used to use just this method:

Hypothesis::GetTargetPhraseStringRep();

for some reason, it seems to work when translating a string but not when
translating a CN or a lattice. I now use the following function
(inspired by what I found in mosesserver.cpp):

string moses_get_hyp(const Hypothesis* hypo) {

    string current("");

    Phrase p = hypo->GetCurrTargetPhrase();
    for (size_t pos = 0 ; pos<p.GetSize() ; pos++) {
        const Factor *factor = p.GetFactor(pos, 0);
        current += factor->GetString()+string(" ");
    }

    const Hypothesis * prev = hypo->GetPrevHypo();
    if(prev != NULL)
        return moses_get_hyp(prev)+string(" ")+current;
    return current;
}

I must confess that I don't really understand what I'm doing :( I'm just
copying code that works, and, well, that works.

cheers,

Sylvain

On 27/04/12 13:11, Sylvain Raybaud wrote:
> Hi Barrow
> 
> By adding
> cerr << "[S2TT] GOT TRANSLATION: " << *hypo << endl;
> 
> I was able to determine that the translation that are actually generated
> look reasonable. The problem therefore lays in how I extract it from the
> "hypo" object. I think I'll be able to find the problem. I'll let the
> list know.
> 
> thanks for the help!
> 
> cheers,
> 
> Sylvain
> 
> On 26/04/12 17:54, Barry Haddow wrote:
>> Hi Sylvain
>>
>> I think ProcessSentence() is the right method to call. If you look at moses 
>> server then you'll see a less cluttered example of how to use the Moses api. 
>> It may be your  moses_get_hyp() is not back-tracking through the hypothesis 
>> correctly.
>>
>> Note that you are calling  UntransformScore() which probably explains your 
>> odd 
>> translation score. It doesn't make much sense to do this, as you won't get a 
>> probability (it's not normalised). It is unusual though, that you appear to 
>> have a positive translation score (in log space).
>>
>> If you increase the verbosity of moses (to 2 or 3) you'll get a better idea 
>> what it is doing, and you can see whether it really is producing "of" as the 
>> translation, and why.
>>
>> cheers - Barry
>>
>> On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote:
>>> wild guessing here: in TranslationTask::Run, I see there are many
>>> alternatives for processing the sentence, like doLatticeMBR etc, not
>>> just runing Manager::ProcessSentence()
>>> Maybe one of these alternatives must be run for processing confusion
>>> networks?
>>>
>>> cheers
>>>
>>> Sylvain
>>>
>>> On 26/04/12 15:53, Sylvain Raybaud wrote:
>>>> Hi Barrow
>>>>
>>>>   Thanks for the tip, that sounds likely indeed. I'll try it again but
>>>> last time I ran the software through valgrind, I got so many errors in
>>>> external libs that I just gave up.
>>>>
>>>> In the meantime, here is the complete fonction that handles the
>>>> decoding, in case someone sees something obviously wrong in here...
>>>>
>>>> static void moses_translate_phonemes(manager_data_t * pool,
>>>> translation_pair_t * pair) {
>>>>     debug("starting");
>>>>
>>>>     const TranslationSystem& system =
>>>> StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
>>>> /* there is only one translation system for now */
>>>>     const StaticData &staticData = StaticData::Instance();
>>>>     const vector<FactorType> &inputFactorOrder =
>>>> staticData.GetInputFactorOrder();
>>>>
>>>>     MyConfusionNet * cn =
>>>> phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool->
>>>> mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder);
>>>>
>>>>     Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
>>>> &system);
>>>>     manager->ProcessSentence();
>>>>     const Hypothesis* hypo = manager->GetBestHypothesis();
>>>>
>>>>     string hyp = moses_get_hyp(hypo);
>>>>     char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
>>>>     strcpy(hyp_ret,hyp.c_str());
>>>>
>>>>     pair->translation_score = UntransformScore(hypo->GetScore());
>>>>     translation_pair_set_target(pair, hyp_ret,NULL);
>>>>
>>>>     delete manager;
>>>>     delete cn;
>>>>
>>>> }
>>>>
>>>> cheers,
>>>>
>>>> Sylvain
>>>>
>>>> On 26/04/12 13:49, Barry Haddow wrote:
>>>>> Hi Sylvain
>>>>>
>>>>> I'm not familiar with this part of the code, but the strange score
>>>>> suggests that there's some uninitialised memory. You could try running
>>>>> through valgrind and it might give some clues,
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
>>>>>> Hi all
>>>>>>
>>>>>>   I'm using Moses API for decoding a confusion network. The CN is
>>>>>> created from the output of an ASR engine and a confusion matrix. More
>>>>>> precisely (even though it's probably irrelevant to my problem), the ASR
>>>>>> engine provides a string of phonemes (1-best) and the confusion matrix
>>>>>> provides alternatives for each phonemes (the idea was described in
>>>>>> Jiang et al., _Phonetic representation based speech translation_, MT
>>>>>> Summit XIII, 2011).
>>>>>>
>>>>>> When the CN is dumped into a file and I use
>>>>>> moses -f moses.phonemes.cn.ini < CN
>>>>>> to decode it, everything is fine.
>>>>>>
>>>>>> But when I use Moses API (loading the same configuration file), I get
>>>>>> incomplete translations, like:
>>>>>>
>>>>>> ASR output (French): "nous font sont toujours chimistes plume
>>>>>> rassembleront ch je trouve que le office de ce tout de suite"
>>>>>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
>>>>>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
>>>>>> swa s swa t u d s h i t"
>>>>>> Translation: "of"
>>>>>> score: 903011968.000000
>>>>>>
>>>>>> Note that the transcription is poor (I haven't really tuned the ASR
>>>>>> engine), but still, the translation ought to be more than just "of".
>>>>>> Sometimes it's several words, I guess it's a phrase in the phrase
>>>>>> table. The word generally seems to be the translation of a word in the
>>>>>> source sentence.
>>>>>> When I use moses on command line to translate either the 1-best or the
>>>>>> the CN, I get a reasonable translation. When I use the API to translate
>>>>>> the 1-best phonetic representation, I also get a reasonable
>>>>>> translation. I think the CN object is created correctly because moses
>>>>>> loads it and prints it prior to decoding (this is normal verbose
>>>>>> behavior). I also tried to create a PCN object, and got exactly the
>>>>>> same results. So I guess the problem is either how I tell moses to
>>>>>> decode it or how I extract the result from the Hypothesis object. But
>>>>>> I'm clueless about what's the problem is here, since the code is
>>>>>> working when I just translate a string. The translation score seems
>>>>>> ridiculously high too. I'll give below the corresponding code.
>>>>>>
>>>>>> Decoding and hypothesis extraction:
>>>>>> ***********************************
>>>>>> [...]
>>>>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
>>>>>> &system);
>>>>>> manager->ProcessSentence();
>>>>>> const Hypothesis* hypo = manager->GetBestHypothesis();
>>>>>> string hyp = moses_get_hyp(hypo);
>>>>>> [...]
>>>>>> pair->translation_score = UntransformScore(hypo->GetScore());
>>>>>> [...]
>>>>>>
>>>>>> string moses_get_hyp(const Hypothesis* hypo) {
>>>>>>     return hypo->GetTargetPhraseStringRep();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Creation of the CN:
>>>>>> *******************
>>>>>>
>>>>>> /** new class derived from ConfusionNet, with a new method for directly
>>>>>> creating CN */
>>>>>> class MyConfusionNet : public ConfusionNet {
>>>>>>   public:
>>>>>>     void addCol(Column);
>>>>>> };
>>>>>>
>>>>>> void MyConfusionNet::addCol(Column col) {
>>>>>>     data.push_back(col);
>>>>>> }
>>>>>>
>>>>>> /** create a column of the CN */
>>>>>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
>>>>>> cm, const char * ph, int width, double thresh, const vector<FactorType>
>>>>>> &factor_order) {
>>>>>>
>>>>>>     MyConfusionNet::Column col;
>>>>>>
>>>>>>     phoneme_conf_t * ph_conf =
>>>>>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
>>>>>>     if(ph_conf==NULL) {
>>>>>>         return col;
>>>>>>     }
>>>>>>
>>>>>>     int i;
>>>>>>     for(i = 0; i<cm->n_phonemes; i++) {
>>>>>>         vector<float> scores;
>>>>>>         float score = float(ph_conf[i].p);
>>>>>>         if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
>>>>>>             string wd(cm->phonemes[ph_conf[i].phoneme]);
>>>>>>             Word word;
>>>>>>             word.CreateFromString(Input,factor_order,wd,false);
>>>>>>             scores.push_back(score);
>>>>>>             pair<Word,vector<float> > linkdata(word,scores);
>>>>>>             col.push_back(linkdata);
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     return col;
>>>>>> }
>>>>>>
>>>>>> /** Creates a confusion network from a NULL terminated phonemes list
>>>>>> and a phonemes confusion matrix */
>>>>>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
>>>>>> char ** phonemes, int width, double thresh, const vector<FactorType>
>>>>>> &factor_order) {
>>>>>>     debug("start");
>>>>>>
>>>>>>     MyConfusionNet * cn = new MyConfusionNet();
>>>>>>
>>>>>>     int i = 0;
>>>>>>     while(phonemes[i]!=NULL) {
>>>>>>         debug("%s",phonemes[i]);
>>>>>>         MyConfusionNet::Column col =
>>>>>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
>>>>>>         cn->addCol(col);
>>>>>>         i += 1;
>>>>>>     }
>>>>>>
>>>>>>     return cn;
>>>>>> }
>>>>>>
>>>>>> So, if anyone has an idea about what's wrong here.... thanks!
>>>>>>
>>>>>> cheers,
>>>
>>  
>> --
>> Barry Haddow
>> University of Edinburgh
>> +44 (0) 131 651 3173
>>
> 
> 


-- 
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to