Re: [Moses-support] decoding a confusion network using Moses' API

2012-05-03 Thread Sylvain Raybaud
Hi all

So, here is the answer. To extract the string from the Hypothesis
object, I used to use just this method:

Hypothesis::GetTargetPhraseStringRep();

for some reason, it seems to work when translating a string but not when
translating a CN or a lattice. I now use the following function
(inspired by what I found in mosesserver.cpp):

string moses_get_hyp(const Hypothesis* hypo) {

string current();

Phrase p = hypo-GetCurrTargetPhrase();
for (size_t pos = 0 ; posp.GetSize() ; pos++) {
const Factor *factor = p.GetFactor(pos, 0);
current += factor-GetString()+string( );
}

const Hypothesis * prev = hypo-GetPrevHypo();
if(prev != NULL)
return moses_get_hyp(prev)+string( )+current;
return current;
}

I must confess that I don't really understand what I'm doing :( I'm just
copying code that works, and, well, that works.

cheers,

Sylvain

On 27/04/12 13:11, Sylvain Raybaud wrote:
 Hi Barrow
 
 By adding
 cerr  [S2TT] GOT TRANSLATION:   *hypo  endl;
 
 I was able to determine that the translation that are actually generated
 look reasonable. The problem therefore lays in how I extract it from the
 hypo object. I think I'll be able to find the problem. I'll let the
 list know.
 
 thanks for the help!
 
 cheers,
 
 Sylvain
 
 On 26/04/12 17:54, Barry Haddow wrote:
 Hi Sylvain

 I think ProcessSentence() is the right method to call. If you look at moses 
 server then you'll see a less cluttered example of how to use the Moses api. 
 It may be your  moses_get_hyp() is not back-tracking through the hypothesis 
 correctly.

 Note that you are calling  UntransformScore() which probably explains your 
 odd 
 translation score. It doesn't make much sense to do this, as you won't get a 
 probability (it's not normalised). It is unusual though, that you appear to 
 have a positive translation score (in log space).

 If you increase the verbosity of moses (to 2 or 3) you'll get a better idea 
 what it is doing, and you can see whether it really is producing of as the 
 translation, and why.

 cheers - Barry

 On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote:
 wild guessing here: in TranslationTask::Run, I see there are many
 alternatives for processing the sentence, like doLatticeMBR etc, not
 just runing Manager::ProcessSentence()
 Maybe one of these alternatives must be run for processing confusion
 networks?

 cheers

 Sylvain

 On 26/04/12 15:53, Sylvain Raybaud wrote:
 Hi Barrow

   Thanks for the tip, that sounds likely indeed. I'll try it again but
 last time I ran the software through valgrind, I got so many errors in
 external libs that I just gave up.

 In the meantime, here is the complete fonction that handles the
 decoding, in case someone sees something obviously wrong in here...

 static void moses_translate_phonemes(manager_data_t * pool,
 translation_pair_t * pair) {
 debug(starting);

 const TranslationSystem system =
 StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
 /* there is only one translation system for now */
 const StaticData staticData = StaticData::Instance();
 const vectorFactorType inputFactorOrder =
 staticData.GetInputFactorOrder();

 MyConfusionNet * cn =
 phonemes_to_cn(pool-mp_engine-phonemes_cm,pair-source-phonemes,pool-
 mp_config-cn_width,pool-mp_config-cn_thresh,inputFactorOrder);

 Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
 system);
 manager-ProcessSentence();
 const Hypothesis* hypo = manager-GetBestHypothesis();

 string hyp = moses_get_hyp(hypo);
 char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
 strcpy(hyp_ret,hyp.c_str());

 pair-translation_score = UntransformScore(hypo-GetScore());
 translation_pair_set_target(pair, hyp_ret,NULL);

 delete manager;
 delete cn;

 }

 cheers,

 Sylvain

 On 26/04/12 13:49, Barry Haddow wrote:
 Hi Sylvain

 I'm not familiar with this part of the code, but the strange score
 suggests that there's some uninitialised memory. You could try running
 through valgrind and it might give some clues,

 cheers - Barry

 On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
 Hi all

   I'm using Moses API for decoding a confusion network. The CN is
 created from the output of an ASR engine and a confusion matrix. More
 precisely (even though it's probably irrelevant to my problem), the ASR
 engine provides a string of phonemes (1-best) and the confusion matrix
 provides alternatives for each phonemes (the idea was described in
 Jiang et al., _Phonetic representation based speech translation_, MT
 Summit XIII, 2011).

 When the CN is dumped into a file and I use
 moses -f moses.phonemes.cn.ini  CN
 to decode it, everything is fine.

 But when I use Moses API (loading the same configuration file), I get
 incomplete translations, like:

 ASR output (French): nous font sont toujours chimistes plume
 rassembleront ch je trouve que le office de ce tout de suite
 

Re: [Moses-support] decoding a confusion network using Moses' API

2012-04-26 Thread Sylvain Raybaud
wild guessing here: in TranslationTask::Run, I see there are many
alternatives for processing the sentence, like doLatticeMBR etc, not
just runing Manager::ProcessSentence()
Maybe one of these alternatives must be run for processing confusion
networks?

cheers

Sylvain


On 26/04/12 15:53, Sylvain Raybaud wrote:
 Hi Barrow
 
   Thanks for the tip, that sounds likely indeed. I'll try it again but
 last time I ran the software through valgrind, I got so many errors in
 external libs that I just gave up.
 
 In the meantime, here is the complete fonction that handles the
 decoding, in case someone sees something obviously wrong in here...
 
 static void moses_translate_phonemes(manager_data_t * pool,
 translation_pair_t * pair) {
 debug(starting);
 
 const TranslationSystem system =
 StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
 /* there is only one translation system for now */
 const StaticData staticData = StaticData::Instance();
 const vectorFactorType inputFactorOrder =
 staticData.GetInputFactorOrder();
 
 MyConfusionNet * cn =
 phonemes_to_cn(pool-mp_engine-phonemes_cm,pair-source-phonemes,pool-mp_config-cn_width,pool-mp_config-cn_thresh,inputFactorOrder);
 
 Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
 system);
 manager-ProcessSentence();
 const Hypothesis* hypo = manager-GetBestHypothesis();
 
 string hyp = moses_get_hyp(hypo);
 char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
 strcpy(hyp_ret,hyp.c_str());
 
 pair-translation_score = UntransformScore(hypo-GetScore());
 translation_pair_set_target(pair, hyp_ret,NULL);
 
 delete manager;
 delete cn;
 
 }
 
 cheers,
 
 Sylvain
 
 On 26/04/12 13:49, Barry Haddow wrote:
 Hi Sylvain

 I'm not familiar with this part of the code, but the strange score suggests 
 that there's some uninitialised memory. You could try running through 
 valgrind 
 and it might give some clues,

 cheers - Barry

 On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
 Hi all

   I'm using Moses API for decoding a confusion network. The CN is
 created from the output of an ASR engine and a confusion matrix. More
 precisely (even though it's probably irrelevant to my problem), the ASR
 engine provides a string of phonemes (1-best) and the confusion matrix
 provides alternatives for each phonemes (the idea was described in Jiang
 et al., _Phonetic representation based speech translation_, MT Summit
 XIII, 2011).

 When the CN is dumped into a file and I use
 moses -f moses.phonemes.cn.ini  CN
 to decode it, everything is fine.

 But when I use Moses API (loading the same configuration file), I get
 incomplete translations, like:

 ASR output (French): nous font sont toujours chimistes plume
 rassembleront ch je trouve que le office de ce tout de suite
 Phonetic representation: n u f on s on t t u ge u r ch i m i s t z p l
 y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
 swa s swa t u d s h i t
 Translation: of
 score: 903011968.00

 Note that the transcription is poor (I haven't really tuned the ASR
 engine), but still, the translation ought to be more than just of.
 Sometimes it's several words, I guess it's a phrase in the phrase table.
 The word generally seems to be the translation of a word in the source
 sentence.
 When I use moses on command line to translate either the 1-best or the
 the CN, I get a reasonable translation. When I use the API to translate
 the 1-best phonetic representation, I also get a reasonable translation.
 I think the CN object is created correctly because moses loads it and
 prints it prior to decoding (this is normal verbose behavior). I also
 tried to create a PCN object, and got exactly the same results. So I
 guess the problem is either how I tell moses to decode it or how I
 extract the result from the Hypothesis object. But I'm clueless about
 what's the problem is here, since the code is working when I just
 translate a string. The translation score seems ridiculously high too.
 I'll give below the corresponding code.

 Decoding and hypothesis extraction:
 ***
 [...]
 Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
 system);
 manager-ProcessSentence();
 const Hypothesis* hypo = manager-GetBestHypothesis();
 string hyp = moses_get_hyp(hypo);
 [...]
 pair-translation_score = UntransformScore(hypo-GetScore());
 [...]

 string moses_get_hyp(const Hypothesis* hypo) {
 return hypo-GetTargetPhraseStringRep();
 }


 Creation of the CN:
 ***

 /** new class derived from ConfusionNet, with a new method for directly
 creating CN */
 class MyConfusionNet : public ConfusionNet {
   public:
 void addCol(Column);
 };

 void MyConfusionNet::addCol(Column col) {
 data.push_back(col);
 }

 /** create a column of the CN */
 static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
 cm, const char * ph, int