Thank you Hieu, The corpus is utf8, but there is a double space in this line. are double spaces regarded as a word? should I remove double spaces from the lines manually to get the correct sentence's length?
On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang <hieuho...@gmail.com> wrote: > > On 20/01/2014 13:45, amir haghighi wrote: > > Hello > > I've some questions about the giza word alignment. > > 1-where is the final alignment file?Is it the aligned.1.grow.... in the > model folder? > > yes. > > > 2-do indexes of the words of both target and source sentences start from > 0? > > yes > > > 3- how does giza calculate the length of a sentence? > > the number of words > > I have a sentence with 11 tokens that are separated with space, but in > the alignment file it length is 13. > > strange. Are you sure your corpus file is encoded as UTF8? Are there > double spaces in the line? > > > Regards > Amir > > > > _______________________________________________ > Moses-support mailing > listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support > > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support