Thank you Hieu,

The corpus is utf8, but there is a double space in this line. are double
spaces regarded as a word?
should I remove double spaces from the lines manually to get the correct
sentence's length?



On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang <hieuho...@gmail.com> wrote:

>
> On 20/01/2014 13:45, amir haghighi wrote:
>
>   Hello
>
>  I've some questions about the giza word alignment.
>
>  1-where is the final alignment file?Is it the aligned.1.grow.... in the
> model folder?
>
> yes.
>
>
>  2-do indexes of the words of both target and source sentences start from
> 0?
>
> yes
>
>
>  3- how does giza calculate the length of a sentence?
>
> the number of words
>
>  I have a sentence with 11 tokens that are separated with space, but in
> the alignment file it length is 13.
>
> strange. Are you sure your corpus file is encoded as UTF8? Are there
> double spaces in the line?
>
>
>  Regards
>  Amir
>
>
>
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to