Hi, On Mon, Sep 26, 2011 at 5:40 PM, marco turchi <marco.tur...@gmail.com> wrote: > corpus-coverage-summary and ttable-coverage-summary: > what does each column represent?
- n-gram order - number of occurrences in corpus/t-table - distinct number of phrases in test set with this number of occurrences ("type") - total number of phrases in test set with this number of occurrences ("token") For the low occurrence counts, this is reported on the web page on the top. > ttable-coverage-by-phrase: > I suppose that the second column is the number of source phrases in the tt > table where that particular phrase appears, but what is it the third column? > is the translation entropy? Yes, translation entropy based on normalized forward phrase translation probability. > input-annotation: > which information is reported after each sentence? For each span over the input sentence: - span range - count in corpus - count in ttable (number of distinct translations) - translation table entropy This is the basis of the colorful visualization over the input sentence on the web page. -phi _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support