Hi, Saurabh (srbhr) here. I have completed a relevant coding challenge to the Automatic Post Editing Task. This Coding Challenge in general tells about the Accuracy of Apertium Translation and the Post-Edited one, by visualizing the difference based on the normalized Word Vectors of each word in a sentence. This can be further used to find the relevant errors in translations by using apt similarity or distance measuring algorithms like Cosine, WordMovers, etc. And once these errors/mis-translations are found then it can be used to produce the relevant dictionaries(mono/bi) to improve the translation. (A similar measure to compare: Google Translate vs. Human Post-Edited is also done, and code is in the repository.)
This approach doesn't use Levenshtein Distance Algorithm, rather uses Word Embeddings (GloVe is used BERT can also be used as well.) and uses t-SNE (t-Distributed Stochastic Neighbor Embedding), and Linear Discriminant Analysis(LDA) to normalize the vectors, and compute the distance. Please have a look at the Code in the following Github Repository, and I'm open to feedbacks about it. https://github.com/srbhr/Test-Edits -- Saurabh Rai, IRC Nick: srbhr New Delhi, India
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff