[Apertium-stuff] [GSOC Coding Challenge]: Automatic Postediting

Saurabh Rai Fri, 20 Mar 2020 05:27:24 -0700

Hi,
Saurabh (srbhr) here. I have completed a relevant coding challenge to the
Automatic Post Editing Task. This Coding Challenge in general tells about
the
Accuracy of Apertium Translation and the Post-Edited one, by visualizing
the difference based on the normalized Word Vectors of each word in a
sentence.
This can be further used to find the relevant errors in translations by
using apt similarity or distance measuring algorithms like Cosine,
WordMovers, etc.
And once these errors/mis-translations are found then it can be used to
produce the relevant dictionaries(mono/bi) to improve the translation.
(A similar measure to compare: Google Translate vs. Human Post-Edited is
also done, and code is in the repository.)


This approach doesn't use Levenshtein Distance Algorithm, rather uses Word
Embeddings (GloVe is used BERT can also be used as well.) and uses
t-SNE (t-Distributed Stochastic Neighbor Embedding), and Linear
Discriminant Analysis(LDA) to normalize the vectors, and compute the
distance.

Please have a look at the Code in the following Github Repository, and I'm
open to feedbacks about it.
https://github.com/srbhr/Test-Edits
-- 
Saurabh Rai,
IRC Nick: srbhr
New Delhi, India

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] [GSOC Coding Challenge]: Automatic Postediting

Reply via email to