Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dc., 9 de set. 2020 a les 11:34:
> Hey guys, > I'm writing a system demonstration to be submitted at LowResMT 2020 about > the recent project that was done as part of GSoC, titled "Translating the > internet into low resource languages with Apertium" (Accepting snazzier > title suggestions). > > As part of this demonstration, I want to show some real world examples of > how the new system of markup handling will help the translation of webpages > and formatted documents - odt, pptx, rtx, etc. To show this effectively, I > need to choose 3-4 released language pairs that are sufficiently > syntactically divergent that they show the effect of markup reordering in > the translation output. As far as I know, spa-cat is one of our most mature > pairs, however I'm not sure how syntactically divergent it is. If it is, > then I'm happy to be corrected. If your language pair has had issues with > webpage translation and those issues are now solved (ish), then some > examples would be really helpful. > > Spanish and Catalan are very similar in terms of syntax. We could definitely try to get examples of where diverge the most, but those examples would need to be completely synthetic. Markup handling helps, though, in markup handling on different areas: some formats where inline tags are common (like ODT), previous formatter/deformatter was splitting words where tags appeared, so translation of those has improved quite a lot. -- < Xavi Ivars > < http://xavi.ivars.me >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff