Hello, I have few ideas about the "Extracting knowledge from Apertium's
post-edition logs to improve translation" project.
- As my firs idea of the possible solution, it would not be a problem to search
all the different logs containing a correctly corrected word, using the
File::Find module in Perl, which does a depth-first search of the text nodes,
but it won't be really a "on-the fly operation". (Yet it is helpful. It can
also be done with a breadth first search algorithm implemented in custom made
Perl script).
- For finding all the matches with a corrected word, the original form of the
word would be used and every instance of the corrected word wold be further
noted, lets say in another file. (Will this file be maintained on daily bases
or words most frequently corrected is of further discussion). This would be
done with regular expressions, using the properties of the word class.
- And last but not least, the word stored in the graphical translator would be
chosen from the most frequently corrected words.
Should I put a specific example with created logs? Please let me know.
Thank you in advance.
Greetings,
Anastasija Efremovska
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff