Re: [Apertium-stuff] Extracting knowledge from Apertium's post-edition logs

Luis Villarejo Wed, 04 Apr 2012 17:33:15 -0700

Hi Anastasija,

thank you for your interesting comments on the task.

On 3 April 2012 20:56, Anastasija Efremovska <[email protected]> wrote:

>
>
>
>
> Hello, I have few ideas about the "Extracting knowledge from Apertium's
> post-edition logs to improve translation" project.
>
> - As my firs idea of the possible solution, it would not be a problem to
> search all the different logs containing a correctly corrected word, using
> the File::Find module in Perl, which does a depth-first search of the text
> nodes, but it won't be really a "on-the fly operation". (Yet it is helpful.
> It can also be done with a breadth first search algorithm implemented in
> custom made Perl script).
>

We can think of two scenarios, a live extraction or a deferred one. The
first one would be performed on a single document, the log generated by the
translation on course. The second one would be performed on a compilation
of logs. This compilation could be addressed in many ways. A possible one
could be to store all posteditions logs in a central repository that could
be mined. The first approximation would need from an integration with the
AWI interface, the second one probably wouldn't need it.

>
> - For finding all the matches with a corrected word, the original form of
> the word would be used and every instance of the corrected word wold be
> further noted, lets say in another file. (Will this file be maintained on
> daily bases or words most frequently corrected is of further discussion).
> This would be done with regular expressions, using the properties of the
> word class.
>

Here we could probably talk about incorporating a stemming process to
maximize findings in the log files.

>
> - And last but not least, the word stored in the graphical translator
> would be chosen from the most frequently corrected words.
>

Could you tell more on this, please?

A general thing you should think of is on how to make the output of our
tool to be most useful for Apertium. I mean, how do you imagine the final
output of the tool so it can be really useful for the community?

> Should I put a specific example with created logs? Please let me know.
>

Sure, why not? it will help us to further discuss on it.

Best,
Luis

>
> Thank you in advance.
> Greetings,
>  Anastasija Efremovska
>
>
>
>
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extracting knowledge from Apertium's post-edition logs

Reply via email to