I am working on texts extracted from docbook (XML) files where custom HTML
entities are often used in place of proper nouns, mainly for the name of
software applications.

I would like the en-gl pair to be able to handle it, to fix the following
translation error:
Source string: The &dolphin; Handbook
Expected translation: O manual de &dolphin;
Actual translation: O &dolphin; Manual

So I would like Apertium to interpret custom HTML entities (I can write a
regular expression to capture any HTML entity, and another one to tell
standard HTML entities apart from custom HTML entities) as proper nouns.

But I do not know where to start. The following questions come to mind:
- Compartmentalization:
    - Should I create a separate mode for KDE documentation (e.g.
en-gl-kdedoc) to implement this non-standard feature, since it is specific
to this use case?
    - Should I apply the improvement to the mainstream en-gl mode, since
the improvement can fix broken translations and it can only break
translations that are already broken?
- Implementation:
    - Should I implement an alternative to html-noenv? (I do not think so
because it is not really part of the format, it is part of the sentence)
    - Should I use apertium-pn-recogniser, possibly extending it first to
support regular expressions if it does not support them? (described in
http://goo.gl/ct01K1, I am afraid to do so, it seems to be dead since 2009)
    - Should I do something else?

Any feedback is more than welcome.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to