Hello everyone, the other day I had an idea that might help to solve the problem with html translation that is caused by the reordering of words/phrases in transfer. I do not know much about the transfer engine's internals, but conceptually this might work:
A deformatter would separate html tags from text and add an origin descriptor pseudo-tag to the LEMMA of each word, indicating e.g. the parent html tag Now consider the following semantic changes: 1. Superblanks encode actual blanks only. (Whitespace, tab, newline etc. and any combination thereof) 2. The lem-element of a lexical unit is a pair consisting of the lemma string and an origin descriptor (e.g. a reference to the parent DOM element in an HTML document). The origin descriptor is invisible in transfer and therefore cannot be tampered with. 3. A variable is a pair consisting of a string and an origin descriptor. 4. Assignment operations (e.g. <let>) involving a lem transparently copy the origin descriptor, unless this behavior is altered by an optional argument along the lines of <let origin="copy">...</let>, <let origin="keep-old">...</let>, <let origin="reset">...</let> The reformatter could then make sure that all words are within the right html tags and might even ensure validity using the format's DTD. This way html translation could effectively be implemented without changing the source code of existing language pairs. Let me know what you think Benedikt ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
