Hi,

translating DOM is a bit funky. Generally, you can probably translate block elements one by one, but you need to persist inline elements.

You should mark up the inline elements in the string that you send to the translation engine, such that you can support inline markup changing the order.

Something like

You would think the <a href="foo">funkyness</a> would <strong>rule</rule>.

could translate into

<strong>Ruling</strong would be the <a href="foo">funkyness</a>, you would think.

Are you intending to also localize tooltips and the like?

Axel


On 3/3/14, 8:28 PM, Felipe G wrote:
Hi everyone, I'm working on a feature to offer webpage translation in
Firefox. Translation involves, quite unsurprisingly, a lot of DOM and
strings manipulation. Since DOM access only happens in the main thread, it
brings the question of how to do it properly without causing jank.

This is the use case that I'm dealing with in bug 971043:

When the user decides to translate a webpage, we want to build a tree that
is a cleaned-up version of the page's DOM tree (to remove nodes that do not
contain any useful content for translation; more details in the bug for the
curious). To do this we must visit all elements and text nodes once and
decide which ones to keep and which ones to throw away.

One idea suggested is to perform the task in chunks to let the event loop
breathe in between. The problem is that the page can dynamically change and
then a tree representation of the page may no longer exist. A possible
solution to that is to only pause the page that is being translated (with,
say, EnterModalState) until we can finish working on it, while letting
other pages and the UI work normally. That sounds a reasonable option to me
but I'd like to hear opinions.

Another option exists if it's possible to make a fast copy of the whole
DOM, and then work on this snapshot'ed copy which is not live. Better yet
if we can send this copy with a non-copy move to a Worker thread. But it
brings the question if the snapshot'ing itself won't cause jank, and if the
extra memory usage for this is worth the trade-off.

Even if we properly chunk the task, it is still bounded by the size of the
strings on the page. To decide if a text node should be kept or thrown away
we need to run a regexp on it, and there's no way to pause that midway
through. And after we have our tree representation, it must be serialized
and encodeURIComponent'ed to be sent to the translation service.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to