I just replied to this and cc'ed mozilla-editor, since it
concerns the editor code for html copy/paste.
Here's the original for those of you who didn't see it on the
Unix newsgroups.

        ...Akkana

----- Forwarded message from Toastie <[EMAIL PROTECTED]> -----

From: Toastie <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Date: Sun, 23 Sep 2001 01:48:35 +0300
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010920
Subject: HTML in Mozilla for X11 clipboard

Hi,

I'm using both Mozilla and Konqueror for my daily browsing on Linux, and 
have recently began looking into possible interoperatibily between Linux 
desktop applications. As a part of my research, I began looking at the 
information various applications export to the clipboard. Mozilla 
typically offers HTML contents in the clipboard, which could be 
succesfully used in rich text editors (HTML or not) as well in other 
office applications (e.g. pasting HTML tables into a spreadsheet).

Currently, Mozilla exports the following "data formats" for a selection 
from a web page (in UTF16 encoding):
text/html - with the contents of the selection
text/_moz_htmlcontext - with surrounding of the selection (the need for 
it is unclear to me)
text/_moz_htmlinfo - with some two integers who always seem to be 0,0 
(I'd be glad to hear an explanation of those values)

Since then, I implemented exporting of "text/html" format in KTHML 
(Konqueror's HTML engine) and noticed Mozilla's implementation got a 
thing which my simple HTML conversion of a DOM Range won't do;
a selection such as "<b>one [two] three</b>" (where "two" is selected) 
in Mozilla would result in the clipboard containing "<b>two</b>" instead 
of just "two". That's a nice feature, especially for word processors 
(e.g. Composer) where the person expects to copy the text along with the 
format.

I assume Mozilla implements this by keeping a list of formatting HTML 
tags (<b>, <i>, <font color=...> etc.) and traversing the HTML tree down 
from the selection, collecting those tags on the way.

I want to extend that feature. Instead of adding surrounding formatting 
tags to the HTML put on the clipboard, I propose adding a clipboard 
format called "text/css", which'll contain a getComputedStyle dump of 
the first textNode of the selections' DOM range. That way, we could pass 
on the actual style of the text, however complicated it might be.
Since Composer uses HTML formatting tags in it's HTML, we'll extend 
Pasting in Composer to convert the CSS to <B>, <I> etc. as much as 
possible - and the rest, as a <SPAN> tag. This way Composer will gain 
the ability to maintain any given style of text from the source page.

There are certain things I'm still wondering about though:

1. Changing of the clipboard encoding. That is discussed in bug 44496.
Since declaring formats in the X clipboard doesn't imply actual transfer 
(or even generation) of the data, we can easily offer multiple encodings 
without a performance penalty. UTF-16 makes interoperatibility a bit 
complicated, since it's contains NULLs and is not the usual encoding 
you'd expect to find on a clipboard. I propose that the clipboard would 
contain an:
a) "text/html;charset=UTF-8" type in UTF-8 encoding.
b) "text/html" type with all non-Latin1 values encoded as Unicode 
entities (&#xNUMBER;). This won't be the prefered format (since it would 
end up much larger than UTF-8), but would be the most non-ambiguous format.
If we still intend to keep UTF-16 on the clipboard, lets in order to 
disambiguate it, call it "text/html;charset=ISO-10646-UCS-2" or 
"text/html;charset=UTF-16" and start it with the endianess bytes (FF FE).

2. Base URI for objects. For objects embedded on the page (<IMG>, 
<OBJECT>, <EMBED> etc.) with a relative URI, would we:
a) change their SRC to contain the absolute URI?
b) keep the HTML data in the clipboard as a complete HTML document with 
a DOCTYPE and a <BASE HREF="..."> tag?

3. An additional idea I had is a "text/html;version=3.0" type, which'll 
contain the closest possible approximation of the style by appending 
HTML formatting tags just like current Mozilla does. This format will be 
  mostly for useful for things like GTKHTML, which have a limited HTML 
support, obviously without CSS.

On my side, I'll try to implement this into Mozilla (unless any of you 
more familiar with the beast will step forward) and I'll make sure KHTML 
works similarily. People in the KOffice team are also interested in 
this. I'll try to promote this for inclusion in GTKHTML (Ximian 
Evolution's HTML viewer / editor) as well.

Looking forward for your comments and suggestions.


----- End forwarded message -----

Reply via email to