Again, I apologise if this has been asked on this list before.

If I make a web page of which the text includes very long "words" (such "words" in my case often are file pathnames), often very awkward line-breaks result. They look even worse if the text is justified ("text-align: justify;" in .css).

It would be nice to allow such long "words" to be split at line breaks. UTF-8 provides a method for this: the zero-width space (U+200B, or as an HTML entity: "​").

OK, so I changed in my page all "/" characters occurring in pathnames to "/​". This indeed greatly improves the appearance of the HTML page. Instead of

   This     is     a      very     long      path       name:
   /etc/this/is/one/very/long/path/name

I get

   This is a very long path name: /etc/this/is/one/very/long/
   path/name

The overall visual impression is now much better.

(This is only an illustration, trying to approximate the effect on justified text, not involving actual zero width spaces; I hope it survives the e-mail transmission).

However, trouble occurs when users try to copy-and-paste (using the mouse) the pathnames into applications. The pathnames are now riddled with invisible zero-width space characters, and will not be accepted as valid pathnames by applications.

So now it seems that I have a choice between two evils. Either ugly web pages, or mysteriously unusable copy-and-paste.

Has this problem been discussed before in UTF-8 forums? I have no idea how the copy-and-paste mechanism in Linux (or indeed anywhere else) works. Would it be possible to specify (in the Unicode specifications) that certain characters (like zero-width space, soft hyphen, maybe others) have to be "uncopyable" or "unpastable", and would it be possible to realise this technically? Or is this simply something that should be called a bug in browsers?

Regards, Jan


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to