Again, I apologise if this has been asked on this list before.
If I make a web page of which the text includes very long "words"
(such "words" in my case often are file pathnames), often very
awkward line-breaks result. They look even worse if the text is
justified ("text-align: justify;" in .css).
It would be nice to allow such long "words" to be split at line
breaks. UTF-8 provides a method for this: the zero-width space
(U+200B, or as an HTML entity: "​").
OK, so I changed in my page all "/" characters occurring in
pathnames to "/​". This indeed greatly improves the
appearance of the HTML page. Instead of
This is a very long path name:
/etc/this/is/one/very/long/path/name
I get
This is a very long path name: /etc/this/is/one/very/long/
path/name
The overall visual impression is now much better.
(This is only an illustration, trying to approximate the effect on
justified text, not involving actual zero width spaces; I hope it
survives the e-mail transmission).
However, trouble occurs when users try to copy-and-paste (using
the mouse) the pathnames into applications. The pathnames are now
riddled with invisible zero-width space characters, and will not
be accepted as valid pathnames by applications.
So now it seems that I have a choice between two evils. Either
ugly web pages, or mysteriously unusable copy-and-paste.
Has this problem been discussed before in UTF-8 forums? I have no
idea how the copy-and-paste mechanism in Linux (or indeed anywhere
else) works. Would it be possible to specify (in the Unicode
specifications) that certain characters (like zero-width space,
soft hyphen, maybe others) have to be "uncopyable" or
"unpastable", and would it be possible to realise this
technically? Or is this simply something that should be called a
bug in browsers?
Regards, Jan
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/