Re: [docbook-apps] Unicode characters in epub

2012-02-01 Thread Tony Graham
On Tue, January 31, 2012 10:39 pm, Boris Schäling wrote:
...
 1. My book is about C++. Unfortunately C++ is not a word - so e-readers
 seem to break C++ wherever they like. A line could end with C+ or C,
 and the plus sign(s) is on the next line. I turned C++ into
 C#xfeff;+#xfeff;+ (which is already crazy as I don't know how often I
 refer to C++ in my book). However this had some unfortunate side effects:
 If
 #xfeff; is used in the book title or titles which appear in the table of
 contents, the Sony Reader displays rectangles (not in the body text
 though).

#xFEFF; has a dual role as Zero Width No-Break Space and as the BOM.

Unicode 3.2 added #x2060, WORD JOINER, that is just a word joiner. [1]

The Unicode Standard says that you are supposed to use #x2060; in new
text, and that applications are supposed to support word joining with
either #x2060; or #xFEFF;.

Maybe, just maybe, your EPUB readers will do better with #x2060; than
they do with #xFEFF;.

Regards,


Tony Graham   tgra...@mentea.net
Consultant http://www.mentea.net
Mentea   13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
 --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
XML, XSL-FO and XSLT consulting, training and programming

[1] Page 5 (or 524) of http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf


-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



RE: [docbook-apps] Unicode characters in epub

2012-02-01 Thread Boris Schäling


 -Original Message-
 From: Tony Graham [mailto:tgra...@mentea.net]
 Sent: Mittwoch, 1. Februar 2012 14:21
 To: docbook-apps@lists.oasis-open.org
 Subject: Re: [docbook-apps] Unicode characters in epub
 
 [...] 
 The Unicode Standard says that you are supposed to use #x2060; in new
 text, and that applications are supposed to support word joining with
either
 #x2060; or #xFEFF;.
 
 Maybe, just maybe, your EPUB readers will do better with #x2060; than
 they do with #xFEFF;.

Thanks, I just tried it: Adobe Digital Editions and the Sony Reader show a
rectangle with a 0 inside when #x2060; is used in the book title or table
of contents. Kindle shows rectangles in the table of contents. I didn't see
any problems on the Kobo Touch. 

Anyway, I find this too risky and will probably not use any special
characters unless I know that without them some parts of the book become
entirely unreadable. 

Boris 

 [...] 



-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



[docbook-apps] Unicode characters in epub

2012-01-31 Thread Boris Schäling
Hello, 

I successfully generated an epub file with the epub stylesheets and tested
it on various e-readers (or emulators of e-readers). I have some annoying
problems with Unicode characters though and wonder what others do or
recommend. 

1. My book is about C++. Unfortunately C++ is not a word - so e-readers
seem to break C++ wherever they like. A line could end with C+ or C,
and the plus sign(s) is on the next line. I turned C++ into
C#xfeff;+#xfeff;+ (which is already crazy as I don't know how often I
refer to C++ in my book). However this had some unfortunate side effects: If
#xfeff; is used in the book title or titles which appear in the table of
contents, the Sony Reader displays rectangles (not in the body text though).
If I use #xfeff; somewhere else like in --#xfeff;option in the body text
(to avoid that a command line option is broken after the double minus), the
Sony Reader displays something like --`option (and still breaks after the
double minus). I don't know whether this is only a problem with the Sony
Reader. But if in doubt I prefer line breaks than having some readers to see
rectangles or other funny characters everywhere. 

2. Some e-readers like the Sony Reader and the Kobo Touch don't break long
words. If you have a book about C++, you can have very long paths to header
files or very long macros. The Kindle does the right thing and puts a line
break into a word which you can't read anymore otherwise. I tried different
CSS properties like word-wrap and overflow-warp but to no avail. Is there
any trick to make e-readers break words by all means if they are too long? 

3. I use a table with three columns in my book which is already difficult to
display on a narrow e-reader. If there are some long words, e-readers can
mess up completely (because of 2.). So I added #xad; here and there to
insert soft hyphens. The Sony Reader, Kobo Touch and Adobe Digital Editions
do break the words now where I put #xad; - but they don't display a hyphen!
Adobe Digital Editions does display a hyphen in the table of contents if I
add #xad; to a chapter title - although the chapter title doesn't need to
be and isn't broken in the table of contents. Only the Kindle seems to do
the right thing. 

My conclusion is that one better doesn't try to beautify an epub with
Unicode characters? I think I'll use #xad; where it's absolutely required
to break words (like in a table with three columns) because I know that some
parts of the text will not be displayed at all. Otherwise it's probably
better to blame the e-reader? ;) 

Boris 



-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org