Re: [tex4ht] unicode and lualatex

2011-07-23 Thread CV Radhakrishnan

On 07/23/2011 10:17 AM, Johannes Wilm wrote:

Hi,

On the attached test file I tried to run
/
/
/dvilualatex unicode.tex/
/dvilualatex unicode.tex/
/dvilualatex unicode.tex/
/tex4ht -f/unicode.tex -cunihtf -utf8/

I cannot figure out as what the characters are encoded in the output, 
but it doesn't seem to be utf8. Output has been attached.


Can your example produce a valid dvi? In my tests, it didn't. TeX4ht 
needs a valid dvi to generate html. Actually the post-processor called 
tex4ht (binary) extracts the textual characters from the dvi by making a 
clever substitution which is based on the *.tfm of font used and *.htf 
(hypertext font). The post-processor needs *.tfm which unfortunately is 
not available for unicode fonts and then it falls back to cmr. The 
resulting html file will not be usable owing to unicode characters 
appearing as junk.


If somebody comes forward with a patch to tex4ht binary which can 
post-process dvi without the help of *.tfm's will be a great 
contribution. The macro package level patching is easier than the binary 
level patching. Volunteers are welcome.


--
Radhakrishnan

It's today! said Piglet.
My favorite day, said Pooh.



Re: [tex4ht] unicode and lualatex

2011-07-23 Thread Ulrike Fischer
WARNING: This e-mail has been altered by the NFIT virus/spamfilter.  Please see 
below for a record of the changes made.
. In case of problems consider contacting the sender or postmas...@nfit.au.dk

---Change report:

An attachment named xhluatex.bat was removed from this document as it
constituted a security hazard.  If you require this document, please contact
the sender and arrange an alternate means of receiving it.

Am Fri, 22 Jul 2011 21:47:32 -0700 schrieb Johannes Wilm:

 Hi,
 
 On the attached test file I tried to run
 *
 *
 *dvilualatex unicode.tex*
 *dvilualatex unicode.tex*
 *dvilualatex unicode.tex*
 *tex4ht -f/unicode.tex -cunihtf -utf8*
 
 I cannot figure out as what the characters are encoded in the output, but it
 doesn't seem to be utf8. Output has been attached.

Your main problem has nothing to do with tex4ht. While luatex can
handle utf8 *input* natively it has problems to output
non-ascii-chars without fontspec and unicode fonts on the output
side. 

Your document is using OT1-encoded fonts (which has 128 characters)
and so your non-ascii-chars are ending in nothingness. With
\usepackage[T1]{fontenc} result will be better but quite a lot chars
will be wrong (e.g. the german ß) 

In normal latex the inputenc/fontenc-combo manages the
input-output-translation, but you can't use inputenc with luatex.

Your best bet is something like this:

% -*- mode: TeX -*- -*- coding: UTF-8 -*-
\documentclass[11pt,a4paper]{book}
\usepackage[utf8]{luainputenc}

\begin{document}
\chapter{UTF-8}

The following characters should be converted to Unicode:

Spanish: áéíóúÁÉÍÓÚ
German: äöüÄÖÜ
Danish: æøåÆØÅ

\end{document}

Which gave for me on miktex an utf8 encoded html with this command
line 

xhluatex.bat unicode html,charset=utf-8 -cunihtf -utf8

and the attached xhluatex.bat (I hope .bat come through) (I had to
change it and my tex4ht.env compared to the standard versions, see
the discussion some weeks ago in the archive in the mailing list.)

I'm not quite sure yet about the correct html-option but it
doesn't matter for utf8-tests..


-- 
Ulrike Fischer