Re: [tex4ht] Automatic HTF file generation

2018-11-17 Thread Michal Hoftich
> I will parse the .pfb files only when no enc is available. We will see
> how it is useful. Worse thing is large number of non-standard glyph
> names used in TeX fonts. We need to map each glyph to Unicode, it is
> not always easy to find a correct mapping.

It turned quite well, it works even for complex math and symbol fonts.
The only issue is that these fonts often use custom glyphs which I
cannot find Unicode equivalent for. For example, it works for this old
issue, which prompted me to create Htfgen in the first place:

https://puszcza.gnu.org.ua/bugs/?236

The attached file contains literate source file for fonts required for
the DVI file used by Nasser's sample. Note that it contains huge
number of "Missing glyph" messages. These won't work until glyph to
unicode mapping is added to Htfgen.

Best regards,
Michal
%   tex {nameofthefile.tex}
%
% Copyright (C) 2018 TeX Users Group

\input tex4ht.sty
   \Preamble{xhtml,th4,sections+}
\EndPreamble
\input ProTex.sty  
%\AlProTex{c,<<<>>>,`,title,list,ClearCode,_^}
\AlProTex{c,<<<>>>,`,title,list,`,ClearCode,_^}


\def\HOME{./tex4ht.dir/}
\def\DTDS{./dtd.dir/}   
\def\SOURCE{./html.dir/}

\def\MYdir{\HOME texmf/tex4ht/ht-fonts}


\newwrite\dbcs 
\newwrite\unicode  

\def\AddFont{\futurelet\ext\AddFontA}
\def\AddFontA{%
   \if [\ext \def\ext[##1]{\def\ext{##1}\AddFontB}%
   \else \def\ext{\def\ext{htf}\AddFontB}\fi
   \ext}
\def\AddFontB#1#2{%
   \Comment{}{}\OutputCode[\ext]\<#1\>%
   \let\StartDir=\empty  \def\EndDir{#2}\MakeDir
   \ifx \WWWdir\Undef \else
  \Needs{"cp #1.\ext\space \WWWdir /#2.\ext"}%
  \Needs{"chmod 644 \WWWdir /#2.\ext"}%
   \fi
   \Needs{"mv #1.\ext\space \MYdir /#2.\ext"}%
   }
\def\MakeDir{\relax
   \expandafter \ifx  \csname !\StartDir\endcsname\relax
  \expandafter\let\csname !\StartDir\endcsname=\empty
  \Needs{"mkdir -p \MYdir/\StartDir"}% 
  \ifx \WWWdir\Undef \else
 \Needs{"mkdir -p \MYdir/\StartDir"}% 
 \Needs{"chmod 711 \WWWdir /StartDir"}%
  \fi
   \fi
   \ifx \EndDir\empty \else
   \expandafter\AppendDir \EndDir*%
   \expandafter\MakeDir
   \fi
}
\def\AppendDir#1/#2/#3/*{%
   \def\temp{#2}\ifx \temp\empty  \let\EndDir=\empty 
   \else
  \edef\StartDir{\ifx \StartDir\empty\else \StartDir/\fi
 #1}\def\EndDir{#2/#3}%
   \fi
}

load font	txr
% writing ntxmia.htf hash: 45a649476729cb886bbd7762cd30e9e0
\<<<
ntxmia 0 255
'Γ' '' Gamma 0
'∆' '' Delta 1
'Θ' '' Theta 2
'Λ' '' Lambda 3
'Ξ' '' Xi 4
'Π' '' Pi 5
'Σ' '' Sigma 6
'Υ' '' Upsilon 7
'Φ' '' Phi 8
'Ψ' '' Psi 9
'Ω' '' Omega 10
'α' '' alpha 11
'β' '' beta 12
'γ' '' gamma 13
'δ' '' delta 14
'ϵ' '' epsilon1 15
'ζ' '' zeta 16
'η' '' eta 17
'θ' '' theta 18
'ι' '' iota 19
'κ' '' kappa 20
'λ' '' lambda 21
'µ' '' mu 22
'ν' '' nu 23
'ξ' '' xi 24
'π' '' pi 25
'ρ' '' rho 26
'σ' '' sigma 27
'τ' '' tau 28
'υ' '' upsilon 29
'ϕ' '' phi 30
'χ' '' chi 31
'ψ' '' psi 32
'ω' '' omega 33
'ε' '' epsilon 34
'ϑ' '' theta1 35
'ϖ' '' pi1 36
'ϱ' '' rho1 37
'ς' '' sigma1 38
'φ' '' phi1 39
'' ''  
'' '' kappa1 41
'' '' kappa1up 42
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' '' g1 49
'' '' y1 50
'' '' v1 51
'' '' w1 52
'y' '' y 53
'' '' npropersubset 54
'' '' npropersuperset 55
'' '' nelement 56
'' '' nowner 57
'
' '' colonequal 58
'
' '' equalcolon 59
'
' '' nequal 60
'=' '' equal 61
'{' '' braceleft 62
'}' '' braceright 63
'∂' '' partialdiff 64
'A' '' A 65
'B' '' B 66
'C' '' C 67
'D' '' D 68
'E' '' E 69
'F' '' F 70
'G' '' G 71
'H' '' H 72
'I' '' I 73
'J' '' J 74
'K' '' K 75
'L' '' L 76
'M' '' M 77
'N' '' N 78
'O' '' O 79
'P' '' P 80
'Q' '' Q 81
'R' '' R 82
'S' '' S 83
'T' '' T 84
'U' '' U 85
'V' '' V 86
'W' '' W 87
'X' '' X 88
'Y' '' Y 89
'Z' '' Z 90
'∀' '' universal 91
'∃' '' existential 92
'∄' '' nexists 93
'∅' '' emptyset 94
'∅' '' emptyset 95
'|' '' bar 96
'a' '' a 97
'b' '' b 98
'c' '' c 99
'd' '' d 100
'e' '' e 101
'f' '' f 102
'g' '' g 103
'h' '' h 104
'i' '' i 105
'j' '' j 106
'k' '' k 107
'l' '' l 108
'm' '' m 109
'n' '' n 110
'o' '' o 111
'p' '' p 112
'q' '' q 113
'r' '' r 114
's' '' s 115
't' '' t 116
'u' '' u 117
'v' '' v 118
'w' '' w 119
'x' '' x 120
'y' '' y 121
'z' '' z 122
'(' '' parenleft 123
')' '' parenright 124
'(' '' parenleft 125
')' '' parenright 126
'⁀' '' tie 127
'∥' '' bardbl 128
'' '' bbA 129
'' '' bbB 130
'' '' bbC 131
'' '' bbD 132
'' '' bbE 133
'' '' bbF 134
'' '' bbG 135
'' '' bbH 136
'' '' bbI 137
'' '' bbJ 138
'' '' bbK 139
'' '' bbL 140
'' '' bbM 141
'' '' bbN 142
'' '' bbO 143
'' '' bbP 144
'' '' bbQ 145
'' '' bbR 146
'' '' bbS 147
'' '' bbT 148
'' '' bbU 149
'' '' bbV 150
'' '' bbW 151
'' '' bbX 152
'' '' bbY 153
'' '' bbZ 154
'(' '' parenleft 155
')' '' parenright 156
'[' '' bracketleft 157
']' '' bracketright 158
'⌊' '' floorleft 159
'⌋' '' floorright 160
'⌈' '' ceilingleft 161
'⌉' '' ceilingright 162
'{' '' braceleft 163
'}' '' braceright 164
'⟨' '' angbracketleft 165
'⟩' '' angbracketright 166
'↕' '' arrowbothv 167
'⇕' '' arrowdblbothv 168
'' '' 

Re: [tex4ht] Automatic HTF file generation

2018-11-15 Thread Michal Hoftich
> > Encoding vector is always embedded in Type1 font.
>
> Usually the /Encoding vector in a given Type 1 is only a subset of the
> characters available in the font. The only way to know what is there is
> to parse the entire file. But there are lots of utilities in this area;
> I suspect something out there will dump a list of characters from a pfb. -k
>

I will parse the .pfb files only when no enc is available. We will see
how it is useful. Worse thing is large number of non-standard glyph
names used in TeX fonts. We need to map each glyph to Unicode, it is
not always easy to find a correct mapping.

Best,
Michal


Re: [tex4ht] Automatic HTF file generation

2018-11-15 Thread Karl Berry
> Encoding vector is always embedded in Type1 font. 

Usually the /Encoding vector in a given Type 1 is only a subset of the
characters available in the font. The only way to know what is there is
to parse the entire file. But there are lots of utilities in this area;
I suspect something out there will dump a list of characters from a pfb. -k



[tex4ht] Automatic HTF file generation

2018-11-12 Thread Michal Hoftich
Hi all,

I've finally finished the Htfgen project [1]. It's objective is to
automatize the creation of the HTF font mapping files. These files are
used by tex4ht to map character codes in the DVI files to Unicode.

There are two new scripts: scanfdfile and dvitohtf. The first one
searches for declared fonts in the FD files, the other generates
literate TeX file for HTF generation. Sample usage is as follows:

   cat /usr/local/texlive/2018/texmf-dist/tex/latex/ebgaramond/*.fd |
scanfdfile | dvitohtf > ebgaramont-htf.tex
   tex ebgaramont-htf.tex

This will create HTF files for all detected fonts defined in FD files
for EB Garamond.

dvitohtf can also generate HTF files for missing fonts in the DVI
file. So if tex4ht reports missing HTF files, it can be used directly
on the DVI file:

   dvitohtf sample.dvi > missing.tex
   tex missing.tex

dvitohtf supports both virtual and tfm fonts. It looks for virtual
fonts first, the tfm file is used only when no vf is found. It looks
for all fonts referenced in the virtual font and tries to look for
corresponding .enc files in pdftex.map. The .enc files contain glyph
lists, which are then mapped to Unicode.

It also parses the .pfb file for font family name and tries to detect
style (italic, bold, small caps) from the font full name saved in the
.pfb file.

It computes hashes for the font tables, so duplicate font tables
aren't written, the fonts with same characters just link to the first
used font.

If no .enc file is found, then the font cannot be supported. There can
be also missing mappings between glyphs and Unicode. The missing
mappings are reported in the TeX file.  Htfgen contains large mapping
files, but some fonts just use some custom glyphs which doesn't have
Unicode equivalent. For example Q_u ligatures etc. In this case the
mapping must be added by hand to glyphlists/glyphlist-fixes.txt.

It works reasonably well for fonts generated by Fontinst, because they
usually use standard glyph names, contains .enc files, etc. For
complex virtual fonts, especially math, it fails. HTF files for such
fonts still needs to be created by hand.

What to do now? There are some wrong HTF files in tex4ht sources, for
example Linux Libertine support is wrong for some ligatures. I am sure
there will be more examples, especially fonts with large number of
ligatures. Their support has been added few years ago, but only in T1
font encoding. We should remove HTF generation for these files from
the huge literate sources for fonts and create smaller literate TeX
file for each of these fonts. This should speed up the tex4ht build
and it should be easier to manage.

Any volunteers are welcomed.

Best regards,
Michal

[1] https://github.com/michal-h21/htfgen