[tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-26 Thread Karl Berry
Update of bug #241 (project tex4ht):

 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

as discussed on the mailing list, i think there is nothing to change here. the
input needs to specify a grave if that's what the output should be ...


___

Reply to this item at:

  

___
  Message sent via/by Puszcza
  http://puszcza.gnu.org.ua/



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-19 Thread CV Radhakrishnan


On 20/01/15 4:53 am, Karl Berry wrote:

 if my system will work, we may process all fonts in texmf tree. but
 manual testing of the results will be needed I am afraid, and that
 would be really huge task.

Testing every glyph in every font is not required to have something
useful.  If your method creates a decent first attempt for them, we can
post those files, people can use them and report bugs.  Occasional, or
even not-so-occasional, mistakes are inevitable and expected.  As you
noted, even in Eitan's files, after all.

I will also join you to generate/test htf files.

--
Radhakrishnan



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-19 Thread Karl Berry
if my system will work, we may process all fonts in texmf tree. but
manual testing of the results will be needed I am afraid, and that
would be really huge task.

Testing every glyph in every font is not required to have something
useful.  If your method creates a decent first attempt for them, we can
post those files, people can use them and report bugs.  Occasional, or
even not-so-occasional, mistakes are inevitable and expected.  As you
noted, even in Eitan's files, after all.

Thanks Michal.

K





Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-19 Thread Michal Hoftich
t>
> I don't doubt it.  No .htf has been created (in the distribution anyway)
> since Eitan died.  It would be great to cover some of the new fonts.

yeah, many fonts are missing, Linux Libertine for example. if my
system will work, we may process all fonts in texmf tree. but manual
testing of the results will be needed I am afraid, and that would be
really huge task.

>
> my idea is following: we can take property list of a tfm file
>
> I doubt the encoding info in the TFM file is especially reliable even in
> the few cases where it's present.  (Ditto afm2pl.)
>

it seems that best might be to use known encodings when present and
use afm file parsing in the other cases.


> and find postscipt name of the character in corresponding .enc
> file. we can get unicode code point for postscript name from
> glyphlist.txt and texglyphlist.txt files included in TeX
> distribution.
>
> Wow, quite a project.

I've already found fonts which use non standard glyph names (txsyc,
for example). so sometimes manual lookup for each character seem
necessary :(

>
> for these FONTSPECIFIC I have to use
> google to find out actually used encoding
>
> For fonts created through the otftotfm process, i.e., nearly everything
> that Michael Sharpe and Bob Tennent have done, who have contributed many
> of the new fonts (Sharpe did newtx), there should be an opaquely-named
> (a bunch of hex chars) .enc file in the font package corresponding to
> every tfm.  As I understand it.
>

thank, I will look at this.

> Anyway, in general, I expect that talking to the package developer or
> looking at the sources would be more fruitful than random web searches.
> (Not to say it'll be easy, no matter what.)

or manual looking for each character, as Eitan did.. But mistakes are
danger in such cases, as I've found German ß coded as beta in one htf
file.


>
> but sometimes two or more glyphs are used to create character
> (mainly accents), so we can't get post script name of such character
> even if we knew encoding of referenced glyphs
>
> All I can think of is to have heuristics or a table saying that a
> composition of character X + character Y in font F means Unicode point
> U.  Since it's generally about accents, the combinations should be
> finite, and repeated through many different fonts.
>

I hope so
> Thanks,
> K

regards,
Michal



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-17 Thread Karl Berry
btw, I think Nasser had found many errors in .htf files in last two
weeks and and also for many fonts, .htf files are missing. 

I don't doubt it.  No .htf has been created (in the distribution anyway)
since Eitan died.  It would be great to cover some of the new fonts.

my idea is following: we can take property list of a tfm file

I doubt the encoding info in the TFM file is especially reliable even in
the few cases where it's present.  (Ditto afm2pl.)

and find postscipt name of the character in corresponding .enc
file. we can get unicode code point for postscript name from
glyphlist.txt and texglyphlist.txt files included in TeX
distribution.

Wow, quite a project.

for these FONTSPECIFIC I have to use
google to find out actually used encoding 

For fonts created through the otftotfm process, i.e., nearly everything
that Michael Sharpe and Bob Tennent have done, who have contributed many
of the new fonts (Sharpe did newtx), there should be an opaquely-named
(a bunch of hex chars) .enc file in the font package corresponding to
every tfm.  As I understand it.

Anyway, in general, I expect that talking to the package developer or
looking at the sources would be more fruitful than random web searches.
(Not to say it'll be easy, no matter what.)

but sometimes two or more glyphs are used to create character
(mainly accents), so we can't get post script name of such character
even if we knew encoding of referenced glyphs

All I can think of is to have heuristics or a table saying that a
composition of character X + character Y in font F means Unicode point
U.  Since it's generally about accents, the combinations should be
finite, and repeated through many different fonts.

Thanks,
K


Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-17 Thread Michal Hoftich
Hi Karl,

> Meanwhile, aren't there options at the tex4ht level to decide whether to
> generate "unicode" (e.g., the unicode directed left quote) or not?
> I confess I have never had a good grasp on, or seen a comprehensible
> description of, all the multifarious options that Eitan created.  Aside
> from what you have written on your blog, and I fear I haven't even
> internalized those.

my understanding of the process is that for each tfm or vf file,
tex4ht post-processor search for corresponding .htf file. ascii code
or hehadecimal unicode codepoint is provided for each character
provided by the font file. these codes provided by the .htf file are
then translated using .4hf file to characters saved into the output
file.

structure of .htf files is described here:
http://www.tug.org/applications/tex4ht/mn-htf.html#index23-63001

example line:

’ˆ’’’2

Which .4hf file will be used in the translation process is directed by
`-c` command line option for tex4ht, this option selects section in
the .env file, so when we use `-cunihtf` for unicode output, this
section is selected:


i~/tex4ht.dir/texmf/tex4ht/ht-fonts/unicode/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/ascii/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/alias/!


so .4hf files in these directories are used (they seems to be always
named unicode.4hf and saved in charset subdir). because .4hf
referenced in `unihtf` section doesn't contain many characters,
majority of accents are outputed as html entities, as they were
provided in .htf files.

when we add `-utf8` option for tex4ht, I think tex4ht translates
unicode entities to unicode characters directly.

btw, I think Nasser had found many errors in .htf files in last two
weeks and and also for many fonts, .htf files are missing. so I
started investigating whether it is possible to get unicode code
points for characters in fonts.

my idea is following: we can take property list of a tfm file and find
postscipt name of the character in corresponding .enc file. we can get
unicode code point for postscript name from glyphlist.txt and
texglyphlist.txt files included in TeX distribution.

I have found two obstacles:

1. virtual fonts, which references many other fonts, including other
virtual fonts. this is not the problem, we can load all needed files
and . but sometimes two or more glyphs are used to create character
(mainly accents), so we can't get post script name of such character
even if we knew encoding of referenced glyphs

2. I have found many tfm files, which declares custom encoding, but I
can't find .enc files for such encodings.

For example, when I list fonts used in `ntxmia` virtual font:

ntxmia=FONTSPECIFIC
txmia=FONTSPECIFIC
txsyc=FONTSPECIFIC
txr=TEX TEXT
ntxexb=UNSPECIFIED
rtxmio=FONTSPECIFIC
ntxsyralt=NTXMIAALTENCODING
txsyb=FONTSPECIFIC + MSBMENCODING
ptmr8r=TEXBASE1ENCODING
zxxrl7z=ADOBESTANDARDENCODING

I can find TEXBASE1ENCODING, but for these FONTSPECIFIC I have to use
google to find out actually used encoding and not always I find
anything. with afm2pl -V I can get properly list with glyph names in
comments, but not always these glyph names are useful.

to sum it up, I am trying to make lua scripts that can generate .htf
file for each virtual or normal font, but I am not sure if it's even
possible :)

https://github.com/michal-h21/htfgen

best regards,

Michal



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-16 Thread Karl Berry
Hi CVR!

A lasting solution would be to modify position 96 of cmtt.htf to 
`

Meanwhile, aren't there options at the tex4ht level to decide whether to
generate "unicode" (e.g., the unicode directed left quote) or not?
I confess I have never had a good grasp on, or seen a comprehensible
description of, all the multifarious options that Eitan created.  Aside
from what you have written on your blog, and I fear I haven't even
internalized those.

K


Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-16 Thread Karl Berry
Hi Nasser and all,

I don't think it would be right to change cmtt10.htf.  What is at
position 0x60 of cmtt10 is, in fact, a directed left quote, not a grave
accent.  The fact that the standards committees screwed over all us
helpless users by making that plain ASCII character into a useless
standalone accent does not change Knuth's fonts.

What's in the PDF file has to correspond to the fonts used by the
document.  Now, what gets copied/pasted from a PDF is another matter
entirely.  Different viewers do different things there.

I realize full well that when you insert an ASCII 0x60, what you
presumably see on your screen is a grave accent (I don't, but that's
another story).  I realize full well that that is what ASCII defined at
that position.  But that is not what TeX (or, more precisely, the cm
fonts) does (do), by default, and therefore tex4ht follows suit.  That
seems undoubtedly the correct behavior to me.

So, if you want to change it, you should change it at the TeX level, and
then tex4ht should do what you want.  Michal explained how to do that
for LaTeX.  (Aside: In Texinfo, I created all kinds of stupid options so
people could get the stupid grave accent in their output, etc.  As I
expect you're aware, there is a similar issue with 0x47 being a directed
right quote in CM and a useless straight quote in the standards.)

Best,
Karl


Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-16 Thread Michal Hoftich
>
> Thanks Michal. But there is still an issue. This is what I tried:
>
> Using \usepackage{upquote}, does indeed correct the problem for tex4ht,
> but _only_ for the verbatim text in the above example, not for
> the normal text.
>
> Yes, the normal text, appears the same in the pdf as it is on the
> web page, but the encoding can't be the same. I found this, when I
> copied the normal text out from pdf to text file and looked at
> the hex encoding using
>
>> xxd -p foo.txt
>

This is maybe caused by PDF viewer you use, I don't get graves using
Acrobat Reader or pdftotext.

Grave character is used to input quotes, ie. ``hello'' will print
correct English quotes, so it would be error to get anything else in a
text. You can use \`{} command to get grave, or better

   \newcommand\textgrave{\`{}}

and then use \textgrave in the document. this works in pdflatex as
well as in tex4ht.

>
> It was the hex60, which is what I wanted, same as the input.
>
> But when I copied the normal text from the web page, and looked at
> its hex encoding, it was the left single quotation mark. which
> causes problem.
>
> So, the encoding inside pdf can't be the same as the HTML generated for
> the normal text. Even though they do appear to be the same (left single
> quotation) when looking at them on the screen.
>
> For pdf, I did not even need the \usepackage{upquote}, and was able
> to copy both the normal and the verbatim text, and they both came out
> as grave accent.
> But for htlatex, it did fix the verbatim part. Not the normal
> text part. This was the same result as when using the patched
> cmtt.htf I was testing with.
>

> So, there is still a problem, with normal text. For now, I will use
> verbatim with \usepackage{upquote} to avoid this problem. But for
> normal text, I think there is still a problem, since it does not
> work like with pdflatex or lualatex.
>
> Thanks for your help.
>

you're welcome :)

Michal
> --Nasser
>


Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-16 Thread Nasser M. Abbasi

On 1/16/2015 6:17 AM, Michal Hoftich wrote:

I will repost my answer on TeX.sx. This is not a bug, but default
LaTeX behaviour:



This is the default behaviour, you will get the same result even with
`pdflatex`. You can use `upquote` package to redefine grave and
upright-quote to produce correct glyphs:

 \documentclass[12pt]{article}
 \usepackage{upquote}
 \begin{document}
 `123`

 \verb|`123`|
 \end{document}

the result:

 ‘123‘
`123` 

-

Best regards,
Michal


Thanks Michal. But there is still an issue. This is what I tried:

Using \usepackage{upquote}, does indeed correct the problem for tex4ht,
but _only_ for the verbatim text in the above example, not for
the normal text.

Yes, the normal text, appears the same in the pdf as it is on the
web page, but the encoding can't be the same. I found this, when I
copied the normal text out from pdf to text file and looked at
the hex encoding using


xxd -p foo.txt


It was the hex60, which is what I wanted, same as the input.

But when I copied the normal text from the web page, and looked at
its hex encoding, it was the left single quotation mark. which
causes problem.

So, the encoding inside pdf can't be the same as the HTML generated for
the normal text. Even though they do appear to be the same (left single
quotation) when looking at them on the screen.

For pdf, I did not even need the \usepackage{upquote}, and was able
to copy both the normal and the verbatim text, and they both came out
as grave accent.  


But for htlatex, it did fix the verbatim part. Not the normal
text part. This was the same result as when using the patched
cmtt.htf I was testing with.

So, there is still a problem, with normal text. For now, I will use
verbatim with \usepackage{upquote} to avoid this problem. But for
normal text, I think there is still a problem, since it does not
work like with pdflatex or lualatex.

Thanks for your help.

--Nasser



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-16 Thread Michal Hoftich
I will repost my answer on TeX.sx. This is not a bug, but default
LaTeX behaviour:



This is the default behaviour, you will get the same result even with
`pdflatex`. You can use `upquote` package to redefine grave and
upright-quote to produce correct glyphs:

\documentclass[12pt]{article}
\usepackage{upquote}
\begin{document}
`123`

\verb|`123`|
\end{document}

the result:

‘123‘
   `123` 

-

Best regards,
Michal

2015-01-16 2:52 GMT+01:00 Nasser M. Abbasi :
> URL:
>   
>
>  Summary: grave accent letter ` (hex 60) changes to left
> single quotation mark (hex 0xE2 0x80 0x98)
>  Project: tex4ht
> Submitted by: nma123
> Submitted on: Fri 16 Jan 2015 03:52:11 AM EET
> Category: None
> Priority: 5 - Normal
> Severity: 5 - Normal
>   Status: None
>  Privacy: Public
>  Assigned to: None
> Originator Email:
>  Open/Closed: Open
>  Discussion Lock: Any
>
> ___
>
> Details:
>
> please see
> http://tex.stackexchange.com/questions/223362/tex4ht-changes-a-grave-accent-letter-hex-60-to-left-single-quotation-mark-h
> for more information.
>
> summary:
>
> when running htlatex on this file
>
> \documentclass[12pt]{article}
> \begin{document}
> `123`
>
> \verb|`123`|
>
> \end{document}
>
> the  grave accent character ` is changed to left-single-quotation-mark in
> HTML. I need it to remain  a Hex60 character as it is in the input.
>
> texlive 2014
>
>
>
>
>
>
> ___
>
> Reply to this item at:
>
>   
>
> ___
>   Message sent via/by Puszcza
>   http://puszcza.gnu.org.ua/
>



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-15 Thread Radhakrishnan CV
On Fri, Jan 16, 2015 at 10:00 AM, Nasser M. Abbasi  wrote:

> Thank you CVR for the file.
>

​Welcome.
​


> Can I use this now to try it? i.e. If I replace my copy of
> /usr/local/texlive/2014/texmf-dist/tex4ht/ht-fonts/alias/cm/cmtt.htf
>

​You can experiment with it by keeping a copy in your working directory.
When you are sure that all your files run without any problems, then you
might update TeX4ht system.
​


> with the one you attached, (do it as root), will this be enough?
>

​That is enough.
​


> Or do I need to something more to activate this change? I'd like
> to try this fix in my end to see if it works for all cases.
>

​Nothing.​

​Best​

-- 
Radhakrishnan
River Valley



Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-15 Thread Nasser M. Abbasi

Thank you CVR for the file.

Can I use this now to try it? i.e. If I replace my copy of

/usr/local/texlive/2014/texmf-dist/tex4ht/ht-fonts/alias/cm/cmtt.htf

with the one you attached, (do it as root), will this be enough?

Or do I need to something more to activate this change? I'd like
to try this fix in my end to see if it works for all cases.

--Nasser

On 1/15/2015 9:45 PM, CV Radhakrishnan wrote:

On 16/01/15 7:22 am, Nasser M. Abbasi wrote:

when running htlatex on this file

\documentclass[12pt]{article}
\begin{document}
`123`

\verb|`123`|

\end{document}

the  grave accent character ` is changed to left-single-quotation-mark in
HTML. I need it to remain  a Hex60 character as it is in the input.


A lasting solution would be to modify position 96 of cmtt.htf to
`. Shall we do that? Attached is my version of cmtt.htf. If that
works fine for all, we shall do it and update tex4ht.





Re: [tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-15 Thread CV Radhakrishnan

On 16/01/15 7:22 am, Nasser M. Abbasi wrote:

when running htlatex on this file

\documentclass[12pt]{article}
\begin{document}
`123`

\verb|`123`|

\end{document}

the  grave accent character ` is changed to left-single-quotation-mark in
HTML. I need it to remain  a Hex60 character as it is in the input.


A lasting solution would be to modify position 96 of cmtt.htf to 
`. Shall we do that? Attached is my version of cmtt.htf. If that 
works fine for all, we shall do it and update tex4ht.


--
Radhakrishnan
cmtt 0 127
'Γ' '' Gamma  0 
'Δ' '' Delta  1 % cmtt.htf (unicode)2003-03-27 %
'Θ' '' Theta  2 % Copyright (C) 2000--2003 Michel Goossens %
'Λ' '' Lambda 3 %  Eitan M. Gurari %
'Ξ' '' Xi 4 %  % 
'Π' '' Pi 5 % This file can redistributed and/or   % 
'Σ' '' Sigma  6 % modified under the terms of the LaTeX% 
'Υ' '' Upsilon7 % Project Public License Distributed from  % 
'Φ' '' Phi8 % CTAN archives in directory   % 
'Ψ' '' Psi9 % macros/latex/base/lppl.txt; either   % 
'Ω' '' Omega 10 % version 1 of the License, or (at your% 
'↑' '' uparrow   11 % option) any later version.   % 
'↓' '' downarrow 12 %However, you are allowed to modify% 
='=== quote 13 % this file without changing its name, if  % 
'¡' '' inverted  14 % you add a note of your own after this% 
'¿' '' inverted  15 % copyright note.  % 
'ı' '' dotless i 16 %  % 
'j''' wrong  \j 17 %gur...@cis.ohio-state.edu % 
'ˋ' '' grave 18 %http://www.cis.ohio-state.edu/~gurari % 
'ˊ' '' acute 19  
'ˇ' ''  caron  20
'˘' ''  breve  21
'ˉ' ''  macron 22
'˚' ''  ring baove 23
'¸' ''  cedilla24
'ß' ''  sharp \ss  25
'æ' ''  aelig  26
'œ' ''  oelig  27
'ø' ''  o with stroke  28
'Æ' ''  AElig  29
'Œ' ''  OElig  30
'Ø' ''  O with stroke  31
'␣' ''  visible space  32
'!'''  exclamation mark   33
'"' ''  right(?) doublequote  34
'#''' 35
'$''' 36
'%''' 37
'&' ''  ampersand  38
'’' ''  right singlequote  39
'(''' 40
')''' 41
'*''' 42
'+''' 43
',''' 44
'-''' 45
'.''' 46
'/''' 47
'0''' 48
'1''' 49
'2''' 50
'3''' 51
'4''' 52
'5''' 53
'6''' 54
'7''' 55
'8''' 56
'9''' 57
':''' 58
';''' 59
'<' ''  less than  60
'=''' 61
'>' ''  greater than   62
'?''' 63
'@''' 64
'A''' 65
'B''' 66
'C''' 67
'D''' 68
'E''' 69
'F''' 70
'G''' 71
'H''' 72
'I''' 73
'J''' 74
'K''' 75
'L''' 76
'M''' 77
'N''' 78
'O''' 79
'P''' 80
'Q''' 81
'R''' 82
'S''' 83
'T''' 84
'U''' 85
'V''' 86
'W''' 87
'X''' 88
'Y''' 89
'Z''' 90
'[''' 91
'\' ''  backslash  92
']''' 93
'ˆ' ''  ring 94
'_' ''  underscore 95
'`' ''  left singlequote   96
'a''' 97
'b''' 98
'c''' 99
'd'''100
'e'''101
'f'''102
'g'''103
'h'''104
'i'''105
'j'''106
'k'''107
'l'''108
'm'

[tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

2015-01-15 Thread Nasser M. Abbasi
URL:
  

 Summary: grave accent letter ` (hex 60) changes to left
single quotation mark (hex 0xE2 0x80 0x98)
 Project: tex4ht
Submitted by: nma123
Submitted on: Fri 16 Jan 2015 03:52:11 AM EET
Category: None
Priority: 5 - Normal
Severity: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any

___

Details:

please see
http://tex.stackexchange.com/questions/223362/tex4ht-changes-a-grave-accent-letter-hex-60-to-left-single-quotation-mark-h
for more information.

summary:

when running htlatex on this file

\documentclass[12pt]{article}
\begin{document}
`123`

\verb|`123`|

\end{document}

the  grave accent character ` is changed to left-single-quotation-mark in
HTML. I need it to remain  a Hex60 character as it is in the input.

texlive 2014






___

Reply to this item at:

  

___
  Message sent via/by Puszcza
  http://puszcza.gnu.org.ua/