Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc

2019-11-25 Thread Michal Hoftich
Hi Bill,

>> The original issue of spurious  elements for longer texts can be fixed
>> using post-processing - make4ht provides common_domfilters extension, which
>> does  exactly this.
>>
>
> Can such post-processing be reliably robust?  (I doubt it.)
>

We have a DOM processing library in make4ht, which is quite powerful.
It can join several elements with the same name and same attributes to
one. We already use this code to join numbers in MathML, which are
produced as 12 by default. I

> I don't think that I really have anything new to say.  It's only that I think 
> this type of issue
> makes the case for profiled source documents.  I wrote a more detailed 
> comment with
> illustrations in LaTeX, ran it through tex4ht linked to MathJax, and posted 
> it here:
>
>   https://www.albany.edu/dept/math-stat/hammond/demos/mathitOverline.html
>
> where one sees MathJax cough on the *first* MathML error, which is having the 
> 'a' loose
> in an mrow.
>

Yes, we can say that the issues like this are often caused by wrong
user input, but they usually don't want to change their way.
Especially if it would mean to correct hundreds of already written
formulas. They also usually don't agree that their way is wrong. I
think we need to accept it, even if it means much more work and
headaches.

Best,
Michal


Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc

2019-11-22 Thread William F Hammond
Greetings to all:

(This thread goes back to 2014, but Michal posted today.)

On Fri, Nov 22, 2019 at 5:30 AM Michal Hoftich wrote:

> Follow-up Comment #5, bug #226 (project tex4ht):
>
> I've found an issue with this approach - the \PauseMathClass will prevent
> correct tagging of nested structures. For example:
>
> \mathit{\overline{a}+\overline{b}}
>
> This will result in
>
> b accent='true'>¯
>
> The b character should be placed in  element. So I think we should
> revert
> back to the use of .
>
> The original issue of spurious  elements for longer texts can be fixed
> using post-processing - make4ht provides common_domfilters extension, which
> does  exactly this.
>
>
Can such post-processing be reliably robust?  (I doubt it.)

I don't think that I really have anything new to say.  It's only that I
think this type of issue
makes the case for profiled source documents.  I wrote a more detailed
comment with
illustrations in LaTeX, ran it through tex4ht linked to MathJax, and posted
it here:


https://www.albany.edu/dept/math-stat/hammond/demos/mathitOverline.html

where one sees MathJax cough on the *first* MathML error, which is having
the 'a' loose
in an mrow.

It may make this a bit clearer if I show you how *overline *and *mathbf *might
be handled with CSS:

overline {
padding-top: 0.1ex;
border-top: 0.2ex solid;
}

mathbf {
  font-weight: bold;
}

That his is not vapor may be seen here:


https://www.albany.edu/dept/math-stat/hammond/demos/mathbfOverline-lm.xml

which is the XML shadow, styled solely with CSS (no MathJax), of the
LaTeX-like source:


https://www.albany.edu/dept/math-stat/hammond/demos/mathbfOverline-glm.txt


-- Bill


Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc

2014-08-02 Thread William F Hammond
Michal Hoftich  writes:

> . . .
> Details:
>
> As was pointed out by David Carlisle, there are spurious `` elements in
> the output of $\mathit{hello }\mathbf{world}$ when converted to mathml:
>   xmlns="http://www.w3.org/1998/Math/MathML";  
> display="inline" >>h>e>l>l>o...
>
> it should be
>
> hello

No.  Loose character data is not allowed in .

The LaTeX markup $\mathit{hello}$ is insufficient for
knowing whether or not "hello" is intended to
be the name of a mathematical symbol.

That is, assuming amsmath, I would like to see something
in the LaTeX source like \mathit{\text{hello}} or
\mathit{\operatorname{hello}}.

If "hello" is intended to be a symbol name,
then hello, but if it's
not a symbol name, then one probably should use
hello for (minimal)
commenting inside math.

There are various inconsistencies afloat.

For example, with the LaTeX markup
$\mathbf{\operatorname{Hom}(X,Y)}$ should "Hom", which
should be upright, be bold or not?  There is a division on
this between tex4ht and latexml and *also* a division
between pdflatex and xelatex (with fontspec and
unicode-math).

\operatorname is a command taking symbol names, while
\mathbf is a command taking expressions.  My thought is that
it should be bold, i.e., I would vote for tex4ht and
xelatex+unicode-math.  That said, just as I am not fond of
seeing  in MathML, I would prefer to see the use
of things like \mathrm, \mathbf, and \mathit exercised
symbol by symbol.

-- Bill