Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc
Hi Bill, >> The original issue of spurious elements for longer texts can be fixed >> using post-processing - make4ht provides common_domfilters extension, which >> does exactly this. >> > > Can such post-processing be reliably robust? (I doubt it.) > We have a DOM processing library in make4ht, which is quite powerful. It can join several elements with the same name and same attributes to one. We already use this code to join numbers in MathML, which are produced as 12 by default. I > I don't think that I really have anything new to say. It's only that I think > this type of issue > makes the case for profiled source documents. I wrote a more detailed > comment with > illustrations in LaTeX, ran it through tex4ht linked to MathJax, and posted > it here: > > https://www.albany.edu/dept/math-stat/hammond/demos/mathitOverline.html > > where one sees MathJax cough on the *first* MathML error, which is having the > 'a' loose > in an mrow. > Yes, we can say that the issues like this are often caused by wrong user input, but they usually don't want to change their way. Especially if it would mean to correct hundreds of already written formulas. They also usually don't agree that their way is wrong. I think we need to accept it, even if it means much more work and headaches. Best, Michal
Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc
Greetings to all: (This thread goes back to 2014, but Michal posted today.) On Fri, Nov 22, 2019 at 5:30 AM Michal Hoftich wrote: > Follow-up Comment #5, bug #226 (project tex4ht): > > I've found an issue with this approach - the \PauseMathClass will prevent > correct tagging of nested structures. For example: > > \mathit{\overline{a}+\overline{b}} > > This will result in > > b accent='true'>¯ > > The b character should be placed in element. So I think we should > revert > back to the use of . > > The original issue of spurious elements for longer texts can be fixed > using post-processing - make4ht provides common_domfilters extension, which > does exactly this. > > Can such post-processing be reliably robust? (I doubt it.) I don't think that I really have anything new to say. It's only that I think this type of issue makes the case for profiled source documents. I wrote a more detailed comment with illustrations in LaTeX, ran it through tex4ht linked to MathJax, and posted it here: https://www.albany.edu/dept/math-stat/hammond/demos/mathitOverline.html where one sees MathJax cough on the *first* MathML error, which is having the 'a' loose in an mrow. It may make this a bit clearer if I show you how *overline *and *mathbf *might be handled with CSS: overline { padding-top: 0.1ex; border-top: 0.2ex solid; } mathbf { font-weight: bold; } That his is not vapor may be seen here: https://www.albany.edu/dept/math-stat/hammond/demos/mathbfOverline-lm.xml which is the XML shadow, styled solely with CSS (no MathJax), of the LaTeX-like source: https://www.albany.edu/dept/math-stat/hammond/demos/mathbfOverline-glm.txt -- Bill
Re: [tex4ht] [bug #226] Spurious elements in mathml output of \mathit, \mathrm, \mathbf etc
Michal Hoftich writes: > . . . > Details: > > As was pointed out by David Carlisle, there are spurious `` elements in > the output of $\mathit{hello }\mathbf{world}$ when converted to mathml: > xmlns="http://www.w3.org/1998/Math/MathML"; > display="inline" >>h>e>l>l>o... > > it should be > > hello No. Loose character data is not allowed in . The LaTeX markup $\mathit{hello}$ is insufficient for knowing whether or not "hello" is intended to be the name of a mathematical symbol. That is, assuming amsmath, I would like to see something in the LaTeX source like \mathit{\text{hello}} or \mathit{\operatorname{hello}}. If "hello" is intended to be a symbol name, then hello, but if it's not a symbol name, then one probably should use hello for (minimal) commenting inside math. There are various inconsistencies afloat. For example, with the LaTeX markup $\mathbf{\operatorname{Hom}(X,Y)}$ should "Hom", which should be upright, be bold or not? There is a division on this between tex4ht and latexml and *also* a division between pdflatex and xelatex (with fontspec and unicode-math). \operatorname is a command taking symbol names, while \mathbf is a command taking expressions. My thought is that it should be bold, i.e., I would vote for tex4ht and xelatex+unicode-math. That said, just as I am not fond of seeing in MathML, I would prefer to see the use of things like \mathrm, \mathbf, and \mathit exercised symbol by symbol. -- Bill