[NTG-context] Re: testing tagging for UA-2

Hans Hagen via ntg-context Sat, 12 Jul 2025 04:12:25 -0700

Hi Max,

On Thu, 2025-07-10 at 10:54 +0200, Hans Hagen via ntg-context wrote:

On 7/10/2025 9:08 AM, Max Chernoff via ntg-context wrote:


sorry ... long answer ... probably much already said once

- I had to manually tag every paragraph with
    \startparagraph/\stopparagraph, which was annoying.


well, that also depends on the concept of paragraph i guess .. one can't
make structure from non-structure


Right, but any text outside a tagged structure automatically fails
validation, so incorrectly tagged paragraphs is better than nothing. And
"indentnext=auto" usually does a decent job visually, so you can
probably use the same heuristics for tagging.

The problem is that when we implement(ed) things like that, we have tomake assumptions. I'm not saying that there are alternatives (could beoptional) but what belongs to a 'paragraph' is a bit arbitrary, even inyour example you have two paragraphs and a formula as being oneparagraph. Heuristics always can fail. And keep in mind that in texanything can end up inside anywhere: no endpoints are defined unless onestarts blocking nested tagging!

I probably need to go over the now kind of ancient code and might seesome ways to improve things (also performance wise) by using some moreluametatex features.

Also, we never had any real usage and examples. I'm not going to tag andcheck out of hobby. The basics were implemented long ago (see tugboatarticle) and the logic is not complex, mostly annoying at the pdf level.I even dare to say it was kind of trivial because we already hadstructure to hook into (call it luck). Adding a few things here andthere is not complex either, just boring.


Furthermore:

- only acrobat did show something but the last version I had (and paidfor) was Pro 8 which at some point I lost after updating a laptop; thefact that it was never supported well elsewhere tells something

- the standard is a mess and tagging is pretty bad imo, maybe derivedfrom how this (non structure related) tagging was present / needed inadobe applications (valid in itself as that where pdf came from) thatused it for storing / editing purposes (there's more to tell about thatbut it's ancient history by now)

- things and interpretations and standards still change .. so much for astandard ... so, if at some point we validate and another we don't (orrefuse to use tricks that suite specific programs) .. well, if someonebarks that context does not do it right, they are rewriting history andas side effect removing themselves from my 'maybe worth listening to'list (which actually is true too when i read rediculous comments andassumptions wrt context in more public places)

[btw, i'm aware that you know more about context inner workings andintentions than the average tex user so thanks for occasionallyclarifying that on eg SE]

- so when we have tim and motivation we pick up on it and adapt ... justthat, adapt ... but within the reasonable (so i adapted the nested mcidand made all NonStruct into Span of Div .. we'll see); it will takeyears before that dust is settled. And no, i'm unlikely to add all kindof pseudo html directives, css or other crap to the file ... then oneshould just make html instead.

- "\setupbackend[format=pdf/ua-2]" needs to come before
    "\setuptagging[state=start]", otherwise lots of stuff will silently
    break.


Indeed, the format influences some later settings (arbitrary order
initializations would complicate the code with no real gain)


Right, this is more of a note in case anyone else ever runs into this
issue. but adding a warning might be a good idea (although maybe not
worth the effort).

I think it's mentioned somewhere but I can check. Maybe issue a warningwhen the order is flipped. It has to do with enabliong / disablingfeatures and doing that in an arbitrary order is kind of messy. (I betyou know why by looking at code.)

and it comes for free (so
it's not driven by some paid project that can set priorities)


There is the TUG development and accessibility funds

     https://tug.org/tc/devfund/grants.html

     https://tug.org/twg/accessibility/

but the grants tend to be quite small, especially considering how
complicated/annoying the accessibility work is.

We (i) never work with grants from user groups, also because I'minvolved in some and I want to avoid conflicts of interest. We dooccasionally tried to get funding for other tex projects (like the fontprojects, mplib by taco, swiglib by luigi, some work by idris).

It is kind of interesting that the large scale users are totally absent:large publishers (i think they lost interest long ago, there used to betex people there; i can't access their content anyway so why should icare), their providers (i know only a small dutch one that i havecontact with and is involved in some font stuff - tricky scripts - incontext; the rest are just comsumers that make money from tex and expectit to be around and developped; they normally sit on their technologyanyway), service providers (where we were told corporate / investorpolicy drives decisions drive the processes; again, just assuming texkeeps developing).

It says something that in the decades we develop luatex none of thosecritically dependent-on-tex entities ever contacted developers (just tobe sure of continuity, know those involved, as they have quite adependency) .. well, they don't care so they can have it .. for me it'sonly users that matter and they normaly don't have the funds sobasically we pocket all ourselves. Get me right, I'm talking substantialhere, not some user groups membership or a few K. Companies that pay 5-7digit number salaries for developers don't really care about the texpart that much apart from using it. It's the small scale long termcontext users who keep that afloat, by being enthousiastic, challengingand friends. That's what we keep do(ing) it for, not that large scalerealizes that dependency.

So, unless I run into specific large scale professional context userspublisher / corporate usage of tex is non-existent for me (and a wasteof time, point of no return passed). And I don't expect user groups tofund anything, just focus on staying around to provide the basicsupport, archiving, distributions, care for domestic language and scriptdemands and maybe journals. Small tech so to say.

(Years ago we sometimes had bit and pieces of context dev being paidwork as it was needed: specific kind of tables. And these are notIT-conforming project anyway, money wise. I only met very fewprofessional large scale users (different mindset wrt software dev too)but they knew their way around and are not your average publishing kindof people. They also put their jobs at risk by going for tex solutions.But long term continuity is bad as organizations move and merge. Wenoticed the same with some educational publishers: merge, ditch, profitdriven (instead of full-range content covering approach). Very fewexceptions alas.

FWIW: quite often in our projects tex was a last resort ... all else hadfailed ... big money spent ... so then they were willing to accept acheap tex solution, even if past experiences were bad (some knew texfrom their education and somehow never saw it as valid solution), butagain we had to develop the technology beforehand. We only say we can dosomething if we know we can (which is why we remained small). Kind ofrewarding too: implementing solutions with a maverick, often interestingpeople. But it's always upfront free development then applied to hourlywork (i bet this ein the same tex business can confirm that more hoursgo in there than get paid for, as there's always tricky detailed demandsinvolved; oen seldom gets the simple many-pages stupid rendering: thatgoes to those in hiding).

So funding ... as i mentioned, tagging itself is kind of trivial, butadapting and making decisions and testing takes time and for that weneed projects in order to prioritize it, unless it's a fun project (likeMikeal S - esp upcoming - lecture notes which are also artistic andeducational master pieces so worth spending time on). Users probalydon't realize how much is done just because I interact with users thathave challenges or think the way that fits into our way of thinking.Like: we're currently doing some mp related coding and it will never payback, but it's a nice challenge, nice discussions and can be artisticfun too. Of course it can drive high perforemance workflows but tex andfriends never fit into the solution space there.

One can argue the same for many mechanisms: improved math (only ofinterest to context users who notice the difference, publishers etcdon't care .. what's good and hackery for decades is good enoughforever; for journals it gets fixed in post production anyway so moneycan be made), better par building (idem, probably only some contextusers appreciate that), whatever we improve on the engine (alsoprogramming capabilities so that source code looks better).. a lot isabout 'feel good' (okay for me).

   > - I've heard that it's actually usually better to put the TeX source in

    the Alt text for math instead of the current generated prose, because
    most people reading math are familiar with TeX anyways.


Well, i never meet people who are familiar with latex input ... or
expect that from us ... do you expect me to generate something latex
math from less verbose context math? And what about all kind of
(educational) stuff inside there. We try to accomodate what users expect
and challenge us to because that's the world we deal with. The latex is
just a different world (to us); little or no overlap.


LaTeX and ConTeXt inline math (_not_ display math) syntaxes are
essentially identical, since they're both essentially "Plain TeX with
\frac instead of \over", so I don't think that many LaTeX users would
struggle with ConTeXt's syntax.

Well, inline should be trivial as we have unicode math symbols. It'sunfortunate that we cannot tag math sequences (as with bidi) and thatsome alphabets have gaps. I think the tex community failed big here andstill does. But I'm not in any loop .. remember: context is not supposedto do / be used for math, that's the persistent narrative.

Also, I don't think the majority of our users cares about latex orwhatever. They just ran into context, didn't like it and left, or sawthe use and fun of it and stayed. None is forced to use tex (or anysyntax). For me latex is to context what msword is to latex: often sortof an annoyance. Different worlds and mindsets too.

(And we also have to be immune for some bashing: hard to install,always, evolving, slow, weird version numbers, needs scripts (why lua),not that many users compared to latex, no math (needed because noarticles), no manuals, huge showstopping bugs (in engine) while usershappily use it anyways, assumptions of how we think and why we decide aswe do, etc. The usual social media crap. But still: lua(meta)tex comesfrom the context end, right? User can use and run it, indeed? We writeabout it and make all (!) public, yes? No one is forced to use it, so?

Concerning your comment: latex users (I suppose that we are talkingabout those who publish articles) never run into context documents andif they have the need for a better readable version context can generateone, or when they lack eyesight sources can be provides or we candiscuss how to accomocate that ... the stupid tagging in pdf is prettysuboptimal (and likely also commercially and political driven). We'renot in that world, bad for ones health and mindset. Also, kind of weirdto impose something (this EU law thing) that is not stable and will inthe end lead to many invalid (intermediate) stuff (okay, money to bemade by fixing). Fighting windmills.

And the embeded xml
blob is probably more reliable than any context -> latex math
conversion. When it comes to math I think most context users are in
education so that's what we focus on.


Yes, I also agree that focusing on MathML is probably the best way
forwards.

Although even there it's a mess: came to and went from and came back inbrowsers (mathjax filled that gap well i think, ascimath is a bit of apain). Now that some was dropped from mathml it was hard to implement(talls more about the programs i guess). One can wonder why that took solong anyway. I saw some drafts that puzzle me. So far we could alwaysadapt but it doesn't get prettier over time. I like content mathml,predictable, it looke like open math would follow up but that was afailure, presentation was always kind of dumb and stays that wayalthough they smuggle in some from content. But we'll adapt and have toaccept the 'older docs and versiosn are not doing ok' rants.

If 30 years of history is representative imagine the next 10 years. Everbeen to a typesetting museum that spans the last 50 years? The best oencan do is just keep producing nice looking documents and hope for thebest. So rendering is what we focus on, that is the (small butappreciating) audience we have.

Just watch the youtube videos about the voyager and how they fix thingson a distance, or the appollo lander computers .. much we brag abouttoday was (conceptually) invented then. It's the time I picked up oncomputing. And tex is that old, and still kicking, so let's keep it alive.

You need to keep in mind that when we started with all this there were
no programs that did anything useful with tagging,


Viewer support is still very weak---MathML only works with Foxit and a
version of NVDA released less than a month ago.

I never used those viewers and settled for sumatra on windows, andokular on windows and linux. All platforms? I suppose there is the usualinteraction between standard (adapt it), pdf processing programs (can orcan't or don't want to do it) and pdf generation (no comments there).

that the spec was
(and is) not stable, that validation is a moving target etc ... it's all
about adaptation and it's always easy to point out whatever without
looking at the past and reality one has / had to deal with.


Yup, the very recent PDF 2.0 specification defines the <H> tags, and
then the UA-2 spec arbitrarily decides that those are now invalid.

Exactly. And the sectioning problem is not solved. It's the usual: onestarts from high school writings, so a few sections, some itemize, asimple toc, maybe a figure, then a simple table. The same fortypesetting: these are easy and then comes the rest. Combine that withtodays (social media) advertising of "we're the best and will do better"and the usual "free, professional, enterprise support" options (and hopefor the best when we sell ourselves or merge) and you will understandwhy i seldom (or never) check those out.

My motto is "the problem doesn't change" and maybe there are plenty waysto reach some goal, no need to compete and get the most users. We don'twant unhappy enforced users (or at least not be confronted by thatfact). There are plenty potential users otu there as long as they arefree to choose and not harassed by tex or competing evanglists(comparing and bragging of being better is often a sign of loss andesperation anyway; fine when a user says it, but a developer ...) Oneshould just use what one likes best.

A few decades from not all this tagging will probably be seen as kind of
rediculous anyway.


Yeah, I'm also fairly skeptical of all this PDF tagging stuff, mainly
because it seems much more compliance-driven than accessibility-driven.
But it's also a classic chicken-and-egg problem between viewer support
and document support, so if/when viewer support gets better, it should
be much more useful.

Sure, but releasing a spec before actually implementing and playing withit ... i tend to follow the 'third attempt is the best' approach so trynot to be driven by release fever. By going ISO some long term stabilitywas signalled. The amount of patching and explaining to me is alarming.

Of course there is new stuff like the balancing mvl that some play withbut it's likely offocial around the meeting, more than year after wewere concentrating on it: first we make some documents that stress allof it, thenas usual it can become stable and maybe occasionallyimproved) .. we're not in some competition.

(Even the new par building and math is not used to full extent by users;much isn't even advocated but only mentioned in low level manuals;expect no sales pitches.)

btw, About these nested mcids .. they come from the fact that we wantedto support math labels in metapost output ... we now just disable thatbecause after all, that math is meaningless without drawing. The easysolitions: if it doesnt work just disable it or claim that it wasunintended usage. But I admit that one should say that beforehand, notwhen a user runs out of luck. Normally we try to solve it, avoid looseends. Try to predict extreme usage patterns (comes with age I guess).


Hans

ps. Sorry, too long and too many typos (working on large screen on 3 mdistance).

ps. About mathml embedding, we already had something like that. We alsodid that with tables in the good old pdftex times: embed tables as excelxml .. one could just click on it and it worked fine; it's probablystill there (on my machine) .. the good old times of hundred ofthousands hyperlinks and so ... 2500+ page documents ... it's easiertoday and also faster but it's not like tex was ever behind (adobe nlreperesentatives used some context docs to demonstrate the possibilitiesof pdf they couldn't render themselves)


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

[NTG-context] Re: testing tagging for UA-2

Reply via email to