Hi Max,
On Thu, 2025-07-10 at 10:54 +0200, Hans Hagen via ntg-context wrote:
On 7/10/2025 9:08 AM, Max Chernoff via ntg-context wrote:
sorry ... long answer ... probably much already said once
- I had to manually tag every paragraph with
\startparagraph/\stopparagraph, which was annoying.
well, that also depends on the concept of paragraph i guess .. one can't
make structure from non-structure
Right, but any text outside a tagged structure automatically fails
validation, so incorrectly tagged paragraphs is better than nothing. And
"indentnext=auto" usually does a decent job visually, so you can
probably use the same heuristics for tagging.
The problem is that when we implement(ed) things like that, we have to
make assumptions. I'm not saying that there are alternatives (could be
optional) but what belongs to a 'paragraph' is a bit arbitrary, even in
your example you have two paragraphs and a formula as being one
paragraph. Heuristics always can fail. And keep in mind that in tex
anything can end up inside anywhere: no endpoints are defined unless one
starts blocking nested tagging!
I probably need to go over the now kind of ancient code and might see
some ways to improve things (also performance wise) by using some more
luametatex features.
Also, we never had any real usage and examples. I'm not going to tag and
check out of hobby. The basics were implemented long ago (see tugboat
article) and the logic is not complex, mostly annoying at the pdf level.
I even dare to say it was kind of trivial because we already had
structure to hook into (call it luck). Adding a few things here and
there is not complex either, just boring.
Furthermore:
- only acrobat did show something but the last version I had (and paid
for) was Pro 8 which at some point I lost after updating a laptop; the
fact that it was never supported well elsewhere tells something
- the standard is a mess and tagging is pretty bad imo, maybe derived
from how this (non structure related) tagging was present / needed in
adobe applications (valid in itself as that where pdf came from) that
used it for storing / editing purposes (there's more to tell about that
but it's ancient history by now)
- things and interpretations and standards still change .. so much for a
standard ... so, if at some point we validate and another we don't (or
refuse to use tricks that suite specific programs) .. well, if someone
barks that context does not do it right, they are rewriting history and
as side effect removing themselves from my 'maybe worth listening to'
list (which actually is true too when i read rediculous comments and
assumptions wrt context in more public places)
[btw, i'm aware that you know more about context inner workings and
intentions than the average tex user so thanks for occasionally
clarifying that on eg SE]
- so when we have tim and motivation we pick up on it and adapt ... just
that, adapt ... but within the reasonable (so i adapted the nested mcid
and made all NonStruct into Span of Div .. we'll see); it will take
years before that dust is settled. And no, i'm unlikely to add all kind
of pseudo html directives, css or other crap to the file ... then one
should just make html instead.
- "\setupbackend[format=pdf/ua-2]" needs to come before
"\setuptagging[state=start]", otherwise lots of stuff will silently
break.
Indeed, the format influences some later settings (arbitrary order
initializations would complicate the code with no real gain)
Right, this is more of a note in case anyone else ever runs into this
issue. but adding a warning might be a good idea (although maybe not
worth the effort).
I think it's mentioned somewhere but I can check. Maybe issue a warning
when the order is flipped. It has to do with enabliong / disabling
features and doing that in an arbitrary order is kind of messy. (I bet
you know why by looking at code.)
and it comes for free (so
it's not driven by some paid project that can set priorities)
There is the TUG development and accessibility funds
https://tug.org/tc/devfund/grants.html
https://tug.org/twg/accessibility/
but the grants tend to be quite small, especially considering how
complicated/annoying the accessibility work is.
We (i) never work with grants from user groups, also because I'm
involved in some and I want to avoid conflicts of interest. We do
occasionally tried to get funding for other tex projects (like the font
projects, mplib by taco, swiglib by luigi, some work by idris).
It is kind of interesting that the large scale users are totally absent:
large publishers (i think they lost interest long ago, there used to be
tex people there; i can't access their content anyway so why should i
care), their providers (i know only a small dutch one that i have
contact with and is involved in some font stuff - tricky scripts - in
context; the rest are just comsumers that make money from tex and expect
it to be around and developped; they normally sit on their technology
anyway), service providers (where we were told corporate / investor
policy drives decisions drive the processes; again, just assuming tex
keeps developing).
It says something that in the decades we develop luatex none of those
critically dependent-on-tex entities ever contacted developers (just to
be sure of continuity, know those involved, as they have quite a
dependency) .. well, they don't care so they can have it .. for me it's
only users that matter and they normaly don't have the funds so
basically we pocket all ourselves. Get me right, I'm talking substantial
here, not some user groups membership or a few K. Companies that pay 5-7
digit number salaries for developers don't really care about the tex
part that much apart from using it. It's the small scale long term
context users who keep that afloat, by being enthousiastic, challenging
and friends. That's what we keep do(ing) it for, not that large scale
realizes that dependency.
So, unless I run into specific large scale professional context users
publisher / corporate usage of tex is non-existent for me (and a waste
of time, point of no return passed). And I don't expect user groups to
fund anything, just focus on staying around to provide the basic
support, archiving, distributions, care for domestic language and script
demands and maybe journals. Small tech so to say.
(Years ago we sometimes had bit and pieces of context dev being paid
work as it was needed: specific kind of tables. And these are not
IT-conforming project anyway, money wise. I only met very few
professional large scale users (different mindset wrt software dev too)
but they knew their way around and are not your average publishing kind
of people. They also put their jobs at risk by going for tex solutions.
But long term continuity is bad as organizations move and merge. We
noticed the same with some educational publishers: merge, ditch, profit
driven (instead of full-range content covering approach). Very few
exceptions alas.
FWIW: quite often in our projects tex was a last resort ... all else had
failed ... big money spent ... so then they were willing to accept a
cheap tex solution, even if past experiences were bad (some knew tex
from their education and somehow never saw it as valid solution), but
again we had to develop the technology beforehand. We only say we can do
something if we know we can (which is why we remained small). Kind of
rewarding too: implementing solutions with a maverick, often interesting
people. But it's always upfront free development then applied to hourly
work (i bet this ein the same tex business can confirm that more hours
go in there than get paid for, as there's always tricky detailed demands
involved; oen seldom gets the simple many-pages stupid rendering: that
goes to those in hiding).
So funding ... as i mentioned, tagging itself is kind of trivial, but
adapting and making decisions and testing takes time and for that we
need projects in order to prioritize it, unless it's a fun project (like
Mikeal S - esp upcoming - lecture notes which are also artistic and
educational master pieces so worth spending time on). Users probaly
don't realize how much is done just because I interact with users that
have challenges or think the way that fits into our way of thinking.
Like: we're currently doing some mp related coding and it will never pay
back, but it's a nice challenge, nice discussions and can be artistic
fun too. Of course it can drive high perforemance workflows but tex and
friends never fit into the solution space there.
One can argue the same for many mechanisms: improved math (only of
interest to context users who notice the difference, publishers etc
don't care .. what's good and hackery for decades is good enough
forever; for journals it gets fixed in post production anyway so money
can be made), better par building (idem, probably only some context
users appreciate that), whatever we improve on the engine (also
programming capabilities so that source code looks better).. a lot is
about 'feel good' (okay for me).
> - I've heard that it's actually usually better to put the TeX source in
the Alt text for math instead of the current generated prose, because
most people reading math are familiar with TeX anyways.
Well, i never meet people who are familiar with latex input ... or
expect that from us ... do you expect me to generate something latex
math from less verbose context math? And what about all kind of
(educational) stuff inside there. We try to accomodate what users expect
and challenge us to because that's the world we deal with. The latex is
just a different world (to us); little or no overlap.
LaTeX and ConTeXt inline math (_not_ display math) syntaxes are
essentially identical, since they're both essentially "Plain TeX with
\frac instead of \over", so I don't think that many LaTeX users would
struggle with ConTeXt's syntax.
Well, inline should be trivial as we have unicode math symbols. It's
unfortunate that we cannot tag math sequences (as with bidi) and that
some alphabets have gaps. I think the tex community failed big here and
still does. But I'm not in any loop .. remember: context is not supposed
to do / be used for math, that's the persistent narrative.
Also, I don't think the majority of our users cares about latex or
whatever. They just ran into context, didn't like it and left, or saw
the use and fun of it and stayed. None is forced to use tex (or any
syntax). For me latex is to context what msword is to latex: often sort
of an annoyance. Different worlds and mindsets too.
(And we also have to be immune for some bashing: hard to install,
always, evolving, slow, weird version numbers, needs scripts (why lua),
not that many users compared to latex, no math (needed because no
articles), no manuals, huge showstopping bugs (in engine) while users
happily use it anyways, assumptions of how we think and why we decide as
we do, etc. The usual social media crap. But still: lua(meta)tex comes
from the context end, right? User can use and run it, indeed? We write
about it and make all (!) public, yes? No one is forced to use it, so?
Concerning your comment: latex users (I suppose that we are talking
about those who publish articles) never run into context documents and
if they have the need for a better readable version context can generate
one, or when they lack eyesight sources can be provides or we can
discuss how to accomocate that ... the stupid tagging in pdf is pretty
suboptimal (and likely also commercially and political driven). We're
not in that world, bad for ones health and mindset. Also, kind of weird
to impose something (this EU law thing) that is not stable and will in
the end lead to many invalid (intermediate) stuff (okay, money to be
made by fixing). Fighting windmills.
And the embeded xml
blob is probably more reliable than any context -> latex math
conversion. When it comes to math I think most context users are in
education so that's what we focus on.
Yes, I also agree that focusing on MathML is probably the best way
forwards.
Although even there it's a mess: came to and went from and came back in
browsers (mathjax filled that gap well i think, ascimath is a bit of a
pain). Now that some was dropped from mathml it was hard to implement
(talls more about the programs i guess). One can wonder why that took so
long anyway. I saw some drafts that puzzle me. So far we could always
adapt but it doesn't get prettier over time. I like content mathml,
predictable, it looke like open math would follow up but that was a
failure, presentation was always kind of dumb and stays that way
although they smuggle in some from content. But we'll adapt and have to
accept the 'older docs and versiosn are not doing ok' rants.
If 30 years of history is representative imagine the next 10 years. Ever
been to a typesetting museum that spans the last 50 years? The best oen
can do is just keep producing nice looking documents and hope for the
best. So rendering is what we focus on, that is the (small but
appreciating) audience we have.
Just watch the youtube videos about the voyager and how they fix things
on a distance, or the appollo lander computers .. much we brag about
today was (conceptually) invented then. It's the time I picked up on
computing. And tex is that old, and still kicking, so let's keep it alive.
You need to keep in mind that when we started with all this there were
no programs that did anything useful with tagging,
Viewer support is still very weak---MathML only works with Foxit and a
version of NVDA released less than a month ago.
I never used those viewers and settled for sumatra on windows, and
okular on windows and linux. All platforms? I suppose there is the usual
interaction between standard (adapt it), pdf processing programs (can or
can't or don't want to do it) and pdf generation (no comments there).
that the spec was
(and is) not stable, that validation is a moving target etc ... it's all
about adaptation and it's always easy to point out whatever without
looking at the past and reality one has / had to deal with.
Yup, the very recent PDF 2.0 specification defines the <H> tags, and
then the UA-2 spec arbitrarily decides that those are now invalid.
Exactly. And the sectioning problem is not solved. It's the usual: one
starts from high school writings, so a few sections, some itemize, a
simple toc, maybe a figure, then a simple table. The same for
typesetting: these are easy and then comes the rest. Combine that with
todays (social media) advertising of "we're the best and will do better"
and the usual "free, professional, enterprise support" options (and hope
for the best when we sell ourselves or merge) and you will understand
why i seldom (or never) check those out.
My motto is "the problem doesn't change" and maybe there are plenty ways
to reach some goal, no need to compete and get the most users. We don't
want unhappy enforced users (or at least not be confronted by that
fact). There are plenty potential users otu there as long as they are
free to choose and not harassed by tex or competing evanglists
(comparing and bragging of being better is often a sign of loss an
desperation anyway; fine when a user says it, but a developer ...) One
should just use what one likes best.
A few decades from not all this tagging will probably be seen as kind of
rediculous anyway.
Yeah, I'm also fairly skeptical of all this PDF tagging stuff, mainly
because it seems much more compliance-driven than accessibility-driven.
But it's also a classic chicken-and-egg problem between viewer support
and document support, so if/when viewer support gets better, it should
be much more useful.
Sure, but releasing a spec before actually implementing and playing with
it ... i tend to follow the 'third attempt is the best' approach so try
not to be driven by release fever. By going ISO some long term stability
was signalled. The amount of patching and explaining to me is alarming.
Of course there is new stuff like the balancing mvl that some play with
but it's likely offocial around the meeting, more than year after we
were concentrating on it: first we make some documents that stress all
of it, thenas usual it can become stable and maybe occasionally
improved) .. we're not in some competition.
(Even the new par building and math is not used to full extent by users;
much isn't even advocated but only mentioned in low level manuals;
expect no sales pitches.)
btw, About these nested mcids .. they come from the fact that we wanted
to support math labels in metapost output ... we now just disable that
because after all, that math is meaningless without drawing. The easy
solitions: if it doesnt work just disable it or claim that it was
unintended usage. But I admit that one should say that beforehand, not
when a user runs out of luck. Normally we try to solve it, avoid loose
ends. Try to predict extreme usage patterns (comes with age I guess).
Hans
ps. Sorry, too long and too many typos (working on large screen on 3 m
distance).
ps. About mathml embedding, we already had something like that. We also
did that with tables in the good old pdftex times: embed tables as excel
xml .. one could just click on it and it worked fine; it's probably
still there (on my machine) .. the good old times of hundred of
thousands hyperlinks and so ... 2500+ page documents ... it's easier
today and also faster but it's not like tex was ever behind (adobe nl
reperesentatives used some context docs to demonstrate the possibilities
of pdf they couldn't render themselves)
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the
Wiki!
maillist : [email protected] /
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________