[NTG-context] Accessibility and Tagged PDFs: Bugs and Feature Requests

Dr. Dominik Klein Tue, 30 Jun 2015 01:11:24 -0700

Context is the only Tex-based system that allows to properly tag a pdf.Tagged PDFs are one major requirement for accessibility.

Indeed, in several large organizations/universities, accessibility ismandated by law, and this is a major obstacle for using Tex. In practicecompliance is often assessed with Acrobat Pro's

accessibility checker.

Context produces a nice tag-structure, but there are some minor issuesthat prevent compliance to [1], and hence Acrobat Pro complains duringthe check. The main issues are:

1.) Elements that are not contained in the structure tree are not markedas an artifact. Consider this example:


-------------------------------
\setuptagging[state=start]

\setuppagenumbering
[location=,
 alternative=doublesided]

\setupheadertexts
  [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}]
  [{Header Right}]
  [{Header Left}]
  [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}]

\setupfootertexts
  [Organization Name]
  [pagenumber]
  [pagenumber]
  [Organization Name]

\starttext
\startfrontmatter
something
\stopfrontmatter

\startbodymatter
some more text here
\stopbodymatter
\stoptext
-------------------------------

Header, footer, pagenumber etc. will not be included in the tagstructure. Of course this makes absolutely sense and is correct, howeveraccording to Section 14.8.2.2.2 of [1], then this content that is not inthe structure tree should be marked as an artifact, i.e.


/Artifact
  BMC
  ..
  EMC

or in an advanced way with /Artifact PropertyList where the type ofArtifact can be defined. It would be nice if those elements that are notincluded in the tag tree would be marked as artifacts by default. Thesame holds for \startelement[ignore] when one wants to explicitly removesomething from the structure tree.


2.) Images without alternate text:

According to Section 14.9.3 of [1], alternate descriptions in humanreadable text should be provided for images. It would be really helpful,

if these could be defined in the source tex file, and then automatically
added when creating the object in the structure tree. I.e. it would be
nice to have something like:

\placefigure[top][Image Reference]{Caption}{

\externalfigure[cow.pdf][width=10cm][alternate text = "This images showsa beautiful cow."]

The same holds for formulas: Whereas the mathml-like tagging of Contextis very advanced, sometimes it might be still helpful to supply atextual description (alt-text ="The definition of the Pythagoreantheorem: a^2 + b^2 = c^2")


3.) Tag names of the resulting tag structure:

Section 14.8.4 of [1] defines standard structure types, such as <H>,<P>, <Sect> etc. Context creates a tag-tree that uses names directlyrepresenting the structure names of the context laguage, such as<sectiontitle>. This should however be mapped to something standard,such as <H>. Interestingly these mappings seem to have been consideredin strc-tag.mkiv but I was unable to generate such a tagged pdf.Editing/Outcommenting things in strc-tag.mkiv didn't work for me. Itwould be nice if there was a switch somewhere, i.e.\setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something?

4.) Acrobat Pro always complains that the language for the wholedocument is not set.


5.) Tables
The generated structure looks something like this:
<table>
 <tablerow>
   <tablecell>
   ...
 <tablerow>
   <tablecell>
 ...

Here, not only are the tag names non-compliant, also the tag structure

should distinguish between the table header (THead), and table rows(TBody), c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would beto always put the first line into THead tags, and the rest of the ableinto TBody.

6.) It would be nice if a flat tag structure could be createdoptionally. This is not a required feature according to [1], and in facta properly nested structure is surely preferable for the final output;for debugging or checking during document creation however, a flatstructure tree sometimes is easier to browse through.

All in all, these seem to be the only issues that prevent accessible PDFdocuments with context. For those within an organization whereaccessibility is required legally for all publications, compliance to atleast Acrobat Pro's checks is a huge issue. I do not know how difficultthese things are to implement in Context (personally I am just lost inthe code), but looking at e.g. tex.stackexchangefor question related to accessibility, this is indeed a major obstaclefor several people.


cheers

- Dominik


[1] ISO 32000-1:2008, available at
http://www.adobe.com/devnet/pdf/pdf_reference.html
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

[NTG-context] Accessibility and Tagged PDFs: Bugs and Feature Requests

Reply via email to