Hi John, John Gardner wrote on Sat, Apr 21, 2018 at 04:48:33PM +1000:
> Ingo, I've spent the last 13 years in front-end web development, > and I've been writing standards-compliant websites for almost > a decade. Sounds like you might have valueable input that could end up improving the mandoc -Thtml output. Note that i did *not* claim that i specialize in anything related to HTML/CSS, which i actually do not. Quite to the contrary, whenever i had questions related to HTML/CSS, i had a hard time finding any developer who knew much about it. So i might finally get some real help, looking forward to that... >> I see absolutely nothing semantic in there, it looks like a >> purely presentational style sheet to me. > ... yes, that's the entire reason CSS exists: to separate > presentation from content (the latter being tantamount with > "semantics" as understood by web authors and those of us who > actively follow modern web standards). Wait - the point of CSS is to select adequate presentation for content of a given kind or class, right? So the CSS, on its input side, first needs to be told, by HTML elements and attributes, what kind or class of content some text belongs to, and then has to select the presentational attributes using selectors addressing these kinds and classes of HTML elements, right? What i meant by the above sentence is that the CSS you gave, https://rawgit.com/Alhadis/Stylesheets/master/complete/manpage/manpage.css with the exception of the dfn{} and kbd{} selectors, selects nothing based on kind or class of content or semantic function. What you do with def{} handles one single macro, and kbd{} is used for very different kinds of content that need completely different formatting, namely fixed syntax elements like command line options (.Fl) and fixed option arguments (.Cm) on the one hand but also code examples (.Dl) on the other hand - consequently, the EXAMPLES section is indeed rendered in a misleading way on your example page, compare to https://man.openbsd.org/mandoc.1#EXAMPLES So, with your stylesheet, almost all semantic information from all macros seems completely unhandled to me - or what am i missing? > Redundant title attributes on everything. They are not redundant. Their purpose is to display the semantic function of the word in a tooltip when hovering the mouse over it. I'll gladly do that in a better way if i find one or someone directs me into the right direction, but when i read the CSS standard, i failed to find any other way. As a matter of fact, the very document you quote uses title attributes for exactly the same purpose: https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* Look at the source code for the elements that show tool tips, like "HTMLElement" or "DOMStringMap". > Actually, worse than redundant: > it screws with assistive technologies like screen-readers, which > might read the contents of a tag to the user using the title > attribute if one is present. The title element would be read *in addition* to the contents of the element, rather than instead of the contentis, right? That would actually be useful, because not being able get a visual impression of the page as a whole, hearing the page word by word, it is even harder to correctly guess the semantic function of key words because you at first lack the necessary context of these words, which a visual impression of the page can provide without reading anything. Besides, listening to a screen reader, you wouldn't hear what is bold or italic, so getting the meta-information across in some different, verbal way seems useful to me. I admit, though, that i relatively rarely work with people who use screen readers, only every few months maybe, and never asked them to test the mandoc output. > If you want to attach page or application-specific metadata to > elements, use data-* > > <https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*> > instead. Looking at the link given there, https://html.spec.whatwg.org/multipage/dom.html#embedding-custom-non-visible-data-with-the-data-*-attributes i read: 3.2.6.6 Embedding custom non-visible data with the data-* attributes [...] These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. That is *not* at all what i want. It is vital that any browser, even those i never heard of, is able to show this information to the user. It is neither "non-visible" nor "private" data, but an important part of the output, to be shown directly to the user. So i don't quite understand why you suggest data-*. No user would never see those attributes, right? > *Presentational tags used instead of those conveying text-level > semantics: *You're > literally doing what mdoc(7) tells you not to do, except in HTML form: > - -b, -S, -o: > Flags/options should be represented using kbd tags, as they describe > user > input <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/kbd>. Not doing that is a deliberate compromise. I want the output to be comprehensible even without any stylesheet. For that reason, i cannot use <kbd> for fixed syntax elements because without a style sheet, that would make them indistinguishable from example user input, which is a very important distinction in manual pages, clearly visible even in terminal output. I could not find any HTML element that is adequate for fixed syntax elements and clearly distinguishable from example code. Can you point me to one? Besides, i never said that the final output rendering of a document must be perfect code in the final target language. As a matter of fact, that is almost never possible, just like you cannot preserve all the subleties and beauty of a poem when translating it into a different language. The target language always provides more potential for making distinctions in some areas than the source language (causing clumsy final output not using the full potential of the target language because the source language lacks information) and it always provides less potential for making distinctions in other areas, causing information to be lost or represented in non-standard ways. All i'm saying is that the *source* document must be rich in semantic markup. Most of that is inevitably lost when rendering into any target format, even if the target language is semantically rich itself, like HTML. There is no problem with that, because you must never use the transformed, inevitably degraded document as the starting point of another transformation, but you should only display it as it is. Being forced to use <b> for .Fl and .Cm is a consequence of the fact that i could not find distict elements for syntax elements and examples in HTML. > - =*option*: > Parameters should use var tags to indicate a placeholder > name for an expectant value Not sure what you are asking for here, for all i can tell, they *do* use <var> tags; i see var.Ar and var.Fa in mandoc.css, and running mandoc(1) shows to me that these <var> elements actually get emitted for .Ar macros. Can you show at which place exactly this goes wrong? > - Use dfn to markup the defining subject's name. > For mdoc, this means *Nm* I dimly remember that i considered that, but decided against it because the default rendering is indistinguishable from <var>, so you get the extrenmely confusing situation that the topic of the page looks as if needed to be replaced by something else. Similar compromise as for .Fl and .Cm: Fall back to presentational formatting that also works without any style sheet. To summarize so far, i disagree with your explicit statement that the only reason why semantic markup matters in the *source* document is as a means to help getting to a good final formatting and visual result. It has uses beyond that, some of which were mentioned in replies. But i also disagree with your apparently implicit assumption that the markup in *target* language must adhere to language purity standards. Ar *that* stage, all that matters is presentation, and when presentational needs and language purity conflict, at *that* stage, language purity must be sacrificed to achieve the best possible visual result. Obviously, HTML language purity would be important if HTML were the source language. (Side note: The situation is slightly different for -Tman and -Tmarkdown because the whole point of these output formats is that they *will* get translated again. So in these two cases, language purity is paramount, and visual quality often has to be sacrificed because trying to be too smart would ruin portability, which is the whole point of *that* exercise.) But you are certainly not supposed to ever process the HTML output of mandoc again. If you need it in a different format, restart from the original document, please. > - *Inconsistent or incorrect use of sectioning elements* > You linked to https://man.openbsd.org/gcc.1 as an example. CTRL+F and > search for "Options Controlling the Kind of Output". > I'd hotlink the section directly, but you neglected to use an ID > attribute or even an anchor element with a name attribute. That's not my fault. The original markup is: .br .ne 5 .PP \fBOptions Controlling the Kind of Output\fP .PP How is mandoc(1) supposed to figure out that that is intended as a section header? It looks like an ordinary, admittedly very short paragraph of text even to a human reader. > Did you mean to use all those separate <dl> tags as an > indication of quality output, or was > that an oversight? You mean, <dl class="Bl-tag"> <dt class="It-tag"><i>file</i><b>.cc</b></dt> <dd class="It-tag"></dd> </dl> ? Sorry, but that is in the original input file, too: .IP "\fIfile\fR\fB.cc\fR" 4 .IX Item "file.cc" .PD 0 .IP "\fIfile\fR\fB.cp\fR" 4 .IX Item "file.cp" .IP "\fIfile\fR\fB.cxx\fR" 4 .IX Item "file.cxx" The document explicitly requests paragraphs with text in the head and empty bodies, and mandoc faithfully renders that. How could it guess that the author actually meant a single list entry with hard line breaks inside the head element? I came across man pages in practice where it was unclear which of the two was intended even to the human eye, though in the case at hand, a human reader *can* probably understand what is meant - but a program can hardly decide that. > - *Pointless empty elements everywhere* You mean, <i></i><i>source</i><i>.</i><i>suffix</i> <i></i> The input file is actually forcing that with the following nonsensical low-level roff(7) code: \fI\fIsource\fI.\fIsuffix\fI\fR I say, garbage in, garbage out. The output is correct, by the way, and renders correctly. The empty elements are rendered faithfully and have no effect. Is your point that the parser should filter such nonsense out? That doesn't seem like a particularly good idea to me. Such nonsensical input is rare in the first place, so filtering would provide little benefit, but trying to detect and remove it adds additional code with potential for additional bugs, and the possibilities for insane input are limitless, so you can't possibly filter all insanity out, even if you tried to do so with substantial amounts of additional code. By the way, the reason for the insanity is that the gcc.1 man(7) code is autogenerated from perlpod(1) code which is in turn autogenerated from texinfo(5) code, and the POD already contains weird stuff like "F<I<source>.I<suffix>>" which pod2man(1) handles poorly. You can't really expect to win a beauty contest by putting make up on a pig that has been passed through a meat grinder, glued back together, and passed through another meat grinder. ;) > - *Class attributes assigned to elements which should be > styled using SIMPLE stylesheet declarations* Can you be more specific? I considered each element very carefully, and for many of them, i could not find good matches in the HTML standard. So i decided to use classes for *all* elements that carry semantic significance, both in those few cases where HTML provides adequate standard elements like <var> and <code> and in the larger number of cases where good matches are not available and i had to fall back to presentational elements or <span>. > Also, half of your "semantic stylesheet <https://man.openbsd.org/mandoc.1>" > is redundant and repeating default properties. Many values aren't actually > doing anything, There are three reasons for that: 1. When elements nest, seemingly redundant attributes that match the defaults without nesting can suddenly become relevant. 2. People can edit the stylesheet and add their own rules. In that case, attributes that are redundant in the unchanged CSS can suddenly become relevant. 3. Because 1 & 2 apply to a significant fraction of cases and there is a risk to overlook cases when trying to minimize the default CSS, i decided to list all attributes that a given elements wants to set for clarity and robustness, even if it can be shown that the default would do for a specific case. Of course, without reference to a specific attribute of a specific rule, i can't say which reason it is, or which combination of multiple reasons. > and several rulesets are empty altogether. Of course. The default stylesheet serves a double purpose. It is short and small enough to be used per default out of the box. But it is also intended as a starting point for people who want to customize their rendering, so it provides a complete listing of the classes that mandoc emits. > I can't go on. > I'm feeling queasy with fremdschaemen. > Seriously. I do have the impression that you might be able to provide useful feedback that could result in specific improvements, but from the above relatively unspecific comments, i so far can't deduce any specific plans regarding what to improved. All the same, thanks for looking at these matters, Ingo