On Wed, May 15, 2024 at 10:18 AM Markus Armbruster <arm...@redhat.com> wrote:
> John Snow <js...@redhat.com> writes: > > > On Wed, May 15, 2024, 7:50 AM Markus Armbruster <arm...@redhat.com> > wrote: > > > >> John Snow <js...@redhat.com> writes: > >> > >> > Prior to this patch, a section like this: > >> > > >> > @name: lorem ipsum > >> > dolor sit amet > >> > consectetur adipiscing elit > >> > > >> > would be parsed as: > >> > > >> > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > >> > > >> > We want to preserve the indentation for even the first body line so > that > >> > the entire block can be parsed directly as rST. This patch would now > >> > parse that segment as: > >> > > >> > "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" > >> > >> I'm afraid it's less than clear *why* we want to parse the entire block > >> directly as rST. I have just enough insight into what you've built on > >> top of this series to hazard a guess. Bear with me while I try to > >> explain it. > >> > > > > My own summary: qapidoc expects a paragraph, the new generator expects a > > block. > > > > > >> We first parse the doc comment with parser.py into an internal > >> representation. The structural parts become objects, and the remainder > >> becomes text attributes of these objects. Currently, parser.py > >> carefully adjusts indentation for these text attributes. Why? I'll get > >> to that. > >> > >> For your example, parser.py creates an ArgSection object, and sets its > >> @text member to > >> > >> "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > >> > >> Printing this string gives us > >> > >> lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> qapidoc.py then transforms parser.py's IR into Sphinx IR. The objects > >> become (more complicated) Sphinx objects, and their text attributes get > >> re-parsed as rST text into still more Sphinx objects. > >> > >> This re-parse rejects your example with "Unexpected indentation." > >> > > > > Specifically, it'd be an unexpected *unindent*; the indent lacking on the > > first *two* lines is the problem. > > > > > >> Let me use a slightly different one: > >> > >> # @name: lorem ipsum > >> # dolor sit amet > >> # consectetur adipiscing elit > >> > >> Results in this @text member > >> > >> lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> which is re-parsed as paragraph, i.e. exactly what we want. > >> > > > > It's what we used to want, anyway. > > Yes, I'm describing the current state here. > > >> > This understandably breaks qapidoc.py; > >> > >> Without indentation adjustment, we'd get > >> > >> lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> which would be re-parsed as a definition list, I guess. This isn't what > >> we want. > >> > >> > so a new function is added > there > >> > to re-dedent the text. > >> > >> Your patch moves the indentation adjustment to another place. No > >> functional change. > >> > >> You move it so you can branch off your new rendering pipeline before the > >> indentation adjustment, because ... > >> > >> > Once the new generator is merged, this function > >> > will not be needed any longer and can be dropped. > >> > >> ... yours doesn't want it. > >> > >> I believe it doesn't want it, because it generates rST (with a QAPI > >> extension) instead of Sphinx objects. For your example, something like > >> > >> :arg name: lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> For mine: > >> > >> :arg name: lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> Fair? > >> > > > > Not quite; > > > > Old parsing, new generator: > > > > :arg type name: lorem ipsum > > dolor sit amet > > consectetur apidiscing elit > > > > This is wrong - continuations of a field list must be indented. Unlike > > paragraphs, we want indents to "keep the block". > > > > New parsing, new generator: > > > > :arg type name: lorem ipsum > > dolor sit amet > > consectetur apidiscing elit > > > > indent is preserved, maintaining the block-level element. > > > > I don't have to re-add indents and any nested block elements will be > > preserved correctly. i.e. you can use code examples, nested lists, etc. > in > > argument definitions. > > > > The goal here was "Do not treat this as a paragraph, treat it directly as > > rST and do not modify it needlessly." > > > > It's a lot simpler than trying to manage the indent and injecting spaces > > manually - and adding a temporary dedent to scheduled-for-demolition code > > seemed the nicer place to add the hack. > > Understand. > > A bit more rationale in the commit message would be nice. Perhaps start > with current state ("we deintent"), then describe the patch ("move the > deindent"), then rationale "to get it out of the way of a new thingy I > wrote, and intend to post soonish", then "which will replace qapidoc.py > entirely". > Updated with more info than you require. I hear that American third graders get a free trip to Pizza Hut if they read at least 20 of my commit messages in a school year. > > >> The transition from the doc comment to (extended) rST is straightforward > >> for these examples. > >> > >> I hope it'll be as straightforward (and thus predictable) in other > >> cases, too. > >> > >> > (I verified this patch changes absolutely nothing by comparing the > >> > md5sums of the QMP ref html pages both before and after the change, so > >> > it's certified inert. QAPI test output has been updated to reflect the > >> > new strategy of preserving indents for rST.) > >> > > >> > Signed-off-by: John Snow <js...@redhat.com> > >> > --- > >> > docs/sphinx/qapidoc.py | 36 > +++++++++++++++++++++++++++++----- > >> > scripts/qapi/parser.py | 8 ++++++-- > >> > tests/qapi-schema/doc-good.out | 32 +++++++++++++++--------------- > >> > 3 files changed, 53 insertions(+), 23 deletions(-) > >> > > >> > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py > >> > index 1655682d4c7..2e3ffcbafb7 100644 > >> > --- a/docs/sphinx/qapidoc.py > >> > +++ b/docs/sphinx/qapidoc.py > >> > @@ -26,6 +26,7 @@ > >> > > >> > import os > >> > import re > >> > +import textwrap > >> > > >> > from docutils import nodes > >> > from docutils.parsers.rst import Directive, directives > >> > @@ -51,6 +52,28 @@ > >> > __version__ = "1.0" > >> > > >> > > >> > +def dedent(text: str) -> str: > >> > + # Temporary: In service of the new QAPI domain, the QAPI doc > parser > >> > + # now preserves indents in args/members/features text. QAPIDoc > does > >> > + # not handle this well, so undo that change here. > >> > + > >> > + # QAPIDoc is being rewritten and will be replaced soon, > >> > + # but this function is here in the interim as transition glue. > >> > >> I'm not sure we need the comment. > > > > OK. Guess I'd rather overcomment than undercomment... easier to delete > than > > add :) > > Right :) > > >> > + > >> > + lines = text.splitlines(True) > >> > + if len(lines) > 1: > >> > + if re.match(r"\s+", lines[0]): > >> > + # First line is indented; description started on > >> > + # the line after the name. dedent the whole block. > >> > + return textwrap.dedent(text) > >> > + else: > >> > >> pylint gripes > >> > >> docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after > "return", > >> remove the "else" and de-indent the code inside it (no-else-return) > >> > > > > Interesting. What pylint version? > > Fedora 39 > > $ pylint --version > pylint 3.0.4 > astroid 3.0.3 > Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat > 13.2.1-7)] > Realized from the prior patch I was not actually running pylint on this file, for good reason. Nonetheless, I did fix this particular gripe by using your rewrite. My current best attempt at reducing the noise: *from inside the docs/sphinx directory*: > PYTHONPATH=/home/jsnow/src/qemu/scripts/ pylint --rc-file ../../scripts/qapi/pylintrc qapidoc.py ************* Module qapidoc qapidoc.py:45:0: C0103: Constant name "Use_SSI" doesn't conform to UPPER_CASE naming style (invalid-name) qapidoc.py:49:4: E0611: No name 'AutodocReporter' in module 'sphinx.ext.autodoc' (no-name-in-module) qapidoc.py:77:10: R1708: Do not raise StopIteration in generator, use return statement instead (stop-iteration-return) qapidoc.py:567:11: R1735: Consider using '{"version": __version__, "parallel_read_safe": True, "parallel_write_safe": True, ... }' instead of a call to 'dict'. (use-dict-literal) Hm, actually, maybe this is tractable... > >> > + # Descr started on same line. Dedent line 2+. > >> > + return lines[0] + textwrap.dedent("".join(lines[1:])) > >> > + else: > >> > + # Descr was a single line; dedent entire line. > >> > + return textwrap.dedent(text) > >> > + > >> > + > >> > # fmt: off > >> > # Function borrowed from pydash, which is under the MIT license > >> > def intersperse(iterable, separator): > >> > @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, > base=None, branches=None): > >> > term = self._nodes_for_one_member(section.member) > >> > # TODO drop fallbacks when undocumented members are > outlawed > >> > if section.text: > >> > - defn = section.text > >> > + defn = dedent(section.text) > >> > else: > >> > defn = [nodes.Text('Not documented')] > >> > > >> > @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc): > >> > > termtext.extend(self._nodes_for_ifcond(section.member.ifcond)) > >> > # TODO drop fallbacks when undocumented members are > outlawed > >> > if section.text: > >> > - defn = section.text > >> > + defn = dedent(section.text) > >> > else: > >> > defn = [nodes.Text('Not documented')] > >> > > >> > @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc): > >> > dlnode = nodes.definition_list() > >> > for section in doc.features.values(): > >> > dlnode += self._make_dlitem( > >> > - [nodes.literal('', section.member.name)], > section.text) > >> > + [nodes.literal('', section.member.name)], > dedent(section.text)) > >> > seen_item = True > >> > > >> > if not seen_item: > >> > @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc): > >> > continue > >> > snode = self._make_section(section.tag) > >> > if section.tag and section.tag.startswith('Example'): > >> > - snode += self._nodes_for_example(section.text) > >> > + snode += > self._nodes_for_example(dedent(section.text)) > >> > else: > >> > - self._parse_text_into_node(section.text, snode) > >> > + self._parse_text_into_node( > >> > + dedent(section.text) if section.tag else > section.text, > >> > + snode, > >> > + ) > >> > nodelist.append(snode) > >> > return nodelist > >> > > >> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py > >> > index 7b13a583ac1..8cdd5334ec6 100644 > >> > --- a/scripts/qapi/parser.py > >> > +++ b/scripts/qapi/parser.py > >> > @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> > Optional[str]: > >> > indent = must_match(r'\s*', line).end() > >> > if not indent: > >> > return line > >> > - doc.append_line(line[indent:]) > >> > + > >> > + # Preserve the indent, it's needed for rST formatting. > >> > >> I'm not sure we need the comment. > >> > > > > OK. > > > > > >> > + doc.append_line(line) > >> > + > >> > prev_line_blank = False > >> > while True: > >> > self.accept(False) > >> > @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> > Optional[str]: > >> > self, > >> > "unexpected de-indent (expected at least %d > spaces)" % > >> > indent) > >> > - doc.append_line(line[indent:]) > >> > + # Again, preserve the indent for ReST. > >> > >> Likewise. > >> > >> If you want to document the fact that .get_doc_indented() preserves > >> indentation, a function comment or doc string would the proper place. > >> > > > > Got it. > > > > > >> > + doc.append_line(line) > >> > prev_line_blank = True > >> > > >> > def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]: > >> > >> Correctness argument: > >> > >> 0. This patch merely moves an indentation adjustment from parser.py to > >> qapidoc.py. > >> > >> Checks out: you remove indentation adjustment in parser.py, add it in > >> qapidoc.py, and that's it. > >> > >> 1. The new indentation adjuster in qapidoc.py behaves just like the old > >> one in parser.py did. > >> > >> parser.py's .get_doc_indented() is used to parse what follows certain > >> sections' first line. It's users store the text on this first line, > >> if any, with leading whitespace stripped, then call > >> .get_doc_indented() to eat the section's remaining lines. All > >> non-blank lines eaten must be indented at least as much as the first > >> non-blank line. .get_doc_indented() appends the lines eaten to the > >> stored text with the first non-blank line's indentation stripped from > >> all the non-blank lines. > >> > >> Your patch drops this stripping of indentation from non-first lines. > >> This is what must now be done in qapidoc.py. > >> > >> If the section's first line has no text, the first non-blank line's > >> indentation must be stripped from all non-blank lines. > >> > >> If the section's first line has text, the next non-blank line's > >> indentation must be stripped from all lines but the first. > >> > >> How can qapidoc.py detect whether the section's first line has text? > >> Fortunately, such a line will be unindented, i.e. start with a > >> non-blank character, and all remaining lines will be blank or > >> indented. > >> > >> qapidoc.py's dedent() seems to remove indentation common to all > >> non-blank lines. Except when there are multiple lines, and the first > >> one is not indented, it removes common indentation from the remaining > >> lines. > >> > >> Common indentation works, because all lines are indented at least as > >> much as the one whose indentation we want to remove. > >> > >> The conditional doesn't quite match the "if the section's first line" > >> above. This simpler dedent() does, and also works in my testing: > >> > >> def dedent(text: str) -> str: > >> lines = text.splitlines(True) > >> if re.match(r"\s+", lines[0]): > >> # First line is indented; description started on > >> # the line after the name. dedent the whole block. > >> return textwrap.dedent(text) > >> # Descr started on same line. Dedent line 2+. > >> return lines[0] + textwrap.dedent("".join(lines[1:])) > >> > >> What do you think? > >> > > > > I try not to on most days. > > I always say "me thinking will cost you extra" ;) > > > I'll check it out, though in practice the generated documents were > already > > identical, so... I'll try yours and verify that's still true. If so, > sure! > Yup, it checks out and the md5sums are still the same. Good enough for me. > > > > > >> 2. The new one is applied exactly when the old one was. > >> > >> The old one is applied to the sections starting with @FOO: and the > >> tagged sections (Returns:, Errors:, ...). > >> > >> The new one is applied in ._nodes_for_members(), > >> ._nodes_for_enum_values(), _nodes_for_features(), and > >> ._nodes_for_sections(). > >> > >> It is not applied to the text of untagged sections, including the > >> body section. > >> > >> Good. > > Thanks! > > [...] > >