parser: preserve indentation in QAPIDoc sections

John Snow Wed, 15 May 2024 10:05:26 -0700

On Wed, May 15, 2024 at 10:18 AM Markus Armbruster <arm...@redhat.com>
wrote:


> John Snow <js...@redhat.com> writes:
>
> > On Wed, May 15, 2024, 7:50 AM Markus Armbruster <arm...@redhat.com>
> wrote:
> >
> >> John Snow <js...@redhat.com> writes:
> >>
> >> > Prior to this patch, a section like this:
> >> >
> >> > @name: lorem ipsum
> >> >    dolor sit amet
> >> >      consectetur adipiscing elit
> >> >
> >> > would be parsed as:
> >> >
> >> > "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
> >> >
> >> > We want to preserve the indentation for even the first body line so
> that
> >> > the entire block can be parsed directly as rST. This patch would now
> >> > parse that segment as:
> >> >
> >> > "lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"
> >>
> >> I'm afraid it's less than clear *why* we want to parse the entire block
> >> directly as rST.  I have just enough insight into what you've built on
> >> top of this series to hazard a guess.  Bear with me while I try to
> >> explain it.
> >>
> >
> > My own summary: qapidoc expects a paragraph, the new generator expects a
> > block.
> >
> >
> >> We first parse the doc comment with parser.py into an internal
> >> representation.  The structural parts become objects, and the remainder
> >> becomes text attributes of these objects.  Currently, parser.py
> >> carefully adjusts indentation for these text attributes.  Why?  I'll get
> >> to that.
> >>
> >> For your example, parser.py creates an ArgSection object, and sets its
> >> @text member to
> >>
> >>     "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
> >>
> >> Printing this string gives us
> >>
> >>     lorem ipsum
> >>     dolor sit amet
> >>       consectetur adipiscing elit
> >>
> >> qapidoc.py then transforms parser.py's IR into Sphinx IR.  The objects
> >> become (more complicated) Sphinx objects, and their text attributes get
> >> re-parsed as rST text into still more Sphinx objects.
> >>
> >> This re-parse rejects your example with "Unexpected indentation."
> >>
> >
> > Specifically, it'd be an unexpected *unindent*; the indent lacking on the
> > first *two* lines is the problem.
> >
> >
> >> Let me use a slightly different one:
> >>
> >>     # @name: lorem ipsum
> >>     #    dolor sit amet
> >>     #    consectetur adipiscing elit
> >>
> >> Results in this @text member
> >>
> >>     lorem ipsum
> >>     dolor sit amet
> >>     consectetur adipiscing elit
> >>
> >> which is re-parsed as paragraph, i.e. exactly what we want.
> >>
> >
> > It's what we used to want, anyway.
>
> Yes, I'm describing the current state here.
>
> >> > This understandably breaks qapidoc.py;
> >>
> >> Without indentation adjustment, we'd get
> >>
> >>     lorem ipsum
> >>        dolor sit amet
> >>        consectetur adipiscing elit
> >>
> >> which would be re-parsed as a definition list, I guess.  This isn't what
> >> we want.
> >>
> >> >                                        so a new function is added
> there
> >> > to re-dedent the text.
> >>
> >> Your patch moves the indentation adjustment to another place.  No
> >> functional change.
> >>
> >> You move it so you can branch off your new rendering pipeline before the
> >> indentation adjustment, because ...
> >>
> >> >                        Once the new generator is merged, this function
> >> > will not be needed any longer and can be dropped.
> >>
> >> ... yours doesn't want it.
> >>
> >> I believe it doesn't want it, because it generates rST (with a QAPI
> >> extension) instead of Sphinx objects.  For your example, something like
> >>
> >>     :arg name: lorem ipsum
> >>        dolor sit amet
> >>          consectetur adipiscing elit
> >>
> >> For mine:
> >>
> >>     :arg name: lorem ipsum
> >>        dolor sit amet
> >>        consectetur adipiscing elit
> >>
> >> Fair?
> >>
> >
> > Not quite;
> >
> > Old parsing, new generator:
> >
> > :arg type name: lorem ipsum
> > dolor sit amet
> >   consectetur apidiscing elit
> >
> > This is wrong - continuations of a field list must be indented. Unlike
> > paragraphs, we want indents to "keep the block".
> >
> > New parsing, new generator:
> >
> > :arg type name: lorem ipsum
> >    dolor sit amet
> >      consectetur apidiscing elit
> >
> > indent is preserved, maintaining the block-level element.
> >
> > I don't have to re-add indents and any nested block elements will be
> > preserved correctly. i.e. you can use code examples, nested lists, etc.
> in
> > argument definitions.
> >
> > The goal here was "Do not treat this as a paragraph, treat it directly as
> > rST and do not modify it needlessly."
> >
> > It's a lot simpler than trying to manage the indent and injecting spaces
> > manually - and adding a temporary dedent to scheduled-for-demolition code
> > seemed the nicer place to add the hack.
>
> Understand.
>
> A bit more rationale in the commit message would be nice.  Perhaps start
> with current state ("we deintent"), then describe the patch ("move the
> deindent"), then rationale "to get it out of the way of a new thingy I
> wrote, and intend to post soonish", then "which will replace qapidoc.py
> entirely".
>

Updated with more info than you require. I hear that American third graders
get a free trip to Pizza Hut if they read at least 20 of my commit messages
in a school year.


>
> >> The transition from the doc comment to (extended) rST is straightforward
> >> for these examples.
> >>
> >> I hope it'll be as straightforward (and thus predictable) in other
> >> cases, too.
> >>
> >> > (I verified this patch changes absolutely nothing by comparing the
> >> > md5sums of the QMP ref html pages both before and after the change, so
> >> > it's certified inert. QAPI test output has been updated to reflect the
> >> > new strategy of preserving indents for rST.)
> >> >
> >> > Signed-off-by: John Snow <js...@redhat.com>
> >> > ---
> >> >  docs/sphinx/qapidoc.py         | 36
> +++++++++++++++++++++++++++++-----
> >> >  scripts/qapi/parser.py         |  8 ++++++--
> >> >  tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
> >> >  3 files changed, 53 insertions(+), 23 deletions(-)
> >> >
> >> > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
> >> > index 1655682d4c7..2e3ffcbafb7 100644
> >> > --- a/docs/sphinx/qapidoc.py
> >> > +++ b/docs/sphinx/qapidoc.py
> >> > @@ -26,6 +26,7 @@
> >> >
> >> >  import os
> >> >  import re
> >> > +import textwrap
> >> >
> >> >  from docutils import nodes
> >> >  from docutils.parsers.rst import Directive, directives
> >> > @@ -51,6 +52,28 @@
> >> >  __version__ = "1.0"
> >> >
> >> >
> >> > +def dedent(text: str) -> str:
> >> > +    # Temporary: In service of the new QAPI domain, the QAPI doc
> parser
> >> > +    # now preserves indents in args/members/features text. QAPIDoc
> does
> >> > +    # not handle this well, so undo that change here.
> >> > +
> >> > +    # QAPIDoc is being rewritten and will be replaced soon,
> >> > +    # but this function is here in the interim as transition glue.
> >>
> >> I'm not sure we need the comment.
> >
> > OK. Guess I'd rather overcomment than undercomment... easier to delete
> than
> > add :)
>
> Right :)
>
> >> > +
> >> > +    lines = text.splitlines(True)
> >> > +    if len(lines) > 1:
> >> > +        if re.match(r"\s+", lines[0]):
> >> > +            # First line is indented; description started on
> >> > +            # the line after the name. dedent the whole block.
> >> > +            return textwrap.dedent(text)
> >> > +        else:
> >>
> >> pylint gripes
> >>
> >>     docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after
> "return",
> >> remove the "else" and de-indent the code inside it (no-else-return)
> >>
> >
> > Interesting. What pylint version?
>
> Fedora 39
>
> $ pylint --version
> pylint 3.0.4
> astroid 3.0.3
> Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat
> 13.2.1-7)]
>

Realized from the prior patch I was not actually running pylint on this
file, for good reason. Nonetheless, I did fix this particular gripe by
using your rewrite.

My current best attempt at reducing the noise:

*from inside the docs/sphinx directory*:

> PYTHONPATH=/home/jsnow/src/qemu/scripts/ pylint --rc-file
../../scripts/qapi/pylintrc qapidoc.py
************* Module qapidoc
qapidoc.py:45:0: C0103: Constant name "Use_SSI" doesn't conform to
UPPER_CASE naming style (invalid-name)
qapidoc.py:49:4: E0611: No name 'AutodocReporter' in module
'sphinx.ext.autodoc' (no-name-in-module)
qapidoc.py:77:10: R1708: Do not raise StopIteration in generator, use
return statement instead (stop-iteration-return)
qapidoc.py:567:11: R1735: Consider using '{"version": __version__,
"parallel_read_safe": True, "parallel_write_safe": True, ... }' instead of
a call to 'dict'. (use-dict-literal)

 Hm, actually, maybe this is tractable...


> >> > +            # Descr started on same line. Dedent line 2+.
> >> > +            return lines[0] + textwrap.dedent("".join(lines[1:]))
> >> > +    else:
> >> > +        # Descr was a single line; dedent entire line.
> >> > +        return textwrap.dedent(text)
> >> > +
> >> > +
> >> >  # fmt: off
> >> >  # Function borrowed from pydash, which is under the MIT license
> >> >  def intersperse(iterable, separator):
> >> > @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what,
> base=None, branches=None):
> >> >              term = self._nodes_for_one_member(section.member)
> >> >              # TODO drop fallbacks when undocumented members are
> outlawed
> >> >              if section.text:
> >> > -                defn = section.text
> >> > +                defn = dedent(section.text)
> >> >              else:
> >> >                  defn = [nodes.Text('Not documented')]
> >> >
> >> > @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
> >> >
> termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
> >> >              # TODO drop fallbacks when undocumented members are
> outlawed
> >> >              if section.text:
> >> > -                defn = section.text
> >> > +                defn = dedent(section.text)
> >> >              else:
> >> >                  defn = [nodes.Text('Not documented')]
> >> >
> >> > @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
> >> >          dlnode = nodes.definition_list()
> >> >          for section in doc.features.values():
> >> >              dlnode += self._make_dlitem(
> >> > -                [nodes.literal('', section.member.name)],
> section.text)
> >> > +                [nodes.literal('', section.member.name)],
> dedent(section.text))
> >> >              seen_item = True
> >> >
> >> >          if not seen_item:
> >> > @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
> >> >                  continue
> >> >              snode = self._make_section(section.tag)
> >> >              if section.tag and section.tag.startswith('Example'):
> >> > -                snode += self._nodes_for_example(section.text)
> >> > +                snode +=
> self._nodes_for_example(dedent(section.text))
> >> >              else:
> >> > -                self._parse_text_into_node(section.text, snode)
> >> > +                self._parse_text_into_node(
> >> > +                    dedent(section.text) if section.tag else
> section.text,
> >> > +                    snode,
> >> > +                )
> >> >              nodelist.append(snode)
> >> >          return nodelist
> >> >
> >> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> >> > index 7b13a583ac1..8cdd5334ec6 100644
> >> > --- a/scripts/qapi/parser.py
> >> > +++ b/scripts/qapi/parser.py
> >> > @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >> >          indent = must_match(r'\s*', line).end()
> >> >          if not indent:
> >> >              return line
> >> > -        doc.append_line(line[indent:])
> >> > +
> >> > +        # Preserve the indent, it's needed for rST formatting.
> >>
> >> I'm not sure we need the comment.
> >>
> >
> > OK.
> >
> >
> >> > +        doc.append_line(line)
> >> > +
> >> >          prev_line_blank = False
> >> >          while True:
> >> >              self.accept(False)
> >> > @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >> >                      self,
> >> >                      "unexpected de-indent (expected at least %d
> spaces)" %
> >> >                      indent)
> >> > -            doc.append_line(line[indent:])
> >> > +            # Again, preserve the indent for ReST.
> >>
> >> Likewise.
> >>
> >> If you want to document the fact that .get_doc_indented() preserves
> >> indentation, a function comment or doc string would the proper place.
> >>
> >
> > Got it.
> >
> >
> >> > +            doc.append_line(line)
> >> >              prev_line_blank = True
> >> >
> >> >      def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
> >>
> >> Correctness argument:
> >>
> >> 0. This patch merely moves an indentation adjustment from parser.py to
> >>    qapidoc.py.
> >>
> >>    Checks out: you remove indentation adjustment in parser.py, add it in
> >>    qapidoc.py, and that's it.
> >>
> >> 1. The new indentation adjuster in qapidoc.py behaves just like the old
> >>    one in parser.py did.
> >>
> >>    parser.py's .get_doc_indented() is used to parse what follows certain
> >>    sections' first line.  It's users store the text on this first line,
> >>    if any, with leading whitespace stripped, then call
> >>    .get_doc_indented() to eat the section's remaining lines.  All
> >>    non-blank lines eaten must be indented at least as much as the first
> >>    non-blank line.  .get_doc_indented() appends the lines eaten to the
> >>    stored text with the first non-blank line's indentation stripped from
> >>    all the non-blank lines.
> >>
> >>    Your patch drops this stripping of indentation from non-first lines.
> >>    This is what must now be done in qapidoc.py.
> >>
> >>    If the section's first line has no text, the first non-blank line's
> >>    indentation must be stripped from all non-blank lines.
> >>
> >>    If the section's first line has text, the next non-blank line's
> >>    indentation must be stripped from all lines but the first.
> >>
> >>    How can qapidoc.py detect whether the section's first line has text?
> >>    Fortunately, such a line will be unindented, i.e. start with a
> >>    non-blank character, and all remaining lines will be blank or
> >>    indented.
> >>
> >>    qapidoc.py's dedent() seems to remove indentation common to all
> >>    non-blank lines.  Except when there are multiple lines, and the first
> >>    one is not indented, it removes common indentation from the remaining
> >>    lines.
> >>
> >>    Common indentation works, because all lines are indented at least as
> >>    much as the one whose indentation we want to remove.
> >>
> >>    The conditional doesn't quite match the "if the section's first line"
> >>    above.  This simpler dedent() does, and also works in my testing:
> >>
> >> def dedent(text: str) -> str:
> >>     lines = text.splitlines(True)
> >>     if re.match(r"\s+", lines[0]):
> >>         # First line is indented; description started on
> >>         # the line after the name. dedent the whole block.
> >>         return textwrap.dedent(text)
> >>     # Descr started on same line. Dedent line 2+.
> >>     return lines[0] + textwrap.dedent("".join(lines[1:]))
> >>
> >>    What do you think?
> >>
> >
> > I try not to on most days.
>
> I always say "me thinking will cost you extra" ;)
>
> > I'll check it out, though in practice the generated documents were
> already
> > identical, so... I'll try yours and verify that's still true. If so,
> sure!
>

Yup, it checks out and the md5sums are still the same. Good enough for me.


> >
> >
> >> 2. The new one is applied exactly when the old one was.
> >>
> >>    The old one is applied to the sections starting with @FOO: and the
> >>    tagged sections (Returns:, Errors:, ...).
> >>
> >>    The new one is applied in ._nodes_for_members(),
> >>    ._nodes_for_enum_values(), _nodes_for_features(), and
> >>    ._nodes_for_sections().
> >>
> >>    It is not applied to the text of untagged sections, including the
> >>    body section.
> >>
> >>    Good.
>
> Thanks!
>
> [...]
>
>

Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Reply via email to