Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections
On Wed, May 15, 2024 at 10:18 AM Markus Armbruster wrote: > John Snow writes: > > > On Wed, May 15, 2024, 7:50 AM Markus Armbruster > wrote: > > > >> John Snow writes: > >> > >> > Prior to this patch, a section like this: > >> > > >> > @name: lorem ipsum > >> >dolor sit amet > >> > consectetur adipiscing elit > >> > > >> > would be parsed as: > >> > > >> > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > >> > > >> > We want to preserve the indentation for even the first body line so > that > >> > the entire block can be parsed directly as rST. This patch would now > >> > parse that segment as: > >> > > >> > "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" > >> > >> I'm afraid it's less than clear *why* we want to parse the entire block > >> directly as rST. I have just enough insight into what you've built on > >> top of this series to hazard a guess. Bear with me while I try to > >> explain it. > >> > > > > My own summary: qapidoc expects a paragraph, the new generator expects a > > block. > > > > > >> We first parse the doc comment with parser.py into an internal > >> representation. The structural parts become objects, and the remainder > >> becomes text attributes of these objects. Currently, parser.py > >> carefully adjusts indentation for these text attributes. Why? I'll get > >> to that. > >> > >> For your example, parser.py creates an ArgSection object, and sets its > >> @text member to > >> > >> "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > >> > >> Printing this string gives us > >> > >> lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> qapidoc.py then transforms parser.py's IR into Sphinx IR. The objects > >> become (more complicated) Sphinx objects, and their text attributes get > >> re-parsed as rST text into still more Sphinx objects. > >> > >> This re-parse rejects your example with "Unexpected indentation." > >> > > > > Specifically, it'd be an unexpected *unindent*; the indent lacking on the > > first *two* lines is the problem. > > > > > >> Let me use a slightly different one: > >> > >> # @name: lorem ipsum > >> #dolor sit amet > >> #consectetur adipiscing elit > >> > >> Results in this @text member > >> > >> lorem ipsum > >> dolor sit amet > >> consectetur adipiscing elit > >> > >> which is re-parsed as paragraph, i.e. exactly what we want. > >> > > > > It's what we used to want, anyway. > > Yes, I'm describing the current state here. > > >> > This understandably breaks qapidoc.py; > >> > >> Without indentation adjustment, we'd get > >> > >> lorem ipsum > >>dolor sit amet > >>consectetur adipiscing elit > >> > >> which would be re-parsed as a definition list, I guess. This isn't what > >> we want. > >> > >> >so a new function is added > there > >> > to re-dedent the text. > >> > >> Your patch moves the indentation adjustment to another place. No > >> functional change. > >> > >> You move it so you can branch off your new rendering pipeline before the > >> indentation adjustment, because ... > >> > >> >Once the new generator is merged, this function > >> > will not be needed any longer and can be dropped. > >> > >> ... yours doesn't want it. > >> > >> I believe it doesn't want it, because it generates rST (with a QAPI > >> extension) instead of Sphinx objects. For your example, something like > >> > >> :arg name: lorem ipsum > >>dolor sit amet > >> consectetur adipiscing elit > >> > >> For mine: > >> > >> :arg name: lorem ipsum > >>dolor sit amet > >>consectetur adipiscing elit > >> > >> Fair? > >> > > > > Not quite; > > > > Old parsing, new generator: > > > > :arg type name: lorem ipsum > > dolor sit amet > > consectetur apidiscing elit > > > > This is wrong - continuations of a field list must be indented. Unlike > > paragraphs, we want indents to "keep the block". > > > > New parsing, new generator: > > > > :arg type name: lorem ipsum > >dolor sit amet > > consectetur apidiscing elit > > > > indent is preserved, maintaining the block-level element. > > > > I don't have to re-add indents and any nested block elements will be > > preserved correctly. i.e. you can use code examples, nested lists, etc. > in > > argument definitions. > > > > The goal here was "Do not treat this as a paragraph, treat it directly as > > rST and do not modify it needlessly." > > > > It's a lot simpler than trying to manage the indent and injecting spaces > > manually - and adding a temporary dedent to scheduled-for-demolition code > > seemed the nicer place to add the hack. > > Understand. > > A bit more rationale in the commit message would be nice. Perhaps start > with current state ("we deintent"), then describe the patch ("move the > deindent"), then rationale "to get it out of the way of a new thingy I > wrote, and intend to p
Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections
John Snow writes: > On Wed, May 15, 2024, 7:50 AM Markus Armbruster wrote: > >> John Snow writes: >> >> > Prior to this patch, a section like this: >> > >> > @name: lorem ipsum >> >dolor sit amet >> > consectetur adipiscing elit >> > >> > would be parsed as: >> > >> > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" >> > >> > We want to preserve the indentation for even the first body line so that >> > the entire block can be parsed directly as rST. This patch would now >> > parse that segment as: >> > >> > "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" >> >> I'm afraid it's less than clear *why* we want to parse the entire block >> directly as rST. I have just enough insight into what you've built on >> top of this series to hazard a guess. Bear with me while I try to >> explain it. >> > > My own summary: qapidoc expects a paragraph, the new generator expects a > block. > > >> We first parse the doc comment with parser.py into an internal >> representation. The structural parts become objects, and the remainder >> becomes text attributes of these objects. Currently, parser.py >> carefully adjusts indentation for these text attributes. Why? I'll get >> to that. >> >> For your example, parser.py creates an ArgSection object, and sets its >> @text member to >> >> "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" >> >> Printing this string gives us >> >> lorem ipsum >> dolor sit amet >> consectetur adipiscing elit >> >> qapidoc.py then transforms parser.py's IR into Sphinx IR. The objects >> become (more complicated) Sphinx objects, and their text attributes get >> re-parsed as rST text into still more Sphinx objects. >> >> This re-parse rejects your example with "Unexpected indentation." >> > > Specifically, it'd be an unexpected *unindent*; the indent lacking on the > first *two* lines is the problem. > > >> Let me use a slightly different one: >> >> # @name: lorem ipsum >> #dolor sit amet >> #consectetur adipiscing elit >> >> Results in this @text member >> >> lorem ipsum >> dolor sit amet >> consectetur adipiscing elit >> >> which is re-parsed as paragraph, i.e. exactly what we want. >> > > It's what we used to want, anyway. Yes, I'm describing the current state here. >> > This understandably breaks qapidoc.py; >> >> Without indentation adjustment, we'd get >> >> lorem ipsum >>dolor sit amet >>consectetur adipiscing elit >> >> which would be re-parsed as a definition list, I guess. This isn't what >> we want. >> >> >so a new function is added there >> > to re-dedent the text. >> >> Your patch moves the indentation adjustment to another place. No >> functional change. >> >> You move it so you can branch off your new rendering pipeline before the >> indentation adjustment, because ... >> >> >Once the new generator is merged, this function >> > will not be needed any longer and can be dropped. >> >> ... yours doesn't want it. >> >> I believe it doesn't want it, because it generates rST (with a QAPI >> extension) instead of Sphinx objects. For your example, something like >> >> :arg name: lorem ipsum >>dolor sit amet >> consectetur adipiscing elit >> >> For mine: >> >> :arg name: lorem ipsum >>dolor sit amet >>consectetur adipiscing elit >> >> Fair? >> > > Not quite; > > Old parsing, new generator: > > :arg type name: lorem ipsum > dolor sit amet > consectetur apidiscing elit > > This is wrong - continuations of a field list must be indented. Unlike > paragraphs, we want indents to "keep the block". > > New parsing, new generator: > > :arg type name: lorem ipsum >dolor sit amet > consectetur apidiscing elit > > indent is preserved, maintaining the block-level element. > > I don't have to re-add indents and any nested block elements will be > preserved correctly. i.e. you can use code examples, nested lists, etc. in > argument definitions. > > The goal here was "Do not treat this as a paragraph, treat it directly as > rST and do not modify it needlessly." > > It's a lot simpler than trying to manage the indent and injecting spaces > manually - and adding a temporary dedent to scheduled-for-demolition code > seemed the nicer place to add the hack. Understand. A bit more rationale in the commit message would be nice. Perhaps start with current state ("we deintent"), then describe the patch ("move the deindent"), then rationale "to get it out of the way of a new thingy I wrote, and intend to post soonish", then "which will replace qapidoc.py entirely". >> The transition from the doc comment to (extended) rST is straightforward >> for these examples. >> >> I hope it'll be as straightforward (and thus predictable) in other >> cases, too. >> >> > (I verified this patch changes absolutely nothing by comparing the >> > md5sums of the QMP ref html pages both before
Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections
On Wed, May 15, 2024, 7:50 AM Markus Armbruster wrote: > John Snow writes: > > > Prior to this patch, a section like this: > > > > @name: lorem ipsum > >dolor sit amet > > consectetur adipiscing elit > > > > would be parsed as: > > > > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > > > > We want to preserve the indentation for even the first body line so that > > the entire block can be parsed directly as rST. This patch would now > > parse that segment as: > > > > "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" > > I'm afraid it's less than clear *why* we want to parse the entire block > directly as rST. I have just enough insight into what you've built on > top of this series to hazard a guess. Bear with me while I try to > explain it. > My own summary: qapidoc expects a paragraph, the new generator expects a block. > We first parse the doc comment with parser.py into an internal > representation. The structural parts become objects, and the remainder > becomes text attributes of these objects. Currently, parser.py > carefully adjusts indentation for these text attributes. Why? I'll get > to that. > > For your example, parser.py creates an ArgSection object, and sets its > @text member to > > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > > Printing this string gives us > > lorem ipsum > dolor sit amet > consectetur adipiscing elit > > qapidoc.py then transforms parser.py's IR into Sphinx IR. The objects > become (more complicated) Sphinx objects, and their text attributes get > re-parsed as rST text into still more Sphinx objects. > > This re-parse rejects your example with "Unexpected indentation." > Specifically, it'd be an unexpected *unindent*; the indent lacking on the first *two* lines is the problem. > Let me use a slightly different one: > > # @name: lorem ipsum > #dolor sit amet > #consectetur adipiscing elit > > Results in this @text member > > lorem ipsum > dolor sit amet > consectetur adipiscing elit > > which is re-parsed as paragraph, i.e. exactly what we want. > It's what we used to want, anyway. > > This understandably breaks qapidoc.py; > > Without indentation adjustment, we'd get > > lorem ipsum >dolor sit amet >consectetur adipiscing elit > > which would be re-parsed as a definition list, I guess. This isn't what > we want. > > >so a new function is added there > > to re-dedent the text. > > Your patch moves the indentation adjustment to another place. No > functional change. > > You move it so you can branch off your new rendering pipeline before the > indentation adjustment, because ... > > >Once the new generator is merged, this function > > will not be needed any longer and can be dropped. > > ... yours doesn't want it. > > I believe it doesn't want it, because it generates rST (with a QAPI > extension) instead of Sphinx objects. For your example, something like > > :arg name: lorem ipsum >dolor sit amet > consectetur adipiscing elit > > For mine: > > :arg name: lorem ipsum >dolor sit amet >consectetur adipiscing elit > > Fair? > Not quite; Old parsing, new generator: :arg type name: lorem ipsum dolor sit amet consectetur apidiscing elit This is wrong - continuations of a field list must be indented. Unlike paragraphs, we want indents to "keep the block". New parsing, new generator: :arg type name: lorem ipsum dolor sit amet consectetur apidiscing elit indent is preserved, maintaining the block-level element. I don't have to re-add indents and any nested block elements will be preserved correctly. i.e. you can use code examples, nested lists, etc. in argument definitions. The goal here was "Do not treat this as a paragraph, treat it directly as rST and do not modify it needlessly." It's a lot simpler than trying to manage the indent and injecting spaces manually - and adding a temporary dedent to scheduled-for-demolition code seemed the nicer place to add the hack. > The transition from the doc comment to (extended) rST is straightforward > for these examples. > > I hope it'll be as straightforward (and thus predictable) in other > cases, too. > > > (I verified this patch changes absolutely nothing by comparing the > > md5sums of the QMP ref html pages both before and after the change, so > > it's certified inert. QAPI test output has been updated to reflect the > > new strategy of preserving indents for rST.) > > > > Signed-off-by: John Snow > > --- > > docs/sphinx/qapidoc.py | 36 +- > > scripts/qapi/parser.py | 8 ++-- > > tests/qapi-schema/doc-good.out | 32 +++--- > > 3 files changed, 53 insertions(+), 23 deletions(-) > > > > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py > > index 1655682d4c7..2e3ffcbafb7 1006
Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections
John Snow writes: > Prior to this patch, a section like this: > > @name: lorem ipsum >dolor sit amet > consectetur adipiscing elit > > would be parsed as: > > "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" > > We want to preserve the indentation for even the first body line so that > the entire block can be parsed directly as rST. This patch would now > parse that segment as: > > "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" I'm afraid it's less than clear *why* we want to parse the entire block directly as rST. I have just enough insight into what you've built on top of this series to hazard a guess. Bear with me while I try to explain it. We first parse the doc comment with parser.py into an internal representation. The structural parts become objects, and the remainder becomes text attributes of these objects. Currently, parser.py carefully adjusts indentation for these text attributes. Why? I'll get to that. For your example, parser.py creates an ArgSection object, and sets its @text member to "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" Printing this string gives us lorem ipsum dolor sit amet consectetur adipiscing elit qapidoc.py then transforms parser.py's IR into Sphinx IR. The objects become (more complicated) Sphinx objects, and their text attributes get re-parsed as rST text into still more Sphinx objects. This re-parse rejects your example with "Unexpected indentation." Let me use a slightly different one: # @name: lorem ipsum #dolor sit amet #consectetur adipiscing elit Results in this @text member lorem ipsum dolor sit amet consectetur adipiscing elit which is re-parsed as paragraph, i.e. exactly what we want. > This understandably breaks qapidoc.py; Without indentation adjustment, we'd get lorem ipsum dolor sit amet consectetur adipiscing elit which would be re-parsed as a definition list, I guess. This isn't what we want. >so a new function is added there > to re-dedent the text. Your patch moves the indentation adjustment to another place. No functional change. You move it so you can branch off your new rendering pipeline before the indentation adjustment, because ... >Once the new generator is merged, this function > will not be needed any longer and can be dropped. ... yours doesn't want it. I believe it doesn't want it, because it generates rST (with a QAPI extension) instead of Sphinx objects. For your example, something like :arg name: lorem ipsum dolor sit amet consectetur adipiscing elit For mine: :arg name: lorem ipsum dolor sit amet consectetur adipiscing elit Fair? The transition from the doc comment to (extended) rST is straightforward for these examples. I hope it'll be as straightforward (and thus predictable) in other cases, too. > (I verified this patch changes absolutely nothing by comparing the > md5sums of the QMP ref html pages both before and after the change, so > it's certified inert. QAPI test output has been updated to reflect the > new strategy of preserving indents for rST.) > > Signed-off-by: John Snow > --- > docs/sphinx/qapidoc.py | 36 +- > scripts/qapi/parser.py | 8 ++-- > tests/qapi-schema/doc-good.out | 32 +++--- > 3 files changed, 53 insertions(+), 23 deletions(-) > > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py > index 1655682d4c7..2e3ffcbafb7 100644 > --- a/docs/sphinx/qapidoc.py > +++ b/docs/sphinx/qapidoc.py > @@ -26,6 +26,7 @@ > > import os > import re > +import textwrap > > from docutils import nodes > from docutils.parsers.rst import Directive, directives > @@ -51,6 +52,28 @@ > __version__ = "1.0" > > > +def dedent(text: str) -> str: > +# Temporary: In service of the new QAPI domain, the QAPI doc parser > +# now preserves indents in args/members/features text. QAPIDoc does > +# not handle this well, so undo that change here. > + > +# QAPIDoc is being rewritten and will be replaced soon, > +# but this function is here in the interim as transition glue. I'm not sure we need the comment. > + > +lines = text.splitlines(True) > +if len(lines) > 1: > +if re.match(r"\s+", lines[0]): > +# First line is indented; description started on > +# the line after the name. dedent the whole block. > +return textwrap.dedent(text) > +else: pylint gripes docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return) > +# Descr started on same line. Dedent line 2+. > +return lines[0] + textwrap.dedent("".join(lines[1:])) > +else: > +# Descr was a single line; dedent entire line. > +return t
[PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections
Prior to this patch, a section like this: @name: lorem ipsum dolor sit amet consectetur adipiscing elit would be parsed as: "lorem ipsum\ndolor sit amet\n consectetur adipiscing elit" We want to preserve the indentation for even the first body line so that the entire block can be parsed directly as rST. This patch would now parse that segment as: "lorem ipsum\n dolor sit amet\n consectetur adipiscing elit" This understandably breaks qapidoc.py; so a new function is added there to re-dedent the text. Once the new generator is merged, this function will not be needed any longer and can be dropped. (I verified this patch changes absolutely nothing by comparing the md5sums of the QMP ref html pages both before and after the change, so it's certified inert. QAPI test output has been updated to reflect the new strategy of preserving indents for rST.) Signed-off-by: John Snow --- docs/sphinx/qapidoc.py | 36 +- scripts/qapi/parser.py | 8 ++-- tests/qapi-schema/doc-good.out | 32 +++--- 3 files changed, 53 insertions(+), 23 deletions(-) diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py index 1655682d4c7..2e3ffcbafb7 100644 --- a/docs/sphinx/qapidoc.py +++ b/docs/sphinx/qapidoc.py @@ -26,6 +26,7 @@ import os import re +import textwrap from docutils import nodes from docutils.parsers.rst import Directive, directives @@ -51,6 +52,28 @@ __version__ = "1.0" +def dedent(text: str) -> str: +# Temporary: In service of the new QAPI domain, the QAPI doc parser +# now preserves indents in args/members/features text. QAPIDoc does +# not handle this well, so undo that change here. + +# QAPIDoc is being rewritten and will be replaced soon, +# but this function is here in the interim as transition glue. + +lines = text.splitlines(True) +if len(lines) > 1: +if re.match(r"\s+", lines[0]): +# First line is indented; description started on +# the line after the name. dedent the whole block. +return textwrap.dedent(text) +else: +# Descr started on same line. Dedent line 2+. +return lines[0] + textwrap.dedent("".join(lines[1:])) +else: +# Descr was a single line; dedent entire line. +return textwrap.dedent(text) + + # fmt: off # Function borrowed from pydash, which is under the MIT license def intersperse(iterable, separator): @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None, branches=None): term = self._nodes_for_one_member(section.member) # TODO drop fallbacks when undocumented members are outlawed if section.text: -defn = section.text +defn = dedent(section.text) else: defn = [nodes.Text('Not documented')] @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc): termtext.extend(self._nodes_for_ifcond(section.member.ifcond)) # TODO drop fallbacks when undocumented members are outlawed if section.text: -defn = section.text +defn = dedent(section.text) else: defn = [nodes.Text('Not documented')] @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc): dlnode = nodes.definition_list() for section in doc.features.values(): dlnode += self._make_dlitem( -[nodes.literal('', section.member.name)], section.text) +[nodes.literal('', section.member.name)], dedent(section.text)) seen_item = True if not seen_item: @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc): continue snode = self._make_section(section.tag) if section.tag and section.tag.startswith('Example'): -snode += self._nodes_for_example(section.text) +snode += self._nodes_for_example(dedent(section.text)) else: -self._parse_text_into_node(section.text, snode) +self._parse_text_into_node( +dedent(section.text) if section.tag else section.text, +snode, +) nodelist.append(snode) return nodelist diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py index 7b13a583ac1..8cdd5334ec6 100644 --- a/scripts/qapi/parser.py +++ b/scripts/qapi/parser.py @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]: indent = must_match(r'\s*', line).end() if not indent: return line -doc.append_line(line[indent:]) + +# Preserve the indent, it's needed for rST formatting. +doc.append_line(line) + prev_line_blank = False while True: self.accept(False) @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[