Re: Post release texi2any performance regression

Patrice Dumas Sat, 28 Oct 2023 15:09:18 -0700

On Sat, Oct 28, 2023 at 05:42:50PM +0100, Gavin Smith wrote:
> I managed to disable a lot of the new XS code and get the test suite
> to pass.  I had to leave the XS translation module active due to the
> coupling that now exists between it and the XS parser.


Also I doubt that any slowdown could come from doing in C the code that
was done in perl previously in Parsetexi.pm.  To me having this code in
XS is both more logical and (probably) faster. 

> As you can see, my attempt at disabling the new modules reverses most of,
> but not all, of the slowdown.

I think that you can also comment out rebuild_document when none of the
XS is overriding the perl code, but I have not tested.

One thing that comes to my mind is that I removed simple_parser
https://git.savannah.gnu.org/cgit/texinfo.git/commit/?id=4a3d02c0fc1932350d925fb957e0758a5290436c
it could explain some increase of the time used by gdt.

> I'm still trying to find causes for the remaining slowdown.  I profiled
> with NYTProf and think that build_document is one possibility, as it
> may does more than build_texinfo_tree did.

I do not think so.  The only additional things it does (store
identifiers_target) correspond to the fact that
set_labels_identifiers_target is now done in C instead of using perl
code as it did previously, but I dount that it requires much more time. 

However, even if it does not do more, doing it twice could be a reason
for the slowdown if the time passed in build_texinfo_tree and other
parser results passing to perl codes is important.

> For the glibc manual, it is
> called 2412 times (at least once per parser object).  As you know,
> there is a new parser for every @def* command in the Texinfo sources,
> so per-parser overhead can be significant.

I do not get it.  If you are speaking about the translation happening in
complete_indices, calls of gdt_tree -> gdt -> replace_convert_substrings
do not require a new parser, the current parser is reused.  There is
is still a parsing and a storing of a document that is later on
removed, plus substitutions in the tree.  But still this should be
faster than the same code in perl.

> I see there are also
> changes to index sorting, but haven't investigated them enough to
> understand if this would have a performance impact.

Hopefully this should have a positive impact by caching some regexps
results.

> It was important to be able to disable these new modules in order to see
> this remaining slowdown.  I still argue for making it easy to cleanly
> disable these new modules unless or until they do not slow down the program
> as much.

It seems like this could be relatively easy, by adding a variable which
is tested when loading XS code and that's it, unless I missed something?

> If the
> promised benefits of the new development never materialised, it would
> mean that the post-7.1 development of texi2any was not worth pursuing.

I would be very surprised if there was no speedup of the HTML converter.
Right now it is very slow, with the main loop in C it should be much
faster.

> This is from my perspective of somebody who is not familiar with the
> new code and doesn't understand how it all works.  I've spent hours
> trying to work this out over the last few days because I view it as a
> threat to the future development of the program.

The slowdown is not that big, that being said I agree that it would be
nice to understand why with XS for structure/transformations it is
slower than with perl.

> If the Perl object for the parse tree is built twice, this is a definite
> problem, and something that needs to be remedied before the new XS code
> can be considered to be in a finished state.

To me it is not in a finished state before the HTML converter main loop
is fully in C when there is no user customization.

-- 
Pat

Re: Post release texi2any performance regression

Reply via email to