Joaquin, Thanks for your reply.
Regarding the data-parsoid route, I can't reproduce the trouble I was having. I suspect I was just getting the /revision/tid part wrong. Taking a step back, I think part of the problem was I apparently had an incorrect mental model of how parsoid works. I was envisioning something that took wikitext, parsed it into a semantic parse tree, (kind of like mwparserfromhell does), and then takes that parse tree and converts it to html. What I was trying to get at was the intermediate parse tree. Looking at https://www.mediawiki.org/wiki/Parsoid/API <https://www.mediawiki.org/wiki/Parsoid/API>, this appeared to be the pagebundle format, and I was groping around trying to find the API which exposed that. I looked at the /html routes and thought to myself, "No, that's not what I want. That's the HTML. I want the parse tree". So I was trying things like: GET /:domain/v3/page/:format/:title/:revision? with :format set to "pagebundle". For example, I tried > https://en.wikipedia.org/v3/page/pagebundle/banana > <https://en.wikipedia.org/v3/page/pagebundle/banana> which 404's. I think the biggest thing that could be done to improve the documentation is to update https://www.mediawiki.org/wiki/Parsoid/API <https://www.mediawiki.org/wiki/Parsoid/API>. That's the page you get to most directly when searching for parsoid documentation. > On Sep 7, 2020, at 6:05 AM, Joaquin Oltra Hernandez > <jhernan...@wikimedia.org> wrote: > > Hi Roy, > > Some responses inline: > > > On Fri, Sep 4, 2020 at 6:41 PM Roy Smith <r...@panix.com > <mailto:r...@panix.com>> wrote: > I know there's been a ton of work done of Parsoid lately. This is great, and > the amount of effort that's gone into this functionality is really > appreciated. It's clear that Parsoid is the way of the future, but the > documentation of how you get a Parsoid parse tree via an AP call isI kind of > confusing. > > I found https://www.mediawiki.org/wiki/Parsoid/API > <https://www.mediawiki.org/wiki/Parsoid/API>, which looks like it's long out > of date. The last edit was almost 2 years ago. As far as I can tell, most > of what it says is obsolete, and refers to a series of /v3 routes which don't > actually exist. > > This definitely looks outdated, I'll forward your email to the maintainers so > maybe they can have a look and update it. > > > I also found https://en.wikipedia.org/api/rest_v1/#/Page%20content > <https://en.wikipedia.org/api/rest_v1/#/Page%20content>, which seems more in > line with the current reality. But, the call I was most interested in, > /page/data-parsoid/{title}/{revision}/{tid}, doesn't actually respond (at > least not on en.wikipedia.org <http://en.wikipedia.org/>). > > Maybe you can share exactly how you are querying the API and the responses > you get, since this does seem to work fine for me (examples below). I think > these APIs are the ones VisualEditor uses so they should work appropriately. > > I tried querying https://en.wikipedia.org/api/rest_v1/page/html/Banana > <https://en.wikipedia.org/api/rest_v1/page/html/Banana> first, and got back > the response. On it, you can get the revision and "tid" from the ETag header, > like it says on the swagger docs: > > ETag header indicating the revision and render timeuuid separated by a slash: > "701384379/154d7bca-c264-11e5-8c2f-1b51b33b59fc" This ETag can be passed to > the HTML save end point (as base_etag POST parameter), and can also be used > to retrieve the exact corresponding data-parsoid metadata, by requesting the > specific revision and tid indicated by the ETag. > > With that information, you can then compose the new API call URL: > https://en.wikipedia.org/api/rest_v1/page/data-parsoid/Banana/975959204/7e3fb2f0-eb7b-11ea-bedb-95397ed6461a > > <https://en.wikipedia.org/api/rest_v1/page/data-parsoid/Banana/975959204/7e3fb2f0-eb7b-11ea-bedb-95397ed6461a> > that should successfully respond with the metadata. > > I'm not 100% clear on the difference between data-mw information on the > /page/html response vs the one found on the /page/data-parsoid response, but > anyhow you should be able to use both endpoints as needed that way. > > > Eventually, I discovered (see this thread > <https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=976731421#Parsing_wikitext_in_javascript?>), > that the way to get a Parsoid parse tree is via the > https://en.wikipedia.org/api/rest_v1/page/html/ > <https://en.wikipedia.org/api/rest_v1/page/html/> route, and digging the > embedded JSON out of data-mw fragments scattered throughout the HTML. This > seems counter-intuitive. And kind of awkward, since it's not even a full > parse tree; it's just little snippets of parse trees, which I guess > correspond to each template expansion? > > I looked around and found https://www.mediawiki.org/wiki/Specs/HTML/2.1.0 > <https://www.mediawiki.org/wiki/Specs/HTML/2.1.0> linked on the Parsoid page, > which has extensive documentation on how wikitext <-> HTML is translated. It > seems to be more actively maintained. Hopefully this can give you some > insight on how the responses relate to the wikitext and how to find what you > want. > > > So, taking a step backwards, my ultimate goal is to be able to parse the > wikitext of a page and discover the template calls, with their arguments. On > the server side, I'm doing this in Python with mwparserfromhell, which is > fine. But now I need to do it on the client side, in browser-executed > javascript. I've looked at a few client-side libraries, but if Parsoid > really is ready for prime time, it seems silly not to use it, and it's just a > question of finding the right API calls. > > > You may be interested in the #Template_markup > <https://www.mediawiki.org/wiki/Specs/HTML/2.1.0#Template_markup> section > from the previous spec given your problem statement. > > > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org <mailto:Cloud@lists.wikimedia.org> (formerly > lab...@lists.wikimedia.org <mailto:lab...@lists.wikimedia.org>) > https://lists.wikimedia.org/mailman/listinfo/cloud > <https://lists.wikimedia.org/mailman/listinfo/cloud> > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud