I know there's been a ton of work done of Parsoid lately.  This is great, and 
the amount of effort that's gone into this functionality is really appreciated. 
 It's clear that Parsoid is the way of the future, but the documentation of how 
you get a Parsoid parse tree via an AP call isI kind of confusing.

I found https://www.mediawiki.org/wiki/Parsoid/API 
<https://www.mediawiki.org/wiki/Parsoid/API>, which looks like it's long out of 
date.  The last edit was almost 2 years ago.  As far as I can tell, most of 
what it says is obsolete, and refers to a series of /v3 routes which don't 
actually exist.

I also found https://en.wikipedia.org/api/rest_v1/#/Page%20content 
<https://en.wikipedia.org/api/rest_v1/#/Page content>, which seems more in line 
with the current reality.  But, the call I was most interested in, 
​/page​/data-parsoid​/{title}​/{revision}​/{tid}, doesn't actually respond (at 
least not on en.wikipedia.org <http://en.wikipedia.org/>).

Eventually, I discovered (see this thread 
<https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=976731421#Parsing_wikitext_in_javascript?>),
 that the way to get a Parsoid parse tree is via the 
https://en.wikipedia.org/api/rest_v1/page/html/ 
<https://en.wikipedia.org/api/rest_v1/page/html/> route,  and digging the 
embedded JSON out of data-mw fragments scattered throughout the HTML.  This 
seems counter-intuitive.  And kind of awkward, since it's not even a full parse 
tree; it's just little snippets of parse trees, which I guess correspond to 
each template expansion?

So, taking a step backwards, my ultimate goal is to be able to parse the 
wikitext of a page and discover the template calls, with their arguments. On 
the server side, I'm doing this in Python with mwparserfromhell, which is fine. 
 But now I need to do it on the client side, in browser-executed javascript.  
I've looked at a few client-side libraries, but if Parsoid really is ready for 
prime time, it seems silly not to use it, and it's just a question of finding 
the right API calls.



_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to