Hi Joel, My needs are pretty simple. The basic 'algorithm' of what I want to do is identify section headers with their names:
if(isinstance(node, Section) and node.name == "External Links"): finish_node = node Then, given the location in the document of a section header with a given name, I want to take all the data in the document up to that point as plain text. So, for : """ {{About|the rock band|their debut album|Foo Fighters (album)|the aerial phenomenon|foo fighter}} {{pp-move-indef}} {{pp-semi|small=yes}} {{Infobox musical artist | name = Foo Fighters | image = Foo Fighters 2007.jpg | [...]}} '''Foo Fighters''' is an<!--Awards don't belong here--> American [[alternative rock]] band """ It becomes: "Foo Fighters is an American alternative rock band" I tried using uparser.simple_parse, but the results of Article.asText() calls was very disappointing. mwlib.refine.compat.parse_text seems to give much better results, but the infobox and other templates are still stuck in the text. And of course, my psuedo code is wrong, I still need to figure out how to identify Sections with a certain name, and then collect nodes between the head and that node. All help is greatly appreciated, thanks, -Travis On 25 April 2012 19:43, Joel Nothman <jnoth...@student.usyd.edu.au> wrote: > > I have been using mwlib for exactly that since 2008, but I haven't checked > if my scripts work with a more recent version of mwlib. (I mostly use > mwlib.refine.compat.parse_**text.) > > I and others may be able to help you with more detail if you give us some > idea what you would like to get out of the parse. For instance I needed > standard structured Wikipedia features (category links, template > information, etc.) as well as tokenised sentences with outgoing links as > standoff annotations. > > - Joel > > > On Thu, 26 Apr 2012 02:35:44 +1000, Travis Briggs <tra...@echonest.com> > wrote: > > Hello, >> >> Is there a way to get an abstract syntax tree from wikitext input >> using mwlib? The documentation seems to only cover creating PDF or >> some other documents. >> >> Thanks, >> -Travis >> > -- You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to mwlib@googlegroups.com. To unsubscribe from this group, send email to mwlib+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mwlib?hl=en.