Hi, I'm sorry for hijacking an old thread.
When you mention token offsets, are those character offsets in the raw
wiki markup? I.e do they make it possible to say that a given node in
the parse tree represents the markup from position a to position b?
If so, is this a capability in mwlib or
Hi Joel,
My needs are pretty simple. The basic 'algorithm' of what I want to do is
identify section headers with their names:
if(isinstance(node, Section) and node.name == External Links):
finish_node = node
Then, given the location in the document of a section header with a given
name, I
I have been using mwlib for exactly that since 2008, but I haven't checked
if my scripts work with a more recent version of mwlib. (I mostly use
mwlib.refine.compat.parse_text.)
I and others may be able to help you with more detail if you give us some
idea what you would like to get out