On 06/09/2011 03:58 AM, A.T.Hofkamp wrote:
On 09/06/11 00:20, Sam Denton wrote:I'm wanting to parse some Wikipedia pages. Wikipedia template data looks like this: {{my template|arg one|arg two|keyword=value}} In a template definition, you can use variable expansion, like this: {{{1|default for arg one}}} I defined my lexer to grab runs of '{' and '}' and return different tokens depending on the length of the run. My problem is, I'm hitting cases where a template's name is a variable expansion, resulting in: {{{{{keword}}}|arg one}}If this is the only way they can be nested, you can use scanner states, that is, define a scanner state 'outside template', which matches {{ only. when encountering {{, switch to a 'inside template' scanner state which matches {{{ only. When encountering }}, switch back to the 'outside template' scanner state. An alternative solution would be to use a scannerless parser. I am however not sure whether these exist for Python.
Also, have you investigated the tools listed on this page? http://www.mediawiki.org/wiki/Alternative_parsers There are several Python solutions listed. -- -Brian Brian Clapper, http://www.clapper.org/bmc/ Weiler's Law: Nothing is impossible for the man who doesn't have to do it himself. -- You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en.
