On 5 Feb, 07:09, [EMAIL PROTECTED] wrote: > So, I'm parsing a log file that's being written out in > real time. > > <logfile> > <entry><timestamp>123</timestamp><details>foo</details> > </entry> > <entry><timestamp>456</timestamp><details>bar</details> > </entry> > <--- no </logfile>, coz the file hasn't yet been closed
This kind of "incomplete" XML (or perhaps ill-formed would be the better term) is reminiscent of XMPP [1,2] where you have a connection which is opened with a start tag and closed with an end tag. > This is part of an event loop, so I want to have some code > that looks like this: > > when logfile is readable: > read one <entry> node, including children > but don't try to read past </entry>, so that > the read won't block. I attempt to do this with the XMPP support in libxml2dom [3], although I can't say that the work is exactly complete by any means. Generally, I assume that each "stanza" (similar to an entry here, I think) is complete and can be read, although the technique I use is dubious: I treat each one like a separate document. I imagine that the designers of XMPP intended that you connect up an event-driven parser to the incoming stream and connect the event handlers to various pieces of logic, with the initial start tag causing a client to become active and the final end tag causing the client to sleep. Paul [1] http://www.xmpp.org/rfcs/rfc3920.html [2] http://www.xmpp.org/rfcs/rfc3921.html [3] http://www.python.org/pypi/libxml2dom -- http://mail.python.org/mailman/listinfo/python-list