On Jun 26, 7:26 pm, "David C. Ullrich" <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]>, > Cédric Lucantis <[EMAIL PROTECTED]> wrote: > > > > > Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit : > > > that is, there is no TABLE tag between a TABLE, for example > > > <table >something with out table tag</table> > > > what is the RE pattern? thanks > > > > the following is not right > > > <table.*?>[^table]*?</table> > > > The construct [abc] does not match a whole word but only one char, so > > [^table] means "any char which is not t, a, b, l or e". > > > Anyway the inside table word won't match your pattern, as there are '<' > > and '>' in it, and these chars have to be escaped when used as simple text. > > So this should work: > > > re.compile(r'<table(|[ ].*)>.*</table>') > > ^ this is to avoid matching a tag name starting with > > table > > (like <table_ext>) > > Doesn't work - for example it matches '<table></table><table></table>' > (and in fact if the html contains any number of tables it's going > to match the string starting at the start of the first table and > ending at the end of the last one.) > Try something like:
re.compile(r'<table\b.*?>.*?</table>', re.DOTALL) -- http://mail.python.org/mailman/listinfo/python-list