On Jun 26, 7:26 pm, "David C. Ullrich" <[EMAIL PROTECTED]> wrote:
> In article <[EMAIL PROTECTED]>,
>  Cédric Lucantis <[EMAIL PROTECTED]> wrote:
>
>
>
> > Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit :
> > > that is, there is no TABLE tag between a TABLE, for example
> > > <table >something with out table tag</table>
> > > what is the RE pattern? thanks
>
> > > the following is not right
> > > <table.*?>[^table]*?</table>
>
> > The construct [abc] does not match a whole word but only one char, so  
> > [^table] means "any char which is not t, a, b, l or e".
>
> > Anyway the inside table word won't match your pattern, as there are '<'
> > and '>' in it, and these chars have to be escaped when used as simple text.
> > So this should work:
>
> > re.compile(r'<table(|[ ].*)>.*</table>')
> >                     ^ this is to avoid matching a tag name starting with
> >                     table
> > (like <table_ext>)
>
> Doesn't work - for example it matches '<table></table><table></table>'
> (and in fact if the html contains any number of tables it's going
> to match the string starting at the start of the first table and
> ending at the end of the last one.)
>
Try something like:

re.compile(r'<table\b.*?>.*?</table>', re.DOTALL)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to