In message <bd818c7b0901182009w5063efbbofd2afc1841e47...@mail.gmail.com>, no sp
am writes:
>I'm using this pattern:
>p5c.compile(".*?<td\\s+.+<span\\s+class=\"nametext\">"+
>".+?<strong>(.+?)</strong></font>.+?Profile\\s+Views",
>Perl5Compiler.SINGLELINE_MASK);
>
>to try and pull genres out of myspace pages.  However some pages like this
...
>How can I prevent these loops?

Presumably, you're concerned only with the capture group (containing
the genre), so rewrite the expression along the following lines to
avoid the ambiguous/excessive backtracking:

p5c.compile("<span\\s+class=\"nametext\">[^<]*</span><br>[^<]*<font[^>]*>"+
            "<strong>([^<]+)</strong></font>",
            Perl5Compiler.SINGLELINE_MASK);



---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscr...@jakarta.apache.org
For additional commands, e-mail: oro-user-h...@jakarta.apache.org

Reply via email to