Hi, I have a regex which I use to parse HTML files which are marked up with HTML comments. Performance is fine but then dips as I increase the number of contained matches in the page.
<html> <body> <!-- name="news" --> The news goes here. Add the template in /_include/templates.html. Could we use miniburst.gif for the icon? <!-- name="/news" --> </body> </html> The main thing is the pairing of the string "news" in the HTML comments. I have found a performance dip when parsing document with 4 of these sections in sequence: <!-- name="A" --> --- some HTML --- <!-- name="/A" --> <!-- name="B" --> --- some HTML --- <!-- name="/B" --> <!-- name="C" --> --- some HTML --- <!-- name="/C" --> <!-- name="D" --> --- some HTML --- <!-- name="/D" --> My pattern can parse 3 such pairs in 12 seconds but when I moved to 4 pairs it took about 3300 seconds. I have an alternative approach I will use however I'm interested if there is a problem with my expression (which otherwise works). This is my pattern (for the curious). private static final String HTML_PATTERN = /* * <!-- name="news" --> * or * <!-- name="news" template="news-tmpl" --> * * (Remember to escape backslash * * \n -> \\n * \w -> \\w * * etc) * */ "<!--\\s*name\\s*=\\s*\"([\\w\\-]+)\"\\s*(template\\s*=\\s*\"[\\w\\-]+\")?\\s*-->" + /* * enclosed content */ "((\\s|.)*)" + /* * <!-- name="/news" --> */ "<!--\\s*name\\s*=\\s*\"/\\1\"\\s*-->" ; Many Thanks, Janek Bogucki ____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie