Hi,
  I am seeing some odd regex behavior.

Using the demo applet:
http://jakarta.apache.org/oro/demo.html

I try the following pattern:
<(script|object|applet|style|noscript)[^>]*>[\s\S]*?</\1[^>]*>
or another alternate version of (with single line
flag)
<(script|object|applet|style|noscript)[^>]*>.*?</\1[^>]*>

With the following test input:
   <td height="35" colspan="2" align="center"
class="style1">
    
<script type="text/javascript">
        function spawn(fileName,width,height) {
window.open(fileName,'new','toolbar=0,location=0,directories=0,status=0,menubar=0,scrollbars=0,width='+width+',height='+height+',resizable=0');
}
</script>
<style type="text/css">
        .Copyright { font-size: 10px; font-family: Verdana,
Arial; color: #FFF; padding:2px; margin:0px;
vertical-align:1px; line-height:11px; }
        .Copyright A { color: #FFF; }
</style>
<span class="Copyright">&copy; 2006 <a
href="http://www.domain.com/"; target="_blank">Vantage
Media Corporation</a> - <a
href="JavaScript:spawn('http://www.domain.com/privacy.html','770','501');">Privacy
Statement</a> - <a
href="JavaScript:spawn('http://www.domain.com/feedback/?data=aHR0cDovL2NvbGxlZ2UudXMuY29tL2NlYy9mdXR1cmVkZWdyZWUvZGVzaWduLnBocA','460','520');">Send
Us Feedback</a></span>    </td>
    <td valign="top">&nbsp;</td>
  </tr>
</table>
=================================================

And the first pattern matches twice (second pattern
obviously doesn't match in the applet since the applet
doesn't have the single line flag applied)

But the following code:
                        
Perl5Compiler s_perlCompiler = new Perl5Compiler();
m_matcher = new Perl5Matcher();
m_matcher.setMultiline(false);

Pattern m_forbiddenTagsWithContentPattern =
s_perlCompiler.compile(
                                
"<(script|object|applet|style|noscript)[^>]*>[\\s\\S]*?</\1[^>]*>",
                                        Perl5Compiler.CASE_INSENSITIVE_MASK
                                                | Perl5Compiler.READ_ONLY_MASK);

                        // remove content and tags that include
script/applet/object etc
                        StringSubstitution substitution1 = new
StringSubstitution(SPACE);
                        filteredStr = 
                                Util.substitute(m_matcher,
                                                                
m_forbiddenTagsWithContentPattern,
                                                                substitution1,
                                                                text,
                                                                
Util.SUBSTITUTE_ALL);
// text is set as the above sample text.

The subtitution does nothing.  I even tried:
PatternMatcherInput input = new
PatternMatcherInput(text);
while(m_matcher.contains(input, pattern)) {
                        System.out.println("In manual strip method - Found
match btw:" + input.getMatchBeginOffset() + "," +
input.getMatchEndOffset() + ":" +
input.substring(input.getMatchBeginOffset(),
input.getMatchEndOffset()));
}

And the above logs nothing.

I tried compiling the pattern with the
SINGLE_LINE_MASK but that made no difference.

Any ideas/help would be appreciated.

TIA,
CJ


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to