Package: libhtml-parser-perl
Version: 3.31
While trying this version with SpamAssassin, I sseem to be getting some
odd attributes when IMG tags are parsed. Not all messages do this; only
some.
Here's some input HTML:
<TABLE cellSpacing=0 cellPadding=0 width=560 align=center border=0>
<TBODY>
<TR vAlign=top>
<TD>
<DIV align=center><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=184
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_01.gif"
width=324 border=0></A><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=184
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_02.jpg"
width=236 border=0></A><BR><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=232
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_03.gif"
width=560 border=0></A><BR><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=214
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_04.gif"
width=560 border=0></A><BR><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=193
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_05.gif"
width=560 border=0></A><BR><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=129
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_06.gif"
width=310 border=0></A> <BR><A
href="http://64.119.218.137/cgi-bin/clickthru?c=1147&m=5162&[EMAIL PROTECTED]"
target=_blank><IMG height=60
src="http://64.119.208.20/ads/responsebase/digibino/2/binoc_07.gif"
width=302 border=0></A> </DIV></TD></TR></TBODY></TABLE>
Here's the "print" code inside the tag parser function:
if ($tag =~ m/img/i) {
print STDERR "html_tests: found image '$tag' (";
my($key,$val);
while (($key,$val) = each %$attr) {
print STDERR " $key=$val;";
}
print STDERR " )\n";
}
Here's the output:
html_tests: found image 'img' ( border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_01.gif; height4=height4;
width24=width24; )
html_tests: found image 'img' ( border=0; width#6=width#6;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_02.jpg; height4=height4; )
html_tests: found image 'img' ( border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_03.gif; widthv0=widthV0;
height#2=height#2; )
html_tests: found image 'img' ( border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_04.gif; height!4=height!4;
widthv0=widthV0; )
html_tests: found image 'img' ( border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_05.gif; height3=height3;
widthv0=widthV0; )
html_tests: found image 'img' ( width10=width10; border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_06.gif; height9=height9; )
html_tests: found image 'img' ( width02=width02; border=0;
src=http://64.119.208.20/ads/responsebase/digibino/2/binoc_07.gif; height`=height`; )
html_tests: found image 'img' ( src=http://64.119.218.137/cgi-bin/view?v=7&mQ62&[EMAIL
PROTECTED]; )
As you can see, the "width" and "height" tags don't seem to get parsed
correctly but the others are fine.
I've gotten even wierder results on other messages:
html_tests: found image 'img' ( dkf=dkf; pup=pup; ehxyky=ehxyky; ey=ey; toemo=toemo;
border=0; src=http://[EMAIL PROTECTED]/img/img.php?a=1&i=lj.gif;
youaqpxlvvmvgkl=youaqpxlvvmvgkl; aqikzxo=aqikzxo; tqt=tqt; j=j; zr=zr;
wrcdmzww=wrcdmzww; )
Any thoughts?
Brian
( [EMAIL PROTECTED] )
-------------------------------------------------------------------------------
Many times the difference between failure and success is doing something
nearly right... or doing it exactly right.