Re: Trying to block out BGCOLOR
[EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I'm making something, and need to block out BGCOLOR attribute. The problem is, the BGCOLOR could be with or without quotation marks. This is the code I used: $article =~ s/ bgcolor=(?)(.*?)(?)//gi Here is how I would do it, using SAX with a helper module called XML::SAX::Machines: package Skip::BGCOLOR; use strict; use XML::SAX::Base; use vars qw/@ISA/; @ISA = qw/XML::SAX::Base/; sub start_element { my($self, $el) = @_; for my $property ( keys %{ $el-{Attributes} } ) { if ($property =~ /BGCOLOR$/i ) { delete( ${ $el-{Attributes} }{$property} ); } } $self-SUPER::start_element( $el ); # forward the element downstream } sub xml_decl { } 1; package main; use strict; use XML::SAX::Machines qw/:all/; my($pipeline) = Pipeline( 'Skip::BGCOLOR' = \*STDOUT ); $pipeline-parse_string( join('', DATA) ); print(\n); __DATA__ html head titleNo BGCOLORs/title /head body bgcolor=red h1 bgcolor=whiteNo BGCOLORs/h1 hr width=75% / div BGCOLOR=blueNo BGCOLORs/div /body /html Much cleaner and it guarantees to not fudge up your markup. Todd W. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
What about the user says : body text=#123456 bgcolor=#aabbcc or body bgcolor='#123456' or body bgcolor= red Anyway, the bgcolor can be formed or change again via javascript or CSS. I mean, blocking bgcolor in body tag cannot solve your potential problem. But you may find someway to put this in your body tag : background=white_block.jpg, as wallpaper goes upper than bgcolor or using javascript : document.bgColor='ff'; // not sure if this run on NS too In Perl way, I can't provide any code here because I don't know when you want to block that bgcolor .. On the print time ? or at the html file's landing time... Anyway, if you just don't want your users to use bgcolor in the body tag, just simply $line =~ s/bgcolor/whatever_you_like/; Once the browser don't understand something not in list of its properties, will be ignored... I mean, don't care on the RHS of =, but the LHS, unless, you are trying to fulfill W3C's html standard. Regards, Perl Beginner no, the problem is on the other side of the = token eg: body bgcolor=#99 or body bgcolor=red or body bgcolor=red and he would like to make that body I would of course go with say: # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour since the middle element needs to guard against a. b. c. white space ciao drieux --- my $l1 = 'bodybgcolor=#99 other=fred stuff here table bgcolor=blue '; my $l2 = 'body bgcolor=red other=fred'; my $l3 = 'body bgcolor=red other=fred'; foreach my $tag ( $l1 , $l2 , $l3 ) { my $answer = un_colour($tag); print #---\n$answer\nfor $tag \n; } # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
Hello World Li Ngok Lam's approach looks good to me. Using the $line=~s// approach appears to be only removing the bgcolor word correctly but could be stuck on the different types of colour descriptor used. Is it RGB, hex or a word? Putting a background color descriptor in though allows you to change the image to a white or transparent gif file quite simply. You can still use the default background where needed. JimmyG -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
volks, brief prefix. I believe Li Ngok Lam has found a clear 'issue' in the original request for solving a regex problem. my working assumption was that the OP needed a filter that would clean up a bunch of pre-existing static *.html files because the site had adopted a new scheme, and so these older pages would merely need to be 'cleaned' But since some here may also have scratched their heads at the original request let's step aside for a moment and look at some of the issues On Friday, Mar 21, 2003, at 09:05 US/Pacific, Li Ngok Lam wrote: [..] Anyway, the bgcolor can be formed or change again via javascript or CSS. I mean, blocking bgcolor in body tag cannot solve your potential problem. This of course is the 'critical kill' in the OP's problem. In terms of trying to 'control it all' from some CGI script that is 'generating' web pages given various 'input streams'. { hey, we all started some place. And figured out our better ways along the way... } Let's deal with the CSS/SSI side plays first, as the javascript side is modestly easier to solve. There are CSS as well as various SSI directives, which, were we to seek completeness would require that a much more complex parser be in play, since it would need to deal with each of them in turn - and DOING the 'resolve in place' - eg given head meta http-equiv=content-type content=text/html;charset=ISO-8859-1 titleWelcome/title link href=../CSS/sitewide.css rel=stylesheet media=screen /head the parser would need to grot through the *.css file and resolve if there is any bgcolor components, if clean, let it stay, otherwise that part of the text would need to be reconstructed and pushed into the data stream: htmlheadtitle Welcome /title style !-- body { font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular } p { font-size: 12px; font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular } td { font-size: 12px; font-family: Times New Roman, Georgia, Times } element { } //-- /style /head We of course would not need to put the static 'content-type' in a dynamic stream back to the web browswer, since as a perl CGI script, we of course need to send out the print Content-Type: text/html;charset=ISO-8859-1 $CR$LF anyway, right??? But you may find someway to put this in your body tag : background=white_block.jpg, while we are proposing the idea of replacing, it is important to remember that the 'background' attribute is 'acceptable' in more than just a body tag... But you probably would not want to ship a src such as a jpg file in the process if all you really want to do is redefine to say white eg: bgcolor=#ff the RegEx I proposed would of course remove the string background=white_block.jpg from any 'input' provided since it really does not care about whether those are alpha-numeric, or not, since it was designed to remove the stuff after the = as it were... as wallpaper goes upper than bgcolor or using javascript : document.bgColor='ff'; // not sure if this run on NS too [..] this part of the problem is where one needs to expand the RegEx as well, so that one deals with the possible contamination in a javascript element, most likely triggered by the 'onload'... But the 'patterns' document.bgColor document.background etc, could likewise be 'targetted' for conversion, on the fly, and/or 'in place' with the same type of filtering with an appropriate RegEx. The trick in those cases of course is that javascript allows white space on either side of the = so one is looking at the problem of $line =~ s/document.bgColor\s*=\s*([']?)([^^'\s]+)([']?)\s*(;?)//gi ; in this case, since single or double quotes would be possible HTH. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
Just so everyone knows, it was for a print friendly part of a CMS-type script. With all your help, it was solved, with a regex. It wasn't just for the body tag, It is for EVERY tag, and I blocked the BGCOLOR, BACKGROUND, STYLE, CLASS, ID, COLOR, and more attributes to totally make the page both dull and print friendly. My problem was with my Regex, which was: $blah =~ s/ bgcolor=(?)(.*?)(?)//gi Shortly after posting, I solved it myself with $blah =~ s/ bgcolor=(?)(.*?)( |)/$4/gi; I doubt that would have held up. My new one thanks to drieux is: $blah =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; Thank you for your help everyone. William Gunther
RE: Trying to block out BGCOLOR
I'm making something, and need to block out BGCOLOR attribute. The problem is, the BGCOLOR could be with or without quotation marks. This is the code I used: $article =~ s/ bgcolor=(?)(.*?)(?)//gi so you are saying it could be bgcolor or bgcolor ? how about something simple like: $article =~ s/bgcolor|\bgcolor\//gi; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
On Thursday, Mar 20, 2003, at 06:00 US/Pacific, Kipp, James wrote: I'm making something, and need to block out BGCOLOR attribute. The problem is, the BGCOLOR could be with or without quotation marks. This is the code I used: $article =~ s/ bgcolor=(?)(.*?)(?)//gi so you are saying it could be bgcolor or bgcolor ? how about something simple like: $article =~ s/bgcolor|\bgcolor\//gi; no, the problem is on the other side of the = token eg: body bgcolor=#99 or body bgcolor=red or body bgcolor=red and he would like to make that body I would of course go with say: # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour since the middle element needs to guard against a. b. c. white space ciao drieux --- my $l1 = 'bodybgcolor=#99 other=fred stuff here table bgcolor=blue '; my $l2 = 'body bgcolor=red other=fred'; my $l3 = 'body bgcolor=red other=fred'; foreach my $tag ( $l1 , $l2 , $l3 ) { my $answer = un_colour($tag); print #---\n$answer\nfor $tag \n; } # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
I'm saying it could be bgcolor=COLOR or bgcolor=COLOR
RE: Trying to block out BGCOLOR
I'm saying it could be bgcolor=COLOR or bgcolor=COLOR Yes I realize. I believe drieux's solution, or an adaptation of it, is what you need I would of course go with say: # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour since the middle element needs to guard against a. b. c. white space ciao drieux --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Trying to block out BGCOLOR
On Thursday, Mar 20, 2003, at 11:26 US/Pacific, Kipp, James wrote: I'm saying it could be bgcolor=COLOR or bgcolor=COLOR Yes I realize. I believe drieux's solution, or an adaptation of it, is what you need note: I do subs because it is easier for me to 'loop on them' and if they are worth it, they get stuffed in a perl module somewhere... [..] # # sub un_colour { my ($line) = @_; $line =~ s/\s*bgcolor=(?)([^\s]+)(?)//gi ; $line; } # end of un_colour the usage would be my $new_html_text = un_colour($html_text); Or you could just use the line itself. If it helps to break out the sequence s/\s* # one or more white space before bgcolor= # the specific text (?) # first conditional group - ([^\s]+) # middle group - (?) # third conditional group //gi since the middle element needs to guard against a. b. c. white space Note that we are looking for at least one or more characters of the 'class' [^\s] - or is english not :: let the 3rd group grab this not :: the end of tag token not white space :: the end of attribute delimiter since we are looking for the set of characters that are 'not delimiters' - perchance the bass-end-akward way of making a set since COLOR in this context is both: a. the secquence of alpha characters b. a # preceeded hexit numeric sequence I figured it would be easier to NOT go with the more complex regex that would need to note that 'if preceded by a #, then must be numeric...' Yeech, way to much work on that side of the trail. The test case code had to include BOTH the and the white space components so that it would correctly parse not merely the specific cases we are concerned about - but those cases in their 'natural enviornment' eg body bgcolor=red other=fred body bgcolor=red body bgcolor=red other=fred body bgcolor=#FF other=fred body bgcolor=#ff remember that bgcolor is an attribute in a tag. Or allow me to argue the defect in the initial idea $line =~ s/ *bgcolor=(?)(.*)(?)//gi ; the problem is that middle group - the match one or more of anything... A very GREEDY GRAB - since it would take say body bgcolor=red other=fred and make that bodyfred since the sequence - with the round braces delimiting the group matches: / bgcolor=()(red other=)()/ is the most greedy grab possible. Which may have been what you were noticing in the output. So the simplest solution appeared to be to work out the list of things that were 'delimiters' and then allow anything in the middle group that was not a delimiter... HTH... ciao drieux --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Trying to block out BGCOLOR
I'm making something, and need to block out BGCOLOR attribute. The problem is, the BGCOLOR could be with or without quotation marks. This is the code I used: $article =~ s/ bgcolor=(?)(.*?)(?)//gi It doesn't work to my liking, and was hoping someone else had a better solution. William