Good point. Actually, with Perl, I've found that it's simpler and usually more convenient to dispatch buckets to HTML::Parser, which is better at catching that sort of thing, and can also work in a stream, chunk by chunk. The stream-based interface provided by mod_perl really makes this easy: while ($buf = $f->read) {$parser->parse($buf);} $parser->eof;
You then have callbacks for opening tags, whitespace and closing tags, which all default to $f->write($content), and you can add any custom business logic above that. I can provide some more fleshed out pseudocode if anyone's interested. The downside is that it locks you in to Perl, which I intentionally wanted to avoid in this particular module. And you still need to make sure you don't get stuck with huge amounts of "leftover"s between buckets (HTML::Parser will take care of remembering the leftover data as a small bonus). I don't see any perfect way to avoid the big common problems, though; you always have to know what you're aiming the filters at and work accordingly... Issac Nick Kew wrote: > On Tue, 10 Apr 2007 09:20:22 +0300 > Issac Goldstand <[EMAIL PROTECTED]> wrote: > >> $buf = ${$f->ctx}{leftover}.$buf if defined(${$f->ctx}{leftover}); >> (prepend f->ctx->leftover onto buf) >> >> and anything leftover that doesn't include a full HTML tag goes to >> >> ${$f->ctx}{leftover} = $buf || undef; > > Define "a full HTML tag". > > As in, for instance > <img > src = "arrow.gif" > alt = " --> " > > > > The point being, it's not a trivial task (and that's without > putting things like the above in a comment or cdata section > where its semantics are completely different, etc). >