Good point.  Actually, with Perl, I've found that it's simpler and
usually more convenient to dispatch buckets to HTML::Parser, which is
better at catching that sort of thing, and can also work in a stream,
chunk by chunk.  The stream-based interface provided by mod_perl really
makes this easy: while ($buf = $f->read) {$parser->parse($buf);}
$parser->eof;

You then have callbacks for opening tags, whitespace and closing tags,
which all default to $f->write($content), and you can add any custom
business logic above that.  I can provide some more fleshed out
pseudocode if anyone's interested.

The downside is that it locks you in to Perl, which I intentionally
wanted to avoid in this particular module.  And you still need to make
sure you don't get stuck with huge amounts of "leftover"s between
buckets (HTML::Parser will take care of remembering the leftover data as
a small bonus).

I don't see any perfect way to avoid the big common problems, though;
you always have to know what you're aiming the filters at and work
accordingly...

  Issac

Nick Kew wrote:
> On Tue, 10 Apr 2007 09:20:22 +0300
> Issac Goldstand <[EMAIL PROTECTED]> wrote:
> 
>> $buf = ${$f->ctx}{leftover}.$buf if defined(${$f->ctx}{leftover});
>> (prepend f->ctx->leftover onto buf)
>>
>> and anything leftover that doesn't include a full HTML tag goes to
>>
>> ${$f->ctx}{leftover} = $buf || undef;
> 
> Define "a full HTML tag".
> 
> As in, for instance
>       <img
>       src = "arrow.gif"
>       alt = " --> "
>       >
> 
> The point being, it's not a trivial task (and that's without
> putting things like the above in a comment or cdata section
> where its semantics are completely different, etc).
> 

Reply via email to