Re: bucketbrigades with html filter

Jeff Ambrosino Tue, 18 Oct 2005 14:23:34 -0700

Yikes indeed :)

I should have clarified that in our app we don't actually process
embedded tags.  Our app lets users mangle the source HTML using RegEx,
and since users can (and often do) perform filtering like
s/(<body>)(.*?)(<\/body>)/$1<center>$2<\/center>$3/, we need to buffer
it all.

The main benefit I see in the Apache::Clean approach is that it's less
memory intensive for large content/pages...  and of course it's very
good as an example of how to manipulate content from within a filter. 
Aside from that, you're "cleaning" many times on smaller bits of
content vs. once if you buffer the entire page.  I'm curious where you
think the performance tradeoffs are for once-vs-many when the average
page size of 65kb.  In my specific case, even if we could operate on
chunks, I wager that there's more overhead in running many regexes vs.
one big one.  And I suppose one has to take into account the
per-invocation buffer size (1kb in Apache::Clean) as well as typical
bucket sizes... (8000 bytes?)

JB

On 10/18/05, Geoffrey Young <[EMAIL PROTECTED]> wrote:
> > The way to deal with this is to buffer as much content as you need
> > (maybe the whole page) and then do your work on the buffer.
>
> yikes!
>
> right.  see
>
>   http://search.cpan.org/~geoff/Apache-Clean-2.00_7/

Re: bucketbrigades with html filter

Reply via email to