Yikes indeed :) I should have clarified that in our app we don't actually process embedded tags. Our app lets users mangle the source HTML using RegEx, and since users can (and often do) perform filtering like s/(<body>)(.*?)(<\/body>)/$1<center>$2<\/center>$3/, we need to buffer it all.
The main benefit I see in the Apache::Clean approach is that it's less memory intensive for large content/pages... and of course it's very good as an example of how to manipulate content from within a filter. Aside from that, you're "cleaning" many times on smaller bits of content vs. once if you buffer the entire page. I'm curious where you think the performance tradeoffs are for once-vs-many when the average page size of 65kb. In my specific case, even if we could operate on chunks, I wager that there's more overhead in running many regexes vs. one big one. And I suppose one has to take into account the per-invocation buffer size (1kb in Apache::Clean) as well as typical bucket sizes... (8000 bytes?) JB On 10/18/05, Geoffrey Young <[EMAIL PROTECTED]> wrote: > > The way to deal with this is to buffer as much content as you need > > (maybe the whole page) and then do your work on the buffer. > > yikes! > > right. see > > http://search.cpan.org/~geoff/Apache-Clean-2.00_7/