I have a crazy idea for you.  Maybe this is overkill but this sounds like
it'd be natural to add to mod_pagespeed <http://modpagespeed.com> as a new
filter.

Here's some code you might use as a template

https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc

one thing we've thought of doing is providing a generic text-substitution
filter that would take strings in character-blocks and do arbitrary
substitutions in them, that could be specified in the .conf file:
  ModPagespeedSubstitute "oldString" "newString"

You are right that text-blocks in Apache output filters can be split
arbitrarily across buckets, but mod_pagespeed takes care of that in an
HTML-centric way, breaking up blocks on html tokens. A block of free-format
text would be treated as a single atomic token independent of the structure
of the incoming bucket brigade.

Let me know if you'd like to discuss this further.

-Josh


On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi <sindhi....@gmail.com> wrote:

> Hello,
>
> Thanks a lot for providing answers to my earlier emails with subject
> "Apache C++ equivalent of javax.servlet.Filter". I really appreciate your
> help.
>
> I had another question. My requirement is something like this -
>
> I have a huge html file that I have copied into the Apache htdocs folder.
> In my C++ Apache module, I want to get this html file contents and
> remove/replace some strings.
>
> Say I have a HTML file that has the string "oldString" appearing 3 times in
> the file. My requirement is to replace "oldString" with the new string
> "newString". I have already written a C++ function that has a signature
> like this -
>
> char* processHTML(char* inHTMLString) {
> //
> char* newHTMLWithNewString = <code to replace all occurrences of
> "oldString" with "newString">
> return newHTMLWithNewString;
> }
>
> The above function does a lot more than just string replace, it has lot of
> business logic implemented and finally returns the new HTML string.
>
> I want to call processHTML() inside my C++ Apache module. As I know Apache
> maintains an internal data structure called Buckets and Brigades which
> actually contain the HTML file data. My question is, is the entire HTML
> file content (in my case the html file is huge) residing in a single
> bucket? Means, when I fetch one bucket at a time from a brigade, can I be
> sure that the entire HTML file data from <html> to </html> can be found in
> a single bucket? For ex. if my html file looks like this -
> <html>
> ..
> ..
> oldString
> ... oldString...........oldString..
> ..
> </html>
>
> When I iterate through all buckets of a brigade, will I find my entire HTML
> file content in a single bucket OR the HTML file content can be present in
> multiple buckets, say like this -
>
> case1:
> bucket-1 contents =
> "<html>
> ..
> ..
> oldString
> ... oldString...........oldString..
> ..
> </html>"
>
> case2:
> bucket-1 contents =
> "<html>
> ..
> ..
> oldStr"
>
> bucket-2 contents =
> "ing
> ... oldString...........oldString..
> ..
> </html>"
>
> If its case2, then the the function processHTML() I have written will not
> work because it searches for the entire string "oldString" and in case2
> "oldString" is found only partially.
>
> Thanks a lot.
>

Reply via email to