I have a crazy idea for you. Maybe this is overkill but this sounds like it'd be natural to add to mod_pagespeed <http://modpagespeed.com> as a new filter.
Here's some code you might use as a template https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc one thing we've thought of doing is providing a generic text-substitution filter that would take strings in character-blocks and do arbitrary substitutions in them, that could be specified in the .conf file: ModPagespeedSubstitute "oldString" "newString" You are right that text-blocks in Apache output filters can be split arbitrarily across buckets, but mod_pagespeed takes care of that in an HTML-centric way, breaking up blocks on html tokens. A block of free-format text would be treated as a single atomic token independent of the structure of the incoming bucket brigade. Let me know if you'd like to discuss this further. -Josh On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi <sindhi....@gmail.com> wrote: > Hello, > > Thanks a lot for providing answers to my earlier emails with subject > "Apache C++ equivalent of javax.servlet.Filter". I really appreciate your > help. > > I had another question. My requirement is something like this - > > I have a huge html file that I have copied into the Apache htdocs folder. > In my C++ Apache module, I want to get this html file contents and > remove/replace some strings. > > Say I have a HTML file that has the string "oldString" appearing 3 times in > the file. My requirement is to replace "oldString" with the new string > "newString". I have already written a C++ function that has a signature > like this - > > char* processHTML(char* inHTMLString) { > // > char* newHTMLWithNewString = <code to replace all occurrences of > "oldString" with "newString"> > return newHTMLWithNewString; > } > > The above function does a lot more than just string replace, it has lot of > business logic implemented and finally returns the new HTML string. > > I want to call processHTML() inside my C++ Apache module. As I know Apache > maintains an internal data structure called Buckets and Brigades which > actually contain the HTML file data. My question is, is the entire HTML > file content (in my case the html file is huge) residing in a single > bucket? Means, when I fetch one bucket at a time from a brigade, can I be > sure that the entire HTML file data from <html> to </html> can be found in > a single bucket? For ex. if my html file looks like this - > <html> > .. > .. > oldString > ... oldString...........oldString.. > .. > </html> > > When I iterate through all buckets of a brigade, will I find my entire HTML > file content in a single bucket OR the HTML file content can be present in > multiple buckets, say like this - > > case1: > bucket-1 contents = > "<html> > .. > .. > oldString > ... oldString...........oldString.. > .. > </html>" > > case2: > bucket-1 contents = > "<html> > .. > .. > oldStr" > > bucket-2 contents = > "ing > ... oldString...........oldString.. > .. > </html>" > > If its case2, then the the function processHTML() I have written will not > work because it searches for the entire string "oldString" and in case2 > "oldString" is found only partially. > > Thanks a lot. >