I didn't know about mod_substitute or mod_sed :) The ModPagespeedSubstitute command I proposed probably adds nothing to those.
But in any case that was not sufficient for Sindhi's use-case where he needs to impose data-dependent business logic and not statically define a substitution in a conf file. -Josh On Wed, May 1, 2013 at 11:19 AM, Jim Jagielski <j...@jagunet.com> wrote: > How is that different from mod_substitute and/or mod_sed? > > On May 1, 2013, at 9:22 AM, Joshua Marantz <jmara...@google.com> wrote: > > > I have a crazy idea for you. Maybe this is overkill but this sounds like > > it'd be natural to add to mod_pagespeed <http://modpagespeed.com> as a > new > > filter. > > > > Here's some code you might use as a template > > > > > https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/collapse_whitespace_filter.cc > > > > one thing we've thought of doing is providing a generic text-substitution > > filter that would take strings in character-blocks and do arbitrary > > substitutions in them, that could be specified in the .conf file: > > ModPagespeedSubstitute "oldString" "newString" > > > > You are right that text-blocks in Apache output filters can be split > > arbitrarily across buckets, but mod_pagespeed takes care of that in an > > HTML-centric way, breaking up blocks on html tokens. A block of > free-format > > text would be treated as a single atomic token independent of the > structure > > of the incoming bucket brigade. > > > > Let me know if you'd like to discuss this further. > > > > -Josh > > > > > > On Wed, May 1, 2013 at 8:54 AM, Sindhi Sindhi <sindhi....@gmail.com> > wrote: > > > >> Hello, > >> > >> Thanks a lot for providing answers to my earlier emails with subject > >> "Apache C++ equivalent of javax.servlet.Filter". I really appreciate > your > >> help. > >> > >> I had another question. My requirement is something like this - > >> > >> I have a huge html file that I have copied into the Apache htdocs > folder. > >> In my C++ Apache module, I want to get this html file contents and > >> remove/replace some strings. > >> > >> Say I have a HTML file that has the string "oldString" appearing 3 > times in > >> the file. My requirement is to replace "oldString" with the new string > >> "newString". I have already written a C++ function that has a signature > >> like this - > >> > >> char* processHTML(char* inHTMLString) { > >> // > >> char* newHTMLWithNewString = <code to replace all occurrences of > >> "oldString" with "newString"> > >> return newHTMLWithNewString; > >> } > >> > >> The above function does a lot more than just string replace, it has lot > of > >> business logic implemented and finally returns the new HTML string. > >> > >> I want to call processHTML() inside my C++ Apache module. As I know > Apache > >> maintains an internal data structure called Buckets and Brigades which > >> actually contain the HTML file data. My question is, is the entire HTML > >> file content (in my case the html file is huge) residing in a single > >> bucket? Means, when I fetch one bucket at a time from a brigade, can I > be > >> sure that the entire HTML file data from <html> to </html> can be found > in > >> a single bucket? For ex. if my html file looks like this - > >> <html> > >> .. > >> .. > >> oldString > >> ... oldString...........oldString.. > >> .. > >> </html> > >> > >> When I iterate through all buckets of a brigade, will I find my entire > HTML > >> file content in a single bucket OR the HTML file content can be present > in > >> multiple buckets, say like this - > >> > >> case1: > >> bucket-1 contents = > >> "<html> > >> .. > >> .. > >> oldString > >> ... oldString...........oldString.. > >> .. > >> </html>" > >> > >> case2: > >> bucket-1 contents = > >> "<html> > >> .. > >> .. > >> oldStr" > >> > >> bucket-2 contents = > >> "ing > >> ... oldString...........oldString.. > >> .. > >> </html>" > >> > >> If its case2, then the the function processHTML() I have written will > not > >> work because it searches for the entire string "oldString" and in case2 > >> "oldString" is found only partially. > >> > >> Thanks a lot. > >> > >