Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
* Transient bucket seems to be working fine in mod_sed. * Added error handling code so that if ap_pass_brigade fails during request processing, error is returned to sed_response_filter / sed_request_filter. Testing : * Compiled with 2.2 branch and make sure there is no regression (against gsed test cases). * Compiled with trunk. Final patch is attached. Regards, Basant. On Mon, Sep 15, 2008 at 12:17:04AM -0700, Basant Kukreja wrote: Hi, Attached is the *rough* patch which uses transient buckets in mod_sed output filter. Testing : I created a 30MB and 300MB text files and ran OutputSed commands on the file. * Before the patch, process size (worker mpm with 1 thread) increased up to 300M for single request. After the patch, process size remains to 3MB to server 300M response output. I also removed 1 extra copying for processing output. I need to add some more error handling to finalize the patch. Any comments are welcome. Regards, Basant. On Thu, Sep 04, 2008 at 09:47:26PM -0500, William A. Rowe, Jr. wrote: Basant Kukreja wrote: Based on your suggestion, I will check what are the other improvements from mod_substitute can be brought into mod_sed. Note that mod_substitute's brigade handling is already based on the work of both Jim and Nick (author of mod_line_edit) - so they are pretty certain that it is the right approach. Good idea to borrow from it. Bill Index: modules/filters/mod_sed.c === --- modules/filters/mod_sed.c (revision 692768) +++ modules/filters/mod_sed.c (working copy) @@ -26,7 +26,8 @@ #include libsed.h static const char *sed_filter_name = Sed; -#define MODSED_OUTBUF_SIZE 4000 +#define MODSED_OUTBUF_SIZE 8000 +#define MAX_TRANSIENT_BUCKETS 50 typedef struct sed_expr_config { @@ -44,11 +45,14 @@ typedef struct sed_filter_ctxt { sed_eval_t eval; +ap_filter_t *f; request_rec *r; apr_bucket_brigade *bb; char *outbuf; char *curoutbuf; int bufsize; +apr_pool_t *tpool; +int numbuckets; } sed_filter_ctxt; module AP_MODULE_DECLARE_DATA sed_module; @@ -71,29 +75,68 @@ sed_cfg-last_error = error; } +/* clear the temporary pool (used for transient buckets) + */ +static void clear_ctxpool(sed_filter_ctxt* ctx) +{ +apr_pool_clear(ctx-tpool); +ctx-outbuf = NULL; +ctx-curoutbuf = NULL; +ctx-numbuckets = 0; +} + +/* alloc_outbuf + * allocate output buffer + */ +static void alloc_outbuf(sed_filter_ctxt* ctx) +{ +ctx-outbuf = apr_palloc(ctx-tpool, ctx-bufsize + 1); +ctx-curoutbuf = ctx-outbuf; +} + +/* append_bucket + * Allocate a new bucket from buf and sz and append to ctx-bb + */ +static void append_bucket(sed_filter_ctxt* ctx, char* buf, int sz) +{ +int rv; +apr_bucket *b; +if (ctx-tpool == ctx-r-pool) { +/* We are not using transient bucket */ +b = apr_bucket_pool_create(buf, sz, ctx-r-pool, + ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +} +else { +/* We are using transient bucket */ +b = apr_bucket_transient_create(buf, sz, +ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +ctx-numbuckets++; +if (ctx-numbuckets = MAX_TRANSIENT_BUCKETS) { +b = apr_bucket_flush_create(ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +rv = ap_pass_brigade(ctx-f-next, ctx-bb); +apr_brigade_cleanup(ctx-bb); +clear_ctxpool(ctx); +} +} +} + /* * flush_output_buffer * Flush the output data (stored in ctx-outbuf) */ -static void flush_output_buffer(sed_filter_ctxt *ctx, char* buf, int sz) +static void flush_output_buffer(sed_filter_ctxt *ctx) { int size = ctx-curoutbuf - ctx-outbuf; char *out; -apr_bucket *b; -if (size + sz = 0) +if ((ctx-outbuf == NULL) || (size =0)) return; -out = apr_palloc(ctx-r-pool, size + sz); -if (size) { -memcpy(out, ctx-outbuf, size); -} -if (buf (sz 0)) { -memcpy(out + size, buf, sz); -} -/* Reset the output buffer position */ +out = apr_palloc(ctx-tpool, size); +memcpy(out, ctx-outbuf, size); +append_bucket(ctx, out, size); ctx-curoutbuf = ctx-outbuf; -b = apr_bucket_pool_create(out, size + sz, ctx-r-pool, - ctx-r-connection-bucket_alloc); -APR_BRIGADE_INSERT_TAIL(ctx-bb, b); } /* This is a call back function. When libsed wants to generate the output, @@ -104,11 +147,38 @@ /* dummy is basically filter context. Context is passed during invocation * of sed_eval_buffer */ +int remainbytes = 0; sed_filter_ctxt *ctx =
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
Hi, Attached is the *rough* patch which uses transient buckets in mod_sed output filter. Testing : I created a 30MB and 300MB text files and ran OutputSed commands on the file. * Before the patch, process size (worker mpm with 1 thread) increased up to 300M for single request. After the patch, process size remains to 3MB to server 300M response output. I also removed 1 extra copying for processing output. I need to add some more error handling to finalize the patch. Any comments are welcome. Regards, Basant. On Thu, Sep 04, 2008 at 09:47:26PM -0500, William A. Rowe, Jr. wrote: Basant Kukreja wrote: Based on your suggestion, I will check what are the other improvements from mod_substitute can be brought into mod_sed. Note that mod_substitute's brigade handling is already based on the work of both Jim and Nick (author of mod_line_edit) - so they are pretty certain that it is the right approach. Good idea to borrow from it. Bill Index: modules/filters/mod_sed.c === --- modules/filters/mod_sed.c (revision 692768) +++ modules/filters/mod_sed.c (working copy) @@ -26,7 +26,8 @@ #include libsed.h static const char *sed_filter_name = Sed; -#define MODSED_OUTBUF_SIZE 4000 +#define MODSED_OUTBUF_SIZE 8000 +#define MAX_TRANSIENT_BUCKETS 50 typedef struct sed_expr_config { @@ -44,11 +45,14 @@ typedef struct sed_filter_ctxt { sed_eval_t eval; +ap_filter_t *f; request_rec *r; apr_bucket_brigade *bb; char *outbuf; char *curoutbuf; int bufsize; +apr_pool_t *tpool; +int numbuckets; } sed_filter_ctxt; module AP_MODULE_DECLARE_DATA sed_module; @@ -71,29 +75,68 @@ sed_cfg-last_error = error; } +/* clear the temporary pool (used for transient buckets) + */ +static void clear_ctxpool(sed_filter_ctxt* ctx) +{ +apr_pool_clear(ctx-tpool); +ctx-outbuf = NULL; +ctx-curoutbuf = NULL; +ctx-numbuckets = 0; +} + +/* alloc_outbuf + * allocate output buffer + */ +static void alloc_outbuf(sed_filter_ctxt* ctx) +{ +ctx-outbuf = apr_palloc(ctx-tpool, ctx-bufsize + 1); +ctx-curoutbuf = ctx-outbuf; +} + +/* append_bucket + * Allocate a new bucket from buf and sz and append to ctx-bb + */ +static void append_bucket(sed_filter_ctxt* ctx, char* buf, int sz) +{ +int rv; +apr_bucket *b; +if (ctx-tpool == ctx-r-pool) { +/* We are not using transient bucket */ +b = apr_bucket_pool_create(buf, sz, ctx-r-pool, + ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +} +else { +/* We are using transient bucket */ +b = apr_bucket_transient_create(buf, sz, +ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +ctx-numbuckets++; +if (ctx-numbuckets = MAX_TRANSIENT_BUCKETS) { +b = apr_bucket_flush_create(ctx-r-connection-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-bb, b); +rv = ap_pass_brigade(ctx-f-next, ctx-bb); +apr_brigade_cleanup(ctx-bb); +clear_ctxpool(ctx); +} +} +} + /* * flush_output_buffer * Flush the output data (stored in ctx-outbuf) */ -static void flush_output_buffer(sed_filter_ctxt *ctx, char* buf, int sz) +static void flush_output_buffer(sed_filter_ctxt *ctx) { int size = ctx-curoutbuf - ctx-outbuf; char *out; -apr_bucket *b; -if (size + sz = 0) +if ((ctx-outbuf == NULL) || (size =0)) return; -out = apr_palloc(ctx-r-pool, size + sz); -if (size) { -memcpy(out, ctx-outbuf, size); -} -if (buf (sz 0)) { -memcpy(out + size, buf, sz); -} -/* Reset the output buffer position */ +out = apr_palloc(ctx-tpool, size); +memcpy(out, ctx-outbuf, size); +append_bucket(ctx, out, size); ctx-curoutbuf = ctx-outbuf; -b = apr_bucket_pool_create(out, size + sz, ctx-r-pool, - ctx-r-connection-bucket_alloc); -APR_BRIGADE_INSERT_TAIL(ctx-bb, b); } /* This is a call back function. When libsed wants to generate the output, @@ -104,11 +147,38 @@ /* dummy is basically filter context. Context is passed during invocation * of sed_eval_buffer */ +int remainbytes = 0; sed_filter_ctxt *ctx = (sed_filter_ctxt *) dummy; -if (((ctx-curoutbuf - ctx-outbuf) + sz) = ctx-bufsize) { -/* flush current buffer */ -flush_output_buffer(ctx, buf, sz); +if (ctx-outbuf == NULL) { +alloc_outbuf(ctx); } +remainbytes = ctx-bufsize - (ctx-curoutbuf - ctx-outbuf); +if (sz = remainbytes) { +if (remainbytes 0) { +memcpy(ctx-curoutbuf, buf, remainbytes); +buf += remainbytes; +sz -= remainbytes; +ctx-curoutbuf += remainbytes; +} +/* buffer is now full */ +append_bucket(ctx,
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
Nick Kew wrote: On Thu, 04 Sep 2008 21:47:26 -0500 William A. Rowe, Jr. [EMAIL PROTECTED] wrote: Basant Kukreja wrote: Based on your suggestion, I will check what are the other improvements from mod_substitute can be brought into mod_sed. Note that mod_substitute's brigade handling is already based on the work of both Jim and Nick (author of mod_line_edit) - so they are pretty certain that it is the right approach. Good idea to borrow from it. Um - make that [everyone]. mod_substitute got pulled apart a bit and optimised when Jim dropped it in to trunk, and ISTR several folks contributing to that. Hopefully Rudiger's comments are just the start of the same process of refinement on mod_sed. +1; didn't mean to slight [anyone] - thanks for raising that, Nick :)
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
+ OutputSed s/sunday/SUN/g br / I guess it should be InputSed here. Patch for the documentation fix is attached. Regards, Basant. $ svn diff mod_sed.xml Index: mod_sed.xml === --- mod_sed.xml (revision 692275) +++ mod_sed.xml (working copy) @@ -110,8 +110,8 @@ lt;Directory /var/www/docs/sedgt; br / indent AddInputFilter Sed php br / - OutputSed s/monday/MON/g br / - OutputSed s/sunday/SUN/g br / + InputSed s/monday/MON/g br / + InputSed s/sunday/SUN/g br / /indent lt;/Directorygt; br / /indent
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
Just a note : sed original code also have ATT copyright. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed1.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed0.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed.h Regards, Basant. On Thu, Sep 04, 2008 at 09:20:43AM -0400, Jim Jagielski wrote: Apologies if this was already discussed and resolved, but I see quite a number of: * Copyright (c) 1984 ATT *All Rights Reserved in various files... Can we track the IP of those parts to ensure that we (and Sun) have the required license to use them??
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
On Sep 5, 2008, at 4:04 PM, Basant Kumar kukreja wrote: Just a note : sed original code also have ATT copyright. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ ucbcmd/sed/sed1.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ ucbcmd/sed/sed0.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ ucbcmd/sed/sed.h Regards, Basant. On Thu, Sep 04, 2008 at 09:20:43AM -0400, Jim Jagielski wrote: Apologies if this was already discussed and resolved, but I see quite a number of: * Copyright (c) 1984 ATT *All Rights Reserved in various files... Can we track the IP of those parts to ensure that we (and Sun) have the required license to use them?? FTR, it is only necessary that Sun do the diligence -- we rely on them to do so as part of the contribution process. Since this is code from the OpenSolaris gate, I know that Sun has a defined legal process to vet their code before publication. From the headers, it looks like this was part of the Unix system licensed to Sun during the big System 5 deal with ATT, and thus I see no problem with accepting the contribution from Sun. However, our license text should be on top of each file, not after the copyright notices. I will fix that when I get a chance. Roy
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
On 09/03/2008 01:01 AM, [EMAIL PROTECTED] wrote: Author: niq Date: Tue Sep 2 16:01:47 2008 New Revision: 691418 URL: http://svn.apache.org/viewvc?rev=691418view=rev Log: Commit mod_sed: enable filtering of HTTP Requests and Responses through sed Added: httpd/httpd/trunk/docs/manual/mod/mod_sed.xml httpd/httpd/trunk/modules/filters/libsed.h httpd/httpd/trunk/modules/filters/mod_sed.c httpd/httpd/trunk/modules/filters/regexp.c httpd/httpd/trunk/modules/filters/regexp.h httpd/httpd/trunk/modules/filters/sed.h httpd/httpd/trunk/modules/filters/sed0.c httpd/httpd/trunk/modules/filters/sed1.c Modified: httpd/httpd/trunk/CHANGES httpd/httpd/trunk/modules/filters/config.m4 Modified: httpd/httpd/trunk/CHANGES Added: httpd/httpd/trunk/docs/manual/mod/mod_sed.xml URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/mod/mod_sed.xml?rev=691418view=auto == --- httpd/httpd/trunk/docs/manual/mod/mod_sed.xml (added) +++ httpd/httpd/trunk/docs/manual/mod/mod_sed.xml Tue Sep 2 16:01:47 2008 @@ -0,0 +1,141 @@ +?xml version=1.0? +!DOCTYPE modulesynopsis SYSTEM ../style/modulesynopsis.dtd +?xml-stylesheet type=text/xsl href=../style/manual.en.xsl? +!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the License); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an AS IS BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +-- + +modulesynopsis metafile=mod_sed.xml.meta + +namemod_sed/name +descriptionFiltering Input (request) and Output (response) content using sed commands/description +statusExperimental/status +sourcefilemod_sed.c sed0.c sed1.c regexp.c regexp.h sed.h/sourcefile +identifiersed_module/identifier +compatibilityAvailable in Apache 2.3 and later/compatibility + +summary +p +mod_sed is a in-process content filter. mod_sed filter implement the sed edit +commands implemented by Solaris 10 sed +program as described in a href=http://docs.sun.com/app/docs/doc/816-5165/sed-1b?a=view;man +page/a. However unlike sed, mod_sed doesn't take data from +standard +input. Instead filter act on the entity data sent between client and +server. mod_sed can be used as a input or output filter. mod_sed is a +content filter which means that it can not be used to modify client or +server http headers. +/p +p +mod_sed output filter accept a chunk of data and execute the sed scripts on data and generates the output which is passed to next filter in the filter chain. +/p + +p +mod_sed input filter reads the data from next filter in filter chain and executes the sed scripts and returns the generated data to caller filter in the filter chain. +/p + +p +Both input and output filter only process the data if new line character is seen in the content. At the end of the data, rest of the data is treated as last line. +/p + +pA tutorial article on mod_sed, and why it is more powerful than simple +string or regular expression search and replace, is available in a +href=http://blogs.sun.com/basant/entry/using_mod_sed_to_filter;on +the author's blog/a./p + +/summary + +directivesynopsis +nameOutputSed/name +descriptionSed command for filter the response content/description +syntaxOutputSed varsed-command/var/syntax +contextlistcontextdirectory/contextcontext.htaccess/context +/contextlist + +usage +pThe directiveOutputSed/directive directive specify the sed +command which will be executed on the response. +/p +/usage +/directivesynopsis + +directivesynopsis +nameInputSed/name +descriptionSed command to filter the request data (typically post data)/description +syntaxInputSed varsed-command/var/syntax +contextlistcontextdirectory/contextcontext.htaccess/context +/contextlist + +usage +pThe directiveInputSed/directive directive specify the sed command +which will be executed on the request data e.g POST data. +/p +/usage +/directivesynopsis + +section id=sampleconftitleSample Configuration/title +exampletitleAdding a output filter /title + # In following example, sed filter will replace the string br / + # monday to MON and the string sunday to SUN in html document br / + # before sending to client. br / +indent +lt;Directory /var/www/docs/sedgt; br / + indent + AddOutputFilter Sed html br / + OutputSed
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
Apologies if this was already discussed and resolved, but I see quite a number of: * Copyright (c) 1984 ATT *All Rights Reserved in various files... Can we track the IP of those parts to ensure that we (and Sun) have the required license to use them??
Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
On Thu, 4 Sep 2008 10:12:39 -0400 Jim Jagielski [EMAIL PROTECTED] wrote: On Sep 4, 2008, at 6:38 AM, Ruediger Pluem wrote: More comments possibly later. For the most part, it looks like many of the optimizations in mod_substitute, esp regarding efficient use of buckets, is lacking in mod_sed... Anecdotal and benchmark data suggest that it's competitive in performance terms. Now that it's in svn, we can further improve it. Ruediger's comments look like a start on that. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/