Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-12-22 Thread Basant Kumar kukreja
* Transient bucket seems to be working fine in mod_sed.
* Added error handling code so that if ap_pass_brigade fails during
  request processing, error is returned to sed_response_filter /
  sed_request_filter.

Testing :
* Compiled with 2.2 branch and make sure there is no regression (against gsed
test cases).
* Compiled with trunk.

Final patch is attached.

Regards,
Basant.



On Mon, Sep 15, 2008 at 12:17:04AM -0700, Basant Kukreja wrote:
 Hi,
 
 Attached is the *rough* patch which uses transient buckets in mod_sed output
 filter.
 
 Testing :
   I created a 30MB and 300MB text files and ran OutputSed commands on the 
 file.
 * Before the patch, process size (worker mpm with 1 thread) increased up to 
 300M 
 for single request.  After the  patch, process size remains to 3MB to server
 300M response output.
 
 I also removed 1 extra copying for processing output.
 
 I need to add some more error handling to finalize the patch. Any comments are
 welcome.
 
 Regards,
 Basant.
 
 On Thu, Sep 04, 2008 at 09:47:26PM -0500, William A. Rowe, Jr. wrote:
  Basant Kukreja wrote:
 
  Based on your suggestion, I will check what are the other improvements from
  mod_substitute can be brought into mod_sed.
 
  Note that mod_substitute's brigade handling is already based on the work of
  both Jim and Nick (author of mod_line_edit) - so they are pretty certain
  that it is the right approach.  Good idea to borrow from it.
 
  Bill

 Index: modules/filters/mod_sed.c
 ===
 --- modules/filters/mod_sed.c (revision 692768)
 +++ modules/filters/mod_sed.c (working copy)
 @@ -26,7 +26,8 @@
  #include libsed.h
  
  static const char *sed_filter_name = Sed;
 -#define MODSED_OUTBUF_SIZE 4000
 +#define MODSED_OUTBUF_SIZE 8000
 +#define MAX_TRANSIENT_BUCKETS 50
  
  typedef struct sed_expr_config
  {
 @@ -44,11 +45,14 @@
  typedef struct sed_filter_ctxt
  {
  sed_eval_t eval;
 +ap_filter_t *f;
  request_rec *r;
  apr_bucket_brigade *bb;
  char *outbuf;
  char *curoutbuf;
  int bufsize;
 +apr_pool_t *tpool;
 +int numbuckets;
  } sed_filter_ctxt;
  
  module AP_MODULE_DECLARE_DATA sed_module;
 @@ -71,29 +75,68 @@
  sed_cfg-last_error = error;
  }
  
 +/* clear the temporary pool (used for transient buckets)
 + */
 +static void clear_ctxpool(sed_filter_ctxt* ctx)
 +{
 +apr_pool_clear(ctx-tpool);
 +ctx-outbuf = NULL;
 +ctx-curoutbuf = NULL;
 +ctx-numbuckets = 0;
 +}
 +
 +/* alloc_outbuf
 + * allocate output buffer
 + */
 +static void alloc_outbuf(sed_filter_ctxt* ctx)
 +{
 +ctx-outbuf = apr_palloc(ctx-tpool, ctx-bufsize + 1);
 +ctx-curoutbuf = ctx-outbuf;
 +}
 +
 +/* append_bucket
 + * Allocate a new bucket from buf and sz and append to ctx-bb
 + */
 +static void append_bucket(sed_filter_ctxt* ctx, char* buf, int sz)
 +{
 +int rv;
 +apr_bucket *b;
 +if (ctx-tpool == ctx-r-pool) {
 +/* We are not using transient bucket */
 +b = apr_bucket_pool_create(buf, sz, ctx-r-pool,
 +   ctx-r-connection-bucket_alloc);
 +APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
 +}
 +else {
 +/* We are using transient bucket */
 +b = apr_bucket_transient_create(buf, sz,
 +ctx-r-connection-bucket_alloc);
 +APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
 +ctx-numbuckets++;
 +if (ctx-numbuckets = MAX_TRANSIENT_BUCKETS) {
 +b = apr_bucket_flush_create(ctx-r-connection-bucket_alloc);
 +APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
 +rv = ap_pass_brigade(ctx-f-next, ctx-bb);
 +apr_brigade_cleanup(ctx-bb);
 +clear_ctxpool(ctx);
 +}
 +}
 +}
 +
  /*
   * flush_output_buffer
   * Flush the  output data (stored in ctx-outbuf)
   */
 -static void flush_output_buffer(sed_filter_ctxt *ctx, char* buf, int sz)
 +static void flush_output_buffer(sed_filter_ctxt *ctx)
  {
  int size = ctx-curoutbuf - ctx-outbuf;
  char *out;
 -apr_bucket *b;
 -if (size + sz = 0)
 +if ((ctx-outbuf == NULL) || (size =0))
  return;
 -out = apr_palloc(ctx-r-pool, size + sz);
 -if (size) {
 -memcpy(out, ctx-outbuf, size);
 -}
 -if (buf  (sz  0)) {
 -memcpy(out + size, buf, sz);
 -}
 -/* Reset the output buffer position */
 +out = apr_palloc(ctx-tpool, size);
 +memcpy(out, ctx-outbuf, size);
 +append_bucket(ctx, out, size);
  ctx-curoutbuf = ctx-outbuf;
 -b = apr_bucket_pool_create(out, size + sz, ctx-r-pool,
 -   ctx-r-connection-bucket_alloc);
 -APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
  }
  
  /* This is a call back function. When libsed wants to generate the output,
 @@ -104,11 +147,38 @@
  /* dummy is basically filter context. Context is passed during invocation
   * of sed_eval_buffer
   */
 +int remainbytes = 0;
  sed_filter_ctxt *ctx = 

Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-15 Thread Basant Kukreja
Hi,

Attached is the *rough* patch which uses transient buckets in mod_sed output
filter.

Testing :
  I created a 30MB and 300MB text files and ran OutputSed commands on the file.
* Before the patch, process size (worker mpm with 1 thread) increased up to 
300M 
for single request.  After the  patch, process size remains to 3MB to server
300M response output.

I also removed 1 extra copying for processing output.

I need to add some more error handling to finalize the patch. Any comments are
welcome.

Regards,
Basant.

On Thu, Sep 04, 2008 at 09:47:26PM -0500, William A. Rowe, Jr. wrote:
 Basant Kukreja wrote:

 Based on your suggestion, I will check what are the other improvements from
 mod_substitute can be brought into mod_sed.

 Note that mod_substitute's brigade handling is already based on the work of
 both Jim and Nick (author of mod_line_edit) - so they are pretty certain
 that it is the right approach.  Good idea to borrow from it.

 Bill
Index: modules/filters/mod_sed.c
===
--- modules/filters/mod_sed.c   (revision 692768)
+++ modules/filters/mod_sed.c   (working copy)
@@ -26,7 +26,8 @@
 #include libsed.h
 
 static const char *sed_filter_name = Sed;
-#define MODSED_OUTBUF_SIZE 4000
+#define MODSED_OUTBUF_SIZE 8000
+#define MAX_TRANSIENT_BUCKETS 50
 
 typedef struct sed_expr_config
 {
@@ -44,11 +45,14 @@
 typedef struct sed_filter_ctxt
 {
 sed_eval_t eval;
+ap_filter_t *f;
 request_rec *r;
 apr_bucket_brigade *bb;
 char *outbuf;
 char *curoutbuf;
 int bufsize;
+apr_pool_t *tpool;
+int numbuckets;
 } sed_filter_ctxt;
 
 module AP_MODULE_DECLARE_DATA sed_module;
@@ -71,29 +75,68 @@
 sed_cfg-last_error = error;
 }
 
+/* clear the temporary pool (used for transient buckets)
+ */
+static void clear_ctxpool(sed_filter_ctxt* ctx)
+{
+apr_pool_clear(ctx-tpool);
+ctx-outbuf = NULL;
+ctx-curoutbuf = NULL;
+ctx-numbuckets = 0;
+}
+
+/* alloc_outbuf
+ * allocate output buffer
+ */
+static void alloc_outbuf(sed_filter_ctxt* ctx)
+{
+ctx-outbuf = apr_palloc(ctx-tpool, ctx-bufsize + 1);
+ctx-curoutbuf = ctx-outbuf;
+}
+
+/* append_bucket
+ * Allocate a new bucket from buf and sz and append to ctx-bb
+ */
+static void append_bucket(sed_filter_ctxt* ctx, char* buf, int sz)
+{
+int rv;
+apr_bucket *b;
+if (ctx-tpool == ctx-r-pool) {
+/* We are not using transient bucket */
+b = apr_bucket_pool_create(buf, sz, ctx-r-pool,
+   ctx-r-connection-bucket_alloc);
+APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
+}
+else {
+/* We are using transient bucket */
+b = apr_bucket_transient_create(buf, sz,
+ctx-r-connection-bucket_alloc);
+APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
+ctx-numbuckets++;
+if (ctx-numbuckets = MAX_TRANSIENT_BUCKETS) {
+b = apr_bucket_flush_create(ctx-r-connection-bucket_alloc);
+APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
+rv = ap_pass_brigade(ctx-f-next, ctx-bb);
+apr_brigade_cleanup(ctx-bb);
+clear_ctxpool(ctx);
+}
+}
+}
+
 /*
  * flush_output_buffer
  * Flush the  output data (stored in ctx-outbuf)
  */
-static void flush_output_buffer(sed_filter_ctxt *ctx, char* buf, int sz)
+static void flush_output_buffer(sed_filter_ctxt *ctx)
 {
 int size = ctx-curoutbuf - ctx-outbuf;
 char *out;
-apr_bucket *b;
-if (size + sz = 0)
+if ((ctx-outbuf == NULL) || (size =0))
 return;
-out = apr_palloc(ctx-r-pool, size + sz);
-if (size) {
-memcpy(out, ctx-outbuf, size);
-}
-if (buf  (sz  0)) {
-memcpy(out + size, buf, sz);
-}
-/* Reset the output buffer position */
+out = apr_palloc(ctx-tpool, size);
+memcpy(out, ctx-outbuf, size);
+append_bucket(ctx, out, size);
 ctx-curoutbuf = ctx-outbuf;
-b = apr_bucket_pool_create(out, size + sz, ctx-r-pool,
-   ctx-r-connection-bucket_alloc);
-APR_BRIGADE_INSERT_TAIL(ctx-bb, b);
 }
 
 /* This is a call back function. When libsed wants to generate the output,
@@ -104,11 +147,38 @@
 /* dummy is basically filter context. Context is passed during invocation
  * of sed_eval_buffer
  */
+int remainbytes = 0;
 sed_filter_ctxt *ctx = (sed_filter_ctxt *) dummy;
-if (((ctx-curoutbuf - ctx-outbuf) + sz) = ctx-bufsize) {
-/* flush current buffer */
-flush_output_buffer(ctx, buf, sz);
+if (ctx-outbuf == NULL) {
+alloc_outbuf(ctx);
 }
+remainbytes = ctx-bufsize - (ctx-curoutbuf - ctx-outbuf);
+if (sz = remainbytes) {
+if (remainbytes  0) {
+memcpy(ctx-curoutbuf, buf, remainbytes);
+buf += remainbytes;
+sz -= remainbytes;
+ctx-curoutbuf += remainbytes;
+}
+/* buffer is now full */
+append_bucket(ctx, 

Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-05 Thread William A. Rowe, Jr.

Nick Kew wrote:

On Thu, 04 Sep 2008 21:47:26 -0500
William A. Rowe, Jr. [EMAIL PROTECTED] wrote:


Basant Kukreja wrote:

Based on your suggestion, I will check what are the other
improvements from mod_substitute can be brought into mod_sed.

Note that mod_substitute's brigade handling is already based on the
work of both Jim and Nick (author of mod_line_edit) - so they are
pretty certain that it is the right approach.  Good idea to borrow
from it.


Um - make that [everyone].  mod_substitute got pulled apart a bit
and optimised when Jim dropped it in to trunk, and ISTR several
folks contributing to that.

Hopefully Rudiger's comments are just the start of the same
process of refinement on mod_sed.


+1; didn't mean to slight [anyone] - thanks for raising that, Nick :)


Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-05 Thread Basant Kumar kukreja
 +   OutputSed s/sunday/SUN/g br /

 I guess it should be InputSed here.


Patch for the documentation fix is attached.

Regards,
Basant.

$ svn diff mod_sed.xml
Index: mod_sed.xml
===
--- mod_sed.xml (revision 692275)
+++ mod_sed.xml (working copy)
@@ -110,8 +110,8 @@
 lt;Directory /var/www/docs/sedgt; br /
indent
AddInputFilter Sed php br /
-   OutputSed s/monday/MON/g br /
-   OutputSed s/sunday/SUN/g br /
+   InputSed s/monday/MON/g br /
+   InputSed s/sunday/SUN/g br /
/indent
 lt;/Directorygt; br /
 /indent


Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-05 Thread Basant Kumar kukreja
Just a note : sed original code also have ATT copyright.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed1.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed0.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/sed/sed.h

Regards,
Basant.

On Thu, Sep 04, 2008 at 09:20:43AM -0400, Jim Jagielski wrote:
 Apologies if this was already discussed and resolved, but I
 see quite a number of:

  *  Copyright (c) 1984 ATT
  *All Rights Reserved

 in various files... Can we track the IP of those parts to
 ensure that we (and Sun) have the required license to use
 them??


Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-05 Thread Roy T. Fielding

On Sep 5, 2008, at 4:04 PM, Basant Kumar kukreja wrote:


Just a note : sed original code also have ATT copyright.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ 
ucbcmd/sed/sed1.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ 
ucbcmd/sed/sed0.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ 
ucbcmd/sed/sed.h


Regards,
Basant.

On Thu, Sep 04, 2008 at 09:20:43AM -0400, Jim Jagielski wrote:

Apologies if this was already discussed and resolved, but I
see quite a number of:

 *  Copyright (c) 1984 ATT
 *All Rights Reserved

in various files... Can we track the IP of those parts to
ensure that we (and Sun) have the required license to use
them??


FTR, it is only necessary that Sun do the diligence -- we rely on them
to do so as part of the contribution process.  Since this is code from
the OpenSolaris gate, I know that Sun has a defined legal process
to vet their code before publication.  From the headers, it looks like
this was part of the Unix system licensed to Sun during the big
System 5 deal with ATT, and thus I see no problem with accepting the
contribution from Sun.

However, our license text should be on top of each file, not after the
copyright notices.  I will fix that when I get a chance.

Roy



Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-04 Thread Ruediger Pluem



On 09/03/2008 01:01 AM, [EMAIL PROTECTED] wrote:

Author: niq
Date: Tue Sep  2 16:01:47 2008
New Revision: 691418

URL: http://svn.apache.org/viewvc?rev=691418view=rev
Log:
Commit mod_sed: enable filtering of HTTP Requests and Responses through sed

Added:
httpd/httpd/trunk/docs/manual/mod/mod_sed.xml
httpd/httpd/trunk/modules/filters/libsed.h
httpd/httpd/trunk/modules/filters/mod_sed.c
httpd/httpd/trunk/modules/filters/regexp.c
httpd/httpd/trunk/modules/filters/regexp.h
httpd/httpd/trunk/modules/filters/sed.h
httpd/httpd/trunk/modules/filters/sed0.c
httpd/httpd/trunk/modules/filters/sed1.c
Modified:
httpd/httpd/trunk/CHANGES
httpd/httpd/trunk/modules/filters/config.m4

Modified: httpd/httpd/trunk/CHANGES



Added: httpd/httpd/trunk/docs/manual/mod/mod_sed.xml
URL: 
http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/mod/mod_sed.xml?rev=691418view=auto
==
--- httpd/httpd/trunk/docs/manual/mod/mod_sed.xml (added)
+++ httpd/httpd/trunk/docs/manual/mod/mod_sed.xml Tue Sep  2 16:01:47 2008
@@ -0,0 +1,141 @@
+?xml version=1.0?
+!DOCTYPE modulesynopsis SYSTEM ../style/modulesynopsis.dtd
+?xml-stylesheet type=text/xsl href=../style/manual.en.xsl?
+!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the License); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an AS IS BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+--
+
+modulesynopsis metafile=mod_sed.xml.meta
+
+namemod_sed/name
+descriptionFiltering Input (request) and Output (response) content using sed 
commands/description
+statusExperimental/status
+sourcefilemod_sed.c sed0.c sed1.c regexp.c regexp.h sed.h/sourcefile
+identifiersed_module/identifier
+compatibilityAvailable in Apache 2.3 and later/compatibility
+
+summary
+p
+mod_sed is a in-process content filter. mod_sed filter implement the sed edit
+commands implemented by Solaris 10 sed
+program as described in a 
href=http://docs.sun.com/app/docs/doc/816-5165/sed-1b?a=view;man
+page/a. However unlike sed, mod_sed doesn't take data from
+standard
+input. Instead filter act on the entity data sent between client and
+server. mod_sed can be used as a input or output filter. mod_sed is a
+content filter which means that it can not be used to modify client or
+server http headers.
+/p
+p
+mod_sed output filter accept a chunk of data and execute the sed scripts on 
data and generates the output which is passed to next filter in the filter 
chain.
+/p
+
+p
+mod_sed input filter reads the data from next filter in filter chain and 
executes the sed scripts and returns the generated data to caller filter in the 
filter chain.
+/p
+
+p
+Both input and output filter only process the data if new line character is 
seen in the content. At the end of the data, rest of the data is treated as 
last line.
+/p
+
+pA tutorial article on mod_sed, and why it is more powerful than simple
+string or regular expression search and replace, is available in a
+href=http://blogs.sun.com/basant/entry/using_mod_sed_to_filter;on
+the author's blog/a./p
+
+/summary
+
+directivesynopsis
+nameOutputSed/name
+descriptionSed command for filter the response content/description
+syntaxOutputSed varsed-command/var/syntax
+contextlistcontextdirectory/contextcontext.htaccess/context
+/contextlist
+
+usage
+pThe directiveOutputSed/directive directive specify the sed
+command which will be executed on the response.
+/p
+/usage
+/directivesynopsis
+
+directivesynopsis
+nameInputSed/name
+descriptionSed command to filter the request data (typically post 
data)/description
+syntaxInputSed varsed-command/var/syntax
+contextlistcontextdirectory/contextcontext.htaccess/context
+/contextlist
+
+usage
+pThe directiveInputSed/directive directive specify the sed command
+which will be executed on the request data e.g POST data.
+/p
+/usage
+/directivesynopsis
+
+section id=sampleconftitleSample Configuration/title
+exampletitleAdding a output filter /title
+ # In following example, sed filter will replace the string br /
+ # monday to MON and the string sunday to SUN in html document br 
/
+ # before sending to client. br /
+indent
+lt;Directory /var/www/docs/sedgt; br /
+   indent
+   AddOutputFilter Sed html br /
+   OutputSed 

Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-04 Thread Jim Jagielski

Apologies if this was already discussed and resolved, but I
see quite a number of:

 *  Copyright (c) 1984 ATT
 *All Rights Reserved

in various files... Can we track the IP of those parts to
ensure that we (and Sun) have the required license to use
them??


Re: svn commit: r691418 [1/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/

2008-09-04 Thread Nick Kew
On Thu, 4 Sep 2008 10:12:39 -0400
Jim Jagielski [EMAIL PROTECTED] wrote:

 
 On Sep 4, 2008, at 6:38 AM, Ruediger Pluem wrote:
 
  More comments possibly later.
 
 
 For the most part, it looks like many of the optimizations
 in mod_substitute, esp regarding efficient use of buckets,
 is lacking in mod_sed...

Anecdotal and benchmark data suggest that it's competitive
in performance terms.  Now that it's in svn, we can further
improve it.  Ruediger's comments look like a start on that.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/