Re: Module portability

2021-12-09 Thread Sorin Manolache

On 09/12/2021 02.58, miim wrote:


I am not sure if this is a question for this list or for the APR forum, but it 
seems to me that this forum is more likely to have an answer.

To what extent can I expect a module compiled on system A to be portable to 
system B:

-- if both are running the same distribution of linux, though possibly not the 
same release
-- if system A and system B are both running Apache 2.4.x, though possibly not 
the same release
-- if system A has a custom-built Apache and system B has a distribution 
release of Apache

I'm asking because several people don't want to compile their own modules 
(can't imagine why, it's so easy an Einstein can do it) and asked for 
ready-to-go kits that they can drop in and go.

APR seems to me to be a black box, and if one of my modules calls a reslib that 
may not be present in another machine or at a different level, it seems to me 
there's potential problems there; and yet APR being a black box, it is going to 
be difficult to force static linking against libraries instead of resident 
libraries if APR intends to go that way.



Linux distributions ship some 3rd-party modules 
(libapache2-mod-auth-kerb for kerberos authentication for example). I've 
noticed that there are regular updates of apache-2.4 packages without 
offering updates for the 3rd-party modules. So in many situations patch 
version number changes of apache-2.4 do not require rebuilds of the 
3rd-party modules.


The apache source API is guaranteed not to change among releases having 
the same major and minor version numbers.


The immutability of the ABI (binary interface) does not really depend 
only on apache, but depends on your build tool chain too. For example if 
your code contained C++, some time ago there was a change in the ABI of 
C++ objects (std::string, std::list). (Debian/Ubuntu addressed this 
problem by creating packages with names such as libFOOv5 (note the v5). 
Slowly all libraries migrated to the new ABI and these names were phased 
out. So if your distributed your module as a package then it would not 
have installed because on one distribution it would depend on libFOO and 
on the other FOO is distributed in libFOOv5.)


Another example: If your module depends on boost or on other header-only 
C++ library then your binary module will contain the code of the 
functions and the static data that it uses from boost. Let us assume 
that you've built it on distribution 1.0 with libboost 1.6 and your code 
uses an object of class X from libboost. On distribution 2.0 someone 
builds another module using boost 1.7 and uses the same object of class 
X. However class X has a different binary layout in boost 1.7 than in 
boost 1.6. So your module would function correctly as long as it is the 
only module using libboost that is loaded into apache. As soon as you 
have two modules using libboost, the runtime linker will load the object 
of class X symbol twice. It will retain only one instance. As the layout 
of the retained class X's symbol is incompatible with one of the modules 
that uses it, the execution will crash.


There are solutions to such situations. You significantly increase the 
chances of binary compatibility if, when linking your module, you 
instruct the linker to put absolutely all symbols as hidden (private to 
the shared object) except the module structure. So the only global 
symbol of your binary module should be the module structure. (See ld's 
man page, the --version-script option.) Then the runtime linker, when 
loading the two shared objects of the two modules using boost, will load 
two instances of the class X object. They'll have different layouts, 
each used by its respective module and the crash is avoided. (Run "nm 
-aC mod_mymodule.so" to see all symbols of your module and consult nm's 
man page for an explanation of its output. Basically upper-case letters 
in the second column of the output denote global symbols and lower-case 
letter denote module-local symbols. Ideally the only globally defined 
symbol of your binary should look like this: "012192a0 D 
my_module". Note the upper-case D.)


That said, it is very likely that the module build on one distribution 
is installable and running correctly on a more recent distribution. The 
situations in which it does not work are really idiosyncratic.


I think you should not worry at all about APR. It is the least likely to 
cause a binary incompatibility between distributions. When the ABI of 
libapr changes it'll be big news.


HTH,
Sorin


Re: Chasing a segfault, part II

2021-10-25 Thread Sorin Manolache

On 26/10/2021 08.18, miim wrote:

ua_pointer = apr_table_get(r->headers_in, "User-Agent");
   /* Find out how long the Apache-supplied string is */
   ualength = strlen(ua_pointer);


If the request does not contain any user-agent then ua_pointer will be 
NULL. strlen of NULL will segfault.


S


Re: Chasing a segfault

2021-10-23 Thread Sorin Manolache

On 23/10/2021 02.49, miim wrote:

I have a relatively simple module which is nonetheless causing Apache to 
intermittently segfault.

I've added debugging trace messages to be sent to the error log, but the lack of anything 
in the log at the time of the segfault leads me to think that the error log is not 
flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last 
previous request was at 00:15:36, so clearly the new request caused the segfault.   But 
not even the "Here I am at the handler entry point" (see below) gets into the 
logfile before the server log reports a segfault taking down Apache.


   /* Retrieve the per-server configuration */
   mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
   &bridcheck_module);
   if (bc_scfg->bc_logdebug & 0x00200)
  ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
"mod_bridcheck: Enter bridcheck_handler");


I could turn on core dumping but (a) I am no expert at decoding core dumps and 
(b) I don't want to dump this problem on somebody else.

So ... is there a way to force Apache to flush the error log before proceeding?


Hello,

I think it is not a problem of log flushing. It is just that when a 
segfault occurs the death is sudden because the process is killed by the 
OS and has few chances to handle the error itself.


I am very confident, almost 100% sure, that if you don't see the message 
in the log then the execution has simply not reached it, the segfault 
happened before.


In my opinion it is easier to learn some four or five gdb commands than 
to do whatsoever when the segfault occurs. There's only one way of 
preventing the death of the process and that it to place a handler on 
the SIGSEGV signal in your module (see "man signal" or "man sigaction"). 
But there's not much you can do in the signal handler. As said, it is 
much much easier to activate coredumps and learn some commands.


Here's how I do it typically:

In Debian/Ubuntu distributions, they put a file named envvars in 
/etc/apache2. If you have such a distribution edit it as I show below. 
If not, then make sure you get the same effects with other means.


I put the following two lines:

ulimit -c unlimited
echo 1 > /proc/sys/kernel/core_uses_pid

The first line is an internal shell command saying that there should be 
no size limit on the core file. If you don't have /etc/apache2/envvars 
then this command should be executed in the shell from which you launch 
apache, such that the apache process inherits this configuration.


The second command instructs the kernel to add the process id to the 
name of the core file. Thus, if you have two apache children that dump 
cores at the same time, you'll get two different core files instead of 
single file in which the kernel writes both cores, and makes it thus 
unusable. If you don't have /etc/apache2/envvars then you can execute 
this command in any shell, just that you need root privileges in order 
to write to /proc/sys/kernel/core_uses_pid.


Let us assume you have now the core file and its name is core.12345, 
where 12345 is the process id of the apache child process that died.


Then I start gdb and I execute the following gdb commands at the gdb prompt:

file /usr/sbin/apache2
core-file core.12345
thread apply all bt

The first command loads the apache executable.
The second command loads the core file.
The thirst command displays the call stacks of all threads of the 
process (bt = backtrace).


You can switch between threads with the command
thread N

where N is the numerical id of the thread you want to switch to.

Once you're in a thread, you can move up and down the call stack with 
the commands "up" and "down". If you compiled your module with debug 
symbols then you can inspect variables with the "print" command, e.g. 
"print bc_scfg". If, for example, the segfault occurred somewhere in a 
libc function, such as malloc, free, strcpy, etc, you may move up the 
call chain to the caller of the libc function, to inspect its arguments.


Besides the necessary "-g" compiler switch for adding debugging symbols, 
I typically add the "-fno-inline -O0" switches. This prevents any code 
optimisation. When I execute step-by-step in a debugger (a live program, 
obviously, not a core-file) the instruction are really executed in the 
order written in the program and not rearranged for speed.


You may also debug a live program. "Normal" programs, when debugging, 
are typically launched directly in the debugger. This is not really 
advisable in apache, because it forks. What I do is to let apache start 
normally ("apache2ctl start" or "systemctl start apache2") and then 
attach the debugger to a live apache child process. I launch gdb, then I 
execute the following commands at the gdb prompt:


attach N (where N is the process id of the apache child)
break my_handler (set a breakpoint at one of my functions)
cont (let the process 

Re: IPv4 vs IPv6 addressing

2021-09-15 Thread Sorin Manolache

On 15/09/2021 02.22, miim wrote:


Sorin, thank you.  I now have a small chunk of code that appears to do the job. 
 I do not have access to an IPv6 system to test with but it does identify the 
connection type correctly on my IPv4 system.

I am not sure what APR_UNIX is, but it is referenced in the Apache source.


APR_UNIX denotes the family of Unix sockets. They appear in the file 
system. They are not network sockets, i.e. a remote machine cannot 
connect to a Unix socket. They are used like network sockets for 
communication between processes on the same machine. man 7 unix.


I don't think that apache listens on Unix sockets, but I suppose it 
could. Examples of applications that listen on Unix sockets are the 
docker daemon and the X server.






/*  */
/* Testing code prefatory to including IPv6 support */
/*   BEGINS */
/*  */

   switch(r->useragent_addr->family) {

 case APR_INET6:
   ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
 " Family %d - IPv6", r->useragent_addr->family);
   break;

 case APR_INET:
   ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
 " Family %d - IPv4", r->useragent_addr->family);
   break;

 case APR_UNIX:
   ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
 " Family %d - Unix", r->useragent_addr->family);
   break;

 default:
   ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
 "  Family %d - Unknown", r->useragent_addr->family);
   break;
   }

/*  */
/* Testing code prefatory to including IPv6 support */
/*ENDS  */
/*  */





Re: IPv4 vs IPv6 addressing

2021-09-14 Thread Sorin Manolache

On 14/09/2021 04.58, miim wrote:


I've reviewed the last three years of the list and I can't find a commentary on 
this issue, nor was I able to find one on goofle.

Consider an incoming request which might have either an IPv4 or an IPv6 address.  The module 
wants to know which one.  It is possible to sscanf the value in r->useragent_ip to see 
which format it matches.  However, this is a relatively expensive operation for a small 
amount of info unless "most" are one or the other; then the test sequence can be 
optimized ... which according to Finagle's Law, will always be the wrong way around on 
somebody else's system.

Is there a more efficient way to do this?



Hello,

I've not tested this, but try inspecting

r->connection->client_addr->family
(or r->useragent_addr->family, it probably points to the same structure)

If family is not initialized (though I think it is), then check 
...->ipaddr_len or ...->salen. Have a look at 
/usr/include/apr-1.0/apr_network_io.h for all the fields of a 
apr_sockaddr_t structure.


Regards,
Sorin



Re: What is the best way of reading of post request body at hooks module from 2 hooks procedures (+) ?

2020-05-18 Thread Sorin Manolache

On 17/05/2020 19.37, CpServiceSPb . wrote:

May you provide an example what is structure to store to ?
I tried to make some structure.
But it would become empty when running came to ap_hook_handler.
I suppose that I non correctly tried to get data at handler step saved
value at authn step from such structure.


Hello,

For example:

your_check_authn_callback(request_rec *r) {
   /* read POST body as you explained */
   MyStruct *obj = apr_pcalloc(r->pool, sizeof(MyStruct));
   /* store what you extracted from the POST body in this obj */
   obj->field1 = ...;
   obj->field2 = ...;
   ...
   /* store your obj in the request configuration */
   ap_set_module_config(r->request_config, &your_module, obj);
}

your_handler_callback(request_rec *r) {
   /* retrieve the object initialized and stored in the auth_callback */
   MyStruct *obj = (MyStruct *)ap_get_module_config(r->request_config, 
&your_module);

   /* use obj->field1, obj->field2, etc according to your needs */
}

You don't have to parse everything in authn_callback, you could just 
read the body, and store it unparsed:


your_check_authn_callback(request_rec *r) {
   /* read POST body as you explained */
   char *body = ...;
   ap_set_module_config(r->request_config, &your_module, body);
   /* extract user/pass/etc from body */
}

your_handler_callback(request_rec *r) {
   /* retrieve the body from where you stored it: */
   char *body = (char *)ap_get_module_config(r->request_config, 
&your_module);

   /* extract the other data from your body */
}

So the body (with ap_get_brigade) is read only once, namely in the first 
callback, the check_authn callback. The second callback just retrieves 
either the raw body from where you stored it or a structure that 
contains data that were extracted from the body in the first callback.


HTH,
Sorin



вс, 17 мая 2020 г. в 10:32, Sorin Manolache :


On 15/05/2020 23.39, CpServiceSPb . wrote:

I write hook module for Apache2 consisted of hooks of 2 procedures:
ap_hook_check_authn/ap_hook_check_user_id and ap_hook_handler.
This module reads content of post method request, mainly body, and

extract

necessary information for functioning of the each of mentioned above
procedure.
For 1st one - it is user/password passed from clients at post request

body,

for 2nd one - it is some data.
That is reading post request body content from brigades/buckets and

saving

it into char buffer is called twice - from ap_hook_check_authn and and
ap_hook_handler.
But after callig reading post request body content reading function for

the

first time from ap_hook_check_authn, other calling of the function (from
ap_hook_handler) returns an empty body buffer.
Even other calling of post request body content reading function from
ap_hook_check returns already an emptybody biffer.

I read post body to a buffer at the following way:
1. apr_brigade_create
2. do-while cycle until eos_bucket is got set to 1
3. within the point 2 cycle ap_get_brigade
4. in for (bucket = APR_BRIGADE_FIRST(bb); bucket !=
APR_BRIGADE_SENTINEL(bb); bucket = APR_BUCKET_NEXT(bucket))
if bucket EOS then eos_bucket set to 1
else (transien bucket) apr_bucket_read is called and then all chunks are
reads to buffer
5. body buffer and its len is got finally

But in such pot body reading all buckest are deleted from brigade (not by
me) .


Hello,

I think you are doing everything fine.

You just cannot read the post body several times, because reading it is
consuming it.

So you just have to read it only once (in ap_hook_check_authn) and store
it yourself in a structure that belongs to your module. Then you can
reuse that structure in ap_hook_handler.

HTH,
Sorin



So what is the best way to read post body content to a buffer as many

times

as necessary as at ap_hook_check_authn step as at ap_hook_handler step
without loosing the content ?
Or may be it's more efficiency to read post body content once at
ap_hook_check_authn and then pass it to ap_hook_handler ?
But I don' t understand now how.

P. S.:
Way of calling of

r->kept_body = apr_brigade_create(r->pool, r->connection->bucket_alloc);
apr_bucket *bucketnew = apr_bucket_transient_create(bodycontent,

bodysize,

r->connection->bucket_alloc);
APR_BRIGADE_INSERT_TAIL(r->kept_body, bucketnew);

after 1st calling of post request body content reading function worked

for

some versions ago (may be even for 2.2 only) .
But it doesn' t work now.










Re: What is the best way of reading of post request body at hooks module from 2 hooks procedures (+) ?

2020-05-17 Thread Sorin Manolache

On 15/05/2020 23.39, CpServiceSPb . wrote:

I write hook module for Apache2 consisted of hooks of 2 procedures:
ap_hook_check_authn/ap_hook_check_user_id and ap_hook_handler.
This module reads content of post method request, mainly body, and extract
necessary information for functioning of the each of mentioned above
procedure.
For 1st one - it is user/password passed from clients at post request body,
for 2nd one - it is some data.
That is reading post request body content from brigades/buckets and saving
it into char buffer is called twice - from ap_hook_check_authn and and
ap_hook_handler.
But after callig reading post request body content reading function for the
first time from ap_hook_check_authn, other calling of the function (from
ap_hook_handler) returns an empty body buffer.
Even other calling of post request body content reading function from
ap_hook_check returns already an emptybody biffer.

I read post body to a buffer at the following way:
1. apr_brigade_create
2. do-while cycle until eos_bucket is got set to 1
3. within the point 2 cycle ap_get_brigade
4. in for (bucket = APR_BRIGADE_FIRST(bb); bucket !=
APR_BRIGADE_SENTINEL(bb); bucket = APR_BUCKET_NEXT(bucket))
if bucket EOS then eos_bucket set to 1
else (transien bucket) apr_bucket_read is called and then all chunks are
reads to buffer
5. body buffer and its len is got finally

But in such pot body reading all buckest are deleted from brigade (not by
me) .


Hello,

I think you are doing everything fine.

You just cannot read the post body several times, because reading it is 
consuming it.


So you just have to read it only once (in ap_hook_check_authn) and store 
it yourself in a structure that belongs to your module. Then you can 
reuse that structure in ap_hook_handler.


HTH,
Sorin



So what is the best way to read post body content to a buffer as many times
as necessary as at ap_hook_check_authn step as at ap_hook_handler step
without loosing the content ?
Or may be it's more efficiency to read post body content once at
ap_hook_check_authn and then pass it to ap_hook_handler ?
But I don' t understand now how.

P. S.:
Way of calling of

r->kept_body = apr_brigade_create(r->pool, r->connection->bucket_alloc);
apr_bucket *bucketnew = apr_bucket_transient_create(bodycontent, bodysize,
r->connection->bucket_alloc);
APR_BRIGADE_INSERT_TAIL(r->kept_body, bucketnew);

after 1st calling of post request body content reading function worked for
some versions ago (may be even for 2.2 only) .
But it doesn' t work now.





Re: Testing module without Apache

2019-11-02 Thread Sorin Manolache

On 02/11/2019 11.00, Ervin Hegedüs wrote:

Hi,

this is just a theoretical question: is there any way to test/use
an Apache module without Apache?n



Hello,

AFAIK no, but I didn't research it much.

There are however best practices that try to come as close as possible 
to a "reasonable" degree of confidence in your code.


You can separate your code in an application-dependent, 
apache-independent library on one hand and a generic, 
application-independent, apache-dependent glue code on the other hand 
and then you can test only the apache-independent library and "have 
faith" in your glue code.


The apache-independent library must not refer to ap_* functions. So no 
ap_hook_*, ap_register_*_filter, ap_get|pass_brigade, ap_log_*, 
ap_rwrite, ap_rputs, ap_rprintf and many many other functions in your 
application-specific and apache-independent lib.


Your application-specific lib may however use the data structures of 
apache (request_rec, server_rec, ap_filter_t) and of libapr (apr_tables, 
apr_pools etc). It's your responsibility however to initialize them in 
your test code, outside your lib. You cannot rely on ap_read_request for 
instance in order to initialize your request_rec structure.


HTH,
Sorin


Re: Is it safe to stop reading a bucket brigade in an input filter before the end?

2019-05-19 Thread Sorin Manolache

On 16/05/2019 00.26, Paul Callahan wrote:

I have an apache body filter that copies off data from the incoming
request.   But I terminate the reading of the data if the size of the
accumulated data is over some limit.

This is my function, it just breaks  out of the loop when the max is read.
  It seems to work ok.   Do I need to do any additional cleanup or anything
because I did not go to the end?


Hello,

I suppose your code is ok.

Keep in mind that apache must clear the request body in order to be able 
to parse and serve a new request that arrives on same, reused TCP 
connection.


So towards the end of the processing of the current request, apache 
calls a function, I think it's called ap_discard_request_body. This 
function will read data from the network socket and pass it to the 
filter chain. So your input filter will be called again (unless you 
called ap_remove_input_filter somewhere), after you've already sent the 
response to the client.


Best regards,
Sorin




Thank you

// returns 0 after desired length is read
int my_append_data(const char *data, apr_size_t len, void *request_ctx);
void my_body_read_done(void *request_ctx);

apr_status_t my_body_input_filter(ap_filter_t *f, apr_bucket_brigade *out_bb,
  ap_input_mode_t mode,
apr_read_type_e block, apr_off_t nbytes,
 void *request_ctx) {
 request_rec *r = f->r;
 conn_rec *c = r->connection;

 apr_bucket_brigade *tmp_bb;
 int ret;

 tmp_bb = apr_brigade_create(r->pool, c->bucket_alloc);
 if (APR_BRIGADE_EMPTY(tmp_bb)) {
 ret = ap_get_brigade(f->next, tmp_bb, mode, block, nbytes);

 if (mode == AP_MODE_EATCRLF || ret != APR_SUCCESS)
 return ret;
 }

 while (!APR_BRIGADE_EMPTY(tmp_bb)) {
 apr_bucket *in_bucket = APR_BRIGADE_FIRST(tmp_bb);
 apr_bucket *out_bucket;
 const char *data;
 apr_size_t len;

 if (APR_BUCKET_IS_EOS(in_bucket)) {
 APR_BUCKET_REMOVE(in_bucket);
 APR_BRIGADE_INSERT_TAIL(out_bb, in_bucket);
 my_body_read_done(request_ctx);
 break;
 }

 ret = apr_bucket_read(in_bucket, &data, &len, block);
 if (ret != APR_SUCCESS) {
 return ret;
 }


 // copy read data up to a limit of 1mb, then stop.
 if (!my_append_data(data, len, request_ctx)) {
 apr_bucket_delete(in_bucket);
 break;
 }

 out_bucket = apr_bucket_heap_create(data, len, 0, c->bucket_alloc);
 APR_BRIGADE_INSERT_TAIL(out_bb, out_bucket);
 apr_bucket_delete(in_bucket);
 }
 return APR_SUCCESS;
}





Re: request_rec.unparsed_uri missing scheme and host. parsed_uri missing most fields

2019-05-14 Thread Sorin Manolache

On 14/05/2019 20.35, Paul Callahan wrote:

Hello,
I'm having trouble getting the full uri of a request from request_rec.
  The comment string for request_rec.unparsed_uri makes it sound like it
should have the entire url, e.g. http:://hostname/path?etc.

But it only has the path and the query parameters.

The parsed_uri struct is populated with port, path and query paramters.
  Everything else (scheme, hostname, username, password, etc) is null.

I set a breakpoint in "apr_uri_parse()" and verified the incoming *uri
field only has the path and query parameters.

Is this expected?How can I get the full URI?


Hello,

Yes, it is expected.

When the client (meaning a program, not a human) makes a request it 
sends the following first line over the network connection:


GET /path?arg1=val1&arg2=val2 HTTP/1.1

(I assume here that it uses the version 1.1 of the HTTP protocol.)

In HTTP/1.1 a "Host" header must be present (it is not present in 
HTTP/1.0 but there is little HTTP/1.0 traffic nowadays)


So you might get

GET /path?arg1=val1&arg2=val2 HTTP/1.1
Host: www.example.com

A browser will decompose the address 
http://www.example.com/path?arg1=val1&arg2=val2 that you type in its 
address bar and generate the two text lines shown above.


But the server will not receive the string 
http://www.example.com/path?arg1=val1&arg2=val2


Moreover, http:// or https:// are not sent by the client. It's the 
server (apache) that determines (reconstructs) the scheme (i.e. http:// 
or https://) from the port and transport protocol (SSL/TLS or plain 
text) used by the request.


The HTTP RFC (https://tools.ietf.org/html/rfc7230) has more details. 
Especially section 5.3 might be of interest to you.


HTH,
Sorin


Re: Modul command directive arguments

2019-04-16 Thread Sorin Manolache

On 15/04/2019 22.39, Ervin Hegedüs wrote:

Hi,

I'm playing with a module, and found an interesting function.

Example:

const command_rec module_directives[] = {
...
 AP_INIT_TAKE23 (
 "DirectiveCmd",
 cmd,
 NULL,
 CMD_SCOPE_ANY,
 "Directive Command"
 ),

...
extern const command_rec module_directives[];

module AP_MODULE_DECLARE_DATA foo_module = {
 STANDARD20_MODULE_STUFF,
 create_directory_config,
 merge_directory_configs,
 NULL,
 NULL,
 module_directives,
 register_hooks
};

And now if there is a command directive in the config, eg:

DirectiveCmd Arg1 "Arg2 re(foo)"

then I'll got the unescaped form of 2nd argument: "Arg2 re(\\foo)" (and of
course, it looks like all argument will unsescaped).

(It's new for me, because so far I've always used the "raw" stream reader
functions (eg. fread()) - nevermind :).)

Could anybody helps me please, which function parses the config file, and
make this unescaped formula (inside of Apache API)?


Hello,

The function that extracts words from a line of text is ap_getword_conf 
declared in httpd.h.


It is called from ap_build_config declared in http_config.h.

The root of the call-chain is ap_read_config delared in http_config.h.

Best regards,
Sorin


Re: How to read data in a request handler and then return DECLINED without consuming the data in the bucket brigade?

2018-06-04 Thread Sorin Manolache

On 2018-06-04 08:27, Paul Callahan wrote:

In apache modules, my understanding is if a handler declines a request, the
request is passed on to the next suitable handler.   I'm finding though if
I read the bucket_brigade/request body, and then decline the request, the
subsequent handler doesn't get any data.  It is like the act of reading the
bucket brigade consumes it.

I would like to have a request handler read the data, do some task (in this
case just count bytes), and decline the request without consuming the data
for the next handler.


Hello,

As far as I know, there is no simple way to do that.

Other handlers do something similar to what you've done, namely they 
call ap_get_brigade(r->input_filters, ...).


So in order for them to still read something, you'll have to

* write an input filter
* in your handler you read the request body and store it somewhere
* afterwards in your handler you add your input filter to the chain of 
input filters (ap_add_input_filter)

* your handler declines.

Then other handlers will call ap_get_brigade which will call your input 
filter. Your input filter will not call any other filters but will copy 
the stored body to the brigade passed in the call to your filter. Your 
filter will give the illusion to other handlers that they are reading 
from the network.


HTH,
Sorin



Thank you.

 int my_declining_handler(request_rec *r) {
 apr_status_t status;
 int end = 0;
 apr_size_t bytes, count = 0;
 const char *buf;
 apr_bucket *b;
 apr_bucket_brigade *temp_brigade;

 // here: header check for content-length/transfer encoding

 temp_brigade = apr_brigade_create(r->pool,
r->connection->bucket_alloc);
 do {
 status = ap_get_brigade(r->input_filters, temp_brigade,
AP_MODE_READBYTES, APR_BLOCK_READ, BUFLEN);
 if (status == APR_SUCCESS) {/* Loop over the contents of
temp_brigade */
 for (b = APR_BRIGADE_FIRST(temp_brigade);
  b != APR_BRIGADE_SENTINEL(temp_brigade);
  b = APR_BUCKET_NEXT(b)) {
 if (APR_BUCKET_IS_EOS(b)) {
 end = 1;
 break;
 }
 else if (APR_BUCKET_IS_METADATA(b)) {
 continue;
 }
 bytes = BUFLEN;
 status = apr_bucket_read(b, &buf, &bytes,
  APR_BLOCK_READ);
 count += bytes;

 apr_bucket_delete(b);
 }
 }
 apr_brigade_cleanup(temp_brigade);
 } while (!end && (status == APR_SUCCESS));
 if (status == APR_SUCCESS) {
 return DECLINED;
 } else {
 return HTTP_INTERNAL_SERVER_ERROR;
 }
 }





Re: How to create ssl backend connections in a module?

2017-06-29 Thread Sorin Manolache

On 2017-06-29 19:36, Christoph Rabel wrote:

Hi,

I have written an apache module that sometimes connects to a backend
server. Currently it does that through http, open a socket, send a get
request, get a response, process it. Nothing special.

Now we need to support https too and I am wondering, how that could be
accomplished.
Should I use openssl directly? Does that work? Are there any helper
functions I could use?

I tried to find examples, but it is quite difficult since most of the
examples cover configuration of ssl, not implementation of a ssl socket.

I was also looking at mod_proxy but I don't understand how that stuff with
the worker works. It's a lot of code and in the end I just need to open an
ssl socket and I guess I can do the rest the same way as before.

Any hints are appreciated.
I should support Apache 2.2, but I might be able to weaken that to support
only Apache 2.4, if that makes a huge difference.


How do you do it now, in plain http? I see two or three ways in which 
you do it: using apache subrequests (ap_sub_req_method_uri), using 
mod_proxy (no code, just conf, like ProxyPass), using a 3rd-party 
library, such as libcurl or libneon for example.


Or do you do it "manually", i.e. using the syscalls 
socket/connect/write, you write to the socket and implement the http 
protocol?


The good news about the first three options is that they work with ssl 
without code modification. You just configure the URL of the backend and 
it recognizes https and performs the SSL handshake and communication.


In my opinion (but it depends on your use case), the best option is 
mod_proxy. Check this generic way of configuring it:




RewriteEngine On

RewriteCond  some_condition
RewriteRule  .*  https://remote.host/path/to/remote/resource?args [P]


https://remote.host/path/to/remote/resource>
ProxyPass https://remote.host/path/to/remote/resource keepalive=On timeout=5


Your module processes requests to /your_url. If it has to make the 
request to the backend, then it sets some apache note or environment 
variable. The value of this variable is then checked in the RewriteCond. 
If the condition is satisfied then the request to /your_url is proxied 
to the remote.host backend. The response of the backend is then sent to 
your client.


If you want to modify the response of the backend, or to send a 
completely different response to the client (and then you just use some 
data from the backend's response) then you write a filter and you 
activate it with the SetOutputFilter conf directive.


This setup works with http and https. You just put the right scheme in 
the URLs in the conf.


Hope this helps,
Sorin



Tia,

Christoph





Re: Change the content-length header for other filters

2016-12-21 Thread Sorin Manolache

On 2016-12-21 22:10, André Rothe wrote:

Hi,

I have a filter, which changes the content length of a POST request.
There are some key-value-pairs of the request, which the filter removes
before other filters process the request.

But after my filter completes the request processing, I'll get:

Sending error response: The request contained fewer content data than
specified by the content-length header

I have tried to change the header key "Content-Length" and set the
new value like:

apr_table_set(f->r->headers_in, "Content-Length",
apr_psprintf(f->r->pool, "%ld", len));

but it has no effect outside of my filter. The incoming request has a
content length of 1107 bytes. I modify the bucket brigade and it
contains at the end of my filter code only 1074 bytes (which is also
stored into "len").

What can I do to send the new content length along the filter chain?

Thank you
André


Hello,

Could you please give us more details about how the body of the post 
request is read? Is it read in a third-party handler? Is is read by a 
standard apache module such as mod_proxy? If it's a third-party handler, 
do you happen to have the code?


Why I'm asking: because it may happen that the reader (i.e. the code 
that triggers the chain of input filters) reads first the Content-Length 
header and then attempts to read N bytes, where N is the value of the 
Content-Length filter. In this case, it is no use to set Content-Length 
in your filter because anyway the reader has read the value of the 
Content-Length header before your filter had the opportunity to change it.


A well-behaved reader should read until it finds an EOS bucket in the 
retrieved brigade. It should not rely on Content-Length. A trivial 
example why it should not use Content-Length is request body 
compression. A reader would get the brigade filtered by the INFLATE 
filter of mod_deflate, which contains many more bytes than indicated by 
Content-Length as this header contains the size of the compressed body.


Best regards,
Sorin



Re: modify request_rec->args

2016-03-25 Thread Sorin Manolache

On 2016-03-25 00:59, Justin Kennedy wrote:

Hello,

I have a simple module, with just a quick_hander, it's sole function is to
check if there is a specific key=value on the query string, and modify the
value, so it gets picked up by a separate module.

For example: if "foo=1" is in r->args, then replace it with "foo=0",
decline the request so it gets picked up by the other module.

In my first attempt, I created a new string and assigned the pointer to
r->args, but it doesn't seem to "stick" when it gets to the second module.
Do I have to modify r->args directly, without changing the pointer? It's
been awhile since I've worked with C strings.



You don't need a module to do that. You can use some mod_rewrite 
directives that you place inside your  or :


RewriteEngine On

RewriteCond %{QUERY_STRING} ^(|.*&)foo=([^&]*)(&.*|$)
RewriteRule (.*) $1?%1foo=new_value%3

--
Sorin



Re: apr_shm_create succeeds then fails on Mac OS X

2015-12-27 Thread Sorin Manolache

On 2015-12-25 19:36, Tapple Gao wrote:

Hi. I’m trying to get mod_tile working on the builtin apache in Mac OS X El 
Capitan. I am running into a problem with apr_shm_create failing to allocate 
memory during ap_hook_post_config:
[Fri Dec 25 12:09:17.898197 2015] [tile:error] [pid 22431] Successfully create 
shared memory segment size 888 on file /tmp/httpd_shm.22431
[Fri Dec 25 12:09:17.898285 2015] [tile:error] [pid 22431] (12)Cannot allocate 
memory: Failed to create shared memory segment size 2401448 on file 
/tmp/httpd_shm_delay.22431

Is there something I need to configure to get this shared memory working, or 
increase the limit? This module is most often run on Ubuntu linux, where it’s 
been running for years to power openstreetmap.org


 /*
  * Create a unique filename using our pid. This information is
  * stashed in the global variable so the children inherit it.
  * TODO get the location from the environment $TMPDIR or somesuch.
  */
 shmfilename = apr_psprintf(pconf, "/tmp/httpd_shm.%ld", (long 
int)getpid());
 shmfilename_delaypool = apr_psprintf(pconf, "/tmp/httpd_shm_delay.%ld", 
(long int)getpid());



I think that the location of the shmfile must be on a filesystem of a 
special type, namely tmpfs, which maps in memory and not on disk. 
Execute "mount" and check if you have such filesystems mounted. For 
example on my Linux machine:


$ mount
/dev/sda6 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=398160k)
/dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered)

As you see, / and /tmp are of disk partitions while /run/shm has a 
filesystem of type tmpfs.


I suggest to change the code and use a different location from /tmp/... 
On Linux the shared memory is often created in /run/shm/*. I have no 
experience with Mac OS X.


The cleanest way would be to implement what's written in the commentary 
of the code above, namely the possibility to specify the path by an 
evrionment variable or from a configuration directive.


Sorin



Re: Apache add content-length

2015-12-09 Thread Sorin Manolache

On 2015-12-09 15:02, Pechoultres Xavier wrote:

Hi everybody,

I wrote an apache (2.2.x) module for special c++ app. It works fine for 
many years but a problem resist to me.

Apache automatically add a content-length header even I send a 
Transfer-Encoding: chunked

I set Transfer-Encoding: chunked in cgi module, I’m sure I do not set a 
content-length here.
I unset Content-length in my module :  apr_table_unset(r->headers_out, 
"Content-Length »);

I’ve tried lot of stuff without success, and Apache always add a 
content-length. This create bug with recent WebCore application.

If somebody can give me the light !


I've checked the sources of the apache server. Apache adds some filters 
to the request processing chain, two of them being 
ap_content_length_filter and ap_http_header_filter. 
ap_content_length_filter executes before ap_http_header_filter.


ap_content_length_filter sets the Content-Length header if it can 
determine the size of the response body. ap_http_header_filter sets 
r->chunked only if the Content-Length header is not set. The response is 
chunked and the "Transfer-Encoding: chunked" header is set only if 
r->chunked = 1.


In short: if apache can determine the size of the response message then 
you cannot force it to send "Transfer-Encoding: chunked" and it will 
always set the Content-Length header.


--
Sorin




Re: Tracking sent responses

2015-11-06 Thread Sorin Manolache

On 2015-11-06 15:12, Julien FROMENT wrote:

Hello,



We would like to use Apache HTTP Server to keep track of exactly what
part of the response was sent over a socket. Apache could call an API
asynchronously with the number of bytes sent over the socket for a given
request.



Here is the pseudo code:

   -- Client send a request

   -- Apache processes the request and send it to the backend server

   ...

   -- The backend server returns the response

   -- Apache sends the response to the client

   -- Apache calls Async API with the number of bytes sent



Relaying the client request to a backend server may be realised with the 
ProxyPass and RewriteRule directives.


Apache may log the number of bytes sent. See 
http://httpd.apache.org/docs/current/mod/mod_log_config.html#formats, 
the %B and %O flags.


The log can be configured to be

1. appended to a file
2. sent to syslog (which in turn may forward it over udp/tcp to a log-host)
3. piped to an external program

The first two options do not require any development from your part. If 
you really need that the number of bytes is sent to an http server, then 
you could write an external program that reads one line at a time from 
standard input and sends the line that it read to an http server.


Regards,
Sorin



Re: server directives are lost when activating module in vhost

2015-10-21 Thread Sorin Manolache

On 2015-10-21 15:45, Justin Kennedy wrote:

Greetings,

I have these two directives specified in the root httpd.conf:
ServerTokens Prod
ServerSignature Off

Those directives are being honored and all is well, until I activate my
module within a virtual host. Once that happens, these directives are
ignored.

Is it possible for my module to be interfering with the other directives
outside of my module configuration? If so, I'm thinking this could this be
happening in my merge configuration hook, even though I only deal with
directives related to my module.

Any ideas?


As far as I can imagine I cannot see how a module can interfere with the 
directives of another module (unless it handles the same directives).


What happens if you disable your merge hook? Can you post your merge 
callback here?



For debugging, it would be helpful if I could output the value of this
directive in the various methods of my module. How can I access the value
of this directive from within my module?


These directives belong to the "core" module. So, in theory, you should

core_conf_object = ap_get_module_config(conf_vector, &core_module);

and then read the values from the core_conf_object.

In practice however you can't do that because neither the address of 
core_module nor the definition of the core_conf structure are available 
to third-party modules.


--
Sorin


Re: graceful child process exit

2015-09-21 Thread Sorin Manolache

On 2015-09-21 00:45, Massimo Manghi wrote:

Hi,

I'm working on an issue with mod_rivet, a content generator module that
embeds in Apache the interpreter of the Tcl scripting language. I won't
bother you with the details of the problem (but if anyone is interested
I'm ready to answer any questions) and let me put the question in short
form:

we need to give the module the ability to gently exit a child process,
something like the function clean_child_exit I found in both prefork.c
and worker.c MPMs that would do exactly what I need (delete the pChild
pool and trigger the associated cleanup functions) but I could not find
a public interface to it. Is there a public interface to achieve this
functionality? The function ap_mpm_safe_kill at first looked a good
candidate but I could not find documented if it's *the right way*


Have a look at apr_pool_cleanup_register.

I don't have pleasant memories with process pools. The problem is that 
sometimes apache children take a long time to exit. When this happens 
then the parent sends them signals in order to stop them, the signals 
becoming progressively stronger (first sigterm, then sigkill if I 
remember correctly). I do not remember the details, but I've been 
getting segfaults at process exit, so I'm steering clear of process 
pools since.


Have a look if the conf pool is what you'd need. It is cleaned up every 
time the apache configuration is reloaded (the parent apache process 
stays alive, the apache worker children are stopped and a new generation 
of apache children is created). You can use the second callback of the 
apr_pool_cleanup_register to clean up things in the children.


The difference between the conf pool and the process pool is the 
following: the process pool is passed to the post_config and child_init 
callbacks. The conf pool is not passed to child_init. So, in order to 
initialise things in the conf_pool you'll need to set a callback in the 
post_config hook. The post_config hook is executed as root in the parent 
process, before it forks a new generation of children, every time the 
apache configuration is reread (after each apache2ctl graceful). The 
child_init hook is executed in the apache child every time the child 
process is created and it executes without root privileges.



Sorin



Re: You don't have permission to access /asd on this server

2015-07-07 Thread Sorin Manolache

On 2015-07-07 13:58, Prakash Premkumar wrote:


I added my Set Handler as follows


 
 SetHandler example_module
 


when I try to access localhost/ , I get "It Works!" screen.

But when I try to access some path in localhost like localhost/asd I get
the following error

Forbidden

You don't have permission to access /asd on this server. Server unable to
read htaccess file, denying access to be safe

Can you please help me solve this ?


I think the problem is not the code but the conf.

 does not match /asd. Check 
http://httpd.apache.org/docs/2.4/mod/core.html#location


Use  instead.

Sorin







Re: Best practice for handling synchronous signals SIGFPE etc

2015-04-20 Thread Sorin Manolache

On 2015-04-20 21:50, Mark Taylor wrote:

I found that ./server/mpm_unix.c is registering a handler (sig_coredump)
for SIGFPE, SIGSEGV, and other synchronous signals.  I'd like to handle at
least these two in my module, using my own handler.  But then how do I
determine if the the handler is called on a request thread or a server
thread? And I'd like to default to still run the sig_coredump() function if
it's signal is not in my module.



Have a look at the man-page of sigaction and getcontext. When you set a 
signal handler you get the old signal handler (3rd argument of 
sigaction). So you can store it in a variable. In your own signal 
handler you do want you intend to do and at the end you call the old 
signal handler. In this way you can call sig_coredump. However you have 
to make sure that you set your signal handler _after_ apache has set 
his. Otherwise apache will replace yours.


Have a look at the siginfo_t structure that is passed by the OS to your 
handler. You can get the program ID and the user ID from that structure. 
But not the thread apparently. Anyway, at least you can determine if the 
signal was raised in the parent or one of the worker children.


Look also at the ucontext structure (man getcontext) that is passed to 
your signal handler. Maybe you can determine the source of the signal 
from that structure, though I think it's too close to machine code and 
registers to be useful.


Alternately, you could use a thread-local variable 
(https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Thread-Local.html). The 
first thing you do when you enter each function of your module is to set 
the variable. Whenever you exit a function you reset it. Thus, you may 
determine in your signal handler by inspection of the variable if the 
signal was raised by your module. (This works only if the signal handler 
is executed in the thread where the signal was raised which is not 
always the case. Otherwise you'll set some variable in your thread and 
read another one in the handler. Here's some information: 
http://stackoverflow.com/questions/11679568/signal-handling-with-multiple-threads-in-linux. 
Apparently the handlers for SIGSEGV and SIGFPE are called in the thread 
that raised them but it's not clear.)


Sorin




Re: is ap_hook_log_transaction the wright place where to write my stats code ?

2014-12-07 Thread Sorin Manolache

On 2014-12-07 01:36, nik600 wrote:

Dear all

i've written a custom module to handle the cache of my CMS system.

Basically this module works in ap_hook_translate_name and decides
(following some custom logic) if the request can be served or not from the
cache.

If yes, the r->uri is changed to be served locally from my cache dir
if no, the r->filename is changed to be server from
a proxy:balancer://cluster config

In the ap_hook_log_transaction i'd like to compute the time of content
generation using:


float request_duration_sec=(float)(apr_time_now() -

r->request_time)/100;

Is this approach correct?

Is there any other hook more appropriate to do that?

Thanks all in advance.


Hello,

I think your approach is correct. However, have a look at the LogFormat 
directive. The %D directive might do exactly what you want and 
implemented in log_transaction.


Have a look at the RewriteCond and RewriteRule directives. The selective 
proxying could be implemented by these directives without writing any code.


E.g.:


RewriteEngine On
RewriteCond ...
RewriteCond ...
RewriteRule .*  http://proxy/path?args keepalive=on [P]


http://proxy/path>
...


I have not tried it with balancers but I think it works and the 
configuration is similar (balancer:// instead of http://).


Sorin



Bye





Re: Mod_proxy is truncating Response Header size

2014-11-26 Thread Sorin Manolache

On 2014-11-26 13:46, KPONVI Komlan (EXT) wrote:

Hi everyone ,

I am having an issue with mod_proxy.

I have to forward request to a server which add a header into the  response.

The size of that header is higher than 8k, and  i notice that mod_proxy
truncate that header before forward back the response to the initial caller.


Try changing the value of this configuration directive:

http://httpd.apache.org/docs/2.4/mod/core.html#limitrequestfieldsize




Did somebody face that kind of problem?

Cordialement,

komlan KPONVI

e-mail : komlan.kponvi-...@ca-technologies.fr




Ce message et toutes les pieces jointes (ci-apres le "message") sont
etablis a l'intention exclusive de ses destinataires.
Si vous recevez ce message par erreur, merci de le detruire et d'en
avertir immediatement l'expediteur par e-mail.
Toute utilisation de ce message non conforme a sa destination, toute
diffusion ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse.
Les communications sur Internet n'etant pas securisees, l’'expediteur
informe qu'il ne peut accepter aucune responsabilite quant au contenu de
ce message.

This mail message and attachments (the "message") are solely intended
for the addressees.
It is confidential in nature .
If you receive this message in error, please delete it and immediately
notify the sender by e-mail.
Any use other than its intended purpose, dissemination or disclosure,
either whole or partial, is prohibited except if formal approval is granted.
As communication on the Internet is not secure, the sender does not
accept responsibility for the content of this message.





Re: output filter needs to redirect to 503 error status

2014-10-16 Thread Sorin Manolache

On 2014-10-16 22:35, Eric Johanson wrote:

Thank you for the suggestion.  I did try that, but the order in which you
set f->r->status and call ap_pass_brigade doesn't seem to really make a
difference.

Basically what happens is that the browsers don't like the format of the
HTTP response packet.  They complain that there is "extra unexpected data
after the end of the response."  Oddly, when I use the Linux "wget" command,
it doesn't complain.  I may just write a handler hook specifically for
returning error codes.  Then when the filter modules has an error, it can
call ap_internal_redirect to a special page whose exclusive purpose is to be
captured by my error handler to return an HTTP status code.

Any other suggestions?


This "expected data" message may be caused by a response body that has a 
different (bigger) length than what is announced in the Content-Length 
output header.


If possible, try to not pass any data down the filter chain to the 
network when you want to set a 503 error. Set f->r->status = 503 and 
then pass a brigade that contains only an EOS bucket.


I don't know if it is possible to send an empty body with a correct 
Content-Length (i.e. equal to zero) once your filter has already passed 
some data down the filter chain.


I found some old code of mine that sends a 5xx code with an empty body 
when it detects an error. However my filter buffers all the data sent by 
a backend before it passes the filtered response in a single brigade 
down the filter chain. I.e. my filter returns APR_SUCCESS without 
invoking ap_pass_brigade whenever the brigade passed to my filter does 
not contain an EOS bucket. When the brigade contains an EOS bucket my 
filter passes the entire filtered response down the filter chain. So in 
my case, no data has reached any downstream filters before I detect an 
error.


   // nominal part
   ...
   // error-handling
   request_rec *r = f->r;
   r->status = HTTP_INTERNAL_SERVER_ERROR;
   ap_remove_output_filter(f);
   f->ctx = 0;
   r->eos_sent = 0;
   apr_table_t *tmp = r->headers_out;
   ap_add_output_filter("error_filter", 0, r, r->connection);
   r->headers_out = r->err_headers_out;
   r->err_headers_out = tmp;
   apr_table_clear(r->err_headers_out);
   return ap_pass_brigade(r->output_filters, bb);

Apparently I remove the "useful" filter when I detect an error 
(ap_remove_output_filter(f)) and I replace it with another filter 
(ap_add_output_filter), that produces the error response. I suppose you 
don't have to do this, I suppose that the effect can be achieved in the 
same filter.


I do not really remember why I swap the error and output headers. I 
suppose I clear the error headers in order to get rid of a previously 
computed Content-Length header.


Also I do not remember why I reset the eos_sent flag, but I think this 
is important.


And I'm quite surprised that I pass to the head of the output filter 
chain (ap_pass_brigade(r->output_filters, bb) and not f->next).


I'm sorry that my explanations are incomplete, it's really an old code 
and I do not remember the details. But it's still in production and does 
what you want: it sends an empty-bodied response with a 5xx http status 
code.


Sorin




Thanks, -Eric


-Original Message-
From: Sorin Manolache [mailto:sor...@gmail.com]
Sent: Thursday, October 16, 2014 12:59 PM
To: modules-dev@httpd.apache.org
Subject: Re: output filter needs to redirect to 503 error status

On 2014-10-16 15:36, Eric Johanson wrote:

Hi,
I have an output filter module which is working just fine, but I need
to add a feature so that when certain error conditions occur during
processing, the output filter hook function redirects the whole
request to a 503 error status (service unavailable).  Obviously for a
"handler" module this is trivial to accomplish, but it is not clear
how to do this in an output filter module.

My output filter hooked function is defined as follows:
 apr_status_t mts_out_filter(ap_filter_t *f,apr_bucket_brigade
*bb)

I need this function to "do something" that causes the whole request
to be redirected such that the client sees a 503 error status with no
body/content.

Things that I've tried so far:
* Returning HTTP_SERVICE_UNAVAILABLE from the output filter function
after calling "ap_pass_brigade(f->next,bb)"
* Setting f->r->status to HTTP_SERVICE_UNAVAILABLE after calling
"ap_pass_brigade(f->next,bb)"


Try setting f->r->status _before_ calling ap_pass_brigade.

If you get the 503 but you get the default 503 error response body that is
automatically set by apache then replace it by using the ErrorDocument
directive http://httpd.apache.org/docs/2.2/mod/core.html#errordocument (or
http://httpd.apache.org/docs/2.4/mod/

Re: output filter needs to redirect to 503 error status

2014-10-16 Thread Sorin Manolache

On 2014-10-16 15:36, Eric Johanson wrote:

Hi,
I have an output filter module which is working just fine, but I need to add
a feature so that when certain error conditions occur during processing, the
output filter hook function redirects the whole request to a 503 error
status (service unavailable).  Obviously for a "handler" module this is
trivial to accomplish, but it is not clear how to do this in an output
filter module.

My output filter hooked function is defined as follows:
apr_status_t mts_out_filter(ap_filter_t *f,apr_bucket_brigade *bb)

I need this function to "do something" that causes the whole request to be
redirected such that the client sees a 503 error status with no
body/content.

Things that I've tried so far:
* Returning HTTP_SERVICE_UNAVAILABLE from the output filter function after
calling "ap_pass_brigade(f->next,bb)"
* Setting f->r->status to HTTP_SERVICE_UNAVAILABLE after calling
"ap_pass_brigade(f->next,bb)"


Try setting f->r->status _before_ calling ap_pass_brigade.

If you get the 503 but you get the default 503 error response body that 
is automatically set by apache then replace it by using the 
ErrorDocument directive 
http://httpd.apache.org/docs/2.2/mod/core.html#errordocument (or 
http://httpd.apache.org/docs/2.4/mod/core.html#errordocument).


Sorin


* calling "ap_send_error_response(f->r,HTTP_SERVICE_UNAVAILABLE)"

None of these really seem to behave properly.  I just want the client to
receive a 503 error status with no content body.  There must be a way to
achieve this behavior from within an output filter hook?

Any advice is appreciated.
Thanks, -Eric






Re: Sharing information across Apache child processes

2014-09-29 Thread Sorin Manolache

On 2014-09-29 13:39, Rajalakshmi Iyer wrote:

Hello,

I have a requirement whereby my application's configuration information
(comprising a few complex data structures) needs to be shared across the
various Apache child processes.

Currently, the configuration is being individually loaded by each child
process, which makes it hard for configuration changes to propagate.

What is the best way / place to have a common configuration for the
application?

Please advise.


I suppose you want to update the configuration without running "apache2 
-k graceful" (or "apache2ctl graceful").


In this case you could use a segment of memory that is shared across the 
apache children. You'll have to create the shared segment memory before 
the parent forks its children (for example in post_config). The shared 
memory is then inherited by the forked children.


You'll need a method to update the contents of the shared memory segment 
and a multiple-readers-single-writer inter-process exclusion mechanism 
in order to safely read and write from the shared segment.


Sorin



Re: apr_hash_t and global scope

2013-12-11 Thread Sorin Manolache

On 2013-12-11 19:17, Ingo Walz wrote:

Hello modules-dev!

I've encountered a problem with an apr_hash_t in the global scope of my
module.
Let me explain the use-case a bit.

Every time, when an asset (image e.g.) is delivered via apache, it
stores meta information of that file with a simple key/value pair in an
apr_hash_t. That variable is part of the global scope of my module and
allocated in register_hooks section (with the pool available here). If
HTML is delivered that references an asset with meta information
available, it rewrites the HTML based on the data from apr_hash_t.

My problem is: using -X (only one worker) everything is fine and the
module works as expected. But starting httpd with more than one worker,
I sometime have no data on my apr_hash_t, which is expected to be there.
I've tried various things, e.g. using server->process->pool and
child_init to allocate the data, but the issue remains the same. I'm
also using no global/thread_mutex, because I'm never reading and writing
in the same thread (but in theory in the same process) - but I've no
idea yet how hash_set is doing it internally, so this might be still a
todo (is it? Do I really need a global mutex for data across
worker/threads? Can I avoid it?). Using memcache is an option in theory,
but I'd love to avoid it too. Otherwise my module scales much different,
based on the delivered HTML. But anyways, it's most likely an issue with
the "wrong pool" or with a misunderstanding of the scope and the cleanup
of those pools.


I'm making the assumption that you're using Unix and not Windows.

I don't think it is related to pools or their cleanup. It is rather 
because of how Unix processes work. The request that updates your hash 
is served by a thread in one apache child and the request that reads 
from your hash may be served by a thread in another apache child. The 
problem is that each apache child has its own copy of the hash.



What's the best way to use a "everywhere shared apr_hash_t"? Do I need
apr_shm to work properly with the workers (or apr_rmm in my use case)
together with a global_mutex or should I try to dig into socache and
avoid doing this handling manually? Many outdated resources around the
web (and even different internal implementations for the same use-case)
made me feel a bit ... doubtful to come to a final decision. Maybe you
can help me here?! :-)


My advice would be to avoid doing it manually.

Regards,
Sorin



Regards,

Ingo




Re: response handling inside ap_hook_create_request cb function

2013-09-27 Thread Sorin Manolache

On 2013-09-27 11:11, Pon Umapathy Kailash S wrote:

Thanks for your response, Sorin.

My concern in this approach is that - it would require one worker
thread to be held up for as long as this connection is open(and this
might be long + the number of clients can be higher than the worker
threads for my use-case/platform).

Given that the 1st handshake message is over http, i can setup the
connection in a handler hookup function and return a response, freeing
up the worker thread while keeping the connection persistent(plus save
the socket/ssl in a cache shared the worker threads).


Anyway in "normal" apache, i.e. over http, the worker threads are not 
freed up after processing a request. An idle worker thread is assigned 
to a connection when one is opened by a client for the whole duration of 
the connection. You can check that in 
modules/http/http_core.c:ap_process_http_sync_connection. You'll see 
that ap_process_request is called in a loop. The loop is left when the 
connection is closed. If the connection is idle, the worker thread is in 
"KeepAlive" state. You can check this when looking at /server-status.


So it would not make any difference if you assigned a worker to the 
connection in process_connection.


You'll have to take care though not to overuse the connection memory 
pool because the pool is not destroyed while the connection is open, and 
this could be a long time.




Now, when the next set of messages come in(which is not over http), I
would need to intercept these and add my handling(ideally write
something on the socket on which the message came and be done with the
request while keeping the connection persistent unless the message was
a control frame to close).

Regards,
Umapathy


On Fri, Sep 27, 2013 at 12:29 PM, Sorin Manolache  wrote:

On 2013-09-27 03:06, Pon Umapathy Kailash S wrote:


Hi,
Here is a quick background on what I am trying to do(basically adding
support for websockets - in a slightly customised manner as needed for
my app):

- Handle the initial handshake inside a cb function registered as a
handler hook(from here, I compute checksums required and return the
response headers as needed).
   Also, the socket from which the request was read is stored in a cache.

- For subsequent message reception(on the same connection), i have a
function cb registered using ap_hook_create_request(since this is a
different protocol format message). Here, I read and parse the
messages/requests which are coming in from the cached list of
sockets(this is working).

However, once I return from this cb, the connection/socket seems to be
closed. I guess the request is further passed down to hooks down the
line and the connection is closed since the req format is not known.

What would be the best way to handle this scenario?

I have the following in mind:
- let the request not be processed any further(and keep the connection
on).
- create a req structure with dummy http headers that i can later
recognise and handle inside my handler hook to just ignore later on

are there any examples/notes on how these can be achieved?



In my opinion, it is too late to handle non-http in the create_request
callback.

The create_request callback is called from
ap_run_process_connection->ap_process_http_{sync|async}_connection->ap_read_request.

Your create_request callback returns to ap_read_request from where the
request is further processed as an http request.

In my opinion you should short-cut the http processing and hook
ap_hook_process_connection. However, there, in process_connection, you have
no request_rec, you have just a conn_rec. process_connection is called only
once per connection creation. So it should handle all the requests that
arrive while the connection is open.

Sorin







Regards,
Umapathy







Re: response handling inside ap_hook_create_request cb function

2013-09-27 Thread Sorin Manolache

On 2013-09-27 03:06, Pon Umapathy Kailash S wrote:

Hi,
Here is a quick background on what I am trying to do(basically adding
support for websockets - in a slightly customised manner as needed for
my app):

- Handle the initial handshake inside a cb function registered as a
handler hook(from here, I compute checksums required and return the
response headers as needed).
  Also, the socket from which the request was read is stored in a cache.

- For subsequent message reception(on the same connection), i have a
function cb registered using ap_hook_create_request(since this is a
different protocol format message). Here, I read and parse the
messages/requests which are coming in from the cached list of
sockets(this is working).

However, once I return from this cb, the connection/socket seems to be
closed. I guess the request is further passed down to hooks down the
line and the connection is closed since the req format is not known.

What would be the best way to handle this scenario?

I have the following in mind:
   - let the request not be processed any further(and keep the connection on).
   - create a req structure with dummy http headers that i can later
recognise and handle inside my handler hook to just ignore later on

are there any examples/notes on how these can be achieved?


In my opinion, it is too late to handle non-http in the create_request 
callback.


The create_request callback is called from 
ap_run_process_connection->ap_process_http_{sync|async}_connection->ap_read_request.


Your create_request callback returns to ap_read_request from where the 
request is further processed as an http request.


In my opinion you should short-cut the http processing and hook 
ap_hook_process_connection. However, there, in process_connection, you 
have no request_rec, you have just a conn_rec. process_connection is 
called only once per connection creation. So it should handle all the 
requests that arrive while the connection is open.


Sorin







Regards,
Umapathy





Re: How to determine the "right" vhost in name based vhosting

2013-09-24 Thread Sorin Manolache

On 2013-09-24 13:04, Christoph Gröver wrote:


Hello Sorin,


I suppose you use the server field of the request_rec structure and
not some stored server_rec that was passed to you in post_config or
somewhere else.


Definitely. I have adopted this from some other module and didn't know
there was another way to obtain a server_rec structure.
So I should be looking for a better way to find the right structure.

Thank you very much. This sounds as if it will be the right way.


I fear there's a misunderstanding here: The right way to get the 
server_rec is, in my opinion, from the request_rec structure, i.e. I 
think you should use req->server->server_hostname.


So, given that you already do this, it is puzzling for me why you don't 
get the result that you want.


Apache sets the req->server pointer to the right server_rec structure 
after it has parsed the request headers. (It cannot guess correctly 
before it parses the Host header.)


So make sure you check req->server _after_ apache has initialised it to 
the right server_rec. Apache sets it in the ap_read_request method. 
Almost all of the callbacks provided to the module developers are called 
_after_ ap_read_request, so you should be ok. I think only the 
create_connection callback is run before ap_read_request.


As a "poor man's debugger" technique you could write a post_config 
callback. The last argument of the post_config callback is the head of 
the list of server_recs. You could traverse the list and log to a file 
the server_hostname of all server_recs in the list. Just to check that 
you have the right number of server_recs and that they are correctly 
initialised.


Sorin





Apache keeps a linked list of server_rec structures. The head of the
list is the server_rec of the whole apache server. The rest of the
list contains one server_rec structure per vhost. For each request
apache picks the right server_rec from the list according to the Host
header and sets r->server to point to the picked object.


This information will also help. Thank you.


Also make sure that your request really arrives in the vhost you
intended. Typically I check this by logging to different files (see
the CustomLog directive) in each vhost.


This is actually the case. I receive the requests in the right vhost.
I have separate logfiles for each vhost.

Thanks for your answers. I guess I will be able to solve the issue with
these informations.

With kind regards,





Re: How to determine the "right" vhost in name based vhosting

2013-09-24 Thread Sorin Manolache

On 2013-09-24 11:38, Christoph Gröver wrote:


Hello list, Hello Sorin,

I tested several different Apaches (2.4.x and 2.2.x) and they never did
the wanted or expected.

If I configure more than one VHost only the first one is returned by
the server->server_hostname structure.
The one of the second vhost that is configured as a "ServerName" seems
to be impossible to determine?

Is there any other way to find the hostname?


I suppose you use the server field of the request_rec structure and not 
some stored server_rec that was passed to you in post_config or 
somewhere else.


Apache keeps a linked list of server_rec structures. The head of the 
list is the server_rec of the whole apache server. The rest of the list 
contains one server_rec structure per vhost. For each request apache 
picks the right server_rec from the list according to the Host header 
and sets r->server to point to the picked object.


Also make sure that your request really arrives in the vhost you 
intended. Typically I check this by logging to different files (see the 
CustomLog directive) in each vhost.


Regards,
Sorin



Re: How to determine the "right" vhost in name based vhosting

2013-09-19 Thread Sorin Manolache

On 2013-09-19 16:39, Christoph Gröver wrote:


Hello,

We usually use name based virtualhosts with something like the following
configuration:

NameVirtualHost  IP:80


   ServerName main.domain.tld
   ServerAlias alias.domain.tld

   ..



   ServerName www.domain.tld
   ServerAlias alt.domain.tld
   ..



I've tested this setup in 2.4.6 and r->server->server_hostname contains 
what you want.


Regards,
Sorin



Now I'm looking for a function which reliable returns the
host main.domain.tld if the first vhost is used (even if
it is used as alias.domain.tld) and returns www.domain.tld
if the second one is used (even if under the name alt.domain.tld).


I know of two ways to do this:

1. ap_get_server_name

  This returns the right hostname if "UseCanonicalName" is set.
  But returns just the Host:-Header if it is off - which is the default.

2. server_rec structure

  The element server->server_hostname always returns the first vhost
  available for an ip address. So even if I use www.domain.tld it returns
  main.domain.tld

So the first option depends on "UseCanonicalName", the second does
something else - which is not what I want.

Any other ways of doing this?
Or is there just the solution to force "UseCanonicalName" to "on"
and otherwise it won't work.

Can anybody enlighten me as to how this should be done?

Thank you, Greetings





Re: Browser cookie and Apache server calls

2013-06-27 Thread Sorin Manolache

On 2013-06-27 06:28, Sindhi Sindhi wrote:

If I clear the browser cache before I click on the hyperlink, I dont see
this issue. But I do not want to delete the cookies, because the business
logic used by the filter makes use of the cookies that are set. Also I may
not want to delete the cache everytime before I click on the hyperlink :(

I added the below lines in httpd.conf file but still see that the page is
cached and hence no server call is made :(
LoadModule headers_module modules/mod_headers.so
Header set Cache-Control "must-revalidate, no-cache"


As a general advice, test your modules with a command line tool first. 
Thus you have a strict control of what you send in your request and you 
see what the server answers. Such a command line tool is "curl". It runs 
under Windows too. It allows you to locate the problem: is it that your 
module does not send the expected headers (Set-Cookie, Cache-Control, 
etc), or is it that the browser does not send them (the Cookie header 
for example). With curl you can specify which headers to send, which 
cookies, and you can simulate browsers by sending all kind of 
cache-related headers (If-Modified-Since, If-Match-None, etc).


Check http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14 for 
an explanation of all http headers, especially the cache-related ones 
(Cache-Control, Expires, Last-Modified, If-Modified-Since, ETag, etc).


For testing with the browser, there's a plugin for Firefox called 
Firebug. Maybe there's something similar for Chrome, I don't know. 
Firebug displays the http communication between browser and server, 
including the http headers.


If you don't find such a plugin, you could sniff the network traffic 
directly via a tool such as wireshark (works on Windows too and it's free).


You can look if the "Cache-Control" header is always set by your server. 
If it is set then the browser is not supposed to cache the page linked 
from the hyperlink, so it should replay the request every time you click 
on it.


If it is not set and you cannot force apache to set it by the Header 
directive, then force it directly in your module 
(apr_table_set(r->headers_out, "Cache-Control", "must-revalidate, 
no-cache"); apr_table_set(r->err_headers_out, "Cache-Control", 
"must-revalidate, no-cache");)


Sorin





On Thu, Jun 27, 2013 at 2:25 AM, Sorin Manolache  wrote:


On 2013-06-26 22:22, Sindhi Sindhi wrote:


Hi,

I have a C++ Apache filter module for HTML output filtering. I'm seeing a
certain behavior when using my filter module after setting cookies and
would want to know if thats expected.

I have a html page index.html, this page has a hyperlink called "Click
here" and when I click on this link, a second page index2.html should be
launched.

My filter applies some business logic while filtering html files.

I have a cookie.html file that has Javascript to set a cookie (using
document.cookie) in the browser.

I need to do the following:
1. Enable my filter using LoadModule directive and start the server
2. Set a cookie with name=cookieName, value=cookieValue in the browser
using the cookie.html
3. Launch index.html, and then click on "Click here". When this call goes
to the filter module, I have to apply some business logic before
index2.html is rendered on browser.

But when I set the cookie in step2 above, I see that the filter module is
not called because a server call is not made, and the browser opens the
cached index2.html which does not have my business logic applied.

And, if I dont set the cookie mentioned in step2 above, a server call is
made when I click on "Click here" link.

How can I ensure that, when I try to launch a html page from a hyperlink,
the call goes to the filter module even when I set a browser cookie.



What happens if you clear your browser's memory and disk cache before you
click on the hyperlink?

If it's a cache issue, then use the mod_headers module and the 'Header set
Cache-Control "must-revalidate, no-cache"' directive to disable browser
caching.

Sorin






My apologies if I'm asking something fundamental, I'm new to how cookies
work with web-servers, any help would be really appreciated.

Thanks.










Re: Browser cookie and Apache server calls

2013-06-26 Thread Sorin Manolache

On 2013-06-26 22:22, Sindhi Sindhi wrote:

Hi,

I have a C++ Apache filter module for HTML output filtering. I'm seeing a
certain behavior when using my filter module after setting cookies and
would want to know if thats expected.

I have a html page index.html, this page has a hyperlink called "Click
here" and when I click on this link, a second page index2.html should be
launched.

My filter applies some business logic while filtering html files.

I have a cookie.html file that has Javascript to set a cookie (using
document.cookie) in the browser.

I need to do the following:
1. Enable my filter using LoadModule directive and start the server
2. Set a cookie with name=cookieName, value=cookieValue in the browser
using the cookie.html
3. Launch index.html, and then click on "Click here". When this call goes
to the filter module, I have to apply some business logic before
index2.html is rendered on browser.

But when I set the cookie in step2 above, I see that the filter module is
not called because a server call is not made, and the browser opens the
cached index2.html which does not have my business logic applied.

And, if I dont set the cookie mentioned in step2 above, a server call is
made when I click on "Click here" link.

How can I ensure that, when I try to launch a html page from a hyperlink,
the call goes to the filter module even when I set a browser cookie.


What happens if you clear your browser's memory and disk cache before 
you click on the hyperlink?


If it's a cache issue, then use the mod_headers module and the 'Header 
set Cache-Control "must-revalidate, no-cache"' directive to disable 
browser caching.


Sorin





My apologies if I'm asking something fundamental, I'm new to how cookies
work with web-servers, any help would be really appreciated.

Thanks.





Re: Mutex protection of output bucket brigade

2013-06-12 Thread Sorin Manolache

On 2013-06-12 14:16, Alex Bligh wrote:


But that aside, is it safe to call apwrite() from one thread whilst there's a 
read in the other?


I do not know for sure, but I suppose it is not safe if both the read 
and the write operate on the same brigade.



I'm not calling ap_pass_brigade (at least not directly). I'm doing (roughly)

   ap_filter_t *of = state->r->connection->output_filters;
   ap_fwrite(of, state->obb, (const char *)header, pos); /* Header */

IE I'm doing an fwrite to output filter list, using the bucket brigade I've 
just created. Is that OK?


I've checked the code of ap_fwrite. Apparently it buffers the data in 
the brigade. If the brigade is full, it calls ap_filter_flush, which, in 
turn, calls the ap_pass_brigade and clears the brigade after 
ap_pass_brigade has returned.


So not every call to ap_fwrite will push the data down the filter chain.


Now I think I understand what happens with ap_fwrite (if that calls 
ap_pass_brigade), I'm wondering whether (say) the input filter of mod_ssl ever 
talks to its output filter.


I think it's impossible. The brigade is created by you in the spawned 
thread, so I don't see how the SSL decode in the main thread could 
access it. And it's the output brigade that is corrupted, not some 
brigade internal to mod_ssl. As I see it, the output brigade 
(state->obb) is not shared between the threads.


Do you get the same errors when you disable mod_ssl?

S






Re: Mutex protection of output bucket brigade

2013-06-12 Thread Sorin Manolache

On 2013-06-12 11:48, Alex Bligh wrote:


On 12 Jun 2013, at 10:20, Sorin Manolache wrote:


If I understand correctly, the main thread belongs to your module, i.e. it is 
not a concise pseudo-code of the request processing in apache's code.


The main thread is the (presumably single) thread of the prefork mpm process, 
created (I assume) by a fork() in apache's worker code.

The pseudo code was what my long-running request handler does (after creating 
the other thread). IE, broadly speaking my request handler (the main thread if 
you like) does this:

  apr_thread_create; /* create the spawned thread */
  while (!done)
   {
 /* Blocking read */
 apr_brigade_create;
 ap_get_brigade;
 apr_brigade_flatten;

 /* Do stuff with the data */
 blocking_socket_write;
   }
  apr_thread_join; /* wait until the spawned thread has executed */


I don't see where the output brigade appears in the main thread. I think this 
is critical, as the output_bucket_brigade is the data item shared between the 
two threads. ap_get_brigade triggers the execution of the chain of input 
filters. One of these input filters writes to the output brigade?


ap_get_brigade is called with APR_BLOCK_READ.

What I now /believe/ this does (because it's the only way data would actually 
get written) is write the post processed output bucket brigade to the client 
too. If this were not the case, it's difficult to see how a single threaded 
application would every write the output bucket brigade.

Or are you saying the output bucket brigade is only actually written to the 
client during an ap_fwrite()? In which case are all the filters (primarily 
mod_ssl) guaranteed to be thread safe if a different thread is doing input from 
that doing output?


"Normally" the output brigade is only written during the 
ap_rprintf/ap_fwrite and the like.


There is no output brigade when the request_rec structure is created. 
The way to write something to the client is via 
ap_pass_brigade(r->output_filter, brigade). Typically the function that 
calls ap_pass_brigade creates the brigade first. So you write something 
to the brigade and you pass it downstream. The last filter writes its 
contents to the network. Be aware that there are filters that buffer the 
brigades and do not send them further down the chain unless the buffers 
are full.


If you created a handler that would just return OK without ever calling 
an ap_rprintf/ap_fwrite, the module would create no output brigade and 
the client would get no response.


I think your hypothesis is unlikely. For it to hold, you'd need an input 
filter that is triggered when ap_get_brigade is called, that that input 
filter creates a brigade and stores it somewhere and that the spawned 
thread somehow writes to this stored brigade. So the ap_fwrite in the 
spawned thread would need to write to a brigade created in an input 
filter triggered by ap_get_brigade in the main thread.


To verify this hypothesis, check which is the brigade whose pointers are 
corrupted. I mean, where is it referenced, in which apache module, in 
which apache filter. Then you can inspect the sources of that module to 
see if the brigade is shared between input and output.


S



Re: Mutex protection of output bucket brigade

2013-06-12 Thread Sorin Manolache

On 2013-06-12 10:46, Alex Bligh wrote:

I think I've finally figured out what's going wrong in my module but am unsure 
what to do about it.

The module runs on apache 2.2.22 with mpm prefork. Occasionally I am seeing 
corruption of the output bucket brigade, primarily the ring pointers (link->next 
and link->prev) ending up with strange values.

Having spent some time reading the source, I believe that apache provides no 
protection to these pointers, and it's inherently unsafe for a bucket brigade 
to be used by more than one thread (even if you are careful with allocators), 
unless all callers provide their own mutex protection. As apache itself uses 
the output bucket brigade without mutex protection, the output bucket brigade 
can never be written to by other threads, and therefore ap_fwrite (to this 
brigade) can never be safely by any thread other than the main thread. First 
question: is this correct?

My module is currently structured as follows.

The main thread creates another thread for each request (the requests are long 
running websocket connections).

The main thread does the following:

   while (!done)
   {
 /* Blocking read */
 apr_brigade_create;
 ap_get_brigade;
 apr_brigade_flatten;

 /* Do stuff with the data */
 blocking_socket_write;
   }

The spawned thread does the following

   while (!done)
   {
  blocking_socket_read;

  /* do stuff with the data */
  ap_fwrite(output_bucket_brigade);

}

Now, what I believe is happening is as follows. The blocking read in the main 
thread at some point calls select(), and does not only do a read, but also also 
a write of the data in the output bucket brigade. This removes a bucket from 
the ring. If this is happens at the same time as the ap_fwrite in the spawned 
thread adds something to the output ring, two threads will be accessing the 
ring pointers at once.

What I can't figure out is how to fix this.

I can't put in a mutex to protect the ring pointers, because the access to the 
ring pointers by apache is outside of my module.

I can't hold a mutex across the blocking read in the main thread, because 
otherwise my module won't be able to write data to the output bucket brigade 
whilst there is no input from the apache client; as the apache client may be 
waiting for data to be sent to it, this could cause deadlock.

And I can't obviously see how to do the read in a non-blocking way.

Any ideas?


If I understand correctly, the main thread belongs to your module, i.e. 
it is not a concise pseudo-code of the request processing in apache's code.


I don't see where the output brigade appears in the main thread. I think 
this is critical, as the output_bucket_brigade is the data item shared 
between the two threads. ap_get_brigade triggers the execution of the 
chain of input filters. One of these input filters writes to the output 
brigade?


Sorin




Re: thread safety and mpm-prefork

2013-06-11 Thread Sorin Manolache

On 2013-06-11 23:08, Alex Bligh wrote:

Sorin,

On 11 Jun 2013, at 21:57, Sorin Manolache wrote:


The threadallocatormutex is created from a child of the request pool. The 
request pool and its child-pools are destroyed when the request terminates. Do 
you use the threadpool/threadallocator/threadallocatormutex afterwards?


Nope. It's one long running request, and at the end of the request handler, the 
thread I've created is _join'ed, and the pools are destroyed.

When I torture test it, I can run 10 hours of fullscreen video through it, and 
every 5 or 6th such test results in a core dump (inevitably a bucket brigade 
pointer being unhappy). We have a customer who seems to be talented at making 
things go wrong and who does not get an abort/segv, but a 100% CPU live lock. 
gdb suggests the destruction of the bucket brigade goes around and around - 
again a symptom of bucket brigade linked list pointers being unhappy.



I'm sorry, I ran out of ideas. I suppose that the operations of the two 
threads on the bucket brigade are protected by mutexes...


S


Re: thread safety and mpm-prefork

2013-06-11 Thread Sorin Manolache

On 2013-06-11 22:21, Alex Bligh wrote:

Sorin,

On 11 Jun 2013, at 21:10, Sorin Manolache wrote:


apr_* and mpm_prefork are different software packages and ubuntu distributes 
them separately. So it is almost certain that you have a thread-enabled libapr 
(i.e. compiled with APR_HAS_THREADS). You would not be able to compile the code 
that uses apr_thread_create if your libapr was not compiled with thread support.

mpm_prefork is like any ordinary client of libapr. Just that it does not use 
the threading functionality in libapr. So it cannot disable/optimise out the 
mutexes in libapr.


Thanks.


Please be aware that apr_pools are not thread-safe. Only the creation of 
subpools is thread-safe. So you should create a subpool per thread to stay safe.


I'm doing that. In fact I'm doing:

 if (!( ( apr_pool_create(&pool, r->pool) == APR_SUCCESS) &&
( apr_thread_mutex_create(&threadallocatormutex, APR_THREAD_MUTEX_UNNESTED, 
pool) == APR_SUCCESS) &&
( apr_allocator_create(&threadallocator) == APR_SUCCESS) &&
( apr_allocator_mutex_set(threadallocator, threadallocatormutex), 1 ) 
&&
( apr_pool_create_ex(&threadpool, NULL, NULL, threadallocator) == 
APR_SUCCESS) && /* WARNING: pool has no parent\
  */
threadpool && threadallocator && threadallocatormutex && pool
 )) {
ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r,
   "tcp_proxy_on_connect could not allocate pool");
 return NULL;
 }



The threadallocatormutex is created from a child of the request pool. 
The request pool and its child-pools are destroyed when the request 
terminates. Do you use the 
threadpool/threadallocator/threadallocatormutex afterwards?



i.e. a subpool, then a new mutex, allocator, and parentless pool using the 
forgoing. I suspect this is well into the land of overkill.

I'm doing similar with creation of bucket brigades. The issue seems to be 
linked to bucket brigade processing (which is unsurprising as that's what's 
written to by one thread but read by the other).

Any help (paid if necessary) welcome. It's an apache licensed tcp proxy module.





Re: thread safety and mpm-prefork

2013-06-11 Thread Sorin Manolache

On 2013-06-11 21:20, Alex Bligh wrote:

I've written a module which I believe to be thread-safe but appears to be
doing something which I have put down to a lack of thread safety in pool
management (somewhere).

Before I tear my hair out here, my module is running with apache 2.2.22
and mpm-prefork on Ubuntu. Do the thread primatives actually do anything in
mpm-prefork? I'm using apr_thread_create to create a thread, then
providing a separate allocator, mutex, pool and similar (all as
recommended). But if the mutex stuff is 'optimised out' of my apr
library - specifically the pool stuff - all this will be in vain.



apr_* and mpm_prefork are different software packages and ubuntu 
distributes them separately. So it is almost certain that you have a 
thread-enabled libapr (i.e. compiled with APR_HAS_THREADS). You would 
not be able to compile the code that uses apr_thread_create if your 
libapr was not compiled with thread support.


mpm_prefork is like any ordinary client of libapr. Just that it does not 
use the threading functionality in libapr. So it cannot disable/optimise 
out the mutexes in libapr.


Please be aware that apr_pools are not thread-safe. Only the creation of 
subpools is thread-safe. So you should create a subpool per thread to 
stay safe.


Sorin


Re: Module exceptions can be handled before it crashes httpd?

2013-05-27 Thread Sorin Manolache

On 2013-05-27 10:30, Sindhi Sindhi wrote:

Hello,

Is there a way to handle the exceptions (access violation, heap corruption
etc) thrown by an output filter module within the module itself so that it
does not propagate till the httpd.exe server resulting in a server crash?

The C++ output filter module that I have written makes use of native memory
allocation methods like malloc/new in some cases. I have not used the
APR request pool here since the allocations in these  methods are very much
short lived and are called many times within a single request. So rather
than waiting for the request completion and then the pool manager releasing
this memory, I'm using native new/delete calls to do the
allocation/deallocation so that I can release the memory immediately after
use.

The issue is, in some rare case scenarios I saw a httpd.exe crash that was
due to heap corruption and access violation during new/delete calls in
these methods. Is there a way I can gracefully handle these within the
module by catching such exceptions and trying to handle them, rather
that propagating this exception resulting in httpd.exe crash?

Worst case even if no filtering happened due to a crash in the module, I'd
prefer that the filter sent back the original data (that was passed to the
filter when the filter callback was made by the server) down the filter
chain, ofcourse after logging this information for later troubleshooting.



Heap corruption/access violation are, most likely, due to bugs in your 
code. These kind of errors are totally different from an out-of-memory 
error, for example. Also they give you a high degree of uncertainty 
about what you're doing and unpredictibility regarding the impacts. It's 
a very bad idea to tolerate them.


If you corrupt some pointers in your structures only, then let us say 
that you could tolerate them, not in principle, as I strongly advise 
against, but at least technically.


However, if you corrupt some pointers in the structures of apache that 
apache reuses for other requests (for example the server_rec, or the 
conf pool, or the array of pointers to module configuration objects), 
then tolerating these kind of errors is impossible, even from a 
technical point of view.


In the first case (when only your structures are corrupt), you could 
catch (at least in Unix) the signal that Unix throws when it detects 
such a violation. But even then your options are limited, because you 
have no idea what you can do without reproducing the violation in the 
violation handler! You can't rely on your data.


(I don't know how these violations are caught in Windows, but I am sure 
they can be caught.)


However, if you corrupted apache's structures, the access violation may 
occur later, not when apache runs your code, but when it runs its code 
or 3rd party code. Then again, in your handler you would not have any 
clue where the violation comes from and how to handle it.



So my advice is to debug your code. Compile it with debug symbols, 
execute it in a debugger, reproduce the scenarios in which it crashes.


If the errors occur only occasionally, then I suspect one of the 
following cases:


*) concurrency problems. To check for this, start running your module at 
low throughput and steadily increase the throughput. If the error rate 
is low at the beginning but increases with throughput, then it could be 
a concurrency problem. You could also start apache with a single thread. 
If it never crashes with a single thread, then again it could be a 
concurrency problem. Check if the libs that you use are thread-safe or not.


*) data-related problem. Run the same request at high throughput. If it 
never crashes, then maybe it is not a concurrency error, but rather 
dependent on the data that you use for testing. So it could be an 
algorithmic problem in your filter.


*) Try to test your corner cases, especially the case is which a string 
to replace is broken between two invocations of the filter. Think of the 
scenario in which the string to replace is contained in _3_ different 
buffers.


Sorin



Re: Apache: Create server config only once

2013-05-25 Thread Sorin Manolache

On 2013-05-25 15:22, Sindhi Sindhi wrote:

Hi,

I see that the create_server_config callback is called twice for every
Apache server startup. I want to do a lot of initialization in this
callback and allocations in this function are meant to be long lived. And
these buffers are very huge, so I want to ensure these initializations are
done only once per server start. As of now I'm trying to do the following -


There's not much you can do against the double-call of create_*_config.

Here's a rough sketch of how apache works:

create conf pool
create config objects for all modules and virtual hosts
read config
call pre_config callbacks
parse config
call post_config callbacks

loop:
   _clear the conf pool_
   create config objects for all modules and virtual hosts
   read config
   call pre_config callbacks
   parse config
   call post_config callbacks
   create generation of children to handle the requests
   wait on the generation to die
   if the signal was to stop => exit the loop
   else (i.e. reload conf) iterate

So apache invokes create_*_config twice before starting to handle the 
requests. But please note the following points:


1. The conf pool is cleared between the first and second call to 
create_*_config. So if you allocated something in create_*_config when 
it was called the first time you won't find it again when called the 
second time.


2. A new server_rec is constructed each time apache calls "create config 
objects for all modules and virtual hosts". This is why you don't find 
your conf object with ap_get_module_config.


3. The create_*_config callbacks are called not only at the beginning, 
but each time a new generation of children is spawned. An old generation 
is sent the message to shut down when apache is sent a signal, either to 
stop or to restart or to gracefully restart. Typically you want to 
reload the conf because you changed it, so it's normal that apache 
reparses the conf.


What you could do is to create the conf in the first call. You also 
store it somewhere, let's say in a global variable. Then, in the second 
call, you do not try to get it from the server_rec (as anyway the 
server_rec is brand new and you would not find it there), but you get it 
from where you stored it in the first call.


So something like that:

static MyConf *my_global_conf;

void *
create_server_config() {
   if (my_global_conf != NULL)
   return my_global_conf;
   my_global_conf = new MyConf;
   return my_global_conf;
}

Note that if you adopt this approach you should not allocate anything in 
the pool that is passed to create_server_config because the pool is 
cleared between invocations.


Also there are two problems with this approach:

1. You'll have one single object for all virtual hosts.
2. You cannot distinguish between the second call and subsequent calls. 
So you cannot do a conf reload (a graceful restart) anymore because all 
invocations of create_server_config except the first one will return the 
old my_global_conf and will not react to changes in the configuration. 
So you will be forced to do server restarts (as opposed to graceful 
restarts) in order to load a new configuration.


My advice is to live with the double invocation because you gain more 
than you lose. It solves you the two problems mentioned above. You pay 
only by waiting a little longer at startup. Even if apache did not call 
the create_*_config functions twice before serving requests you would 
have to live with several invocations of the create_*_config callbacks 
if you want to support conf reloads. And conf reloads are very useful, 
you can reload the conf without losing requests.


In order not to leak, place a cleanup function in the list of cleanup 
callbacks of the conf pool. For example:


void *
create_server_config(apr_pool_t *pconf, ...) {
   MyConf *cnf = new MyConf;
   apr_pool_cleanup_register(pconf, cnf, &my_cleanup, 
&apr_pool_cleanup_null);

   return cnf;
}

apr_status_t
my_cleanup(void *data) {
   MyConf *cnf = reinterpret_cast(data);
   delete cnf;
   return APR_SUCCESS;
}

my_cleanup will be called when the conf pool is cleared, i.e. before 
each new reparsing of the conf and creation of each new generation of 
children.


Sorin



typedef struct
{
int bEnabled; // Enable or disable the module.
MyFilterInit* myFilterInitObj; // A class that has methods to do all huge
initializations
bool serverConfigured;
} MyFilterConfig;

static int serverConfigHit = 0;

static void* CreateServerConfig(apr_pool_t* pool, server_rec* virtServer) {
 MyFilterConfig *pExistingConfig = (MyFilterConfig *)
ap_get_module_config (virtServer,  &tag_filter_module);

if (serverConfigHit == 0) {
MyFilterConfig *pConfig = (MyFilterConfig *) apr_pcalloc (pool, sizeof
*pConfig);
pConfig->myFilterInitObj = new MyFilterInit(); // This does all the huge
initializations
serverConfigHit = serverConfigHit +1;
return pConfig;
}
return pExistingConfig;
}

But I see an issue here. The second time when CreateServerC

Re: Apache pool management

2013-05-25 Thread Sorin Manolache

On 2013-05-25 10:05, Sindhi Sindhi wrote:

You have answered all my questions and thanks a lot. Had two questions
more, appreciate your response.

1.
As of now, my httpd.conf file has the below lines-
# Server-pool management (MPM specific)
#Include conf/extra/httpd-mpm.conf

This means Apache does not read the httpd-mpm.conf file during startup. And
so it uses the default settings for "Max number of requests it can support"
and "Max number of threads it creates per child process"

Where can I find what default values Apache uses for the following -
- Upto how many concurrent requests will Apache support by default
- Max number of threads that one child process creates

For ex. if I want Apache to handle upto 400 concurrent requests at a time,
how will I know that this 400 is within the default settings that Apache
uses.



Have a look here, depending on your version of apache.
http://httpd.apache.org/docs/2.2/mod/mpm_common.html
http://httpd.apache.org/docs/2.4/mod/mpm_common.html

The number of threads per process is given in ThreadsPerChild.

You typically tweak

ServerLimit (the maximum number of children that are simultaneously alive)
ThreadLimit (the maximum sum of alive threads in all children)
StartServers (how many children are created upon startup)
ThreadsPerChild
MaxRequestWorkers (or MaxClients in 2.2) (the maximum number of requests 
that are served simultaneously)
MaxConnectionsPerChild (or MaxRequestsPerChild in 2.2) (after a child 
has served that many requests, it exits and is potentially replaced with 
a new child; avoids memleaks)
MinSpareThreads and MaxSpareThreads, the minimum and maximum number of 
spare threads.


There are some constraints on the arguments of these directives, which I 
do not master. I think that MaxRequestWorkers <= ThreadsPerChild * 
ServerLimit and that MaxRequestWorkers and ThreadLimit should be 
divisible by ThreadsPerChild, but as I said, I do not master. If you get 
them wrong, apache adjusts the values automatically and informs you 
about it upon startup.


I am not sure, maybe others on the list can confirm or deny, but I think 
that apache does not distinguish between threads and processes in 
windows: http://httpd.apache.org/docs/2.4/mod/mpm_winnt.html or 
http://httpd.apache.org/docs/2.2/mod/mpm_winnt.html. So I think that 
ServerLimit = 1 in Windows and probably MaxConnectionsPerChild is not 
used or does not exist.


You may also have a look at KeepAlive On|Off, MaxKeepAliveRequests and 
KeepAliveTimeout (http://httpd.apache.org/docs/2.4/mod/core.html)


If my module is on the internet with hundreds of thousands of possible 
client IPs that issue one request and then leave, I set KeepAlive Off. 
If my module is on the intranet and is accessed by a couple of 
webservices that continuously issue requests, I set it On with a short 
timeout. Performance-wise the KeepAlive directive makes a huge difference.



2.
My understanding is, once a request is completely processed, Apache frees
the pool of only this request and does not free any other request's pool.
And other request pools will be freed only when those requests are
completely processed. Kindly confirm my understanding to be correct.


Yes, it is correct.


Sorin



Re: Apache pool management

2013-05-24 Thread Sorin Manolache

On 2013-05-24 20:00, Sindhi Sindhi wrote:

Hi,

I did an initial study to find out more information about Apache APR pools.
Some links say pools are not thread safe, some say they are really useful,
so I'm a bit confused. Kindly advice.

In certain links like the below, I read that pools are not thread safe.
Does this mean that when multiple requests are being handled by multiple
threads in Apache, data of one request could get overwritten/read/freed by
other requests?
http://mail-archives.apache.org/mod_mbox/apr-dev/200502.mbox/%3c1f1d9820502241330123f9...@mail.gmail.com%3E

A lot of other links like the below state that pools are a big advantage in
Apache memory management.
http://structure.usc.edu/svn/svn.developer.pools.html
http://dev.ariel-networks.com/apr/apr-tutorial/html/apr-tutorial-3.html

To summarize my requirement, I have written a C++ output filter module that
filters HTML. And this module will be invoked by multiple requests that hit
the server at the same time. My concern about using the APR pool is, if I
use the request pools, when one request (say request1) is in the middle of
processing, I do not want this request's data to be corrupted by other
requests'(say request2) - like, request2 overwriting on request1's data or
request2 freeing the memory allocated by request1.

If you can answer the below questions that will help me make a better use
or APR pool, kindly reply.
1. Can I use the request pools in such a way that, each request owns its
pool and cannot be read/written/freed by other requests/pool manager? What
settings in httpd-mpm.conf will make this possible?


Yes. I think it works out-of-the-box and does not need any special settings.


2. I get the "ap_filter_t *f" as an input to my filter module, and I'm
using the f->r->pool to allocate memory that is used by my filter module.
So is this pool unique for every request?


Yes. Each request owns its pool. Concurrent requests are handled in 
different threads, but the threads cannot access another thread's 
request structure, and implicitly cannot access another request's pool.


I don't know if pools are thread-safe. However, as stated here 
http://apr.apache.org/docs/apr/1.4/group__apr__pools.html, sub-pool 
creation is thread-safe. The pool owned by a request is a sub-pool of 
the connection pool. So apache creates, in a thread-safe manner, the 
request pool for your exclusive use. So it should be safe. Personally 
I've never had concurrency issues with request pools.


Sorin



Would appreciate a response.

Thanks.





Re: Get name of config file for module

2013-05-22 Thread Sorin Manolache

On 2013-05-21 23:52, Sean Beck wrote:

Sorin,

Is there a way to figure out the name of the config file in code so I can
log it? Or even just the path to where it is.

Also, I'm confused because you said there is no such thing as a
module-specific configuration file, but then you said configuration files
can be split per-module. Does Apache just read out of httpd.conf on
start-up and in httpd.conf I would use the Include directive to include my
different configurations for each module?


Yes, exactly. Apache knows its config file, either because it's 
hard-coded when apache was compiled, or because the hard-coded path was 
overwritten by command line arguments (the -f switch). Then, the "root" 
config file, so to say, contains Include directives to other files. But 
it's only syntactic sugar for conceptually separating configurations for 
the benefit of the server administrator. Apache has no way of mapping 
the included files to modules.


I don't think there is a way to get the name of the included files. And 
I don't think there is a way either to get the name of the root 
configuration file, as far as I---briefly---looked in apache's sources.


Sorin




Thanks


On Tue, May 21, 2013 at 2:32 PM, Sorin Manolache  wrote:


On 2013-05-21 21:36, Sean Beck wrote:


Hi all,

I have written a module and now would like to log the name of the actual
config file being used by the module (there will be multiple modules on
the
server with their own config files. I looked through
https://httpd.apache.org/docs/**2.4/developer/modguide.html<https://httpd.apache.org/docs/2.4/developer/modguide.html>but
 am still
struggling to understand how Apache knows what the config file is for each
module and how I can check what the name of the actual name of the file is
for each module.



There is no such thing as a module-specific configuration file. In
principle a single file could contain the configuration for all modules.

The configuration files are split per-module for convenience only, in
order to be able to activate modules independently.

The configuration _directives_ are defined per-module (but actually
nothing stops you from defining the same configuration directive in several
modules).

Apache defines a directive called "Include". This directive allows for the
splitting of the entire configuration into several files. But from apache's
point of view there is no correspondence between the files into which the
configuration is split and the modules that define the directives found in
those files.

Sorin








Re: Get name of config file for module

2013-05-21 Thread Sorin Manolache

On 2013-05-21 21:36, Sean Beck wrote:

Hi all,

I have written a module and now would like to log the name of the actual
config file being used by the module (there will be multiple modules on the
server with their own config files. I looked through
https://httpd.apache.org/docs/2.4/developer/modguide.html but am still
struggling to understand how Apache knows what the config file is for each
module and how I can check what the name of the actual name of the file is
for each module.


There is no such thing as a module-specific configuration file. In 
principle a single file could contain the configuration for all modules.


The configuration files are split per-module for convenience only, in 
order to be able to activate modules independently.


The configuration _directives_ are defined per-module (but actually 
nothing stops you from defining the same configuration directive in 
several modules).


Apache defines a directive called "Include". This directive allows for 
the splitting of the entire configuration into several files. But from 
apache's point of view there is no correspondence between the files into 
which the configuration is split and the modules that define the 
directives found in those files.


Sorin



Re: C++ Apache module fails to load

2013-05-11 Thread Sorin Manolache

On 2013-05-11 08:35, Sindhi Sindhi wrote:

Hello,

I have created a C++ Apache module that performs text filtering operations.
I used Visual Studio 2010 to build/compile the source code to generate the
DLL. I have Apache httpd-2.4.4-win64 installed from
http://www.apachelounge.com/download/win64/. I have a Windows 7 64-bit OS.
Loading and executing this module works absolutely fine on my laptop.

But if I copy the same DLL on to a different laptop having same operating
system and Apache configuration, I get the following error when I try to
start httpd.exe -


httpd.exe: Syntax error on line 174 of C:/Apache24/conf/httpd.conf: Cannot

load modules/MyFilter.dll into server: The specified module could not be
found.

>
> Could you please advice?

I have no clue but check that you have exactly the same configuration 
and the same directory tree. Check the arguments of the LoadModule 
directive and check that MyFilter.dll is placed where LoadModule expects 
it. Try replacing the relative paths with absolute paths. The relative 
paths are relative to the ServerRoot, so check that the ServerRoot 
directive has the same arguments. Check that the ServerRoot or other 
configuration directives are not overwritten by some command-line 
arguments in the apache-lauching script, if any.


Sorin



Re: Setting and accessing module specific directory

2013-05-11 Thread Sorin Manolache

On 2013-05-11 08:22, Sindhi Sindhi wrote:

Thankyou.
Had one more question. What is the right way of handling this - in which
location in Apache server directory should I be keeping my filter specific
files?

As of now the XML and other files that are read by my filter are placed in
a new folder say "MyFilter" inside "httpd-2.4.4-win64\Apache24\htdocs". Is
this the right way of doing it?


I think a more "elegant" solution would be to have the location 
configurable.


For that, you define a module-specific configuration directive. (I am 
not familiar with apache 2.4, all my advice is based on apache 2.2, but 
I suppose there is a sufficient degree of backwards compatibility.)


For defining configuration directives, you have to take the following steps:

1. You define one or two callback functions, create_dir_config and/or 
create_server_config. You put their addresses in the corresponing places 
in your module structure. These callback functions are called when 
apache starts up and they should create and return an 
application-specific configuration object that is opaque for apache; it 
sees it as a void pointer.


The server-wide configuration object (created by create_server_config) 
contains configuration data that apply to the whole virtual host.


The directory-wide configuration object (created by create_dir_config) 
contains configuration data that apply to a  or .


So, if you have a server with 3 virtual hosts, you have 3 server 
configuration objects per module. However you may have many more 
directory configuration objects, depending on your Locations and 
Directories.


2. You add one entry to the cmds array of your module structure for each 
configuration directive that you want to define. Each cmds array entry 
has a placeholder for your configuration directive-specific callback. 
The callback function is invoked by apache when it parses the 
configuration and it encounters the configuration directive specified by 
your module.


3. You define the configuration directive callback. The callback gets 
the arguments of the configuration directive. What the callback should 
do is to retrieve somehow the configuration object created by the 
callback in step 1 and initialise it with the arguments passed by apache 
(the configuration values).


The directory-wide configuration object is passed in the second argument 
of the configuration callback. You just have to cast the void pointer to 
a pointer to the type of your configuration object.


The server-wide configuration object is retrieved as follows:

ap_get_module_config(params->server->module_config, &my_module)

where params is the cmd_parms argument of the configuration directive 
callback and my_module is the name of your module structure object.


Note that you don't need to use both a server-wide and a directory-wide 
configuration object. The simplest is to use a server-wide. However, a 
directory-wide gives you more configuration flexibility.


The callbacks in steps 1-3 are not called as part of request processing. 
It's more an apache startup and configuration thing. Once apache starts 
processing the requests, it is properly configured and the values of 
your configuration directives are stored in the configuration objects. 
During request processing you'll just have to retrieve these values from 
the configuration objects. This is done is step 4:


4. You retrieve the configuration object of your module from the 
request_rec stucture.


For the server-wide:
MyConf *cnf = (MyConf *)ap_get_module_config(r->server->module_config, 
&my_module);


For the directory-wide:
MyConf *cnf = (MyConf *)ap_get_module_config(r->per_dir_config, &my_module);



Also, I have downloaded the 64-bit HTTP server from -

http://www.apachelounge.com/download/win64/httpd-2.4.4-win64.zip<http://www.apachelounge.com/download/win64/>
Is this the right official website to download Apache server?


I think the official builds are downloaded from here:
http://httpd.apache.org/download.cgi




On Wed, May 8, 2013 at 11:04 PM, Sorin Manolache  wrote:


On 2013-05-08 17:02, Sindhi Sindhi wrote:


Hi,

I have written a C++ Apache module that performs filtering of HTML
content.

There are some XML files which are read by this filter when it does the
filtering. During run-time when this filter is invoked, I'd want the
filter
pick up these XML files and read them. I was thinking these XML files can
be placed in the server location "C:\Program
Files\httpd-2.4.4-win64\**Apache24\htdocs\MyFilter\". where MyFilter
folder
will contain all the XML files needed by my filter. How do I access this
location programatically during run-time when the filter is invoked?

How can I get the absolute path of the server installation loacation
(C:\Program Files\httpd-2.4.4-win64\**Apache24\htdocs\) so that I can
append
"MyFilter\" to the same to get the location of the XML 

Re: Setting and accessing module specific directory

2013-05-08 Thread Sorin Manolache

On 2013-05-08 17:02, Sindhi Sindhi wrote:

Hi,

I have written a C++ Apache module that performs filtering of HTML content.

There are some XML files which are read by this filter when it does the
filtering. During run-time when this filter is invoked, I'd want the filter
pick up these XML files and read them. I was thinking these XML files can
be placed in the server location "C:\Program
Files\httpd-2.4.4-win64\Apache24\htdocs\MyFilter\". where MyFilter folder
will contain all the XML files needed by my filter. How do I access this
location programatically during run-time when the filter is invoked?

How can I get the absolute path of the server installation loacation
(C:\Program Files\httpd-2.4.4-win64\Apache24\htdocs\) so that I can append
"MyFilter\" to the same to get the location of the XML files?

I have specified the below in httpd.conf file:
DocumentRoot "C:/Program Files/httpd-2.4.4-win64/Apache24/htdocs"
ServerRoot "C:/Program Files/httpd-2.4.4-win64/Apache24"

The signature of my filter looks like this -
static apr_status_t myHtmlFilter(ap_filter_t *f, apr_bucket_brigade *pbbIn)

I will need the document root or server root or any other path variable
that I can access from my filter that will give me the absolute path
"C:/Program Files/httpd-2.4.4-win64/Apache24"


const char *ap_server_root in http_main.h (apache 2.2. It could be 
similar in 2.4) contains the ServerRoot.


ap_server_root_relative in http_config.h allows you to compose paths 
relative to the ServerRoot.


Sorin



Thanks.





Re: Apache Buckets and Brigade

2013-05-01 Thread Sorin Manolache

On 2013-05-01 14:54, Sindhi Sindhi wrote:

Hello,

Thanks a lot for providing answers to my earlier emails with subject
"Apache C++ equivalent of javax.servlet.Filter". I really appreciate your
help.

I had another question. My requirement is something like this -

I have a huge html file that I have copied into the Apache htdocs folder.
In my C++ Apache module, I want to get this html file contents and
remove/replace some strings.

Say I have a HTML file that has the string "oldString" appearing 3 times in
the file. My requirement is to replace "oldString" with the new string
"newString". I have already written a C++ function that has a signature
like this -

char* processHTML(char* inHTMLString) {
//
char* newHTMLWithNewString = 
return newHTMLWithNewString;
}

The above function does a lot more than just string replace, it has lot of
business logic implemented and finally returns the new HTML string.

I want to call processHTML() inside my C++ Apache module. As I know Apache
maintains an internal data structure called Buckets and Brigades which
actually contain the HTML file data. My question is, is the entire HTML
file content (in my case the html file is huge) residing in a single
bucket? Means, when I fetch one bucket at a time from a brigade, can I be
sure that the entire HTML file data from  to  can be found in
a single bucket? For ex. if my html file looks like this -

..
..
oldString
... oldString...oldString..
..


When I iterate through all buckets of a brigade, will I find my entire HTML
file content in a single bucket OR the HTML file content can be present in
multiple buckets, say like this -

case1:
bucket-1 contents =
"
..
..
oldString
... oldString...oldString..
..
"

case2:
bucket-1 contents =
"
..
..
oldStr"

bucket-2 contents =
"ing
... oldString...oldString..
..
"

If its case2, then the the function processHTML() I have written will not
work because it searches for the entire string "oldString" and in case2
"oldString" is found only partially.



Unfortunately there is no guarantee that the whole file is in one brigade.

Even if it was in one brigade, there is no guarantee that is it in a 
single bucket.


So you can have case2.

In my experience the buckets that I've seen have about 8 kilobytes. So 
you will not consume too much memory if you "flatten" the bucket brigade 
into one buffer and then perform the replacement in the buffer. (see 
apr_brigade_flatten). However, you have to provide for the case in which 
oldString is split between two brigades.


Sorin



Thanks a lot.





Re: Apache C++ equivalent of javax.servlet.Filter

2013-05-01 Thread Sorin Manolache

On 2013-05-01 12:21, Sindhi Sindhi wrote:

Hi,

I'm developing a C++ module for Apache(httpd.exe) server. This C++ module
intends to function the same as a Java module that we earlier developed for
Tomcat. I'll be very thankful to you if you could answer the following
queries I have about finding the Apache(httpd.exe server) C++ equivalents
for the below Java classes.

This Java code has references to the following -

1. javax.servlet.Filter
In Java we have a class CustomFilter that implements javax.servlet.Filter.
This CustomFilter class has an init() method that will be called when
Tomcat starts. How can I achieve this for Apache(httpd.exe server)? means,
how can I make a function call when Apache(httpd.exe server) starts.


There are three possibilities:

1. the pre_config hook. This is invoked before apache parses its 
configuration. I think this is not what you want.


2. the post_config hook. This is invoked after apache parses its 
configuration but before the children processes are invoked. This could 
be what you want.


pre_config and post_config are called twice when apache starts and each 
time the configuration is reloaded.


pre_config and post_config are called in the apache parent process with 
the privileges of the apache parent process.


3. the child_init hook. This is invoked whenever apache creates a new 
child process. This is invoked with the privileges of the apache 
children processes.


That was the answer to your question. However, in apache filters are 
typically initialised differently. Each filter may have an init 
function. If the filter has such an init function (i.e. if it's not 
null), then the init filter function is invoked automatically by apache 
after the fixups callback and before the handler callback. So the init 
function, if it exists, is invoked once per request.



2. javax.servlet.FilterConfig
The above mentioned init() method takes in an argument of type
FilterConfig. What is the Apache C++ equivalent of FilterConfig?


The ap_filter_t structure has a void pointer field called ctx. This is 
null by default. You can make it point to whatever object you want. You 
can initialise this ctx in the init function of the filter or when the 
filter is invoked for the first time. (Please note that a filter may be 
invoked several times for the same request, so you'll have to take care 
to distinguish between the several invocations for the same request and 
between the invocations triggered by different requests.)


I'm not sure what's the role of the FilterConfig object in Java 
servlets. Maybe you need a configuration object of your apache module 
and not a filter context. A filter context serves mainly to store a 
state between consecutive invocations of the filter for the same 
request. So the data in the filter context, as it is a state, changes. 
It's not really a configuration, which is more a read-only object.



3. The interface javax.servlet.Filter also has the method
doFilter(ServletRequest request, ServletResponse response, FilterChain
chain). In Apache C++ I can get the filter chain using the structure
ap_filter_t. But how will I get the objects of
ServletRequest/HttpServletRequest and ServletResponse/HttpServletResponse
in C++ module? Means what are the corresponding structures I can access in
Apache C++ module.

The main filter callback function in my C++ module looks like this -

EXTERN_C_FUNC apr_status_t filterOutFilter (
ap_filter_t *filterChain,
apr_bucket_brigade *inBucketList)



The request_rec *r field of the ap_filter_t structure plays the role of 
the ServletRequest.


The ap_filter_t *next field of the ap_filter_t structure plays the role 
of the FilterChain.


There is no ServletResponse object. Your filter gets this "bucket 
brigade" which is basically a linked list of "buckets". The buckets 
contain data coming from the upstream filters or from the handler.


So a typical filter would parse the bucket brigade (i.e. traverse the 
linked list of buckets and process the data they contain) and generate a 
new, filtered (transformed) bucket brigade. Then it would pass it to 
downstream filters.


Something like

for (all buckets in input brigade) {
   read data in bucket
   transform the data, optionally using
   the state stored in f->ctx
   optionally update the state in f->ctx
   append the transformed data to the output brigade
   if (we reached the end-of-stream)
  // if upstream filters were well-behaved
  // this would be the last invocation of the filter
  // for this request
  return ap_pass_brigade(f->next, output_brigade)
}
// we parsed the whole brigade without reaching
// to the end-of-stream => the filter will be invoked again
// later with the next part of the data coming from upstream
// filters
return ap_pass_brigade(f->next, output_brigade);

Note, as I said previously, that the filter may be called several times 
for the same request.


After passing a brigade containing an EOS (end-of-stre

Re: retrieving module configs from other modules

2013-04-16 Thread Sorin Manolache

On 2013-04-15 23:12, Tim Traver wrote:

Hi all,

is there a way to access the module config tables of another module?

For instance, if I want to see (from my module) what the path to the SSL
cert is in the ssl_module, how can I go about doing that?

And are there any tutorials on how to retrieve and set config values?

Thanks,

Tim


I think you have two options:

1) You handle exactly the same configuration directive in your module in 
order to "intercept" what other modules see. This is possible because 
apache invokes the configuration handling callbacks of all modules that 
have registered for that directive. So in theory your module can 
intercept the whole configuration of apache and all its modules.


2) You read the configuration object of the other module. However you'll 
need two pieces of information in order to do so: The name of the 
other's module object and the structure of its configuration object.


For example you'd do:

OtherModConf *conf = (OtherModConf *)ap_get_module_config(conf_vector, 
&other_mod_obj);

use(conf->conf_value_of_interest);

I think you cannot know other_mod_obj unless you search in the other 
module's sources. (You could check the exported symbols of the binary as 
well.)


And you may know OtherModConf only if the other module publishes its 
structure in a header file intended to be included in modules such as yours.



Sorin



Re: Apache fails to start when Xerces library is used in a C++ module

2013-03-31 Thread Sorin Manolache

On 2013-03-31 11:06, Sindhi Sindhi wrote:

Hello,

I have written a C++ module to invoke the Xerces C++ XML library to parse a
XML file. I'm unable to start httpd.exe with these changes. Here are the
details -

a) Apache server version: httpd-2.4.4-win64
b) Xerces version: xerces-c-3.1.1-x86_64-windows-vc-10.0
c) Development envt: Visual Studio 2010 with SP1

Following are the settings I have made in Visual Studio so that the C++
module refers to the Xerces library:
1. Additional Include Directories =
E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\include

2. Additional Dependencies = xerces-c_3.lib and xerces-c_static_3.lib

3. Additional library directories =
E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\lib

4. Debugging -> Environment:
PATH=E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\bin

5. In the Operating System environment variables, added the path
E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\bin
to the environment variable PATH

6. The code in my C++ module to invoke the Xerces library routine goes like
this -

try {
XMLPlatformUtils::Initialize();  // Initialize Xerces infrastructure
}
catch( XMLException& e ) {
char* message = XMLString::transcode( e.getMessage() );
XMLString::release( &message );
}
XMLPlatformUtils::Terminate();

7. Added the below in httpd.conf file -
LoadModule filter_module modules/XercesDLL.dll
AddOutputFilterByType TagFilter text/html text/plain text/css

8. If i try to launch httpd.exe from command prompt, I see the below error -

httpd.exe

httpd.exe: Syntax error on line 172 of
E:/httpd-2.4.4-win64/Apache24/conf/httpd.conf: Cannot load
modules/XercesDLL.dll into server: The specified module could not be found.

Even if I comment out the above C++ code in step 6, Apache still fails to
start. That means Apache is failing to load the Xerces library version I'm
using, irrespective of the way I'm invoking the library.

However, If I write a standalone DLL that invokes the above Xerces library
version, and invoke this DLL from an EXE then I'm successfully able to
parse the XML. This means, the Xerces library fails to get loaded only by
the Apache server for some reason.

I think its got something to do with the C to C++ linkage, not sure tho.

Any help is highly appreciated.



Hello,

Try

LoadFile /absolute/path/to/the/xerces/lib1
LoadFile /absolute/path/to/the/xerces/lib2
...
LoadModule /absolute/path/to/XercesDLL.dll

For example

LoadFile 
E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\lib\xerces-c_3.lib 
and xerces-c_static_3.lib
LoadFile 
E:\xerces-c-3.1.1-x86_64-windows-vc-10.0\xerces-c-3.1.1-x86_64-windows-vc-10.0\lib\xerces-c_3.lib 
and xerces-c_3.lib

LoadModule E:\workspace\XercesDLL.dll

Maybe you do not need both LoadFiles. Experiment.

Sorin



Re: How to control block size of input_filter data

2013-03-12 Thread Sorin Manolache

On 2013-03-12 10:52, Hoang-Vu Dang wrote:

Thank you for the quick reply !

The context is what I am looking into right now, and It is indeed a
right solution to my original question. I just want to know a little bit
more detail if you do not mind, you said:

"I typically destroy it by placing a callback in the cleanup hook of the
req->pool. "


Now I remember: I use C++ so I need to create and destroy the context. 
But if you allocate your context from an apr_pool you don't need to 
bother about the context destruction because it is automatically 
destroyed. Sorry for confusing you.


Just for information, I create/destroy the contexts like that:

int
flt_init_function(ap_filter_t *flt) {
   // I use C++, if you allocated ctx from a pool, you don't even need 
this callback destruction

   flt->ctx = new MyContext();

   // delete_ctx is called when r->pool is destroyed, i.e. at the very 
end of the request processing, after the response has been sent to the 
client and the request logged.
   apr_pool_cleanup_register(flt->r->pool, flt->ctx, (apr_status_t 
(*)(void *))destroy_ctx, apr_pool_cleanup_null);

}

and

apr_status_t
destroy_ctx(MyContext *ctx) {
   delete ctx;
   return APR_SUCCESS;
}

The filter function could be something like:

apr_status_t
input_filter(ap_filter_t *f, apr_bucket_brigade *bb, ap_input_mode_t 
mode, apr_read_type_e block, apr_off_t bytes) {

MyContext *ctx = (MyContext *)f->ctx;

switch (ctx->state()) {
case FIRST_INVOKATION:
...
break;
case NTH_INVOKATION:
...
break;
case FOUND_EOS:
...
break;
...
}
}




What exactly is the callback function that I need to look for ? When it
executes, can we be sure that all the data has been processed, and our
ctx will be maintained at that state ?

Best, Vu

On 03/12/2013 10:36 AM, Sorin Manolache wrote:

On 2013-03-12 10:16, Hoang-Vu Dang wrote:

Hi all,

When I write an input_filter, I notice that the data sent from client is
not always available in one chunk if it's large.

In other words, The input_filter() function will be called multiple
times per request. My question is how to have a control on this (for
example the size of the chunk until it breaks in to two) ? what should
we look into in order to check if the two filters are called from the
same request.



You can keep the state from one filter invokation to the other in
f->ctx, the filter's context.

There are many ways to do this.

One way I've seen is to check if f->ctx is NULL (if it was NULL then
it would mean that it is the first invokation of the filter). If it's
NULL, we build the context. Subsequent invokations have the context !=
NULL. You'll have to destroy the context at the end of the request. I
typically destroy it by placing a callback in the cleanup hook of the
req->pool.

Another way to destroy it, but in my opinion a wrong way, is to
destroy it when you encounter EOS in the data processed by the filter.
I'd say it's wrong because a wrongly written filter could send data
_after_ an EOS bucket and then you could not distinguish between a new
request and a request sending data after EOS.

Another way to initialize the context is by placing a filter init
function when you declare the filter and to initialize the context in
this function. This is more elegant in my opinion, because the context
is already initialized when the filter is called the first time.

The filter context could be any structure, so you can track the filter
processing state in it.

Regards,
Sorin





Re: How to control block size of input_filter data

2013-03-12 Thread Sorin Manolache

On 2013-03-12 10:16, Hoang-Vu Dang wrote:

Hi all,

When I write an input_filter, I notice that the data sent from client is
not always available in one chunk if it's large.

In other words, The input_filter() function will be called multiple
times per request. My question is how to have a control on this (for
example the size of the chunk until it breaks in to two) ? what should
we look into in order to check if the two filters are called from the
same request.



You can keep the state from one filter invokation to the other in 
f->ctx, the filter's context.


There are many ways to do this.

One way I've seen is to check if f->ctx is NULL (if it was NULL then it 
would mean that it is the first invokation of the filter). If it's NULL, 
we build the context. Subsequent invokations have the context != NULL. 
You'll have to destroy the context at the end of the request. I 
typically destroy it by placing a callback in the cleanup hook of the 
req->pool.


Another way to destroy it, but in my opinion a wrong way, is to destroy 
it when you encounter EOS in the data processed by the filter. I'd say 
it's wrong because a wrongly written filter could send data _after_ an 
EOS bucket and then you could not distinguish between a new request and 
a request sending data after EOS.


Another way to initialize the context is by placing a filter init 
function when you declare the filter and to initialize the context in 
this function. This is more elegant in my opinion, because the context 
is already initialized when the filter is called the first time.


The filter context could be any structure, so you can track the filter 
processing state in it.


Regards,
Sorin



Re: How to write apache module in C through which to retrieve POST form data from browser?

2013-01-17 Thread Sorin Manolache

On 2013-01-17 03:59, Dhiren123 wrote:

   Samething i put HTML page(GET method) in /www folder  and through my apache
module i communicate with browser successfully.
But when i connect to apache module through POST method  ,that action
page is not found by the server.
Actually what happens i donot know or where i make the mistakes .Anybody
suggest me



I think there are two methods for getting the POST data, I don't know 
which one is the "correct" one.


1. You write an input filter. (Check the appropriate documentation how 
to write input filters.) You either add it with ap_add_input_filter or 
you use the SetInputFilter directive of the core module. Then the input 
filter executes ap_get_brigade. The "brigade" is a linked list of 
"buckets", i.e. some data holders. You loop through the linked list of 
buckets until you get an EOS bucket (EOS = end-of-stream). Note that the 
brigade that you got may not contain the entire POST data. In this case 
the brigade does not have an EOS bucket and the filter will be called 
several times by apache. In any case the filter might be called several 
times, so write your code accordingly. (I think this is explained in the 
filter tutorial. See the apr_buckets.h header also.)


2. You write a loop in which you call ap_get_client_block. I am not sure 
about the stopping criterion of the loop. You could count the size of 
the data that you read and compare it with Content-Length but this does 
not work in the rare cases in which the POST data is compressed. I don't 
know much about this method, but you can lookup documentation on 
ap_get_client_block.


I hope this helps.

Sorin




Re: Read file file before web server startup

2012-11-12 Thread Sorin Manolache

On 2012-11-13 00:41, Idel Fuschini wrote:

Hi,
I need to read a configuration file (XML) and put dati in hash array before
the webserver starts.
It's possibile, if yes how ?


You define a module configuration object using the create_server_config 
callback of your module structure.


You define a configuration directive that specifies the name of the XML 
file (see the command_rec array of the module structure). In the 
command_rec array entry that defines your configuration directive, you 
associate the configuration directive with a callback function. The 
callback function is called when apache encounters the directive.


The argument of the configuration directive (in your case the XML file 
name) is passed by apache as an argument to the callback function.


You get the configuration object in the configuration directive callback 
using the ap_get_module_config function. You read the XML file in the 
callback function and initialise the configuration object.



Sorin



Re: Forcing Apache to exit on startup

2012-10-22 Thread Sorin Manolache

On 2012-10-22 21:29, Joshua Marantz wrote:

Hi,

Our module has multiple confirmation parameters.  There is a case where if
you have option A and option B, then you must also specify option C,
otherwise Bad things can happen that are a lot easier to debug on startup
than they are after the server is running.

I know how to force 'apachectl' to exit with a nice error message if I
don't like the value of option A or option B.  I can do that in the
function-pointer I provide in my option-table by returning a non-null char*
message.

But in this case I want to exit nicely from the function I have registered
with ap_hook_child_init, having noticed that option A and B are set but not
C.  Is that possible?

By "nicely", I mean that the user types:

% sudo /etc/init.d/apachectl restart
Error in pagespeed.conf: if you have option A and B specified, you must
specify a value for option C.

At that point either the server would not be running, or it would still be
in whatever state it was previously in.

Is this possible?

Currently our solution is to log an error and call abort(), and it's not
very nice!


Don't do it in child_init. I don't think the child can stop the server.

I think apachectl spawns apache with an appropriate -k argument (e.g. 
"apache2 -k stop") to start/stop/reload apache. However, I don't think 
that the child has sufficient privileges to stop the parent by exec'ing 
"apache2 -k stop". Nor can it send the apropriate signal to the parent.


But you can exit nicely in post_config, which is run (as root) after the 
conf is read but before the children are spawned. It suffices to return 
a non-ok code for apache to exit. I _think_, I'm not sure, that what you 
log in post_config does not go on the console but goes to the server 
log. But maybe there's a way to put the error message on the console too.


Sorin



Re: "Close" HTTP connection callback/hook

2012-10-16 Thread Sorin Manolache

On 2012-10-16 15:50, Evgeny Shvidky wrote:

Hi,

I am implementing a new module on C.
I need to perform some functionality when a user closes a HTTP connection 
before he received any response for his request.
How can I know when a HTTP user request state has been changed/closed?
Is there any callback/hook for this functionality I can register?


As far as I know, there is no callback for this functionality.

You can check r->connection->aborted or the APR_ECONNABORTED return code 
of ap_pass_brigade when you attempt to write to the client.



Sorin



Re: apche rewrite and reverse proxy help

2012-10-14 Thread Sorin Manolache

On 2012-10-14 01:12, Abdel wrote:

I have back end website running on Tomcat with the following url
http://local.domain.com/app. External user access the website through apache
proxy with the following url http://www.domain.com/user1 (user1, user2,
etc... It’s uri specific to each user). I want to use apache  rewrite or/
and reverse proxy  directive to translate the url like
http://www.domain.com/user1 into http://local.domain.com/app?user=user1
Please can someone help me please?


Try

RewriteEngine On
RewriteRule ^/(user[^/]*)$ http://local.domain.com/app?user=$1 [P]

http://local.domain.com/app>
ProxyPass http://local.domain.com/app keepalive=On






Re: Empty module configuration in one hook stage

2012-10-02 Thread Sorin Manolache

On 2012-10-02 12:02, Christoph Gröver wrote:


Hello list,

I have a strange problem.

In a function that is called by setting it up in the hook phase
"post_read_request" the module configuration is just empty (only
defaults set).

In a function that is hooked by the "access_checker" the module
configuration is fully set up. The code to get the config is the same on
both occassions, of course.

I don't know whether this has always been the case or is a recent
development.

Without getting into too much detail, I have seen example modules use
the phase "post_read_request" so this shoud be working, right?
Any idea why this is happening?

Perhaps wrong definition of the config creation?



The per-directory configuration has not yet been merged into the default 
server configuration in the post_read_request hook.


Only callbacks that are executed after the walk_config hook have the 
merged per-directory configuration.



You should be able, though, to have the merged per-server configuration 
in post_read_request.


Sorin



Re: Modules Communicating

2012-08-22 Thread Sorin Manolache

On 2012-08-22 09:31, Adi Selitzky wrote:

Hi!
I am writing my own module that handles all incoming requests. In some cases, I 
want this request to be handled by mod_python which I installed. In these 
cases, my module should change the requested file extension to .py, which is 
configured to be handled by mod_python.

AddHandler mod_python .py

I have tow questions:
1. How can I set the modules order, so my module will handle the request first, 
change its url, and then mod_python will handle it?


const char *succ[] = {"mod_python.c", NULL};
ap_hook_handler(&your_handler, NULL, succ, APR_HOOK_FIRST);


2. Which field in the request_rec I should change so it will take effect? I 
tried to change the URL key in subprocess_env table, but the request was not 
handled by mod_python.


"AddHandler mod_python .py" simply sets r->handler to "mod_python" if 
r->filename contains .py. So I guess that the python processing is 
triggered by r->handler being "mod_python".


So you can try setting r->handler = "mod_python" and then return 
DECLINED from your handler and forget about appending .py.



Sorin


Re: forward request with proxy_http in custom module

2012-08-17 Thread Sorin Manolache

On 2012-08-17 16:25, nik600 wrote:

Dear all

i'm trying to code a custom module that will implement some logic this
is the concept of the module:

*

/*
* some stuff...
*/
if(condition){

/*return a custom result*/

return OK;

}else{
/*forward the request to another server*/

r->filename = "proxy:http://www.google.it/";;
r->proxyreq = PROXYREQ_PROXY;
r->handler  = "proxy-server";  

return OK;
}
*

But it seems that when i go into the else condition the proxy request
isn't handled.

proxy and proxy_http are enabled and correctly working.

Is this code correct to forward a request and make my module working
as a proxy_http ?



Try

if (condition) {
 ...
 return OK;
} else {
 return DECLINED;
}

and make sure your handler runs before mod_proxy's:

static const char *succ[] = {"mod_proxy.c", NULL};
ap_hook_handler(&your_handler, NULL, succ, APR_HOOK_MIDDLE);

Then put a ProxyPass in your conf:


   ProxyPass http://www.google.it/ keepalive=On


Also make sure you do not check on r->handler. Even if you set 
"SetHandler your_handler", ProxyPass will overwrite it with "proxy-server".



Sorin



Thanks in advance





Re: How to access string in a module that is set in configuration directive?

2012-06-30 Thread Sorin Manolache

On 2012-06-30 22:33, oh...@cox.net wrote:

Hi,

I got my 1st module working and now I need to add support for a configuration directive 
that sets a string that my module needs in various places, i.e., if my directive is 
"SetMyVar", and httpd.conf has:

SetMyVar"foo123"

then, in my module, I need to able to access the string, "foo123", that got set via the 
"SetMyVar" directive.  This directive would only be used at the server level.

I think that I know how to do the code to set the variable into my module, but 
the question that I have is how do I *access* it after it is set?

Here's the code I have to handle the directive:

/*
  * Stuff for Directive handling
  */

// Struct for Directives
typedef struct txt_cfg {
 const char * MyVar;
} txt_cfg;


// Function to get GxInterceptorURL
static const char * txt_set_MyVar(cmd_parms* cmd, void* cfg, const char* val) {
 printf("In set_MyVar: Setting MyVar to [%s]\n", val);
 ((txt_cfg*) cfg)->MyVar = val;
} // end txt_set_MyVar()


//
static const command_rec txt_cmds[] = {
 AP_INIT_TAKE1("SetMyVar", txt_set_MyVar, NULL, OR_ALL,
 "This is MyVar"),
 { NULL }
};
.
.
.
module AP_MODULE_DECLARE_DATA my_module =
{
 STANDARD20_MODULE_STUFF,
 NULL,   /* dir config creater */
 NULL,   /* dir merger --- default is to override */
 NULL,   /* server config */
 NULL,   /* merge server configs */
 txt_cmds,   /* command apr_table_t */
 register_hooks  /* register hooks */
};


Can anyone tell me how, in my module code, I can access that "MyVar" string?



You don't get a valid configuration object (txt_cfg) in the cfg argument 
of the txt_set_MyVar unless you create that configuration object. As 
your code looks now, cfg in txt_set_MyVar should be null.


You'll have to set a callback function for creating the configuration 
object. The configuration object configuration functions are 
dir_config_creater (first after STANDARD20_MODULE_STUFF in my_module) 
and server_config (third after  STANDARD20_MODULE_STUFF).


The dir_config creator will put your configuration object in 
r->per_dir_config and you get it with


txt_cfg *cfg = (txt_cfg *)ap_get_module_config(r->per_dir_config, 
&my_module);


The server_config_creator will put your configuration object in 
r->server->module_config and you get it with


txt_cfg *cfg = (txt_cfg *)ap_get_module_config(r->server->module_config, 
&my_module);



Have a look in include/http_config.h at all those OR_* macros and 
especially at RSRC_CONF and ACCESS_CONF.


RSRC_CONF is used mainly to define server-wide directives. ACCESS_CONF 
is used to define directory-wide configuration directives.


In your cfg argument of the txt_set_MyVar configuration directive 
handler, you'll get the _directory-wide_ configuration object (if you've 
created one by using the dir_config_creator in module my_module). If you 
want to set your variable just for the directory, you can keep 
txt_set_MyVar as it is now. Later in the module, you can retrieve the 
value as I showed above, i.e. from r->per_dir_config.


If you want to set it server-wide, then don't use cfg. Write

txt_cfg *srv_cfg = (txt_cfg 
*)ap_get_module_config(cmd->server->module_config, &my_module).


Later in the module, you can retrieve the value as I showed above, i.e. 
from r->server->module_config.


Remember that in any case, you have to create the respective 
configuration object (via the callbacks in my_module).


S



Thanks,
Jim






Re: ssl_var_lookup snippet was Re: Confused about modules processing order...

2012-06-26 Thread Sorin Manolache

On 2012-06-26 22:17, oh...@cox.net wrote:


 Sorin Manolache  wrote:

On 2012-06-26 19:56, oh...@cox.net wrote:

You cannot wait until mod_ssl runs its fixups, you have to hook one of
the hooks that execute earlier than webgate's check_user_id or
auth_checker. (You have to hook one of the hooks (1)-(4).) There, in
your hook, you have to get yourself the values of the server
certificates, client certificate, etc, everything that mod_ssl would
have given you, but too late.

"

I guess that what I'm seeing is exactly what you said would happen, i.e., my 
check_user_id hook function is being called, but none of the SSL vars are 
populated (since, as you said mod_ssl doesn't populate them until the fixup 
phase).

What mechanisms/methods could I use to get those SSL vars ("you have to get yourself 
the values of the server certificates, client certificate, etc, ") at this point?


I don't know, unfortunately. Have a look at the sources
(modules/ssl/ssl_engine_kernel.c, ssl_hook_Fixup) to see how mod_ssl
does it.

Apparently mod_ssl uses ssl_var_lookup defined in ssl_engine_vars.c.
Maybe you can use it in check_user_id already.

Sorin



Sorin,

THANKS for that pointer to ssl_var_lookup.

As a very small payback (VERY small) for your help (and others), and for the record, I 
put the following code (assembled from various places) in the ap_headers_early, and it 
seems to work "somewhat")


static apr_status_t ap_headers_early(request_rec *r)
{

printf("In ap_headers_early\n");

printf("\n\nIn ap_headers_early: About to call ssl_var_lookup\n");

typedef char* (*ssl_var_lookup_t)(apr_pool_t*, server_rec*, conn_rec*, 
request_rec*, char*);

ssl_var_lookup_t ssl_var_lookup = 0;

ssl_var_lookup = (ssl_var_lookup_t)apr_dynamic_fn_retrieve("ssl_var_lookup");

const char * foo = ssl_var_lookup(r->pool, r->server, r->connection, r, 
"SSL_CLIENT_CERT");

printf("In ap_headers_early: SSL_CLIENT_CERT=[%s]\n", foo);
.
.

and it seems to work perfectly!!


Do you think that such calls would work in ANY hook?  In other words, would I 
be at my leisure to use that in ANY of the module hooks?


No, it won't work in any hook, in my opinion. The availability of the 
data depends on the phase (hook) in which you run the ssl_var_lookup.


I think, though I'm not sure, that the data are gathered in the 
post_read_request hook. If so, ssl_var_lookup would work in any hook 
that is called after post_read_request.


ap_headers_early is run in post_read_request. My intuition is that 
putting your code there is slightly too early. This is because the 
directory-wide configuration of the request is not yet correctly set in 
this phase and URL rewrite rules have not yet been applied, although I 
don't know if this would affect your functionality.


I'd put the code either in header_parser or in check_user_id and I'd try 
to make sure that my check_user_id is run before webgate's check_user_id.


I'd go for header_parser as it is always run for main requests. 
check_user_id is run only when some conditions are satisfied (check the 
ap_process_request_internal in server/request.c).


If you go for check_user_id, make sure that it is run before Oracle's 
check_user_id. In order to do so, you can use APR_HOOK_FIRST 
(ap_hook_check_user_id(&my_check_user_id, NULL, NULL, APR_HOOK_FIRST)), 
or you can use something like


static const char *successor[] = {nameoftheoraclesourcefile, NULL};
ap_hook_check_user_id(&my_check_user_id, NULL, successor, APR_HOOK_MIDDLE);

(See how mod_ssl places its post_read_request _after_ mod_setenvif's in 
modules/ssl/mod_ssl.c)


Also, I would not change mod_headers, I would write my own module in 
which I'd place my header_parser hook.


Sorin


Re: Confused about modules processing order...

2012-06-26 Thread Sorin Manolache

On 2012-06-26 19:56, oh...@cox.net wrote:

You cannot wait until mod_ssl runs its fixups, you have to hook one of
the hooks that execute earlier than webgate's check_user_id or
auth_checker. (You have to hook one of the hooks (1)-(4).) There, in
your hook, you have to get yourself the values of the server
certificates, client certificate, etc, everything that mod_ssl would
have given you, but too late.

"

I guess that what I'm seeing is exactly what you said would happen, i.e., my 
check_user_id hook function is being called, but none of the SSL vars are 
populated (since, as you said mod_ssl doesn't populate them until the fixup 
phase).

What mechanisms/methods could I use to get those SSL vars ("you have to get yourself 
the values of the server certificates, client certificate, etc, ") at this point?


I don't know, unfortunately. Have a look at the sources 
(modules/ssl/ssl_engine_kernel.c, ssl_hook_Fixup) to see how mod_ssl 
does it.


Apparently mod_ssl uses ssl_var_lookup defined in ssl_engine_vars.c. 
Maybe you can use it in check_user_id already.


Sorin


Re: Confused about modules processing order...

2012-06-26 Thread Sorin Manolache

On 2012-06-26 13:55, oh...@cox.net wrote:





And for webgate, I see:

Registering hooks for apache2entry_web_gate.cpp
 Hooked post_config
 Hooked handler
 Hooked check_user_id
 Hooked auth_checker




The original mod_headers code has a hook for fixups.  If I added an "after" 
string in the code that registers my fixup function, with the name of the webgate, would 
that cause my modified mod_headers to run before the webgate?


As you see in the debug messages obtained with SHOW_HOOKS=1, the webgate 
does not place any callback on the fixups hook.


The relative order of the callbacks in question is:

1) post_read_request
2) other callbacks (e.g. translate_name, header_parser)
3) access_checker
4) check_user_id
5) auth_checker
6) fixups
7) insert_filter
8) handler

mod_ssl hooks (1), (3-6), and (8) but it initialises the environment 
only in the fixups hook (6).


webgate hooks (4), (5), and (8). So putting your code in (6) is already 
too late if it is webgate's (4) or (5) that you want to precede.


There's no way in which your fixups callback can run earlier than 
webgate's check_user_id or auth_checker simply because the latter are 
run by apache earlier than fixups.



Also can you clarify/expand on what you mean by " you'll have to get those variables 
yourself"?  I think that I'm currently getting them using env->setproc or something 
like that.


What I mean is:

*) apparently you need the variables before webgate's check_user_id or 
auth_checker.
*) but mod_ssl initialises them in fixups, i.e. _after_ check_user_id 
and auth_checker


You cannot wait until mod_ssl runs its fixups, you have to hook one of 
the hooks that execute earlier than webgate's check_user_id or 
auth_checker. (You have to hook one of the hooks (1)-(4).) There, in 
your hook, you have to get yourself the values of the server 
certificates, client certificate, etc, everything that mod_ssl would 
have given you, but too late.



Please note that what I say holds under the condition that it is 
webgate's check_user_id and auth_checker that you want to precede. If it 
is webgate's handler, then your code already runs before webgate's handler.



Sorin

P.S. For the order of hooks, check
modules/http/http_core.c, ap_process_http_connection
server/protocol.c, ap_read_request
server/request.c, ap_process_request_internal


Re: Confused about modules processing order...

2012-06-26 Thread Sorin Manolache

On 2012-06-26 13:14, oh...@cox.net wrote:


 Sorin Manolache  wrote:

On 2012-06-26 03:49, oh...@cox.net wrote:


Hi,

I have my small prototype module, which I implemented starting with the 
mod_headers.c source, working.  The changes that I did to the original source 
were to add some code in the insert_filter hook to inject an additional header 
into the request.

That seems to work ok with a "vanilla" Apache configuration.

I want to be able to make my modified module work together with another module, provided 
by Oracle (the Oracle Access Manager webgate, aka "webgate").

However, after I add the directives into the Apache httpd.conf to enable the 
webgate, it appears that, on incoming requests, the webgate processing occurs, 
but my code in the modified mod_headers module is not called at all :(!!


Here's the last part of my modified mod_headers.c:

static void register_hooks(apr_pool_t *p)
{
  printf("mod_headers-jl V0.13 - use LIBCURL instead of OAM ASDK-process 
response from callCurl\n");
  printf("In register_hooks\n");
  ap_register_output_filter("FIXUP_HEADERS_OUT", ap_headers_output_filter,
NULL, AP_FTYPE_CONTENT_SET);
  ap_register_output_filter("FIXUP_HEADERS_ERR", ap_headers_error_filter,
NULL, AP_FTYPE_CONTENT_SET);
  ap_hook_pre_config(header_pre_config,NULL,NULL,APR_HOOK_MIDDLE);
  ap_hook_post_config(header_post_config,NULL,NULL,APR_HOOK_MIDDLE);
  ap_hook_insert_filter(ap_headers_insert_output_filter, NULL, NULL, 
APR_HOOK_LAST);
  ap_hook_insert_error_filter(ap_headers_insert_error_filter,
  NULL, NULL, APR_HOOK_LAST);
  ap_hook_fixups(ap_headers_fixup, NULL, NULL, APR_HOOK_LAST);
  ap_hook_post_read_request(ap_headers_early, NULL, NULL, APR_HOOK_FIRST);
}

module AP_MODULE_DECLARE_DATA headers_module =
{
  STANDARD20_MODULE_STUFF,
  create_headers_dir_config,  /* dir config creater */
  merge_headers_config,   /* dir merger --- default is to override */
  NULL,   /* server config */
  NULL,   /* merge server configs */
  headers_cmds,   /* command apr_table_t */
  register_hooks  /* register hooks */
};


The code I added is in the "ap_headers_insert_output_filter()" function.


I did an "export SHOW_HOOKS=1" and ran the Apache, and I see this for the 
modified mod_headers:

Registering hooks for mod_headers.c
mod_headers-jl V0.13 - use LIBCURL instead of OAM ASDK-process response from 
callCurl
In register_hooks
Hooked pre_config
Hooked post_config
Hooked insert_filter
Hooked insert_error_filter
Hooked fixups
Hooked post_read_request


And for webgate, I see:

Registering hooks for apache2entry_web_gate.cpp
Hooked post_config
Hooked handler
Hooked check_user_id
Hooked auth_checker


I thought that the handler functions are called almost last part of the 
processing (content generation), and my code is hooked to insert_filter, which 
I thought occurs earlier than content generation, so shouldn't my code get 
processed BEFORE Apache attempts to process the webgate functions?

How can I get my code to process before the webgate does?



insert_filter is run between the fixups and the handler hooks.

Try to identify who is producing the variables that you need, in which
phase they are available at the earliest. Then identify which part of
web_gate hijacks the processing such that your code is not executed
anymore. I suppose it is one of web_gate's auth_checker or
check_user_id. If it was the web_gate handler then your code would have
run before.

Sorin



Hi Sorin,

I posted a later msg that I've been trying do something along the lines that 
you said:

"I've been doing more testing, and it appears that the insert_filter hook (the
"ap_headers_insert_output_filter()" function) is the only place where I can put
my code where it has access to the variables that it needs to do the processing
that I'm doing.

The problem is that if that other Oracle module is enabled in the Apache, it
runs before my code, and I can't get the insert_filter hook (my function) to get
processed before the Oracle module "


The SSL variables are set in the fixups hook by mod_ssl. The fixups hook 
is run _after_ check_user_id and auth_checker. So you cannot rely on 
mod_ssl to populate the environment with the variables. I guess you'll 
have to get those variables yourself, before Oracle's check_user_id and 
auth_checker hooks.


Sorin


Re: Confused about modules processing order...

2012-06-26 Thread Sorin Manolache

On 2012-06-26 03:49, oh...@cox.net wrote:


Hi,

I have my small prototype module, which I implemented starting with the 
mod_headers.c source, working.  The changes that I did to the original source 
were to add some code in the insert_filter hook to inject an additional header 
into the request.

That seems to work ok with a "vanilla" Apache configuration.

I want to be able to make my modified module work together with another module, provided 
by Oracle (the Oracle Access Manager webgate, aka "webgate").

However, after I add the directives into the Apache httpd.conf to enable the 
webgate, it appears that, on incoming requests, the webgate processing occurs, 
but my code in the modified mod_headers module is not called at all :(!!


Here's the last part of my modified mod_headers.c:

static void register_hooks(apr_pool_t *p)
{
 printf("mod_headers-jl V0.13 - use LIBCURL instead of OAM ASDK-process response 
from callCurl\n");
 printf("In register_hooks\n");
 ap_register_output_filter("FIXUP_HEADERS_OUT", ap_headers_output_filter,
   NULL, AP_FTYPE_CONTENT_SET);
 ap_register_output_filter("FIXUP_HEADERS_ERR", ap_headers_error_filter,
   NULL, AP_FTYPE_CONTENT_SET);
 ap_hook_pre_config(header_pre_config,NULL,NULL,APR_HOOK_MIDDLE);
 ap_hook_post_config(header_post_config,NULL,NULL,APR_HOOK_MIDDLE);
 ap_hook_insert_filter(ap_headers_insert_output_filter, NULL, NULL, 
APR_HOOK_LAST);
 ap_hook_insert_error_filter(ap_headers_insert_error_filter,
 NULL, NULL, APR_HOOK_LAST);
 ap_hook_fixups(ap_headers_fixup, NULL, NULL, APR_HOOK_LAST);
 ap_hook_post_read_request(ap_headers_early, NULL, NULL, APR_HOOK_FIRST);
}

module AP_MODULE_DECLARE_DATA headers_module =
{
 STANDARD20_MODULE_STUFF,
 create_headers_dir_config,  /* dir config creater */
 merge_headers_config,   /* dir merger --- default is to override */
 NULL,   /* server config */
 NULL,   /* merge server configs */
 headers_cmds,   /* command apr_table_t */
 register_hooks  /* register hooks */
};


The code I added is in the "ap_headers_insert_output_filter()" function.


I did an "export SHOW_HOOKS=1" and ran the Apache, and I see this for the 
modified mod_headers:

Registering hooks for mod_headers.c
mod_headers-jl V0.13 - use LIBCURL instead of OAM ASDK-process response from 
callCurl
In register_hooks
   Hooked pre_config
   Hooked post_config
   Hooked insert_filter
   Hooked insert_error_filter
   Hooked fixups
   Hooked post_read_request


And for webgate, I see:

Registering hooks for apache2entry_web_gate.cpp
   Hooked post_config
   Hooked handler
   Hooked check_user_id
   Hooked auth_checker


I thought that the handler functions are called almost last part of the 
processing (content generation), and my code is hooked to insert_filter, which 
I thought occurs earlier than content generation, so shouldn't my code get 
processed BEFORE Apache attempts to process the webgate functions?

How can I get my code to process before the webgate does?



insert_filter is run between the fixups and the handler hooks.

Try to identify who is producing the variables that you need, in which 
phase they are available at the earliest. Then identify which part of 
web_gate hijacks the processing such that your code is not executed 
anymore. I suppose it is one of web_gate's auth_checker or 
check_user_id. If it was the web_gate handler then your code would have 
run before.


Sorin


Re: Setting response timeout

2012-06-25 Thread Sorin Manolache

On 2012-06-25 14:42, Rajalakshmi Iyer wrote:

Hello,

I am working on an Apache module where the response times for each request
are very stringent. I want to be able to return a custom error deck in case
I am not able to process the request within the threshold instead of a
timeout response. I have added a check in the request handler code to check
if a threshold time has elapsed and return a custom error in that case.
However it appears that this code is not invoked at all for most requests.



From what I see in Apache code, the request / response is actually

synchronous. So the next request is not read in until the response for the
first one is sent out.


Yes, it is synchronous _at thread level_, i.e. each thread must finish 
processing its request before it is able to process a new one. But the 
server has several threads.




So, if a previous request takes a longer time, the response times for
subsequent requests is affected.


No, subsequent requests are taken up by other threads/processes of the 
server.




Any guidelines on how to handle this would be greatly appreciated.


It depends on what is eating up the response time and on the granularity 
of the timeout.


*) If it's processing time on the server, you'll need a timer (see for 
example boost::asio::deadline_timer).


*) If it's a the response of a backend server you are waiting on then 
the timeout mechanism depends on the way in which the request is made to 
the backend. If you're using mod_proxy, then you could use the timeout 
directive of ProxyPass if its coarse granularity suits your 
requirements. I suppose other http client libs have their own way to 
specify timeouts.


Sorin


Re: Anyone have some example code doing simple HTTP GET request from within a module?

2012-06-23 Thread Sorin Manolache

On 2012-06-23 04:47, oh...@cox.net wrote:

Hi,

Per earlier threads on this list, I've been working on an Apache module.  For 
the time being, I'm kind of stuck because of the problems that I've run into 
with trying to integrate my module with a 3rd party library, so just for my 
module, which is mainly a proof-of-concept, I'd like to have my module do an 
HTTP GET request.

So, I was wondering if anyone has some simple example code for doing that from 
within a module, maybe using libcurl, or just natively using sockets?

I'm trying to do this myself, and I've been looking at using libcurl, but most of the 
examples that I've seen use the "easy" setup, so if someone has something like 
that that can be shared, it'd be a big help.  Conversely, if I figure it out, I'll post 
some working snippets here :)...

I'll say the same thing as Ben, try with apache, either mod_proxy or 
ap_run_sub_request. That if you make one outgoing request per incoming 
request. If you want several outgoing requests, in parallel preferably, 
per incoming request, then go with some 3rd-party library.


I have some in-house C++ wrappers for libcurl (curl_multi_* + timeouts + 
client pools), but they are not straightforward to use, a lot of setup 
is involved, and they are not thoroughly tested.


S


Re: Followup to earlier thread about "How to compiling/link/use Apache module that uses shared library?"

2012-06-22 Thread Sorin Manolache

On 2012-06-22 21:22, oh...@cox.net wrote:


Does that confirm that they statically linked stuff from openssl (and 
libcrypto) into libobaccess.so?


I think so.

Also you can run nm -aC liboaccess.so. The symbols marked with "U" are 
undefined => they are external to the lib. The functions marked with "T" 
or "t" are defined by the lib => their code is in the binary.


Functions marked with upper-case ("T") are exported, i.e. another module 
may use the function. Functions marked with lower-case ("t") are not 
exported. Those functions can be run by other functions in the same 
module only. If the libcrypto functions in liboaccess were not exported 
(marked with lower-case letter) you wouldn't have a problem: the 
functions in liboaccess would execute the libcrypto functions in 
liboaccess and the functions in mod_ssl would execute the libcrypto 
functions in your system's libcrypto.


But I suppose that's not the case.


Assuming that's the case, is there any way around this?


The easiest way would be to have a liboaccess _dynamically_ linked with 
libcrypto. In this case, the first module between mod_ssl and your 
module that loads would load libcrypto. When the second module loads, 
the loader tries to resolve the undefined symbols of the second module 
and it will find them in the already loaded libcrypto.


If you cannot obtain a liboaccess dynamically linked with libcrypto, you 
_could_ try to recompile mod_ssl such that it does not export any 
libcrypto functions, but I don't know if it is possible.


S


Re: Followup to earlier thread about "How to compiling/link/use Apache module that uses shared library?"

2012-06-22 Thread Sorin Manolache

On 2012-06-22 17:35, oh...@cox.net wrote:



Sorry.  I meant to say:

"So, my code calls ObConfig_initialize() then it appears that that calls
ObConfig::initialize() which is presumably a C++ function. "



We develop our apache modules in C++ on a regular basis and they 
interact with other modules written in plain C and there's no problem.


What I think happens in your case is:

I suspect that the Oracle lib was _statically_ linked with libcrypto. So 
the code of some version of libcrypto is in the libobaccess binary. Then 
mod_ssl is _dynamically_ linked with libcrypto. I suspect that the two 
libcryptos have different versions and they are possibly incompatible => 
segfaults at all kind of mallocs/frees. I think it has nothing to do 
with new/delete vs malloc/free.


S


Re: How to compiling/link/use Apache module that uses shared library?

2012-06-21 Thread Sorin Manolache
And I forgot to say: run gdb in some sort of environment where you see 
your current source code line and a couple of surrounding lines. You 
could achieve this with the "list" command, but I prefer running gdb in 
emacs and let emacs do the nice listing of source code in a different panel.


S


Re: How to compiling/link/use Apache module that uses shared library?

2012-06-21 Thread Sorin Manolache

On 2012-06-21 22:22, oh...@cox.net wrote:


[root@apachemodule bin]# gdb httpd
GNU gdb Red Hat Linux (6.3.0.0-1.162.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library 
"/lib64/tls/libthread_db.so.1".

(gdb) b header_post_config
Function "header_post_config" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (header_post_config) pending.
(gdb) run -X
Starting program: /apps/httpd/bin/httpd -X
[Thread debugging using libthread_db enabled]
[New Thread 182897612000 (LWP 8741)]
Breakpoint 2 at 0x2a97a69060: file mod_headers.c, line 1026.
Pending breakpoint "header_post_config" resolved
mod_headers-jl V0.09 - start calling OAM API
In register_hooks
In create_headers_dir_config
In create_headers_dir_config
In header_cmd
In header_inout_cmd
In parse_format_tag
In parse_misc_string
In create_headers_dir_config
In header_cmd
In header_inout_cmd
In parse_format_tag
In parse_misc_string
[Switching to Thread 182897612000 (LWP 8741)]

Breakpoint 2, header_post_config (pconf=0x573138, plog=0x5a52c8, 
ptemp=0x5a72d8, s=0x59d3a8) at mod_headers.c:1026
1026printf("In header_post_config\n");
(gdb) s
1025{
(gdb) s
1026printf("In header_post_config\n");
(gdb) n
In header_post_config
1027header_ssl_lookup = APR_RETRIEVE_OPTIONAL_FN(ssl_var_lookup);
(gdb) n
1029}
(gdb) n
0x004360c7 in ap_run_post_config (pconf=0x573138, plog=0x5a52c8, 
ptemp=0x5a72d8, s=0x59d3a8) at config.c:91
91  AP_IMPLEMENT_HOOK_RUN_ALL(int, post_config,
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x003518d6c1e1 in BN_num_bits () from /lib64/libcrypto.so.4
(gdb)


I have no idea what the above means :(  But it looks like something blew up 
in libcrypto?

Jim



Navigate the help of gdb (help, help data, help running).

Check the "bt" (backtrace), "up", and "down" commands. From your 
segfault point in libcrypto you can go "up" the stack until you reach in 
one of your functions.


When you reached your code after several "ups", do a "print" of the 
variables (especially pointers that you suspect they are null but 
shouldn't be null) that could have triggered the error.


S


Re: How to compiling/link/use Apache module that uses shared library?

2012-06-21 Thread Sorin Manolache

On 2012-06-21 22:04, oh...@cox.net wrote:


 Ben Noordhuis  wrote:

On Thu, Jun 21, 2012 at 8:43 PM,   wrote:

I tried that, which allowed me to start Apache, but am getting a segfault.


Run it through gdb and inspect the backtrace. Compiling with debug
symbols and optimizations disabled (-g -O0) will help.



Sorin,

The apxs already has "-g" and "-O2" looks like.  How do I change that to "-O0"? 
 Or, do I just run the gcc compile manually?


Try adding

-Wc,-O0 -Wc,-g -Wc,-fno-inline -Wl,-g

to your apxs command line.


Also how do I "run it through gdb", since apachectl is a script?


You don't run apachectl, you run the apache binary.

gdb

Then in gdb type:
file /path/to/your/httpd_or_apache2
set args -d /path/to/your/server_root_dir -f /path/to/your/conf/file -X

If apachectl sets some environment variables first, which are then used 
in the conf file (as in the standard debian installation) then you'll 
have to set them manually in gdb, e.g.


set environment APACHE_RUN_USER www-data
set environment APACHE_RUN_GROUP www-data

Last, type

run

For example, on my debian I'd do

file /usr/sbin/apache2
set args -d /etc/apache2 -f /etc/apache2/apache2.conf -X
And a couple of set environments and then run.

S



Sorry for the questions, but not too familiar with this stuff :(...

Jim






Re: How to compiling/link/use Apache module that uses shared library?

2012-06-21 Thread Sorin Manolache

On 2012-06-21 19:47, oh...@cox.net wrote:


I've tried using "-l" pointing directly to the .so, libobaccess.so, but when I 
do that, it says it can't find the .so:

[root@apachemodule build-mod_headers]# ./compile-mod-headers.sh
/apps/httpd/build/libtool --silent --mode=compile gcc -prefer-pic   -DLINUX=2 
-D_REENTRANT -D_GNU_SOURCE -g -O2 -pthread -I/apps/httpd/include  
-I/apps/httpd/include   -I/apps/httpd/include   -c -o mod_headers.lo 
mod_headers.c && touch mod_headers.slo
/apps/httpd/build/libtool --silent --mode=link gcc -o mod_headers.la  
-l/apps/netpoint/AccessServerSDK/oblix/lib/libobaccess.so -rpath 
/apps/httpd/modules -module -avoid-versionmod_headers.lo
/usr/bin/ld: cannot find 
-l/apps/netpoint/AccessServerSDK/oblix/lib/libobaccess.so
collect2: ld returned 1 exit status
apxs:Error: Command failed with rc=65536


Try -lobaccess

S


Re: How to *add* a cookie in module?

2012-06-19 Thread Sorin Manolache

On 2012-06-19 07:26, oh...@cox.net wrote:

Hi,

I spoke too soon :(  The apr_table_mergen puts a comma (",") in between each cookie 
name/value pair, rather than a semicolon (";").

So, does anyone know how I can accomplish the merge of the cookie headers, but 
with semicolons in between the name/value pairs?


AFAIK there's no direct way to add a cookie to the Cookie request 
header. If you're sure that you already have a Cookie request header, 
you can use the "RequestHeader edit" directive (RequestHeader edit 
Cookie ".*" "$1; my_cookie"). However, afaik there's no directive 
implementing something like if_present_edit_else_set.


You'll have to do it manually:

const char *cookie = apr_table_get(r->headers_in, "Cookie");
if (cookie == NULL)
   apr_table_set(r->headers_in, "Cookie", my_cookie);
else
   apr_table_setn(r->headers_in, "Cookie", apr_pstrcat(r->pool, cookie, 
"; ", my_cookie, NULL));


--
Sorin




Re: How to access client certificate PEM and incoming request headers in a module?

2012-06-18 Thread Sorin Manolache

On 2012-06-18 08:53, oh...@cox.net wrote:

Hi,

I'll look at ssl_var_lookup a little later, but I'm still messing around with 
mod_headers.c, tweaking it to understand how THAT is working :)...

I added a call to header_request_env_var(r, "REMOTE_URI"), just to see what it 
got (running Apache in single-process mode):

printf("REMOTE_URI=[%s]\n", header_request_env_var(r, "REMOTE_URI") );

Then I pointed a browser to http:///test, where /test was a  
with a RequestHeader (to trigger mod_headers) but I got:

REMOTE_URI=[(null)]

Shouldn't that be showing:

REMOTE_URI=[/test]



How did you set the REMOTE_URI environment variable? I've grepped the 
apache sources and there's no occurrence of REMOTE_URI, so I assume that 
apache does not set it as part of its request processing.


S


Re: Protocol converter module

2012-06-06 Thread Sorin Manolache

On 2012-06-05 20:39, Robert Mitschke wrote:


The Alternative would be to have all blocking on the Same pipe. For each
connection Handler I want to wake up I make a Note in shared mem then write
a Single Byte to the pipe causing all blocking threads to wake up. These
will Check in the shared mem whether they are meant and only if they are
will read a Byte from the pipe.


I was thinking about the second approach too. But its problem is that 
when you write something to the pipe, the operating system chooses 
randomly _one_ process to wake up. Not all of them will wake up. And 
that's the problem.


Sorin


Re: Protocol converter module

2012-06-05 Thread Sorin Manolache

On 2012-06-05 13:47, Robert Mitschke wrote:

Dear Sorin,

again thanks for your valuable response. The architecture is now clearing
up in my head which makes me a happier man after days of reading Apache
code (which in turn is a valuable and interesting exercise by itself).


I see a problem with the "select" approach.

When the handler writes to the pipe, in order to wake up some thread, it 
has no control of which thread wakes up.


So if the wrong thread wakes up, it will check the shared memory, it 
will not find its ID there and it will do nothing. On the other hand, 
the right thread keeps sleeping so it has no chance to see that its ID 
showed up in the shared memory.


So I think it is better if the threads wait on a condition variable and 
they are woken up via a notify_all. However, I have no idea how you can 
combine waiting on the condition variable _and_ on client data on the 
socket.


Ideally a thread should sleep if (1) no activity is detected on the 
socket _and_ (2) it is not told by an external request to push data to 
the client.


I think I could have mislead you along a wrong path. I'm sorry.

Sorin





Doing
it as a protocol handler I guess does work. In the protocol handler I
could
then still call the hook for processing a http request that I am creating
based on my proprietary protocol right?



When you say "process a http request" that you create you mean process its
response, don't you? Yes, you can get the response. For my SMPP module I
did with mod_proxy but I suppose you can do it with any http client lib.



No, what I would like to do is write an adapter from my binary tcp
bidirectional protocol and derive xml based http requests from it so that I
can process them using standard application server infrastructure.
Therefore, whenever I receive a message from my client, I would want to
create a request_rec on my own and send that up for handling. When the
application server wants to send me something to send to the client it can
either put that into a response of a request that I sent, or it can send me
a http request that is not related to a message from the client.

Therefore, when I receive a message from my client in the protocol handler
I would like to still be able to create an (fake) http request and have
that handled as a normal http request. Ideally I would like to maintain the
ability of reverse proxying so that the request does not need to be handled
in my apache instance locally but could by forwarded to  any other server
of my choice.

For the messages that the application server wants to send me outside of
normal http responses I would of course need to create a handler that would
then handle the request by notifying the protocol handler using the pipes
as suggested by you.




  I have also thought about the shared memory approach to communicate

between
the individual children. How would I go about listening on input from
shared memory without doing a polling approach?



You could open a pipe in post_config (i.e. before the parent forks its
worker children). The pipe descriptors are then inherited by all children.
In process_connection you perform a timed select on two descriptors: the
socket from the non-http client and the reading end of the pipe. When you
get a triggering third-party http request, in your http handler you write
something to the shared memory to the writing end of the pipe. This wakes
up one of the non-http handlers which can check if the triggering request
was for the client it handles and then it can proceed with pushing on the
non-http socket.

How do I go about implementing this select. I have searched through the

code but could not find out a way to actually get a handle to the socket.
In the code all that is handled are network buckets. How would I gain
access to the socket handle in process_connection? I would need that handle
to select on it.

It is great that I will not even need to use a separate thread, this way
the architecture is much cleaner now.

Best regards,
Robert





Re: Protocol converter module

2012-06-05 Thread Sorin Manolache

On 2012-06-05 13:47, Robert Mitschke wrote:

How do I go about implementing this select. I have searched through the
code but could not find out a way to actually get a handle to the socket.
In the code all that is handled are network buckets. How would I gain
access to the socket handle in process_connection? I would need that handle
to select on it.



Place a callback on the pre_connection hook. The 2nd argument to 
pre_connection is an apr_socket_t.


int
pre_connection_callback(conn_rec *c, void *sock) {
   apr_os_sock_t os_fd;
   apr_os_sock_get(&os_fd, (apr_socket_t *)sock);
   // store os_fd somehwere (for example in your c->conn_config)
   // os_fd is your socket descriptor
   return OK;
}


Re: Protocol converter module

2012-06-05 Thread Sorin Manolache

On 2012-06-05 10:09, Robert Mitschke wrote:

Dear Sorin,

thank you very much for your thoughts and shared experience on this. Doing
it as a protocol handler I guess does work. In the protocol handler I could
then still call the hook for processing a http request that I am creating
based on my proprietary protocol right?


When you say "process a http request" that you create you mean process 
its response, don't you? Yes, you can get the response. For my SMPP 
module I did with mod_proxy but I suppose you can do it with any http 
client lib.



I have also thought about the shared memory approach to communicate between
the individual children. How would I go about listening on input from
shared memory without doing a polling approach?


You could open a pipe in post_config (i.e. before the parent forks its 
worker children). The pipe descriptors are then inherited by all 
children. In process_connection you perform a timed select on two 
descriptors: the socket from the non-http client and the reading end of 
the pipe. When you get a triggering third-party http request, in your 
http handler you write something to the shared memory to the writing end 
of the pipe. This wakes up one of the non-http handlers which can check 
if the triggering request was for the client it handles and then it can 
proceed with pushing on the non-http socket.



I guess in my usual process connection I would create some kind of a table
of the open connections. Then I would spawn a separate thread in my
initialization that listens to input on the shared memory. I need to
respond to such input even if there is no incoming data on the connection
and the normal connection processing is not taking place. If that input is
there my self created thread determines on which connection to push out the
data and pushes it out by simply calling the corresponding output filter
chain, correct?


No, I don't think the thread is necessary. As I said above, each 
non-http-processing thread performs a select that contains the pipe 
reading end in its descriptor set.


When the select wakes up on socket activity => it handles the 
non-http-data coming from the client.


When the select wakes up on pipe activity => it checks the shared memory 
if it should write something to its client.


When it wakes up on a timeout => it writes something to its client.


Is there an apr construct that I can use to block this thread on instead of
frequent polling? I guess reusing the mpm somehow to give me a multitude of
threads that will serve my shared memory input (or I could use something
like a pipe right?) is out of the question right? Could I wrap a connection
filter around my internal (shared memory or pipe) connection and have the
mpm serve my requests in this way?

Does anyone know of an existing module that solves a similar problem that I
could look at?

Again thank you for your input,
Robert

2012/6/5 Sorin Manolache


On 2012-06-04 18:40, Robert Mitschke wrote:


Hi everybody,

I am attempting to write a module that implements a binary protocol that
is
not http and is not fully request and response based using apache. I have
looked at mode echo and some others and I have Nick Kew's book.

I want my module to convert incoming messages into http requests so that
Apache is going to serve them using normal application server
infrastructure. This is what I, based on the info I have can easily do
using an input and output filter.

What I also need the protocol to be able to do is to send messages to the
client with no incoming data from the client. This may be based on a
timeout or based on a request coming from somewhere else (a tier-2
application server sending me a request on a totally different
connection).

  From what I have read so far, I could not find a hook that allows me to

do


so. The only way that I could figure out how to do that is to modify
http_core.c and in ap_process_http_sync_**connection query for either the
timeout or the separate even to have occured using some shared memory
technique. This however does not feel right to me. I would ideally like to
keep using http_core as it is without touching it.

Is there a way for me to wake up trigger the input filter chain even when
there is no data on the actual connection? I could then create a request
from the context of my persistent connection for a handler that I have
written that triggers the output filter chain to send the correct message.
Or even better is there a way I can trigger the output filter chain? Are
there hooks for this purpose?

I would very much appreciate a hint in the right direction.



Hello,

As your protocol is not http, I think that you should not execute
ap_process_http_*_connection. ap_process_http_*_connection is a callback
placed on the process_connection hook. I would suggest that you place your
own protocol-specific callback on the process_connection hook. In your
callback you get the socket descriptor and you perform &qu

Re: Protocol converter module

2012-06-05 Thread Sorin Manolache

On 2012-06-04 18:40, Robert Mitschke wrote:

Hi everybody,

I am attempting to write a module that implements a binary protocol that is
not http and is not fully request and response based using apache. I have
looked at mode echo and some others and I have Nick Kew's book.

I want my module to convert incoming messages into http requests so that
Apache is going to serve them using normal application server
infrastructure. This is what I, based on the info I have can easily do
using an input and output filter.

What I also need the protocol to be able to do is to send messages to the
client with no incoming data from the client. This may be based on a
timeout or based on a request coming from somewhere else (a tier-2
application server sending me a request on a totally different connection).


From what I have read so far, I could not find a hook that allows me to do

so. The only way that I could figure out how to do that is to modify
http_core.c and in ap_process_http_sync_connection query for either the
timeout or the separate even to have occured using some shared memory
technique. This however does not feel right to me. I would ideally like to
keep using http_core as it is without touching it.

Is there a way for me to wake up trigger the input filter chain even when
there is no data on the actual connection? I could then create a request
from the context of my persistent connection for a handler that I have
written that triggers the output filter chain to send the correct message.
Or even better is there a way I can trigger the output filter chain? Are
there hooks for this purpose?

I would very much appreciate a hint in the right direction.



Hello,

As your protocol is not http, I think that you should not execute 
ap_process_http_*_connection. ap_process_http_*_connection is a callback 
placed on the process_connection hook. I would suggest that you place 
your own protocol-specific callback on the process_connection hook. In 
your callback you get the socket descriptor and you perform "select" 
syscalls with timeout on the descriptor in a loop to get the timeout 
behaviour.


If you want to push data upon an incoming http request from a 
third-party, I think you cannot avoid the shared memory approach. The 
process that handles your http request has somehow to communicate with 
the process in which you handle the non-http connection to your client.


I've written something similar for SMPP and I remember I considered the 
filter implementation alternative but ultimately I did it without filters.


Sorin


Best regards,
Robert





Re: Change Request-Header before mod_rewrite

2012-06-04 Thread Sorin Manolache

On 2012-06-04 22:53, Marc apocalypse17 wrote:

Hi all,

I just developed my first apache module following the tutorial on the apache 
website. The module is responsible for adding one header value to the active 
request which must be checked in a mod_rewrite ReWriteCondition.
The problem is, that this value never reaches the mod_rewrite Rule. The Header 
just behaves the same as the original request. Does anyone know why? What am I 
doing wrong?

My module looks like this:

static int helloworld_handler(request_rec* r){
 if (!r->main) {
 apr_table_setn(r->headers_in, "X-CUSTOM-HEADER", "1");
 }
 return DECLINED;
}

static void register_hooks(apr_pool_t* pool){
 ap_hook_handler(helloworld_handler, NULL, NULL, APR_HOOK_FIRST);
}

module AP_MODULE_DECLARE_DATA helloworld_module = {
 STANDARD20_MODULE_STUFF,
 NULL,
 NULL,
 NULL,
 NULL,
 example_directives,
 register_hooks
};

The .htacces file looks like this:

RewriteEngine on
RewriteCond %{HTTP:X-CUSTOM-HEADER} 1 [NC]
RewriteRule from.html to.html

The Rewrite-Rule is never executes fine. It always show the content of 
from.html.


A server-wide RewriteRule is executed in the translate_name hook.

A directory-wide RewriteRule is executed in the fixups hook.

So you'll have to place a callback on the translate_name or on the 
fixups hook and make sure your callback is executed before mod_rewrite's.


You could also try to set the header using the RequestHeader directive 
of the mod_headers module. Use the option "early". 
http://httpd.apache.org/docs/2.2/mod/mod_headers.html#requestheader


--
Sorin





Thank you in advance,
Marc





Re: Windows service startup configuration - how to find out the "working" process?

2012-04-26 Thread Sorin Manolache

On 2012-04-26 11:57, Waldemar Klein wrote:

Thanks, that already helped a lot.

I had this "post_config hook" thing somewhere in the back of my mind,
but didn't connect it with my problem as a possible solution. While
googling for it I also found child_init hook. I created logs to see
when those are called, and it seems the child_init hook is ONLY called
in the last, the "real working" configuration. I tested this with
direct calling (-X), as service (-k start/stop/restart) and also when
the service is started on boot, and it seems to work the same way
always. So, I'll just transfer all the heavy work to the child_init
hook.

In the process I also found out that I can store data in the in a pool
itself (i.e. not only allocating memory, but in a hash or table linked
to the pool), which I wasn't aware is possible. This was my first plan
to use to store a boolean variable, but the way child_init behaves, I
won't need this, it might come in handy in the future. This is done
with apr_pool_userdata_set and apr_pool_userdata_get.

This is only for Apache on Windows, didn't try it on Linux. If I ever
have to get it running on Linux, I'll cross that bridge when I come to
it*



Again, I don't know how the permissions work in Windows. In Linux, the 
child_init hook is called by the non-privileged apache children. The 
post_config is called by the privileged parent.


Also, post_config is not called by the children. They get the modifs 
made in post_config through the inheritence that comes with process 
spawning because the parent runs post_config _before_ spawning its children.


If you put your heavy lifting in child_init, it will be executed every 
time the parent creates a child. So child creation will be slow. 
Children are often killed and recreated (they may have a finite upper 
bound on the number of requests they process), so child creation just 
happens even if your load is constant and no extra capacity would be needed.


I would put it in post_config. It is run a couple of times, apache 
restart/reloads could be slow, but at least you know that once it 
starts, it runs smoothly. At the end of the day, if you go for 
child_init, you'll run the conf parsing many more times.


Sorin


Re: Windows service startup configuration - how to find out the "working" process?

2012-04-25 Thread Sorin Manolache

On 04/26/12 00:03, Waldemar Klein wrote:

Hello.


Looking at the startup procedure of Apache on Windows (as service), I
noticed that the httpd.conf is read and processed (i.e. procedures
ran) in total 4 times. After startup, only one of them is actually
used later to handle requests.

I have a module which has a rather expansive (reading and processing a
few big files into memory) operation on startup. It is a waste of
resources to do this 4 times, when it is used only once.

What I found out so far:
- the first process (pid A), runs as the user who entered the command
httpd -k (re)start. Processes httpd.conf once. This will be terminated
after startup is complete. It would not use and memory later on, but
still it needs the processing time at startup.
- the second (pid B) (runs as "SYSTEM") seems to be the "control"
process and only starts another child process (pid B), processes
httpd.conf once. Writes its pid to logs/http.pid. Stays as process but
doesn't answer requests.
- the third (pid C), processes the http.conf twice, only the second
one is actually used.

I can think of a few quite ugly hacks to solve this:
- writing the pids to files and only doing the configuration work when
I already see my own pid was in the file, this must be the 2nd run on
the process with pid C (can also clean up then). Problemmatic with
crashes during startup.
- only prepare the config (store filenames) and finish (read and
process files) the first time I actually get to work. Problem is,
first request might take some time, because the "work" must be done.
Could be solved by a dummy request right after startup. (it's still an
ugly hack :) )
- since I can identify the first process (different user), I could
start a counter, and only read my files when the counter reaches 4.
Problem: this doesnt work if apache is started with -X. (Also I
haven't checked yet what the user of the 1st process is if it's
started at boot or via a scheduled task)
I can think of a few more problems that might arise, and I am even
more afraid of the problems that I can not think of right now :)

Now my question is, is there a good method to find out if we (the
configuration procedures) are in the "working" configuration? This
would be the third process and the 2nd processing of the httpd.conf,
which will be actually used when the module gets to work.


Try not to do the heavy lifting during the configuration parsing. Put 
the heavy stuff in the post_config hook. You can combine it with a 
boolean variable whose state you switch after you did the heavy lifting 
once. I think it doesn't matter when you parse the conf. If you do it 
early, the child processes inherit the address space of the parent in 
which you already did you processing. The differences between the 
various parsings are the process permissions and availability of the log 
files.


Please note that I didn't work with apache on Windows, so my advice is 
based on not-so-educated extrapolations.


Sorin


Re: Using apr_hash_t within shared memory

2012-03-21 Thread Sorin Manolache

On 2012-03-21 12:31, Rajalakshmi Iyer wrote:

Hello,

I want to be able to store a hash map (apr_hash_t) in the Apache shared
memory (created using apr_shm_create). This map will be created once and
will be shared by all child processes.

However, when I try to access the hash stored in the shared memory, I get a
segmentation fault (because the hash appears empty).

I have ensured that I use the apr_shm_baseaddr_get to get the starting
address for the map in shared memory correctly.

However that does not seem to help.

Note that I have tried the same code with simple data structures like char*
etc and it works. Do I need to do something more to get a hash into and out
of the shared memory?

Thanks in advance!


I don't think you can put it into shared memory. It uses an apr_pool in 
order to allocate new data when you add elements to the hash. The 
apr_pool will allocate non-shared memory.


If your code is C++ or can be migrated to C++, have a look here: 
http://www.boost.org/doc/libs/1_49_0/doc/html/interprocess/allocators_containers.html#interprocess.allocators_containers.containers_explained


Sorin


Re: Calling another URL from output filter

2012-03-16 Thread Sorin Manolache

On 2012-03-16 02:57, Swaminathan Bhaskar wrote:

Hi Sorin

Can you share your code for this cae so that I can take a look to get an
understanding. Can I call a uri on a different server before the proxy
to the requested server ?




My conf on pqr.mydomain.com looks something like


  RewriteEngine On
  RewriteRule .*  http://xyz.mydomain.com/fetchdata [P]


http://xyz.mydomain.com/fetchdata>
# access control directives (Order, Allow, Deny, etc)
   ProxyPass http://xyz.mydomain.com/fetchdata keepalive=On


http://abc.mydomain.com/signon>
# access control directives (Order, Allow, Deny, etc)
# RequestHeader directives to block/enable/edit the
# request headers to the signon server
   ProxyPass http://abc.mydomain.com/signon keepalive=On


RewriteCond %{IS_SUBREQ} true
RewriteRule /internal_signon http://abc.mydomain.com/signon [P]


So the client accesses http://pqr.mydomain.com/my_url. My module makes a 
subrequest to /internal_signon. The rewrite rule redirects to 
http://abc.mydomain.com/signon. My module consumes the response of 
http://abc.mydomain.com/signon without producing any output. Then my 
handler returns DECLINED. Because it declines, the proxy module takes 
over and makes the request to http://xyz.mydomain.com/fetchdata.


The request to /internal_signon is made in a fixups hook. My fixups hook 
has to execute before mod_rewrite's fixup hook. So I register it as follows:


static const char * const fixups_succ[] = {"mod_rewrite.c", 
"mod_headers.c", NULL};

ap_hook_fixups(&fixups, NULL, fixups_succ, APR_HOOK_MIDDLE);

My fixups callback does something like this:

int fixups(request_rec *r) {
  if (!ap_is_initial_req(r))
 return DECLINED;
  // other conditions to see if we should handle it
  Data data;
  make_request("/internal_signon", r, &data);
  // use the data.
  return OK;
}

void make_request(const char *url, request_rec *r, Data *data) {
  request_rec *newreq = ap_sub_req_lookup_uri(url, r, NULL);
  if (NULL == newreq)
; // err_handler

  ap_set_module_config(newreq->request_config, &mymodule, data);

  ap_filter_t *flt = ap_add_output_filter("subreqflt", 0, newreq, 
newreq->connection);

  if (NULL == flt) {
ap_destroy_sub_req(newreq);
// err_handler;
  }

  int proxy_ret_code = ap_run_sub_req(newreq);
  int ret_code = newreq->status;

  ap_destroy_sub_req(newreq);

  if (ap_is_HTTP_ERROR(proxy_ret_code) || ap_is_HTTP_ERROR(ret_code)) {
// err_handler
  }
}


The filter "subreqflt" gets the data object

Data *data = (Data *)ap_get_module_config(flt->r->request_config, 
&mymodule);


It parses the response, extracts from it the relevant data and puts them 
in the "data" object.


The subrequest response parsing is a "normal" filter. The only 
difference is that it does never "return ap_pass_brigade(flt->next, 
bb)". It always does "return APR_SUCCESS". Thus, downstream filters, 
like the one that sends the response to the client, are not called. The 
filter acts like a sink for the subrequest data.



Sorin



Rgds
Bhaskar

On 03/05/2012 04:26 AM, Sorin Manolache wrote:

On 2012-03-04 19:19, Swaminathan Bhaskar wrote:


Hello,

How can I call another url from an output filter - here is the scenario:
when a client accesses abc.mydomain.com/signon, this url autheticates
the
user and we dont want the response going back to the client rather call
another url xyz.mydomain.com/fetchdata which will return some data
... we
want to send the data from the second url and response headers from
first
url merged back to the client. So my thought is to intercept the
response
from the first url in an output ffilter and make a call from the output
filter to the second url. What function call would allow me to make
th call
to the second url Any help appreciated



Hello,

We did something similar but we didn't issue the 2nd request from the
output filter of the first.

The first request was made by the apache subrequest API (the
ap_sub_req_lookup_uri and ap_run_sub_req functions). The output filter
ran until it encountered an end-of-stream. It did not make any other
request. The output filter of the subrequest did not pass any brigade
to downstream filters. It simply parsed the response, stored relevant
data in some structure and returned APR_SUCCESS to its upstream filters.

Next, after ap_run_sub_req returned, we invoked the 2nd URL via the
proxy module. The useful data returned by the 1st URL was taken from
the structure in which the subrequest stored it.

Calling a 2nd URL from output filters is a bit tricky, as you have
filter-chains invoked from within a filter-chain, so we preferred to
call the 2nd URL only after the request to the first completed.

Regards,
Sorin







Re: problems about mod_fastcgi

2012-03-07 Thread Sorin Manolache

On 03/07/12 04:12, Rui Hu wrote:

hi,

I am writing a module whose function depends on variable r->uri. But r->uri
is modified when I activate mod_fastcgi.

1. I just want to make sure whether mod_fastcgi modified r->uri.
2. How to know if mod_fastcgi is activated in my own module? So I can
handle the request differently depends on this information.



Try using the fields in r->parsed_uri instead of r->uri.

Or place a callback on an early hook (post_read_request, for example) 
and copy r->uri to a structure that is specific to your module so that 
it is guaranteed that no other module changes it.


Please be aware that r->uri can be changed by rewrite rules too.

Sorin


Re: Calling another URL from output filter

2012-03-05 Thread Sorin Manolache

On 2012-03-04 19:19, Swaminathan Bhaskar wrote:


Hello,

How can I call another url from an output filter - here is the scenario:
when a client accesses abc.mydomain.com/signon, this url autheticates the
user and we dont want the response going back to the client rather call
another url xyz.mydomain.com/fetchdata which will return some data ... we
want to send the data from the second url and response headers from first
url merged back to the client. So my thought is to intercept the response
from the first url in an output ffilter and make a call from the output
filter to the second url. What function call would allow me to make th call
to the second url Any help appreciated



Hello,

We did something similar but we didn't issue the 2nd request from the 
output filter of the first.


The first request was made by the apache subrequest API (the 
ap_sub_req_lookup_uri and ap_run_sub_req functions). The output filter 
ran until it encountered an end-of-stream. It did not make any other 
request. The output filter of the subrequest did not pass any brigade to 
downstream filters. It simply parsed the response, stored relevant data 
in some structure and returned APR_SUCCESS to its upstream filters.


Next, after ap_run_sub_req returned, we invoked the 2nd URL via the 
proxy module. The useful data returned by the 1st URL was taken from the 
structure in which the subrequest stored it.


Calling a 2nd URL from output filters is a bit tricky, as you have 
filter-chains invoked from within a filter-chain, so we preferred to 
call the 2nd URL only after the request to the first completed.


Regards,
Sorin



Re: one problem about MPM worker

2012-03-02 Thread Sorin Manolache

On 2012-03-02 16:44, Rui Hu wrote:

hi,

My apache is running in worker MPM mode. And I wrote a module which uses
ap_get_module_conf and ap_set_module_conf to get&  set core_module's data
once a request comes. So I am worried about following three potential
problems.

1. Does every request have a copy of core_module's configuration data?


No. They have pointers to configuration objects.


2. Is there any built-in synchronization mechanism existing in these two
functions?


No.


3. Should I add locking mechanism myself?


If I answer strictly to the question, I would say No. ap_set_module_conf 
does something like array_of_pointers_to_conf_data_objects[module_index] 
= address_of_conf_object.


So it is an assignment in an array. Normally this is done atomically.

If I digress, I would say that I am not sure your approach is safe.

First, configurations more often are read-only. Apache loads them at 
startup and they do not change during the life-time of the server.


If you want to use request-specific data, you have other means, such as 
r->request_config, r->notes, r->subprocess_env.


Second, it is good practice that modules create and use their own 
configuration objects and do not change de configuration objects of 
other modules.


Sorin


Re: thread ID

2012-03-01 Thread Sorin Manolache

On 03/02/12 00:21, Ben Noordhuis wrote:

On Thu, Mar 1, 2012 at 17:29,  wrote:

Hello,

I would need a memory buffer associated per worker thread (in the worker
MPM) or to each process (in the prefork MPM).

In order to do that, I would need a map thread<->buffer. So, I would
need a sort of thread ID/key/handle that stays the same during the
lifetime of the thread and no two threads in the same process can have
the same ID/key/handle.

What is the most portable way to get this thread ID?

I thought of r->connection->id. It works but it is not very portable as
it is not guaranteed that two connections created by the same thread
will have the same id. They do for now.

If r->connection->sbh was not opaque it would be great, because
sbh->thread_num would be exactly what I need.

I could also use pthread_self. It works too but, in general, it is not
guaranteed that the worker threads are pthreads.


Thank you for your help.

Sorin


What about apr_os_thread_current()? It returns a opaque value that's a
pthread_t on Unices and a pseudo-HANDLE on Windows. Read this[1] to
understand what that means.

As a recovering standards lawyer I should probably point out that
pthread_t is an opaque type that's not guaranteed to be convertible to
a numeric value (or to anything, really). That said, I've never seen a
pthreads implementation where that wasn't the case.

[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683182%28v=vs.85%29.aspx


Thank you, it's what I need.

Sorin




Re: Threads and signals in a module

2012-02-29 Thread Sorin Manolache

On 2012-02-28 22:05, Ben Rockefeller wrote:

Hello,

I have a module which creates a thread using pthread_create and then
registers for SIGRTMIN+4 signal from another process. Problem is I do not
see a signal callback when the signal is sent to me. I only see it when I
try to kill apache where just before dying it hits the signal callback.

So something in Apache is blocking the signal. I am using worker MPM.
What could be the issue. I do not have much leeway into changing the design
of the module (the threading model comes from a separate lib which I link
into the module)...but I can change to use a different signal than
SIGRTMIN+4, etc.
I can also change apache and recompile apache if needed.

Please let me know if you have any suggestions. I have been stuck at this
for a while now :-(

Thanks



In which hook do you create your thread and you register your signal 
handler?


Right after executing the child_init callbacks, apache blocks the 
threads from receiving most signals.


Check the list of blocked/ignored/caught signals using "ps axs" in a shell.

Sorin



Re: about setting r->headers_out

2012-02-29 Thread Sorin Manolache

On 02/29/12 07:52, Rui Hu wrote:

hi,

I want to set "Content-Type" and "Cache-Control" fields in my private
module. So I hooked fixups and used apr_table_setn to set
r->headers_out but nothing happened. Was it thought through?

Thanks for you help!


Try to set r->err_headers_out.

For content-type you could check the configuration directive DefaultType.

You could also set Cache-Control with the Headers directive. Check its 
"always" option too.


You can also combine the Headers directive with environment variables 
set in r->subprocess_env. Check 
http://httpd.apache.org/docs/2.0/mod/mod_headers.html#header


Regards,
Sorin


Re: child init/ exit and the apr_cleanup_register

2011-11-22 Thread Sorin Manolache
On Tue, Nov 22, 2011 at 09:19, michaelr  wrote:
> Hello again,
>
> maybe another stupid question but i could not found any
> example in the modules dir of the httpd source.
>
> Let's say i have an child_init_function which opens a
> filehandle. This filehandle should be open until the child
> ends.
>
> In mod_example.c they register an cleanup function to call a
> function on child exit.
>
> static apr_status_t child_exit ( void *data )
>        {
>        //close file handle...
>
>        return OK;
>        }
>
> static void child_init ( apr_pool_t *p, server_rec *s )
>        {
>        //open file handle...
>
>        apr_pool_cleanup_register(p, s, NULL, child_exit) ;
>        }
>
> I understand the cleanup as a function which runs on pool_cleanup.
> This could happend on any time which i can't control - right?
>
> When i return an HTTP_INTERNAL_SERVER_ERROR in my handler function
> as an example the cleanup get's called also but the child is still
> alive. The filehandle is closed and on the next request i run into
> some kind of trouble.


The cleanup callback should not be called when you finish processing a
request. It should be called only when the child exits.

Register the cleanup function as follows:

apr_pool_cleanup_register(p, NULL, child_exit, apr_pool_cleanup_null);

The third argument (child_exit) is invoked when the pool is destroyed.
The 4th (apr_pool_cleanup_null) is invoked when subpools of p are
destroyed.

You can pass the file handle in the second argument (where I've put
NULL). Then it will show up as the data argument in child_exit.


S

>
> Is there a way to define a 'real' exit function or can i force child
> shutdown in above example? What's the right way to do it correctly?
>
> Thanks a lot and greetings
>  Michael
>
>
>
>
>


Re: mod_proxy retry

2011-11-04 Thread Sorin Manolache
On Fri, Nov 4, 2011 at 01:18, Jodi Bosa  wrote:
> thanks but unfortunately it seems mod_include #include virtual does not
> appear to support requests to external servers (must not contain a scheme
> or hostname - only path and query string).
>
>
> Regardless, I also tried ap_sub_req_lookup_uri():
>
>
>   static apr_status_t x_response_filter(ap_filter_t *f, apr_bucket_brigade
> *bb)
>   {
>        request_rec                                             *r, *subr;
>        int
> status;
>
>        request_rec *subr = ap_sub_req_lookup_uri("http://www.apache.org";,
> r, f->next);
>        status = ap_run_sub_req(subr);
>        ap_destroy_sub_req(subr);
>   }
>
>
> which seemed to succeed to make_sub_request() but later on mod_proxy's
> proxy_handler() failed because of r->proxyreq was NULL.
>

You can still do it with ap_sub_req_lookup_uri. Just use something like

RewriteEngine On
RewriteRule /dummy/path http://external.proxy/path2 [P]

and then ap_sub_req_method_uri("GET", "/dummy/path", r, f->next);

The [P] in the rewriterule makes sure you have r->proxyreq not null.

S


Re: Developing Authn/Authz Modules

2011-10-03 Thread Sorin Manolache
On Sat, Oct 1, 2011 at 23:05, Suneet Shah  wrote:
> Hello,
>
> I am trying to build my apache module which needs to carry out
> authentication and authorization functions based on the value of a cookie.
> To start with, I have just created a shell with the intent that I wanted the
> functions for authentication and authorization being called.
> However, it does not appear that these functions are being called. I have
> pasted by configuration and code below.
>
> When I try to access  http://localhost/test_rpc/ I get the login.html that
> is defined in my ErrorDocument below.
> But when I look in the log file, I see the following.
> Since its looking for a userId, I am wondering if there is an error in my
> configuration
>
> [Sat Oct 01 16:37:29 2011] [debug] prefork.c(996): AcceptMutex: sysvsem
> (default: sysvsem)
> [Sat Oct 01 16:38:08 2011] [error] [client 127.0.0.1] access to
> /test_rpc/header.jsp failed, reason: verification of user id '' not
> configured

You have not hooked check_user_id. In this case the default
check_user_id of mod_authn_default is called. The mod_authn_default
module rejects the request by default and gives you the "verification
of user id ''" log line.

Hook check_user_id instead of auth_checker. Set r->user in
check_user_id. I think setting r->user is not mandatory but it gives
you more precise log messages.

Use return OK (OK is 0) and not return HTTP_OK (HTTP_OK is 200) in your hooks.

S

>
> Any guidance on what I am doing wrong would be greatly appreciate.
>
> Regards
> Suneet
>
>
> -- Configuration in Httpd.conf
>
> 
>   IAM_CookieName IAM_PARAM
>   IAM_TokenParam tkn
>   IAM_Service_base_url "http://localhost:8080/";
>   ErrorDocument 401 "/login.html"
>   AuthType IAMToken
>   AuthName "IAM Login"
>   AuthCookie_Authoritative On
>  
>
> 
>    ProxyPass http://localhost:9080/test_rpc
>
>    require tkn
> 
>
> - Module Code
> static int authz_dbd_check(request_rec *r) {
>
>    ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server, "authz_dbd_check
> called");
>    return HTTP_OK;
> }
>
> static int check_token(request_rec *r) {
>
>     ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server, "chedk_token
> called.");
>    return OK;
> }
>
> static void authz_dbd_hooks(apr_pool_t *p)
> {
>    ap_hook_auth_checker(check_token, NULL, NULL, APR_HOOK_MIDDLE);
>    ap_hook_auth_checker(authz_dbd_check, NULL, NULL, APR_HOOK_MIDDLE);
> }
> module AP_MODULE_DECLARE_DATA authz_dbd_module =
> {
>    STANDARD20_MODULE_STUFF,
>    authz_dbd_cr_cfg,
>    NULL,
>    NULL,
>    NULL,
>    authz_dbd_cmds,
>    authz_dbd_hooks
> };
>


Re: Question on sub requests and output filter context.

2011-09-18 Thread Sorin Manolache
On Thu, Sep 15, 2011 at 12:52, Martin Townsend
 wrote:
> Hi,
>
> I have an output filter that parses custom tags to retrieve data from an
> application running on the same device.
>
> Everything was working well until I tried to move some HTML into Server Side
> Include pages.  Snippet below:
>
> 
> 
> 
>
> 
> 
>
> The first three commands will populate hash tables that are saved in my
> output filters context.
> The HTML in the included pages then use custom tags to query the hash tables
> but for some reason the hash tables are NULL.
>
> Having stepped through with the debugger I can see that the pointer to the
> output filter when processing the main HTML page is different to the one
> when parsing custom tags in SSI pages.  Looking through mod_include I can
> see it creates a sub request for include and sub requests call
> make_sub_request to create a new filter.  Should this new filter also
> inherit the output filters context?  Am I doing something wrong with my use
> of mod_include?  I've tried moving my filter so it's after mod_include but
> still the same problem.
>
> I'm using Server version: Apache/2.2.19 (Unix) on an  ARM board.
>
> Best Regards,
> Martin.
>
>

How do you construct the context of your filter? At the first
invokation of the filter or in the init function of the filter?

In the second case, it could be that you construct the context twice,
the first time in the main request processing and the second time in
the subrequest processing.

In my opinion, apache uses the same filter structure in both the main
and the sub request. In mod_includes apache creates a subrequest,
passing f->next to it. Thus, the first filter in the filter chain of
the subrequest is the filter succeeding the INCLUDES filter. In my
opinion, if you place your filter before the INCLUDES filter, your
filter should not be called in the subrequest if yours is a
AP_FTYPE_RESOURCE filter. If you place your filter after the INCLUDES
filter, the hash tables you mention are not initialised at the time
when your filter processes the responses of the includes subrequests.
I am not sure of what I'm saying because I have no experience in how
mod_includes interacts with other filters. Anyway, I hope this helps.

Have a look in server/request.c at make_sub_request. The subrequest
inherits the protocol filters of the main request, but not all of the
non-protocol output filters of the main request. Maybe you should make
your filter a AP_FTYPE_PROTOCOL filter such that it is not removed
from the chain by mod_includes.

S


  1   2   3   >