Hi Chris,
Thank you for the detailed review, this really helps - appreciate it very much. I do agree with your concerns on memory leaks and the difficulty of dealing with that in the absence of a pool.
   And improvements you have suggested makes quite sense.
However I need more time to understand some of the stuff - I am traveling this week, so it will take some time.
   May be others can have look in the mean time :)

Thanks,
Samisa...
Chris Darroch wrote:
Hi --

   I've been working fairly intensively with Axis2/C over the last month
or two in order to build some applications, and I thought I might share
two major concerns that have arisen for me.  I apologize if any of these
points have been discussed before!

   Before I explain in detail, if I could just put in a request for
someone to review the latest patches I've attached to AXIS2C-380
(https://issues.apache.org/jira/browse/AXIS2C-380) regarding a number
of mod_axis2 cleanups and fixups.  I've tried to fix and clean up
the configuration directives, add axis2_error_init(), move
axiom_xml_reader_init(), and enable valid SOAP faults from application
services and also when services and operations are not found.  Thanks
in advance!

   In brief, my two concerns are that (A) Axis2/C doesn't use the APR
libraries (in particular, it doesn't use APR memory pools), and (B)
that it relies on extensive use of indirect function calls through
tables of function pointers instead of making simple function calls.
I'll walk through my thinking on both issues below.


A) There's a lot of code duplication between the Axis2/C util library
and APR.  That's too bad, because it means that bug fixes to APR
won't be automatically inherited by Axis2/C.  From a presentation
I attended at last month's ApacheCon U.S. I learned that the original
plan was to provide an abstraction layer above a portability library
like APR, and that that got delayed by time constraints.  I don't
know myself if such an extra layer is really valuable or not, but I
would have thought that it was better to at least use APR than to
reproduce portions of it.

   At any rate, I'm not suggesting that much can be done about this
in the short term, at least for a 1.0 release.  However, I am
concerned in particular about the absence of APR memory pools and
the use instead of AXIS2_MALLOC() and AXIS2_FREE().

   My concern comes about because I'm hoping to run mod_axis2 in
a production system.  That means that I'm worried about gradual
memory leaks that occur when memory is allocated but not deallocated
appropriately.  Given the size of the Axis2/C codebase, I think it's
a reasonable assumption that there are a number of places where
this occurs -- and that's not a major criticism, since this is one
of the classic hard things about C programming.

   APR memory pools make life a lot easier.  When writing an
Apache httpd module, there's basically two kinds of pools one wants
to consider using -- either a pool that lasts for the lifetime of an
httpd child process, or a pool that lasts for the lifetime of an
HTTP request.  (I'm leaving aside for now the threading issues that
can arise when dealing with process-lifetime pools when using a
threaded MPM.)

   The attached patchset is a first hack at making mod_axis2 use
these two kinds of pools.  It works, sort of, but after a few
requests the child processes segfault.  (It doesn't include
patches to rampart or guththila, both of which seems to contain
a few calls to AXIS2_REALLOC which would need to be rewritten.)
The segfaults always seems to happen at the following level of
the calling stack (for me, anyway):

axis2_engine_receive()
  AXIS2_MSG_RECV_RECEIVE()
    axis2_raw_xml_in_out_msg_recv_receive_sync()
      axis2_core_utils_create_out_msg_ctx()
        AXIS2_MSG_CTX_SET_SVC_CTX()
          axis2_msg_ctx_set_svc_ctx()
            axis2_msg_ctx_set_svc()

   I don't have time to pursue this further right now, but I think
it would be really valuable before the 1.0 release to ensure that
mod_axis2 worked using some kind of similar technique.

   It seems to me that what might be the logical next steps would be:

1) Add to the env structure a second allocator, named something like
   resident_allocator or config_allocator or something.
2) Analyze which data structures in Axis2/C were effectively
   "configuration" data and needed to stay in memory for the lifetime
   of a process.  This should include any such data that needs to
   be modified or deleted over the lifetime of the server process.
3) For each such structure, ensure that it is created, modified,
   and deleted using only the env->resident_allocator.  This could
   be accomplished a variety of ways (a private sub-allocator,
   for example).  The main concern is thread-safety; if multiple
   threads might be managing this structure at the same time,
   a thread mutex will need to be used as well.

   At this point, everything else should be data which can be
cleaned up automatically when the httpd request->pool is cleared
at the end of each HTTP request.  Then you could even go through
the code and remove all the calls to AXIS2_FREE() if you really
felt like it!  The key advantage is that it means that each
httpd server process is guaranteed to only increase in size if
the "configuration" data grows, and this should presumably be by
only a trivial amount.


B) I mentioned to a few people I met at a presentation on Axis2/C
at ApacheCon my concern about the all the function pointers in
all of the "ops" structures.  The heavy use of all this indirection
means that the compiler and linker can't help detect when you're
about to call a function but your function pointer is NULL.
Making simple, normal function calls means that you can rely on
the compiler and linker to give you a lot of helpful errors about
such things before you reach runtime.

   Now, again I understand that this isn't going to be something
that can change for a 1.0 release.  However, maybe something can
be done that would help a bit.  Let me point to an example;
unfortunately I don't have the exact details to hand any more, but
I think it still illustrates my concern.

   When I was working through the problems I described in AXIS2C-326
and AXIS2C-322 (https://issues.apache.org/jira/browse/AXIS2C-326 and
http://issues.apache.org/jira/browse/AXIS2C-322) I came across
a situation where a valid WSDL file, fed into woden, caused a
segfault.  It turned out to be because the woden code was expecting
an attribute where the WSDL spec says you don't absolutely need one.
I no longer have all the details at hand but it doesn't really matter
for the purposes of discussion.

   Ideally, obviously, woden should accept the WSDL and proceed.
But, if it really isn't going to work without the attribute -- if
the WSDL is just nonsensical, or whatever -- it should at least
print a nice message like "expecting attribute 'foo' at line 123
of file 'bar.wsdl'".

   To do that, the first thing it needs to do is not crash.  There
were two segfault problems I had to work through.  The first is
a relatively simple coding error that happens to appear in a bunch
of places in woden; see my nodes in AXIS2C-384
(https://issues.apache.org/jira/browse/AXIS2C-384) about that.

   The code where I recall the problem turned up looked like this
(I might be off by a few lines but that doesn't matter to the larger
point):

int_msg_ref = woden_wsdl10_interface_msg_ref_to_interface_msg_ref_element(
                  int_msg_ref, env);
intf_msg_qname = WODEN_WSDL10_INTERFACE_MSG_REF_ELEMENT_GET_QNAME(
                     int_msg_ref, env);

   The WODEN_WSDL10_INTERFACE_MSG_REF_ELEMENT_GET_QNAME macro is
defined like this (with some whitespace for clarity):

#define WODEN_WSDL10_INTERFACE_MSG_REF_ELEMENT_GET_QNAME(
            interface_msg_ref_element, env)
  (((woden_wsdl10_interface_msg_ref_element_t *) interface_msg_ref_element)->
            ops->get_qname(interface_msg_ref_element, env))

   The AXIS2C-322/384 issue meant that int_msg_ref was unexpectedly
NULL and so the macro would cause a segfault.  It seems to me like
this sort of thing could happen quite a bit, given the way the
code is structured as a whole, and that's the crux of my concern here.

   At ApacheCon, someone pointed out that the Axis2/C code does
a lot of parameter checking, but that's not going to help here.
Suppose that woden_wsdl10_interface_msg_ref_get_qname(), which is
what the macro calls, performed a nice argument check:

AXIS2_PARAM_CHECK(env->error, interface_msg_ref, AXIS2_FAILURE);

that still won't help, because the macro is still going to fail
if either its interface_msg_ref_element argument is NULL, or if
the ops structure pointer is NULL, or if the get_qname function
pointer is NULL.

   In a more traditional C program, you might have something like:

int_msg_ref = woden_wsdl10_interface_msg_ref_to_interface_msg_ref_element(
                  int_msg_ref, env);
intf_msg_qname = woden_wsdl10_interface_msg_ref_get_qname(
                     int_msg_ref, env);

and if the get_qname() function did an AXIS2_ENV_CHECK() and an
AXIS2_PARAM_CHECK() on its two arguments, you'd be nicely protected
against segfaults.  The compiler and linker would caution you if
were calling a function that wasn't defined.

    Plus, if AXIS2_PARAM_CHECK() did an AXIS2_LOG_ERROR()
then you'd even get an error message showing exactly which file
and line of code encountered a NULL function argument.

   With the current indirect function calling scheme, you're at
the mercy of the various "constructor" functions; if they fail to
fill in function pointers then you're in trouble.  You're also
going to fail if you pass a NULL to a macro.  And in all these
cases there's nothing in the logs to help someone.

   What I might propose is a compromise, since clearly the function
pointer technique is used extensively and can't just be replaced.
Here are my proposals:

1) AXIS2_ENV_CHECK() and AXIS2_PARAM_CHECK() should call
AXIS2_LOG_ERROR() in the case where the argument is NULL.  Because
these are macros they'll expand so that AXIS2_LOG_SI gives file
and line information on the place where the NULL argument was
encountered.

2) All the macros that perform indirect function calling through
the many ops structures should be expanded to first call
AXIS2_ENV_CHECK() and AXIS2_PARAM_CHECK() on their key env
and "interface" structure pointer arguments.  It might be good
to also check for a NULL ops structure pointer and a NULL function
pointer in the ops structure.

   This would mean that whenever you called one of these many
macros you'd be automatically protected against failure due to
NULLs, and you'd get a helpful error message too.


   So ... a long email, but hopefully not too unpleasant a read.
I'm going to have to largely ignore email next week while I work
toward a deadline, but please do reply to the list or to me
in person with comments, criticisms, flames, etc.  Thanks for
listening!

Chris.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to