Re: one word syncronize once more

2007-07-20 Thread Darryl Miles

Greg Ames wrote:

please see rev. 558039.  requests_this_child does not need to be 100% accurate. 
 the cure below is worse than the disease.

Greg

-requests_this_child--; /* FIXME: should be synchronized - aaron */
+apr_atomic_dec32(requests_this_child); /* much slower than important 
*/



Maybe someone could reconfirm to the list what exactly the disease is ?


If this counter is not accurate (meaning it may loose 
increments/decrements) does the end the world cave in ?  Maybe someone 
could explain how and where this particular variable 
requests_this_child is used.


For example if its just to provide guideline information to allow 
termination after 1 requests then thats less of a problem than if 
someone configured it to terminate after 2 requests.  Being 0.0001% out 
is less of a problem than 100% out.


For example if the counter is used in some way to allow apache to 
restart/logrotate and if it does not come back to zero reliabily then 
apache will remain hung trying to shutdown (as it often the case in my 
real world experience) then the disease would be a hung webserver during 
shutdown and taking.  This is much more fatal.


I'm not suggesting this particular counter requests_this_child has a 
direct cause with hung apache instances but I'm asking those that 
understand the disease(s) if they could explain exactly what they are.





I'm not familiar with the apr_atomic_dec32() API, is this correctly 
optimized to a single asm instruction lock decl 0xaddr32 on Intel IA32 
?  How many clock cycles do you think that operations takes, what 
technical understanding makes you think such a cure should be utilized 
with caution ?


My understanding of the cure is that its very very light weight so light 
weight I'd challenge you to prove the mythical performance issue in the 
situation its being used in apache.



To give this some weight, I'm was recently involved in an multi-threaded 
ethernet packet processing application dealing with 100's of mbit and 
each packet that came updated multiple counters using the IA32 LOCK 
prefix but there was zero noticeable contention.  Indeed this is the 
purpose of the assembly primitive to implement something in hardware 
that is most efficiently done there.



Darryl



Re: one word syncronize

2007-06-20 Thread Darryl Miles

sebb wrote:

On 14/06/07, Dmytro Fedonin [EMAIL PROTECTED] wrote:

Looking through 'server/mpm/worker/worker.c' I have found such a
combination of TODO/FIXME comments:
1)
/* TODO: requests_this_child should be synchronized - aaron */
if (requests_this_child = 0) {
2)
requests_this_child--; /* FIXME: should be synchronized - aaron */

And I can not see any point here. These are one word CPU operations,
thus there is no way to preempt inside this kind of operation. So, one
CPU is safe by nature of basic operation. If we have several CPUs they
will synchronize caches any way, thus we will never get inconsistent
state here. We can only lose time trying to synchronize it in code. Am I
not right?


The decrement operation is a read-modify-write cycle, it is possible for 
2 CPUs to overlap their operations, ending up with a observable lost 
decrement.  Since they both end up reading the same initial value.


On IA32/x86 the DEC assembly instruction operation can be prefixed by 
the LOCK instruction, this makes the CPU continue to assert memory bus 
locking for the duration of the instruction so there is no way for CPU2 
to perform a read access until CPU1 releases control of the memory bus 
when it completes the instruction, this is effectively what atomic_dec() 
enforces.


The amount of performance lost by using atomic_xxx() really is minimal, 
with any luck it might only be that cache-line that remains locked not 
the entire memory bus.




The decrement operation may be handled as load, decrement, store on
some architectures, so can be pre-empted by a different CPU.


There is no other way to handle it :)  Memory itself can't perform 
arithmetic operations, so the decrement always happens inside the ALU 
inside the CPU.


It is true that non-SMP aware CPUs might maintain memory bus acquisition 
during the 'decrement' (aka modify) phase of the operation since there 
is no reason not to give it up as they are the only user of memory.


This becomes a performance bottleneck for any SMP capable CPU which has 
a cache that can operate at full CPU clock speeds.  As the 'decrement' 
(aka modify) phase is going to require at least 1 clock cycle to perform 
so why not let another CPU make use of the memory bus.




Also some hardware architectures (e.g. HP Alpha) have an unusual
memory model. One CPU may see memory updates in a different order from
another CPU. Software that relies on the updates being seen across all
CPUs must use the appropriate memory synchronisation instructions.

I don't know if these considerations apply to this code.


Memory update ordering applies when considering how 2 or more distinct 
machine words are updated with respect to themselves when those updates 
are observed from another CPU.


The example here is with concerns over a single machine word being 
updated on SMP systems.



Darryl



Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-11 Thread Darryl Miles

William A. Rowe, Jr. wrote:

Darryl Miles wrote:

Your thinking is correct there is a problem.  Those OpenSSL functions
are not documented in my man page but exist in the library.  Yes there
is a read-test-write race window by using those APIs alone.


Nope.  This is set when the server process is running in single process,
single thread mode, long before the server 'opens up' and spawns off it's
worker threads.


Understood on the single thread situation.  But what about module 
initialization order guaranteed and possible future modules which may 
also use OpenSSL ?



The goal is to ensure the CRYPTO_() init functions are called, and 
called exactly once before openssl first use only when openssl is being 
used.


* mod_core (no mod_ssl, no mod_frank, no need do anything no openssl users)
* mod_core + mod_ssl (no mod_frank)
* mod_core + mod_frank (no mod_ssl)
* mod_core + mod_ssl + mod_frank
* mod_core + mod_frank + mod_ssl (opposite initialization order)

There may not be worker threads and apache-http may not support dynamic 
runtime loading / unloading of modules (at this time) but its possible 
to deal with all those concerns cleanly.




What I DO agree with is that these callbacks should be locked in much
earlier than post_config.

I'm happy to see these callbacks locked in at the time we register the
module itself.


Is the module registration order the same for a given canonical 
configuration (i.e. its not subject to httpd.conf config LoadModule 
directive ordering).


Also can one module can ask the core what other modules are loaded and 
it can ask them if they too are users of OpenSSL.  I.e. whatever system 
to address this problem is used would need to deal with the possible 
existence of a future mod_frank_v2 (that does not exist today but may do 
in the future) in a way that mod_ssl does not need to be recompiled or 
upgraded to know about mod_frank_v2 when the time comes to release 
mod_frank_v2.



Darryl


Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-07 Thread Darryl Miles

Frank wrote:

Joe Orton wrote:

On Wed, Dec 06, 2006 at 06:20:55PM +, Darryl Miles wrote:
[...]

Is there an API to get the current value ?



Yes, CRYPTO_get_locking_callback/CRYPTO_get_id_callback.
[...]


I already know that this functions exists. But what if my module gets 
inited before mod_ssl, which doesn't use the get-functions to determine 
that something is already there? I was in the hope to see a clean 
general purpose solution. :-)


Your thinking is correct there is a problem.  Those OpenSSL functions 
are not documented in my man page but exist in the library.  Yes there 
is a read-test-write race window by using those APIs alone.



After this long and informative discussion I really think there is need 
for a ssl_thread_init_if_not_already_done inside Apache. (Besides the 
correction/removal of OpenSSL's stupid global locking mechanism.)


I disagree I should not need to compile apache mod_core in a way that 
has to understand the quirks of OpenSSL.  So adding a function 
ssl_thread_init_if_not_already_done() to apache core is too specific to 
that one problem.


I'm saying its possible to solve a bunch of problems like this in a more 
generic way.  This helps you and the next man.





Maybe there is some (small) re-design of the Apache code needed?


Agreed, something needs to be added.  I'm saying there is no need to 
make it specific to OpenSSL.  Serializing the initialization can be made 
generic such that these objectives are met:


* When building Apache mod_core with minimal functionality it is not 
necessary to build it with special API function to handle OpenSSL.  So 
we get the co-ordination mechanism by default.


* I can build my mod_ssl or mod_frank at any point later without having 
to rebuild apache mod_core as well.


* Can provide a generic mechanism which can be used to solve the same 
problem where a common subordinate library is linked with by a module 
but is not require by mod_core.


* The overhead of adding the handful of new functions to apache mod_core 
should low.  Subjective but...


* Because the solution is generic it can be reused for other purposes 
where one module has to co-ordinate with another module at runtime over 
the use of a shared resource, but due to the modular nature of apache 
its impossible to tell if/when/what other modules need to co-ordinate 
their behavior until runtime.


* Only those modules that use OpenSSL directly need to be linked against 
OpenSSL.


* Only those modules that link with OpenSSL need openssl around at build 
time.  So the helper functions below would be repeated within each 
mod_ that uses OpenSSL.


Those a rough concerns to address.



All Apache should provide is an accessible keyed namespace to attach 
arbitrary pieces of memory where the attachment functions provides some 
thread-safe guarantees.  Maybe there is already a mechanism to achieve 
most of that within apache.


All these functions must be thread-safe with respect to themselves.  The 
key itself could be a string or an integer.  Or even both if you wanted 
to abstract the key, like X11 Xwindows atoms work, you lookup/create 
your key independently and get back a reference which is usually a 
cardinal integer starting at 1 and then use this api with that (not the 
key directly).


* void *get(key) this gets the (void *) key value.

* void *attach(key, newdata) this overrides the current key value with 
an optional exchange (by returning the old value).


* void *attach_smart_exchange(key, newdata) this does an exchange but 
will only exchange a NULL for non-NULL and non-NULL for NULL.  This is 
used to set without overwriting.  It always returns the pre-existing 
value by checking that value in the context of how you called it you can 
work out if you were successful or not.



Sample hypothetical usage in self-documenting non compilable code:

struct openssl_init_struct {
long magic; /* do versioning here */
int boolean_flag;
mutex_t mutex;
} *ois;

ois = calloc_from_global_pool(1, sizeof(openssl_init_struct));
ois-magic = OPENSSL_INIT_STRUCT_V1; /* magic constant */
ois-boolean_flag = FALSE;
mutex_init(ois-mutex);
/* as first use we lock before attachment */
mutex_lock(ois-mutex);

const char *label_key = mylabel_openssl_initialization_serializer;	/* 
could be integer based key system */


/* This ap_newapi_attach_smart_exchange function is a thread-safe atomic 
attachment of mydata to the global namespace, it will only attach if 
the current value for the key does not exist or is NULL already, which 
are essentially treated identically (does not exist vs NULL value) */


void *retval = 
apache_core_handle-ap_newapi_attach_smart_exchange(label_key, ois);


if(retval == NULL) {
	/* we won the attachment, we already locked it and we own that lock, 
fall thru to code below */

} else {
free_from_global_pool(ois); /* FIXME: destruct mutex etc.. properly 
*/

	/* we lost, go get the winner, you can optimize

Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-06 Thread Darryl Miles

Frank wrote:

William A. Rowe, Jr. wrote:

Nick Kew wrote:
[...]
An SSL_CTX can't be cross-threaded.  If the scope of use of that CTX is
restricted to one thread at a time, then yes, OpenSSL has been threadsafe
for a very very long time.


You mean if I were able to create one SSL_CTX for every thread then I do 
not have to use the both thread-safe-maker callbacks?


I dont think this is true.  But correct my understanding too if I am 
wrong.  Cross-threaded might confuse someone into thinking there maybe 
some apartment threading rules to obey, there isn't.



An SSL * can't have a method invoked on the same instance at the same 
time.  So long as you serialize your method calls (SSL_() family) to 
that same instance; any thread can call that method.  It is unusual to 
need to do so.


But SSL_CTX * is the template context specifically designed to be 
shared and used across multiple-threads if needs be, providing you make 
correct use of the 'CRYPTO_set_locking_callback' and 
'CRYPTO_set_id_callback' and friends as part of your application 
initialization.  This allows for (amongst other things) the obviously 
parallel usage of SSL_new(SSL_CTX *) when creating new connections.



Maybe the openssl-users list would be a better place for assistance.


Darryl



Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-06 Thread Darryl Miles

Nick Kew wrote:

Unless OpenSSL nomenclature is rather confusing here, an SSL_CTX
sounds like the kind of thing you would instantiate per-connection
or per-request.  Does your module act on a request or a connection?


Maybe a bit of background reading and examination of reference 
implementations would be a better help for you right now.



SSL_CTX_new(3): SSL_CTX_new - create a new SSL_CTX object as framework 
for TLS/SSL enabled functions


SSL_new(3): SSL_new - create a new SSL structure for a connection


The SSL_CTX is a template/configuration holder to stamp out your 
connection instances from.  This saves configuring certificates, cipher 
specs, etc... for every connection.



Darryl


Re: vote on concept of ServerTokens Off

2006-12-06 Thread Darryl Miles

Jeff Trawick wrote:

I know... that's why I asked :)


We're up to two great answers to disable some output from the server
that isn't required by the HTTP protocol anyway:

1) modify the source
2) install third-party module


ROFL.  Please add to the list:

3) Start a new apache-httpd fork.  apache-phewbits  :D


Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-06 Thread Darryl Miles

Frank wrote:


EVP_CIPHER_CTX ctx;
EVP_CIPHER_CTX_init ( ctx);
EVP_EncryptInit ( ctx, EVP_bf_cbc (), key, iv);
EVP_EncryptUpdate ( ctx, outbuf,  olen, inbuff, n);
EVP_EncryptFinal ( ctx, outbuf + olen,  tlen);

Because 'EVP_CIPHER_CTX_init' is 'slow', I want to call it once! (Yes! I 
can call it for every request and then (I think) I am on the safe side, 
but I do not want this because there are MANY requests!)
So my code has to be thread safe, as Apache might be compiled with 
thread support! To make it thread safe 
http://www.openssl.org/docs/crypto/threads.html told me:


These functions a generally thread safe AFAIK so long as you are not 
using an engine / cyptocard or something shared.  If its just a basic 
memory in memory out transform it should be pretty clean of locking.


Its only stuff like SSL session cache and SSL_CTX and registrations 
schemes (cipher registration, digest registration, etc...) which use 
locking as they are shared concurrently.


But I understand your point.  Its a necessity because the API contract 
says that is what you must do.




OpenSSL can safely be used in multi-threaded applications provided that 
at least two callback functions are set.


This means the two functions 'CRYPTO_set_locking_callback' and 
'CRYPTO_set_id_callback'!


These two functions are being called from mod_ssl by the 
ssl_init_Module-function (via ssl_util_thread_setup, which creates some 
thread mutexes and calls the both functions) without testing whether 
they have already being called or not.


My question is: How does this interfere with my module? How can I ensure 
that only one of us (mod_ssl or my module) is calling these both 
functions? I cannot believe that there is no problem when my module 
creates some thread mutexes and mod_ssl does it too...


Of course it is necessary to arbitrate the load and unloading of the 
OpenSSL libraries.  Where loading really means the initial configuration 
that OpenSSL requires to initialize itself before it achieves maximal 
thread-safety.


This is true of any programming paradigm.  If something it thread-unsafe 
until you configure it correctly to be thread-safe, you have to 
serialize all usage upto the point it becomes safe.  This includes 
calling functions to initialize the safety mechanism.


You also can't go and change the locking policy while there maybe one or 
more active users of the old/existing policy.  From a practical 
standpoint its impossible to tell how many users there are once the 
first user gets stuck into application work.  So you are correct, you 
can't blindly can go overwriting the policy with 
CRYPTO_set_locking_callback() to point to your mutexes.




 P.S.: I still think there is need for a test routine like
 'ssl_is_thread_safe_maker_on()'.

Agreed.  The problem is reduced to serializing the setting of the 
locking policy, which is reduced to the apache http server framework 
providing a non-performance critical locking mechanism and a boolean 
flag.  Then each mod_ user locks that external lock, checks the 
external flag, if flag not set, configure locking policy in openssl, set 
flag and unlock.  The apache framework needs to be the arbitrator here 
by holding the lock and flag in a namespace accessible by all modules.


It does not matter which application wins to set the policy, just so 
long as a policy it setup.  But it would be helpful if both mod_ssl and 
mod_frank would agree on the same policy (OpenSSL kind of gives you a 
few options here)



Darryl



Re: Creating a thread safe module and the problem of calling of 'CRYPTO_set_locking_callback' twice!

2006-12-06 Thread Darryl Miles

Joe Orton wrote:
What I do with OpenSSL in neon is to check that the existing callback is 
NULL before registering a new callback; and likewise to check that the 
ID callback is the one neon previously registered before un-registering 
it later.  If everybody did that it would be relatively safe.


Is there an API to get the current value ?

It shouldn't be too hard to attach a piece of memory to a keyed 
namespace which the core module maintains.  3 primitives should do the 
trick.  data = get(key), set(key, data), set_if_null(key, data) where 
the set_if_null() operation is especially thread-safe.


This would be little impact/bloat on core module, but be user extendable 
to allow other such things.  Someone just needs to maintain an assigned 
key value list.


A notice board for modules to talk to each other.


The OpenSSL guys have actually obviated the ID callback for some future 
release, it was entirely unportable because of the cast-to-long issue 
anyway; but the locking callback remains.


(void *) in the latest releases AFAIK.  This is to address this very 
concern.



Darryl


Re: [PATCH 40026] ServerTokens Off

2006-08-21 Thread Darryl Miles

Mads Toftum wrote:

+1 - looking at the number of IIS targeted worms that keep hitting my
apache installs seem to suggest that obscuring the server name will at
most lead to a false sense of security. Besides, if you really care, I'm
pretty sure it wouldn't be all that hard to guess what server it is by
looking at all the rest of the headers.


Looking at the way the TCPIP stack behaves under normal and error 
conditions.


Looking at the way the HTTP server behaves under normal and error 
conditions.


Looking at the way the file serving behaves under normal and error 
conditions.


Looking at the way any scripting technology behaves under normal and 
error conditions.


You can't hide everything and why waste your own CPU cycles trying to 
imitate another platforms quirks, when you could be serving documents 
with it.  Another major point about OSS security is that it can 
withstand source code disclosure _AND_ still be secure.  Maybe other 
servers implementations just aren't in the same league of security.


Darryl


Re: [PATCH 40026] ServerTokens Off

2006-08-12 Thread Darryl Miles

Joshua Slive wrote:

noteSetting directiveServerTokens/directive to less than
codeminimal/code is not recommended because it makes it more
difficult to debug interoperational problems./note

And my +1 isn't very strong.  I have no problem with saying that this
small bit of advertising is the tiny price that you pay for using our
free software.  But just to make this never-ending issue go away, I'd
say put it in.


I should also be pointed out in the documentation that those thinking of 
setting it to Off for the purpose of security by obscurity (for hiding 
of implementation and version number) should realize that this concept 
has no technical merit in the HTTP server situation.  Call this an 
education clause in the documentation which may head off inappropriate 
usage by less clueful users.


With regards to the price that you pay ... I take it that you are 
reading it from the karma equalization policy not in any legal policy 
since one of the fundamental points of the Apache Foundation is that 
advertisement is not one of the prices you pay.



Darryl


Re: mod_proxy_balancer/mod_proxy_ajp TODO

2006-06-22 Thread Darryl Miles

Henri Gomez wrote:

Well you we always indicate some sort of CPU power for a remote (a sort 
of bogomips) and use this in computation.


Why should the CPU power matter, what if the high power CPU is getting 
all the complex requests and the lower power CPU is ending up with 
simple request (static content).



You are better implementing it in control packets over AJP that can 
advertise that hosts current willingness to take on new requests on a 
32/64bit scale.  Call this a flow control back pressure packet, a 
stateless beacon of information which may or may not be used by the 
client AJP.


Then have a pluggable implementation at the server end of calculating 
that value and frequency for advertising it.  All the apache LB has to 
do is work from that load calculation.  All existing AJPs have to do is 
just ignore the packet.


In the case of different horse power CPUs that factor is better fed into 
the server AJP algorithm by biasing the advertised willingness to take 
on new requests after a certain low level is reached.  Only the server 
end of the AJP know the true request rate and how near that system is to 
breaking point.


This scheme also works where apache may not see all the work the backend 
is doing, like if you have a different apache front end clusters linked 
to the same single backend cluster.



Darryl


Re: mod_proxy_balancer/mod_proxy_ajp TODO

2006-06-22 Thread Darryl Miles

Henri Gomez wrote:

The TomcatoMips indicator was just something to tell that it's not the
raw CPU power which is important, but the estimated LOAD capacity of
an instance.



But its still apache working out TomcatoMips.  I think that approach is 
still flawed.


I'm saying only the server end of the AJP knows the true situation.  The 
current setup presumes that the running apache instance has all the 
facts necessary to determine balancing.  When all it knows about is the 
work it has given the backend and the work rate its betting it back.



I'm thinking both ends apache and tomcat should make load calculations 
based on that they know at hand.  As far as I know there is no provision 
in the AJP to announce Willingness to serve.  Both ends should feed 
their available information and configuration biases it into their 
respective algorithm and come out with a result that can be compared 
against each other.  The worker would then announce as necessary (there 
maybe a minimum % change to damper information flap) down the connector 
the info to apache.  There probably need to be a random magic number and 
wrapping sequence number in the packet to help the apache end spot 
obvious problems.


This would allow kernel load avg/io load (and anything else) to be 
periodically taken into account at the tomcat end.  It would be expected 
that each member of the backend tomcat cluster is using the same 
algorithm to announce willingness.  Otherwise you get disparity when 
apache comes to make a decision.



So I suppose its just the framework to allow an LB worker to announce 
its willingness to serve I am calling for here.  Not any specific 
algorithm, that issue can be toyed with until the end of time.



An initial implementation would need to experiment and work out:
 * How that willingess value impacts/biases the existing apache LB 
calculations.
 * Guidelines on how to configure algorithm at each end up based on 
known factors (like CPUs, average background workload, relative IO 
performance).


I'm thinking with that you can hit the widest audience to make a usable 
default without giving much thought to configuration.  The type of 
approach kernels make these days, you only have to tweak and think about 
configuration in extreme scenarios but for the most it works well out of 
the box.



Darryl


mod_proxy_xxxxx last resort fallback redirect ?

2006-06-17 Thread Darryl Miles


I'm interested in your comments (good and bad) on implementing a new 
option to ProxyPass which would make apache perform a redirect when the 
proxy server or balancer cluster is not available.  This minics the same 
functionality of a dedicated hardware load balancer by issuing a HTTP 
redirect response to a holding page (This service is not available, or 
We are updating out website).


Can anyone see any reason why this is a bad idea to implement within 
apache, or see any obstacles in my way ?



The last time I looked at the internals of apache it had a sub-request 
facility, I would like to be able to redirect to both external and 
internal pages.  External being a HTTP 3xx response with Location: 
header and would be configured with a absolute url http://...;.  
Internal being a fall through to the configured mapping and content 
handler for the URL just as-if the ProxyPass directive wasn't there, I 
was thinking along the lines of this being configured with a 
fallthrough: scheme prefix.  This means it would be possible to use:


DocumentRoot /opt/apache/htdocs
ProxyPass / balancer://group1/ timeout=5 maxattempts=3 
fallback-redirect=fallthrough:/holding.html


Then if the balancer://group1 cluster fails the document at 
/opt/apache/htdocs/holding.html is served instead.



Request for comments,

Darryl

--
Darryl L. Miles