Re: getpid() not unique for threads on Linux 2.6 + NPTL

2006-03-31 Thread Balazs Scheidler
On Fri, 2006-03-31 at 13:52 -0300, Leandro Santi wrote:
> Balazs Scheidler, 2006-03-31:
> 
> > The problem with
> > the current situation is that everything _seems_ to work well, but
> > whenever load hits the application it crashes and it is not easy to
> > debug especially when one is looking for an error in his own code :)
> 
> IMHO, the sooner the problem is detected, the better. Even if this
> implies a brutal crash of the application. 
> 
> On Linux, the current CRYPTO_thread_id() behavior with multithreaded
> applications hides the fact that the application is *broken*. For 
> example, MySQL with OpenSSL has been broken *for years*. The problem was
> much more harder to trigger on Linux, because of the default 
> CRYPTO_thread_id() behavior. Platforms without the getpid() <-> 
> pthread_self() bijection (Solaris, NetBSD >= 2, ...) happily crashed
> sooner, and more importantly: the problem gets fixed sooner, as well.

I would rather see a big assertion failed message and an abort() in this
case too. The default CRYPTO_id_callback() is simply broken.

A possible solution would be not to handle this race in ERR_get_state(),
but do an assert(0) instead with a nice error message:

/* If a race occured in this function and we came second, tmpp
 * is the first one that we just replaced. */
if (tmpp)
ERR_STATE_free(tmpp);

If getpid() is unique this race will never happen, if it is not, then
instead of freeing an error state which _is being used_ in another
thread, we should do an assert(0)

This way the default would work for single-threaded applications and
would quickly fail with multi-threaded ones, the race above always
indicates a problem which should not be allowed.

-- 
Bazsi

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: getpid() not unique for threads on Linux 2.6 + NPTL

2006-03-31 Thread Leandro Santi
Balazs Scheidler, 2006-03-31:

> The problem with
> the current situation is that everything _seems_ to work well, but
> whenever load hits the application it crashes and it is not easy to
> debug especially when one is looking for an error in his own code :)

IMHO, the sooner the problem is detected, the better. Even if this
implies a brutal crash of the application. 

On Linux, the current CRYPTO_thread_id() behavior with multithreaded
applications hides the fact that the application is *broken*. For 
example, MySQL with OpenSSL has been broken *for years*. The problem was
much more harder to trigger on Linux, because of the default 
CRYPTO_thread_id() behavior. Platforms without the getpid() <-> 
pthread_self() bijection (Solaris, NetBSD >= 2, ...) happily crashed
sooner, and more importantly: the problem gets fixed sooner, as well.

Leandro
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


getpid() not unique for threads on Linux 2.6 + NPTL

2006-03-31 Thread Balazs Scheidler
Hi,

I've been debugging my multithreaded application using OpenSSL for three
consequtive days, which crashed after processing about 5-200 SSL
transactions.

The reason turned out to be the default CRYPTO_thread_id()
implementation which uses getpid() as a thread identifier. This used to
work on non-NPTL linuxthreads but does not work on anything using Linux
2.6 and NPTL as in this combination getpid() returns the same value for
all threads.

I have solved the problem myself by supplying a custom callback using
CRYPTO_set_id_callback which uses the pthread id, however this problem
probably affects a lot of openssl users, who have applications that run
flawlessly on Linux 2.4 and after upgrading to a newer distribution the
application will crash randomly as the issue causes a heap corruption.

A clean non-pthread depending solution would be to use gettid(2) when
available, it returns the pid of the thread instead of the application.
(gettid is equal to pid in a non-threaded application).

The problem with gettid() that it is not available in the libc and one
has to use crude hacks to be able to use it. The following C programs
works for me:

#include 
#include 
#include 
#include 
#include 

_syscall0(pid_t,gettid)

pid_t gettid(void);

int main()
{
  printf("%d\n", gettid());
}

That code already contains a couple of #ifdefs anyway. The problem with
the current situation is that everything _seems_ to work well, but
whenever load hits the application it crashes and it is not easy to
debug especially when one is looking for an error in his own code :)

I would have opened a bug ticket but I have not found an obvious bug
tracking system, so I decided to send it to openssl-dev in the hope that
this is the right list. Please enlighten me if I am mistaken.

-- 
Bazsi

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]