We had this in the past. I'm not sure and would have to search the archives but I vaguely remember that this has been a threading bug in the Solaris version. Could you please try using 7.4.2 or cvs head where this should be fixed. Alternatively you could try with threadding disabled.
I verified last night that this problem also occurs with 7.4.2. I did some more extensive testing on the solution in my previous follow-up email. That is definitely the problem - configure is setting "-pthread" instead of "-lpthread" in config.status. After manually correcting this in config.status, everything works properly.
As stated before, this is not true. If you don't compile with -D_REENTRANT, the /usr/include/errno.h declared errno as
extern int errno;
instead of the thread safe
extern int *___errno(); #define errno *(___errno())
At least it does so here on Solaris 8. That leads to libpq using the global errno variable, which might or might not be the one where "your" error is in a multithreaded program. I mailed the correct solution as a follow up to the other thread earlier today as a patch against 7.4.2.
I don't know enough about configure to know how to fix configure. It is properly setting -lpthread on linux.
Just linking against the right libraries does not do it here. Solaris is not Linux.
Jan
It's also not clear why the symptoms occur since the build does not abort with an unsatisfied external. It must be picking up the pthread externals from soemwhere else? The only difference I can se in the ldd's is the order of the libraries. An ldd of ecpglib shows:
Good:
gcc -shared -h libecpg.so.4 execute.o typename.o descriptor.o data.o error.o prepare.o memory.o connect.o misc.o -L../../../../src/port -L/mhinteg/trees/4/sun32_fixes/ported/openssl -L../pgtypeslib -lpgtypes -L../../../../src/interfaces/libpq -lpq -lssl -lcrypto -lm -lpthread -R/home/wrp/local/pgsql.7.4.2/lib -o libecpg.so.4.1 rm -f libecpg.so.4 ln -s libecpg.so.4.1 libecpg.so.4 rm -f libecpg.so ln -s libecpg.so.4.1 libecpg.so
% ldd libecpg.so libpgtypes.so.1 =>
/home/wrp/local/pgsql.7.4.2/lib/libpgtypes.so.1
libpq.so.3 => /home/wrp/local/pgsql.7.4.2/lib/libpq.so.3
libssl.so.0.9.7 =>
/mhinteg/trees/4/sun32_fixes/ported/openssl/libssl.so.0.9.7
libcrypto.so.0.9.7 =>
/mhinteg/trees/4/sun32_fixes/ported/openssl/libcrypto.so.0.9.7
libm.so.1 => /usr/lib/libm.so.1
libpthread.so.1 => /usr/lib/libpthread.so.1
libresolv.so.2 => /usr/lib/libresolv.so.2
libsocket.so.1 => /usr/lib/libsocket.so.1
libnsl.so.1 => /usr/lib/libnsl.so.1
libdl.so.1 => /usr/lib/libdl.so.1
libc.so.1 => /usr/lib/libc.so.1
libmp.so.2 => /usr/lib/libmp.so.2
libthread.so.1 => /usr/lib/libthread.so.1
/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1
Bad:
gcc -shared -h libecpg.so.4 execute.o typename.o descriptor.o data.o error.o prepare.o memory.o connect.o misc.o -L../../../../src/port -L/mhinteg/trees/4/sun32_fixes/ported/openssl -L../pgtypeslib -lpgtypes -L../../../../src/interfaces/libpq -lpq -lssl -lcrypto -lm -pthread -R/home/wrp/local/pgsql.7.4.2/lib -o libecpg.so.4.1 gcc: unrecognized option `-pthread' rm -f libecpg.so.4 ln -s libecpg.so.4.1 libecpg.so.4 rm -f libecpg.so ln -s libecpg.so.4.1 libecpg.so
% !ldd ldd libecpg.so libpgtypes.so.1 => /home/wrp/local/pgsql.7.4.2/lib/libpgtypes.so.1 libpq.so.3 => /home/wrp/local/pgsql.7.4.2/lib/libpq.so.3 libssl.so.0.9.7 => /mhinteg/trees/4/sun32_fixes/ported/openssl/libssl.so.0.9.7 libcrypto.so.0.9.7 => /mhinteg/trees/4/sun32_fixes/ported/openssl/libcrypto.so.0.9.7 libm.so.1 => /usr/lib/libm.so.1 libresolv.so.2 => /usr/lib/libresolv.so.2 libsocket.so.1 => /usr/lib/libsocket.so.1 libnsl.so.1 => /usr/lib/libnsl.so.1 libpthread.so.1 => /usr/lib/libpthread.so.1 libdl.so.1 => /usr/lib/libdl.so.1 libc.so.1 => /usr/lib/libc.so.1 libmp.so.2 => /usr/lib/libmp.so.2 libthread.so.1 => /usr/lib/libthread.so.1 /usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1
I realize it isn't entirely meaningful without the source code to know exactly where I put the print statements, but here is my debug output running the previously enclosed test program. You can see that it is allocating a new sqlca structure when it shouldn't be.
Good:
% ./testit ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x23b98 ECPGget_sqlca: before return: address of sqlca = 0x23b98 ECPGINIT: address of sqlca = 0x23b98 In ECPGconnect ECPGconnect: address of sqlca = 0x23b98 Before connection check bad connection ECPGconnect: address of sqlca = 0x23b98 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x23b98 ECPGget_sqlca: before return: address of sqlca = 0x23b98 In error.c - code = -402 ECPGraise: address of sqlca = 0x23b98 After ECPGraise, sqlca->sqlcode = -402 ECPGconnect: address of sqlca = 0x23b98 Before return false, sqlca->sqlcode = -402 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x23b98 ECPGget_sqlca: before return: address of sqlca = 0x23b98 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x23b98 ECPGget_sqlca: before return: address of sqlca = 0x23b98 Connect failure: -402
Bad:
% ./testit ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x23900 ECPGget_sqlca: before return: address of sqlca = 0x23900 ECPGINIT: address of sqlca = 0x23900 In ECPGconnect ECPGconnect: address of sqlca = 0x23900 Before connection check bad connection ECPGconnect: address of sqlca = 0x23900 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x251b0 ECPGget_sqlca: before return: address of sqlca = 0x251b0 In error.c - code = -402 ECPGraise: address of sqlca = 0x251b0 After ECPGraise, sqlca->sqlcode = 0 ECPGconnect: address of sqlca = 0x23900 Before return false, sqlca->sqlcode = 0 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x25248 ECPGget_sqlca: before return: address of sqlca = 0x25248 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x252e0 ECPGget_sqlca: before return: address of sqlca = 0x252e0 ECPGINIT: address of sqlca = 0x252e0 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x25378 ECPGget_sqlca: before return: address of sqlca = 0x25378 In error.c - code = -220 ECPGraise: address of sqlca = 0x25378 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x25410 ECPGget_sqlca: before return: address of sqlca = 0x25410 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x254a8 ECPGget_sqlca: before return: address of sqlca = 0x254a8 ECPGget_sqlca: after pthread_getspecific: address of sqlca = 0x0 ECPGINIT: address of sqlca = 0x25540 ECPGget_sqlca: before return: address of sqlca = 0x25540 SELECT error code: 0 systemNum = -4261248
I just got this in response to a post to pgsql-general on a different Solaris problem. This sounds like the same problem as I'm seeing. I've sent him my solution. Hopefully it will solve his symptoms also.
One other problem I am looking into (and why I tried to compile with thread safety in the first place) is that this somehow did not turn on -D_REENTRANT in the CFLAGS for libpq. And that leads to libpq not using the threadsafe definition of errno, leading to serious communication trouble in the end (pqReadData() failing with ENOENT while the real error is a harmless EAGAIN from a nonblocking recv()).
Jan
Wes
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
-- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== [EMAIL PROTECTED] #
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster