Re: Hadoop, C API, and fork

2012-03-21 Thread Harsh J
Tarek,

What specific archive mail are you asking this about? Could you post a
link to the earlier conversation?

On Wed, Mar 21, 2012 at 7:01 AM, Tareq Aljabban dee.jay23...@gmail.com wrote:
 Hi Patrick,
 I have the exact same situation.. did you find a way to solve this?
 Thanks in advance



-- 
Harsh J


Hadoop, C API, and fork

2010-04-06 Thread Patrick Donnelly
Hi,

I have a distributed file server front end to Hadoop that uses the
libhdfs C API to talk to Hadoop. Normally the file server will fork on
a new client connection but this does not work with the libhdfs shared
library (it is loaded using dlopen). If the server is in single
process mode (no forking and can handle only one client at a time)
then everything works fine.

I have tried changing it so the server disconnects the Hadoop
connection before forking and having both processes re-connect post
fork. Essentially in the server:

hdfsDisconnect(...);
pid = fork();
hdfsConnect(...);
if (pid == 0)
  ...
else
  ...

This causes a hang in the child process on Connect with the following backtrace:

(gdb) bt
#0  0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x2ace492559f7 in os::PlatformEvent::park ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#2  0x2ace4930a5da in ObjectMonitor::wait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#3  0x2ace49307b13 in ObjectSynchronizer::wait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#4  0x2ace490cf5fb in JVM_MonitorWait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#5  0x2ace49c87f50 in ?? ()
#6  0x0001 in ?? ()
#7  0x2ace4cd84d10 in ?? ()
#8  0x3f80 in ?? ()
#9  0x2ace49c8841d in ?? ()
#10 0x7fff0b4d04c0 in ?? ()
#11 0x in ?? ()

Leaving the connection open in the server:

pid = fork();
if (pid == 0)
  ...
else
  ...

Also produces a hang in the child:

(gdb) bt
#0  0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x2b3d7193d9f7 in os::PlatformEvent::park ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#2  0x2b3d719f25da in ObjectMonitor::wait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#3  0x2b3d719efb13 in ObjectSynchronizer::wait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#4  0x2b3d717b75fb in JVM_MonitorWait ()
   from 
/afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
#5  0x2b3d7236ff50 in ?? ()
#6  0x in ?? ()


Does anyone have a suggestion on debugging/fixing this?

Thanks for any help,

-- 
- Patrick Donnelly


Re: Hadoop, C API, and fork

2010-04-06 Thread Brian Bockelman
Hey Patrick,

Using fork() for a multi-threaded process (which anything that uses libhdfs is) 
is pretty shaky.  You might want to start off by reading the multi-threaded 
notes from the POSIX standard:

http://www.opengroup.org/onlinepubs/95399/functions/fork.html

You might have better luck playing around with pthread_atfork, or thinking 
about other possible designs :)

If you really, really want to do this, you can also try playing around with the 
internals of libhdfs.  Basically, use native JNI calls to shut down the JVM 
after you disconnect, then fork, then re-initialize everything.  No idea if 
this would work.

Brian

On Apr 6, 2010, at 9:51 AM, Patrick Donnelly wrote:

 Hi,
 
 I have a distributed file server front end to Hadoop that uses the
 libhdfs C API to talk to Hadoop. Normally the file server will fork on
 a new client connection but this does not work with the libhdfs shared
 library (it is loaded using dlopen). If the server is in single
 process mode (no forking and can handle only one client at a time)
 then everything works fine.
 
 I have tried changing it so the server disconnects the Hadoop
 connection before forking and having both processes re-connect post
 fork. Essentially in the server:
 
 hdfsDisconnect(...);
 pid = fork();
 hdfsConnect(...);
 if (pid == 0)
  ...
 else
  ...
 
 This causes a hang in the child process on Connect with the following 
 backtrace:
 
 (gdb) bt
 #0  0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
 #1  0x2ace492559f7 in os::PlatformEvent::park ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #2  0x2ace4930a5da in ObjectMonitor::wait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #3  0x2ace49307b13 in ObjectSynchronizer::wait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #4  0x2ace490cf5fb in JVM_MonitorWait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #5  0x2ace49c87f50 in ?? ()
 #6  0x0001 in ?? ()
 #7  0x2ace4cd84d10 in ?? ()
 #8  0x3f80 in ?? ()
 #9  0x2ace49c8841d in ?? ()
 #10 0x7fff0b4d04c0 in ?? ()
 #11 0x in ?? ()
 
 Leaving the connection open in the server:
 
 pid = fork();
 if (pid == 0)
  ...
 else
  ...
 
 Also produces a hang in the child:
 
 (gdb) bt
 #0  0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
 #1  0x2b3d7193d9f7 in os::PlatformEvent::park ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #2  0x2b3d719f25da in ObjectMonitor::wait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #3  0x2b3d719efb13 in ObjectSynchronizer::wait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #4  0x2b3d717b75fb in JVM_MonitorWait ()
   from 
 /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so
 #5  0x2b3d7236ff50 in ?? ()
 #6  0x in ?? ()
 
 
 Does anyone have a suggestion on debugging/fixing this?
 
 Thanks for any help,
 
 -- 
 - Patrick Donnelly



smime.p7s
Description: S/MIME cryptographic signature