Hi everyone

 

I have a problem when I want to enable our distributed system to access hdfs.

 

The background:

In our system, we have 4 ~ 6 segment instance on one physical node, and each 
segment forks a new process to deal with a new session. So if a client connect 
to our system, we will have 4~6 processes working on this session in one 
physical node. We need to handle 250 sessions at the same time, that means 
1000~1500 processes on one physical node. That is ok for our system.

 

Now we want to enable our system to access hdfs. We use libhdfs to do that. It 
works ok if we only deal with a few concurrent session, but if too many client 
connect to the system, the system cannot access hdfs anymore.

The reason is libhdfs create one JVM for each process, and our machine cannot 
afford 1000~1500 JVMs. Libhdfs report the error that it cannot create JVM due 
to memory limitation. 

 

We try to walk around with this problem using a few processes as proxies to 
access hdfs and exchange data with other processes, but proxy process could 
become the performance bottleneck.

 

My questions:

1)      Can different processes share JVM if we use libhdfs to access hdfs?

2)      Is there any other good solution? Maybe I am stupid to find it out. 

3)      Is there any ongoing project to implement a C/C++ hdfs client, without 
JVM? Is it a good idea to create such a project? If it is, I will be very glad 
to contribute my time.

 

 

Thanks

 

 

 

Best Regards

 

------------------------------

 

Zhanwei Wang

 

Reply via email to