Hi everyone
I have a problem when I want to enable our distributed system to access hdfs. The background: In our system, we have 4 ~ 6 segment instance on one physical node, and each segment forks a new process to deal with a new session. So if a client connect to our system, we will have 4~6 processes working on this session in one physical node. We need to handle 250 sessions at the same time, that means 1000~1500 processes on one physical node. That is ok for our system. Now we want to enable our system to access hdfs. We use libhdfs to do that. It works ok if we only deal with a few concurrent session, but if too many client connect to the system, the system cannot access hdfs anymore. The reason is libhdfs create one JVM for each process, and our machine cannot afford 1000~1500 JVMs. Libhdfs report the error that it cannot create JVM due to memory limitation. We try to walk around with this problem using a few processes as proxies to access hdfs and exchange data with other processes, but proxy process could become the performance bottleneck. My questions: 1) Can different processes share JVM if we use libhdfs to access hdfs? 2) Is there any other good solution? Maybe I am stupid to find it out. 3) Is there any ongoing project to implement a C/C++ hdfs client, without JVM? Is it a good idea to create such a project? If it is, I will be very glad to contribute my time. Thanks Best Regards ------------------------------ Zhanwei Wang