We have a project which needs to support similarity queries against items from a huge amount of data. One approach we have tried is to use Hbase as the data repository and Hadoop as the query execution engine. We adopted Hadoop because Map-Reduce is a very good model of our underlying task and the programming was straightforward. However, we found that Hadoop will always allocate a new JVM for each individual task on a node. This is inefficient for us because in our case the whole Hadoop platform is dedicated to some relatively stable parametrized querries, and security and strict isolation of different tasks is not our main concern. To save the task setup time, I wonder if there are some existing mechanism in Hadoop or some extension of Hadoop in other open source projects that can let us reside our classes in a JVM on the job node, with task nodes waiting for requests.
-- View this message in context: http://www.nabble.com/about-hadoop-jvm-allocation-in-job-excution-tp25458201p25458201.html Sent from the Hadoop core-user mailing list archive at Nabble.com.