We have a project which needs to support similarity queries against items
from a huge amount of data.  One approach we have tried is to use Hbase as
the data repository and Hadoop as the query execution engine. We adopted
Hadoop because Map-Reduce is a very good model of our underlying task and
the programming was straightforward. However, we found that Hadoop will
always allocate a new JVM for each individual task on a node. This is
inefficient for us because in our case the whole Hadoop platform is
dedicated to some relatively stable  parametrized querries, and security and
strict isolation of different tasks is not our main concern. To save the
task setup time, I wonder if there are some existing mechanism in Hadoop or
some extension of Hadoop in other open source projects that can let us
reside our classes in a JVM on the job node, with task nodes waiting for
requests.  


-- 
View this message in context: 
http://www.nabble.com/about-hadoop-jvm-allocation-in-job-excution-tp25458201p25458201.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to