Re: Running queries using index on HDFS

Robert Evans Mon, 25 Jul 2011 15:01:49 -0700

Sofia,

You can access any HDFS file from a normal java application so long as your 
classpath and some configuration is set up correctly.  That is all that the 
hadoop jar command does.  It is a shell script that sets up the environment for 
java to work with Hadoop.  Look at the example for the Tool Class


http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html

If you delete the JobConf stuff you can then just talk to the FIleSystem by 
doing the following

Path p = new Path("URI OF FILE TO OPEN");
FileSystem fs = p.getFileSystem(conf);
InputStream in = fs.open(p);

Now you can use in to read your data.  Just be sure to close it when you are 
done.

--Bobby Evans



On 7/25/11 4:40 PM, "Sofia Georgiakaki" <[email protected]> wrote:

Good evening,

I have built an Rtree on HDFS, in order to improve the query performance of 
high-selectivity spatial queries.
The Rtree is composed of a number of hdfs files (each one created by one 
Reducer, so as the number of the files is equal to the number of the reducers), 
where each file is a subtree of the root of the Rtree.
I investigate the way to use the Rtree in an efficient way, with respect to the 
locality of each file on hdfs (data-placement).


I would like to ask, if it is possible to read a file which is on hdfs, from a 
java application (not MapReduce).
In case this is not possible (as I believe), either I should download the files 
on the local filesystem (which is not a solution, since the files could be very 
large), orrun the queries using the Hadoop.
In order to maximise the gain, I should probably process a batch of queries 
during each Job, and run each query on a node that is "near" to the files that 
are involved in handling the specific query.

Can I find the node where each file is located (or at least most of its 
blocks), and run on that node a reducer that handles these queries? Could the 
function  DFSClient.getBlockLocations() help ?

Thank you in advance,
Sofia

Re: Running queries using index on HDFS

Reply via email to