To add to what Bobby said, you can get block locations with fs.getFileBlockLocations() if you want to open based on locality.
-Joey On Mon, Jul 25, 2011 at 3:00 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Sofia, > > You can access any HDFS file from a normal java application so long as your > classpath and some configuration is set up correctly. That is all that the > hadoop jar command does. It is a shell script that sets up the environment > for java to work with Hadoop. Look at the example for the Tool Class > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html > > If you delete the JobConf stuff you can then just talk to the FIleSystem by > doing the following > > Path p = new Path("URI OF FILE TO OPEN"); > FileSystem fs = p.getFileSystem(conf); > InputStream in = fs.open(p); > > Now you can use in to read your data. Just be sure to close it when you are > done. > > --Bobby Evans > > > > On 7/25/11 4:40 PM, "Sofia Georgiakaki" <geosofie_...@yahoo.com> wrote: > > Good evening, > > I have built an Rtree on HDFS, in order to improve the query performance of > high-selectivity spatial queries. > The Rtree is composed of a number of hdfs files (each one created by one > Reducer, so as the number of the files is equal to the number of the > reducers), where each file is a subtree of the root of the Rtree. > I investigate the way to use the Rtree in an efficient way, with respect to > the locality of each file on hdfs (data-placement). > > > I would like to ask, if it is possible to read a file which is on hdfs, from > a java application (not MapReduce). > In case this is not possible (as I believe), either I should download the > files on the local filesystem (which is not a solution, since the files could > be very large), orrun the queries using the Hadoop. > In order to maximise the gain, I should probably process a batch of queries > during each Job, and run each query on a node that is "near" to the files > that are involved in handling the specific query. > > Can I find the node where each file is located (or at least most of its > blocks), and run on that node a reducer that handles these queries? Could the > function DFSClient.getBlockLocations() help ? > > Thank you in advance, > Sofia > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434