Re: Hadoop and retrieving data from HDFS

Bryan Duxbury Thu, 24 Apr 2008 11:41:38 -0700

I think what you're saying is that you are mostly interested in datalocality. I don't think it's done yet, but it would be pretty easy tomake HBase provide start keys as well as region locations for splitsfor a MapReduce job. In theory, that would give you all the piecesyou need to run locality-aware processing.


-Bryan


On Apr 24, 2008, at 10:16 AM, Leon Mergen wrote:

Hello,
I'm sorry if a question like this has been asked before, but I wasunable tofind an answer for this anywhere on google; if it is off-topic, Iapologize
in advance.
I'm trying to look a bit into the future, and predict scalabilityproblemsfor the company I work for: we're using PostgreSQL, and processingmany
writes/second (access logs, currently around 250, but this will only
increase significantly in the future). Furthermore, we perform dataminingon this data, and ideally, need to have this data stored in astructured
form (the data is searched in various ways). In other words: a very
interesting problem.
Now, I'm trying to understand a bit of the hadoop/hbasearchitecture: as Iunderstand it, HDFS, MapReduce and HBase are sufficiently decoupledthat theuse case I was hoping for is not available; however, I'm stillgoing to ask:
Is it possible to store this data in hbase, and thus have allaccess logsdistributed amongst many different servers, and start MapReducejobs onthose actual servers, which process all the data on those servers ?In other
words, the data never leaves the actual servers ?
If this isn't possible, is this because someone simply never tookthe timeto implement such a thing, or is it hard to fit in the design (forexample,that the JobTracker needs to be aware of the physical locations ofall thedata, since you don't want to analyze the same (replicated) datatwice) ?
From what I understand by playing with hadoop for the past fewdays, theidea is that you fetch your MapReduce data from HDFS rather thanBigTable,
or am I mistaken ?

Thanks for your time!

Regards,

Leon Mergen

Re: Hadoop and retrieving data from HDFS

Reply via email to