Re: MR on HDFS data inserted via HBase?

Otis Gospodnetic Wed, 13 Jan 2010 20:28:43 -0800

Hello,

 
----- Original Message ----

> From: Amandeep Khurana <[email protected]>

> HBase has its own file format. Reading data from it in your own job will not
> be trivial to write, but not impossible.

You are referring to HTable, HFile, etc.?

> Why would you want to use the underlying data files in the MR jobs? Any
> limitation in using the HBase api?

Are you referring to writing a MR job that makes use of TableInputFormat and 
TableOutputFormat as mentioned on 
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink
 ?

I think that would work.

But I'd also like to be able to run Hive/Pig scripts over the data, and I 
*think* neither support reading it from HBase.  But they can obviously read it 
from files in HDFS, that's why I was asking.  But it sounds like anything 
wanting to read HBase's data without going through the HBase's API and reading 
from behind its back would have to know how to read from HFile & friends?
(and again, I think/assume Hive and Pig don't know how to do that)

Thanks,
Otis

> On Wed, Jan 13, 2010 at 8:06 PM, Otis Gospodnetic <
> [email protected]> wrote:
> 
> > Hello,
> >
> > If I import data into HBase, can I still run a hand-written MapReduce job
> > over that data in HDFS?
> > That is, not using TableInputFormat to read the data back out via HBase.
> >
> > Similarly, can one run Hive or Pig scripts against that data, but again,
> > without Hive or Pig reading the data via HBase, but rather getting to it
> > directly via HDFS?  I'm asking because I'm wondering whether storing data in
> > HBase means I can no longer use Hive and Pig to run my ad-hoc jobs.
> >
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >

Re: MR on HDFS data inserted via HBase?

Reply via email to