Re: MR on HDFS data inserted via HBase?

Otis Gospodnetic Thu, 14 Jan 2010 08:17:10 -0800

Yeah, I'm JIRA Watch-ing them.  Thanks.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch




----- Original Message ----
> From: Andrew Purtell <[email protected]>
> To: [email protected]
> Sent: Thu, January 14, 2010 5:29:31 AM
> Subject: Re: MR on HDFS data inserted via HBase?
> 
> There is some work on a SerDe for Hive for HBase ongoing:
> 
>     https://issues.apache.org/jira/browse/HIVE-705
> 
>     https://issues.apache.org/jira/browse/HIVE-806
> 
>   - Andy
> 
> 
> ----- Original Message ----
> > From: Amandeep Khurana 
> > To: [email protected]
> > Sent: Wed, January 13, 2010 8:36:15 PM
> > Subject: Re: MR on HDFS data inserted via HBase?
> > 
> > Yes, by api I mean TableInputFormat and TableOutputFormat.
> > 
> > Pig has a connector to HBase. Not sure if Hive has one yet.
> > 
> > 
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> > 
> > 
> > On Wed, Jan 13, 2010 at 8:28 PM, Otis Gospodnetic <
> > [email protected]> wrote:
> > 
> > > Hello,
> > >
> > >
> > > ----- Original Message ----
> > >
> > > > From: Amandeep Khurana 
> > >
> > > > HBase has its own file format. Reading data from it in your own job will
> > > not
> > > > be trivial to write, but not impossible.
> > >
> > > You are referring to HTable, HFile, etc.?
> > >
> > > > Why would you want to use the underlying data files in the MR jobs? Any
> > > > limitation in using the HBase api?
> > >
> > > Are you referring to writing a MR job that makes use of TableInputFormat
> > > and TableOutputFormat as mentioned on
> > > 
> > 
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink?
> > >
> > > I think that would work.
> > >
> > > But I'd also like to be able to run Hive/Pig scripts over the data, and I
> > > *think* neither support reading it from HBase.  But they can obviously 
> > > read
> > > it from files in HDFS, that's why I was asking.  But it sounds like 
> > > anything
> > > wanting to read HBase's data without going through the HBase's API and
> > > reading from behind its back would have to know how to read from HFile &
> > > friends?
> > > (and again, I think/assume Hive and Pig don't know how to do that)
> > >
> > > Thanks,
> > > Otis
> > >
> > > > On Wed, Jan 13, 2010 at 8:06 PM, Otis Gospodnetic <
> > > > [email protected]> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > If I import data into HBase, can I still run a hand-written MapReduce
> > > job
> > > > > over that data in HDFS?
> > > > > That is, not using TableInputFormat to read the data back out via
> > > HBase.
> > > > >
> > > > > Similarly, can one run Hive or Pig scripts against that data, but
> > > again,
> > > > > without Hive or Pig reading the data via HBase, but rather getting to
> > > it
> > > > > directly via HDFS?  I'm asking because I'm wondering whether storing
> > > data in
> > > > > HBase means I can no longer use Hive and Pig to run my ad-hoc jobs.
> > > > >
> > > > > Thanks,
> > > > > Otis
> > > > > --
> > > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > > > >
> > > > >
> > >
> > >

Re: MR on HDFS data inserted via HBase?

Reply via email to