Yeah, I'm JIRA Watch-ing them. Thanks. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
----- Original Message ---- > From: Andrew Purtell <[email protected]> > To: [email protected] > Sent: Thu, January 14, 2010 5:29:31 AM > Subject: Re: MR on HDFS data inserted via HBase? > > There is some work on a SerDe for Hive for HBase ongoing: > > https://issues.apache.org/jira/browse/HIVE-705 > > https://issues.apache.org/jira/browse/HIVE-806 > > - Andy > > > ----- Original Message ---- > > From: Amandeep Khurana > > To: [email protected] > > Sent: Wed, January 13, 2010 8:36:15 PM > > Subject: Re: MR on HDFS data inserted via HBase? > > > > Yes, by api I mean TableInputFormat and TableOutputFormat. > > > > Pig has a connector to HBase. Not sure if Hive has one yet. > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > > > > > On Wed, Jan 13, 2010 at 8:28 PM, Otis Gospodnetic < > > [email protected]> wrote: > > > > > Hello, > > > > > > > > > ----- Original Message ---- > > > > > > > From: Amandeep Khurana > > > > > > > HBase has its own file format. Reading data from it in your own job will > > > not > > > > be trivial to write, but not impossible. > > > > > > You are referring to HTable, HFile, etc.? > > > > > > > Why would you want to use the underlying data files in the MR jobs? Any > > > > limitation in using the HBase api? > > > > > > Are you referring to writing a MR job that makes use of TableInputFormat > > > and TableOutputFormat as mentioned on > > > > > > http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink? > > > > > > I think that would work. > > > > > > But I'd also like to be able to run Hive/Pig scripts over the data, and I > > > *think* neither support reading it from HBase. But they can obviously > > > read > > > it from files in HDFS, that's why I was asking. But it sounds like > > > anything > > > wanting to read HBase's data without going through the HBase's API and > > > reading from behind its back would have to know how to read from HFile & > > > friends? > > > (and again, I think/assume Hive and Pig don't know how to do that) > > > > > > Thanks, > > > Otis > > > > > > > On Wed, Jan 13, 2010 at 8:06 PM, Otis Gospodnetic < > > > > [email protected]> wrote: > > > > > > > > > Hello, > > > > > > > > > > If I import data into HBase, can I still run a hand-written MapReduce > > > job > > > > > over that data in HDFS? > > > > > That is, not using TableInputFormat to read the data back out via > > > HBase. > > > > > > > > > > Similarly, can one run Hive or Pig scripts against that data, but > > > again, > > > > > without Hive or Pig reading the data via HBase, but rather getting to > > > it > > > > > directly via HDFS? I'm asking because I'm wondering whether storing > > > data in > > > > > HBase means I can no longer use Hive and Pig to run my ad-hoc jobs. > > > > > > > > > > Thanks, > > > > > Otis > > > > > -- > > > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > > > > > > > > > > > >
