Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan

2015-08-04 Thread Nick Dimiduk
Nice work! Congratulations Sushanth!

On Wed, Jul 22, 2015 at 9:45 AM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Sushanth Sowmyan has been elected to the Hive
 Project Management Committee. Please join me in congratulating Sushanth!

 Thanks.

 - Carl



Re: ApacheCON EU HBase Track Submissions

2015-06-30 Thread Nick Dimiduk
Get your submissions in, the deadline is imminent!

On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote:

 Hello developers, users, speakers,

 As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
 HBase: NoSQL + SQL track come together. The idea is to showcase the
 growing ecosystem of applications and tools built on top of and around
 Apache HBase. To have a track, we need content, and that's where YOU come
 in.

 CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
 submitted so we can pull together a full day of great HBase ecosystem
 talks! Already planning to submit a talk on Hive? Work in HBase and we'll
 get it promoted as part of the track!

 Thanks,
 Nick

 ApacheCON EU
 Sept 28 - Oct 2
 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!)
 http://apachecon.eu/
 CFP link: http://events.linuxfoundation.org/cfp/dashboard



Re: ApacheCON EU HBase Track Submissions

2015-06-25 Thread Nick Dimiduk
On Thu, Jun 25, 2015 at 12:13 PM, Owen O'Malley omal...@apache.org wrote:

 Actually, Apache: Big Data Europe CFP closes 10 July.

 The CFP is:
 http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp


My message was regarding a track for ApacheCON EU Core, for which the CFP
closes July 1. Communities are not given the opportunity to organize tracks
for the big data conference.

On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote:

  Hello developers, users, speakers,
 
  As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
  HBase: NoSQL + SQL track come together. The idea is to showcase the
  growing ecosystem of applications and tools built on top of and around
  Apache HBase. To have a track, we need content, and that's where YOU come
  in.
 
  CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
  submitted so we can pull together a full day of great HBase ecosystem
  talks! Already planning to submit a talk on Hive? Work in HBase and we'll
  get it promoted as part of the track!
 
  Thanks,
  Nick
 
  ApacheCON EU
  Sept 28 - Oct 2
  Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome
 city!)
  http://apachecon.eu/
  CFP link: http://events.linuxfoundation.org/cfp/dashboard
 



ApacheCON EU HBase Track Submissions

2015-06-25 Thread Nick Dimiduk
Hello developers, users, speakers,

As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a
HBase: NoSQL + SQL track come together. The idea is to showcase the
growing ecosystem of applications and tools built on top of and around
Apache HBase. To have a track, we need content, and that's where YOU come
in.

CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks
submitted so we can pull together a full day of great HBase ecosystem
talks! Already planning to submit a talk on Hive? Work in HBase and we'll
get it promoted as part of the track!

Thanks,
Nick

ApacheCON EU
Sept 28 - Oct 2
Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!)
http://apachecon.eu/
CFP link: http://events.linuxfoundation.org/cfp/dashboard


Re: [DISCUSS] Supporting Hadoop-1 and experimental features

2015-05-22 Thread Nick Dimiduk
On Fri, May 22, 2015 at 1:19 PM, Alan Gates alanfga...@gmail.com wrote:

 I see your point on saying the contributor may not understand where best
 to put the patch, and thus the committer decides.  However, it would be
 very disappointing for a contributor who uses branch-1 to build a new
 feature only to have the committer put it only in master.  So I would
 modify your modification to say at the discretion of the contributor and
 Hive committers.


For what its worth, this is more or less how HBase works. All features land
first in master and then percolate backwards to open, active branches
where's it's acceptable to do so. Since our 1.0 release, we're trying to
make 1.0+ follow more closely to semantic versioning. This means that new
features never land in a released minor branch. Bug fixes are applied to
all applicable branches, sometimes this means older release branches and
not master. Sometimes that means contributors are forced to upgrade in
order to take advantage of their contribution in an Apache release (they're
fine to run their own patched builds as they like; it's open source). Right
now we have:

master - (unreleased, development branch for eventual 2.0)
branch-1 - (unreleased, development branch for 1.x series, soon to be
branch basis for 1.2)
branch-1.1 - (released branch, accepting only bug fixes for 1.1.x line)
branch-1.0 - (released branch, accepting only bug fixes for 1.0.x line)

When we're ready, branch-1.2 will fork from branch-1 and branch-1 will
become development branch for 1.3. Eventually we'll decide it's time for
2.0 and master will be branched, creating branch-2. branch-2 will follow
the same process.

We also maintain active branches for 0.98.x and 0.94.x. These branches are
different, following our old model of receiving backward-compatible new
features in .x versions. 0.94 is basically retired now, only getting bug
fixes. 0.94 is only hadoop-1, 0.98 supports both hadoop-1 and hadoop-2
(maybe we've retired hadoop-2 support here in the .12 release?), 1.x
support hadoop-2 only. 2.0 is undecided, but presumably will be hadoop-2
and hadoop-3 if we can extend our shim layer for it.

We have separate release managers for 0.94, 0.98, 1.0, and 1.1, and we're
discussing preparations for 1.2. They enforce commits against their
respective branches.


   kulkarni.swar...@gmail.com
  May 22, 2015 at 11:41
 +1 on the new proposal. Feedback below:

  New features must be put into master.  Whether to put them into
 branch-1 is at the discretion of the developer.

 How about we change this to *All* features must be put into master.
 Whether to put them into branch-1 is at the discretion of the *committer*.
 The reason I think is going forward for us to sustain as a happy and
 healthy community, it's imperative for us to make it not only easy for the
 users, but also for developers and committers to contribute/commit patches.
 To me being a hive contributor would be hard to determine which branch my
 code belongs. Also IMO(and I might be wrong) but many committers have their
 own areas of expertise and it's also very hard for them to immediately
 determine what branch a patch should go to unless very well documented
 somewhere. Putting all code into the master would be an easy approach to
 follow and then cherry picking to other branches can be done. So even if
 people forget to do that, we can always go back to master and port the
 patches out to these branches. So we have a master branch, a branch-1 for
 stable code, branch-2 for experimental and bleeding edge code and so on.
 Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.

 Another reason I say this is because in my experience, a pretty
 significant amount of work is hive is still bug fixes and I think that is
 what the user cares most about(correctness above anything else). So with
 this approach, might be very obvious to what branches to commit this to.




 --
 Swarnim
Chris Drome cdr...@yahoo-inc.com.INVALID
  May 22, 2015 at 0:49
 I understand the motivation and benefits of creating a branch-2 where more
 disruptive work can go on without affecting branch-1. While not necessarily
 against this approach, from Yahoo's standpoint, I do have some questions
 (concerns).
 Upgrading to a new version of Hive requires a significant commitment of
 time and resources to stabilize and certify a build for deployment to our
 clusters. Given the size of our clusters and scale of datasets, we have to
 be particularly careful about adopting new functionality. However, at the
 same time we are interested in new testing and making available new
 features and functionality. That said, we would have to rely on branch-1
 for the immediate future.
 One concern is that branch-1 would be left to stagnate, at which point
 there would be no option but for users to move to branch-2 as branch-1
 would be effectively end-of-lifed. I'm not sure how long this would take,
 but it would eventually happen as a direct result of the 

Re: [DISCUSS] Hive/HBase Integration

2015-05-11 Thread Nick Dimiduk
I'm particularly interested in how the extraction of ORC will necessitate
creation of a more expressive API for Hive's storage tier.
StorageHandlerV2 should be considered with HBase in mind as well as ORC.

On Sat, May 9, 2015 at 9:37 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 Hello all,

 So last week, Myself, Brock Noland and Nick Dimiduk got a chance to present
 some of the work we have been doing in the Hive/HBase integration space at
 HBaseCon 2015 (slides here[1] for anyone interested). One of the
 interesting things that we noted at this conference was that even though
 this was an HBase conference, *SQL on HBase* was by far the most popular
 theme with talks on Apache Phoenix, Trafodion, Apache Kylin, Apache Drill
 and a SQL-On-HBase panel to compare these and other technologies.

 I personally feel that with the existing work, we have come a long way but
 still have work to do and would need more love to make this a top-notch
 feature of Hive. However I was curious to know what the community thought
 about it and where do they see this integration stand in coming time when
 compared with all the other upcoming techs?

 Thanks,
 Swarnim

 [1]

 https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p



Re: VOTE: move to git

2015-04-15 Thread Nick Dimiduk
+1

On Wed, Apr 15, 2015 at 3:18 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 +1

 On Wed, Apr 15, 2015 at 3:09 PM, Vivek Shrivastava 
 vivshrivast...@gmail.com
  wrote:

  +1
  On Apr 15, 2015 6:06 PM, Vaibhav Gumashta vgumas...@hortonworks.com
  wrote:
 
   +1
  
   On 4/15/15, 2:55 PM, Prasanth Jayachandran
   pjayachand...@hortonworks.com wrote:
  
   +1
   
On Apr 15, 2015, at 2:48 PM, Gopal Vijayaraghavan 
 gop...@apache.org
   wrote:
   
+1
   
End the merge madness.
   
On 4/15/15, 11:46 PM, Sergey Shelukhin ser...@apache.org wrote:
   
Hi.
We’ve been discussing this some time ago; this time I¹d like to
 start
   an
official vote about moving Hive project to git from svn.
   
I volunteer to facilitate the move; that seems to be just filing
  INFRA
jira, and following instructions such as verifying that the new
 repo
  is
sane.
   
Please vote:
+1 move to git
0 don’t care
-1 stay on svn
   
+1.
   
   
   
   
   
   
  
  
 



Re: ORC separate project

2015-04-01 Thread Nick Dimiduk
I think the storage-api would be very helpful for HBase integration as well.

On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley omal...@apache.org wrote:



 On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates alanfga...@gmail.com wrote:



   Carl Steinbach cwsteinb...@gmail.com
  April 1, 2015 at 0:01

 Hi Owen,

 I think you're referring to the following questions I asked last week on
 the PMC mailing list:

 1) How much if any of the code for vectorization/sargs/ACID will migrate
 over to the new ORC project.

 2) Will Hive contributors encounter situations where they are required to
 make changes to ORC in order to complete work on projects related to
 vectorization/sargs/ACID or other Hive features?

  What I'd like to see here is well defined interfaces in Hive so that any
 storage format that wants can implement them.  Hopefully that means things
 like interfaces and utility classes for acid, sargs, and vectorization move
 into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
 on this module without needing to pull in all of Hive.

 Then Hive contributors would only be forced to make changes in Orc when
 they want to implement something in Orc.


 Agreed. The goal of the new module keep a clean separation between the
 code for ORC and Hive so that vectorization, sargs, and acid are kept in
 Hive and are not moved to or duplicated in the ORC project.

 .. Owen



Re: ORC separate project

2015-03-19 Thread Nick Dimiduk
This is a great plan, +1!

On Thursday, March 19, 2015, Owen O'Malley omal...@apache.org wrote:

 All,
Over the last year, there has been a fair number of projects that want
 to integrate with ORC, but don't want a dependence on Hive's exec jar.
 Additionally, we've been working on a C++ reader (and soon writer) and it
 would be great to host them both in the same project. Toward that end, I'd
 like to create a separate ORC project at Apache. There will be lots of
 technical details to work out, but I wanted to give the Hive community a
 chance to discuss it. Do any of the Hive committers want to be included on
 the proposal?

 Of the current Hive committers, my list looks like:
 * Alan
 * Gunther
 * Prasanth
 * Lefty
 * Owen
 * Sergey
 * Gopal
 * Kevin

 Did I miss anyone?

 Thanks!
Owen



HS2 standalone JDBC jar not standalone

2015-03-02 Thread Nick Dimiduk
Heya,

I've like to use jmeter against HS2/JDBC and I'm finding the standalone
jar isn't actually standalone. It appears to include a number of
dependencies but not Hadoop Common stuff. Is there a packaging of this jar
that is actually standalone? Are there instructing for using this
standalone jar as it is?

Thanks,
Nick

jmeter.JMeter: Uncaught exception:  java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at
org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394)
at
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188)
at
org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at
org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138)
at
org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311)
at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235)
at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108)
at
org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214)
at
org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more


Re: HS2 standalone JDBC jar not standalone

2015-03-02 Thread Nick Dimiduk
Thanks Alexander!

On Mon, Mar 2, 2015 at 10:31 AM, Alexander Pivovarov apivova...@gmail.com
wrote:

 yes, we even have a ticket for that
 https://issues.apache.org/jira/browse/HIVE-9600

 btw can anyone test jdbc driver with kerberos enabled?
 https://issues.apache.org/jira/browse/HIVE-9599


 On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 Heya,

 I've like to use jmeter against HS2/JDBC and I'm finding the standalone
 jar isn't actually standalone. It appears to include a number of
 dependencies but not Hadoop Common stuff. Is there a packaging of this jar
 that is actually standalone? Are there instructing for using this
 standalone jar as it is?

 Thanks,
 Nick

 jmeter.JMeter: Uncaught exception:  java.lang.NoClassDefFoundError:
 org/apache/hadoop/conf/Configuration
 at
 org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394)
 at
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188)
 at
 org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164)
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
 at java.sql.DriverManager.getConnection(DriverManager.java:571)
 at java.sql.DriverManager.getConnection(DriverManager.java:233)
 at
 org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138)
 at
 org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.conf.Configuration
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 13 more





Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-26 Thread Nick Dimiduk
Nice work Sergey!

On Wednesday, February 25, 2015, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Sergey Shelukhin has been elected to the
 Hive Project Management Committee. Please join me in congratulating Sergey!

 Thanks.

 - Carl




[jira] [Updated] (HIVE-4765) Improve HBase bulk loading facility

2015-02-05 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-4765:
---
Fix Version/s: 1.2.0

Planting a stake to get this in for 1.2.0.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9591) Add support for OrderedByte encodings

2015-02-05 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-9591:
--

 Summary: Add support for OrderedByte encodings
 Key: HIVE-9591
 URL: https://issues.apache.org/jira/browse/HIVE-9591
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk


HBase has added out-of-the-box support for order-preserving data encoding for 
many common primitive types (HBASE-8201). The StorageHandler should add support 
for these encodings, which will increase the range of predicates that can be 
pushed down to HBase.

I propose adding a {{#o / #orderedbytes}} specifier to the column mappings to 
enable this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2015-02-05 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7805:
---
Priority: Major  (was: Minor)

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7805) Support running multiple scans in hbase-handler

2015-02-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307771#comment-14307771
 ] 

Nick Dimiduk commented on HIVE-7805:


Bumping priority for a nice, easy performance gain.

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9521) Drop support for Java6

2015-02-02 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302555#comment-14302555
 ] 

Nick Dimiduk commented on HIVE-9521:


Test failure looks unrelated and console output looks clean. Can I get a 
+1/commit? :)

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9521:
---
Attachment: HIVE-9521.00.patch

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9521:
---
Status: Patch Available  (was: Open)

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-9521:
--

 Summary: Drop support for Java6
 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0


As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan

2015-01-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: (was: HIVE-9504.00.patch)

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan

2015-01-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: HIVE-9504.00.patch

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan

2015-01-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Status: Patch Available  (was: Open)

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9504) [beeline] ZipException when using !scan

2015-01-28 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-9504:
--

 Summary: [beeline] ZipException when using !scan
 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0


Notice this while mucking around:

{noformat}
0: jdbc:hive2://localhost:1/ !scan
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:220)
at java.util.zip.ZipFile.init(ZipFile.java:150)
at java.util.jar.JarFile.init(JarFile.java:166)
at java.util.jar.JarFile.init(JarFile.java:130)
at 
org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
at org.apache.hive.beeline.Commands.scan(Commands.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan

2015-01-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: HIVE-9504.00.patch

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Creating a branch for hbase metastore work

2015-01-22 Thread Nick Dimiduk
+1

On Thursday, January 22, 2015, Brock Noland br...@cloudera.com wrote:

 +1

 On Thu, Jan 22, 2015 at 8:19 PM, Alan Gates ga...@hortonworks.com
 javascript:; wrote:
  I've been working on a prototype of using HBase to store Hive's metadata.
  Basically I've built a new implementation of RawStore that writes to
 HBase
  rather than DataNucleus.  I want to see if I can build something that
 has a
  much more straightforward schema than DN and that is much faster.
 
  I'd like to get this out in public so other can look at it and
 contribute,
  but it's no where near ready for real time.  So I propose to create a
 branch
  and put the code there.  Any objections?
 
  Alan.
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure or forwarding
 of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
 it
  from your system. Thank You.



Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk
I guess the PMC should be responsive to this kind of question.

Can you not depend on a library containing these files? Why include the
source directly?

On Wed, Jan 21, 2015 at 3:16 PM, Sergey Shelukhin ser...@hortonworks.com
wrote:

 Ping? Where do I write about such matters if not here.

 On Wed, Jan 14, 2015 at 11:43 AM, Sergey Shelukhin ser...@hortonworks.com
 
 wrote:

  Suppose I want to use a Java source within Hive that has this header (I
  don't now, but I was considering it and may want it later ;)):
 
  /*
   * Written by Doug Lea with assistance from members of JCP JSR-166
   * Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/licenses/publicdomain
   */
 
  As far as I see the class is not available in binary distribution, and
  there are projects on github that use it as is and add their license on
 top.
  Can I add it to Apache (Hive) codebase?
  Should Apache license header be added? Should the original header be
  retained?
 
 
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Undeliverable mail: Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk
Thank you Ashutosh!

On Wed, Jan 21, 2015 at 4:10 PM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Done. I have removed two offending ids from list.

 On Wed, Jan 21, 2015 at 3:22 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Seriously, these guys are still spamming this list? Why hasn't the
 dev-list
  admin booted these receivers yet? It's been *months*.
 
  On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com
 wrote:
 
   Failed to deliver to 'bsc...@ebuddy.com'
   SMTP module(domain mail-in.ebuddy.com:25) reports:
host mail-in.ebuddy.com:25 says:
550 5.1.1 User unknown
  
  
   Original-Recipient: rfc822;bsc...@ebuddy.com
   Final-Recipient: rfc822;bsc...@ebuddy.com
   Action: failed
   Status: 5.0.0
  
  
 



Re: Undeliverable mail: Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk
Seriously, these guys are still spamming this list? Why hasn't the dev-list
admin booted these receivers yet? It's been *months*.

On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com wrote:

 Failed to deliver to 'bsc...@ebuddy.com'
 SMTP module(domain mail-in.ebuddy.com:25) reports:
  host mail-in.ebuddy.com:25 says:
  550 5.1.1 User unknown


 Original-Recipient: rfc822;bsc...@ebuddy.com
 Final-Recipient: rfc822;bsc...@ebuddy.com
 Action: failed
 Status: 5.0.0




What's the status of AccessServer?

2014-12-17 Thread Nick Dimiduk
Hi folks,

I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One
potential way to do that is to follow in the Hive footsteps with a HS2
protocol-compatible service. I've done some prototyping along these lines
and see that it's quite feasible. Along the way I came across this proposal
for refactoring HS2 into the AccessServer [1].

What's the state of the AccessServer project? Is anyone working on it? Is
there a relationship between this effort and Calcite's Avatica [2]? The
system proposed in the AccessServer doc seems to fit nicely in line with
Calcite's objectives.

Thanks,
Nick

[0]: http://phoenix.apache.org
[1]:
https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal
[2]:
http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E


[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234532#comment-14234532
 ] 

Nick Dimiduk commented on HIVE-8809:


Using activeByDefault causes issues -- if you specify some other unrelated 
profiles (thrift generation, for instance), you end up disabling your default 
profile. Better to use a property flag.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, 
 dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, 
 dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-8809:
---
Attachment: HIVE-8809.01.patch

Over on HBase, we have the property hadoop.profile and check it's value. See 
also http://java.dzone.com/articles/maven-profile-best-practices

Give this patch a spin. For hadoop1 build, add {{-Dhadoop.profile=1}}.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, 
 dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, 
 dep_with_hadoop_2.txt, dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [PROPOSAL] Hivemall incubation

2014-11-21 Thread Nick Dimiduk
Hi Makoto,

I cannot speak for Hive PMC, only as a data tool user and occasional
contributor. I think the idea is very much a good one. Incubator takes a
lot of work because it's all about establishing a vibrant developer and
user community for the project. Community before code, as they say.

I would also encourage you to consider joining forces with DataFu, rather
than competing. I think there's a real appetite a wholistic toolbox of
patterns and implementations that can span these projects. From my
understanding, there's nothing about DataFu that's unique to Pig, they just
need the work done to abstract away the Pig bits and implement the Hive
interfaces.

Is there anything about Hivemall that's unique to Hive, that wouldn't be
applicable to Pig as well?

+Casey, as I believe he has some interest in seeing DataFu reach a wider
audience as well.

Good on you.
Nick

On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote:

 Hi all,

 I am the principal developer of Hivemall, a scalable machine learning
 library for Apache Hive.

   https://github.com/myui/hivemall

 When I presented a talk at the last Hadoop Summit in San Jose [1],
 several audiences asked me the possibility to change the software
 license of Hivemall to Apache License v2 and then sustainability of the
 project was their major concerns.

 Since then, I am wondering to propose Hivemall as an Apache Incubator
 project. The position of Hivemall for Hive would become similar one to
 DataFu (an Apache Incubator project) for Apache Pig.

 I believe that adding machine learning functionality over Apache Hive
 could extend application range of Apache Hive and Hivemall could help
 existing Hive users in their learning-scale data analytics projects.

 I have got approved from my employer (AIST) to change the license of
 Hivemall to Apache License version 2 and the donating the code to Apache
 Foundation. And now, I am willing to propose Hivemall as an Apache
 incubator project, together with Hivemall contributors in NTT corp.

 I am considering that the current Hivemall codebase is bits large to be
 included in Hive contrib and thus it is better to be a separated
 incubator project. I would like to propose Hivemall to be graduated as a
 subproject of Apache Hive.

 Is the strategy possible from the Hive PMC point of view?
 http://incubator.apache.org/guides/graduation.html#subproject-or-top-level

 Before formulating a proposal, I would like to hear Hive developers’
 opinion (e.g., possibilities, +1/-1, and missing pieces for incubations)
 on incubating Hivemall.

 BTW, I found this JIRA issue mentioning Hivemall.
 https://issues.apache.org/jira/browse/HIVE-7940

 Is there a possibility to cooperate with them in proposing Hivemall to
 Apache Incubator project? According the incubation guides, I need a
 mentor/champion for incubating.
 http://incubator.apache.org/guides/proposal.html#formulating

 Your help toward the incubation will be much appreciated.

 Thanks,
 Makoto

 [1] http://www.slideshare.net/myui/hivemall-hadoop-summit-2014-san-jose

 -- *** Makoto YUI
 m@aist.go.jp javascript:; Information Technology Research
 Institute, AIST.
 http://staff.aist.go.jp/m.yui/ ***



Re: [PROPOSAL] Hivemall incubation

2014-11-21 Thread Nick Dimiduk
Thank you for humoring my questions. I do not know the mind of the DataFu
community. Your observations are quite clear; I have no further concerns.

-n

On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote:

 Hi Nick,

 Thank you for the comments.

 (2014/11/22 3:42), Nick Dimiduk wrote:

 I would also encourage you to consider joining forces with DataFu,
 rather than competing. I think there's a real appetite a wholistic
 toolbox of patterns and implementations that can span these projects.
  From my understanding, there's nothing about DataFu that's unique to
 Pig, they just need the work done to abstract away the Pig bits and
 implement the Hive interfaces.


 My current understanding of DataFu is that it is UDF collections for
 Apache Pig. Though Hive interface is not yet supported in DataFu, is the
 direction (to extend DataFu for Hive) a consensus in DataFu community?

 My concern is that merging Hivemall codebase to DataFu makes the building
 and packing process of DataFu complex and the target/objective of the
 project unclear.

 I do not think that Hivemall competes with DataFu because
 1) There are users who prefer Pig and Hive respectively, and
 2) Pig/DataFu is useful for what HiveQL is unsuited (e.g., complex feature
 engineering steps). After preprocessing using DataFu, Hivemall can be
 applied for classification/regression in a scalable way in Hive.

  Is there anything about Hivemall that's unique to Hive, that wouldn't be
 applicable to Pig as well?


 The techniques used in Hivemall (e.g., training data amplification that
 emulates iterative training and machine learning algorithms as
 table-generating functions) could be appreciable to Apache Pig.

 However, I am not a heavy user of Pig and porting Hivemall to Pig requires
 a bunch of works. So, I am currently considering to stick with HiveQL
 interfaces (Hive, HCatalog, and Tez for the software stack of Hivemall) in
 developing Hivemall because SQL-like interface is friendly to a broader
 range of developers.

 Thanks,
 Makoto

 --
 ***
 Makoto YUI m@aist.go.jp
 Information Technology Research Institute, AIST.
 https://staff.aist.go.jp/m.yui/index_e.html
 ***



[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats

2014-11-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206979#comment-14206979
 ] 

Nick Dimiduk commented on HIVE-8808:


Yes, it looks like our input formats are stateful. We're set up to specify a 
config object and that's closed over to provide custom record readers, based on 
which mapred API is being implemented.

 HiveInputFormat caching cannot work with all input formats
 --

 Key: HIVE-8808
 URL: https://issues.apache.org/jira/browse/HIVE-8808
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In {{HiveInputFormat}} we implement instance caching (see 
 {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are 
 stateless but I don't think this assumption is true, especially with regards 
 to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats

2014-11-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206983#comment-14206983
 ] 

Nick Dimiduk commented on HIVE-8808:


FYI, this may be changing with HBase 1.0 as we're reworking the way connection 
management is handled. I'm behind on my reviews, so I don't know if this 
assumption will change.

 HiveInputFormat caching cannot work with all input formats
 --

 Key: HIVE-8808
 URL: https://issues.apache.org/jira/browse/HIVE-8808
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In {{HiveInputFormat}} we implement instance caching (see 
 {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are 
 stateless but I don't think this assumption is true, especially with regards 
 to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2828) make timestamp accessible in the hbase KeyValue

2014-10-10 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167773#comment-14167773
 ] 

Nick Dimiduk commented on HIVE-2828:


Left a comment on phabricator. +1

 make timestamp accessible in the hbase KeyValue 
 

 Key: HIVE-2828
 URL: https://issues.apache.org/jira/browse/HIVE-2828
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.5.patch, HIVE-2828.6.patch.txt, 
 HIVE-2828.7.patch.txt, HIVE-2828.8.patch.txt


 Originated from HIVE-2781 and not accepted, but I think this could be helpful 
 to someone.
 By using special column notation ':timestamp' in HBASE_COLUMNS_MAPPING, user 
 might access timestamp value in hbase KeyValue.
 {code}
 CREATE TABLE hbase_table (key int, value string, time timestamp)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string,:timestamp)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: hive unit test report question

2014-09-07 Thread Nick Dimiduk
IMHO, would be better to wire up the integration suite via failsafe plugin
(surefire for IT) and link the modules correctly. This is on (admittedly,
near the bottom of) my todo list. See also HBase poms for an example.

-n

On Saturday, September 6, 2014, wzc wzc1...@gmail.com wrote:

 hi all:
  I would like to create a jenkins job to run both hive ut and integration
 test. Right now it seems that I have to execute mulitple maven goals in
 different poms:

 mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2
  cd itests
  mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2


 I would like to use one maven jenkins job and right now I cant figure out
 how to configure job propery to execute  maven goals  in different poms
 (maybe I can add post-build step to execute another shell?).  Each hive
 ptest2 job can run all tests and I would like to know the configure it use.

 Any help is appreciated.

 Thanks.







 2014-01-14 14:05 GMT+08:00 Shanyu Zhao shz...@microsoft.com
 javascript:;:

  Thanks guys for your help!
 
  I found Eugene's comments are particularly helpful. With
  -Daggregate=true I now can see an aggregated unit test results.
 
  Btw, I didn't mean to run itests, I just want to run all unit tests. I
  think in the FAQ they made it clear that itests are disconnected from the
  top level pom.xml.
 
  Shanyu
 
  -Original Message-
  From: Eugene Koifman [mailto:ekoif...@hortonworks.com javascript:;]
  Sent: Monday, January 13, 2014 4:06 PM
  To: dev@hive.apache.org javascript:;
  Subject: Re: hive unit test report question
 
  I think you want to add
  -Daggregate=true
  you should then have target/site/surefire-report.html in the module where
  you ran the command
 
 
 
  On Mon, Jan 13, 2014 at 2:54 PM, Szehon Ho sze...@cloudera.com
 javascript:; wrote:
 
   Hi Shanyu,
  
   Are you running in /itests?  The unit tests are in there, and are not
   run if you are running from the root.
  
   Thanks
   Szehon
  
  
   On Mon, Jan 13, 2014 at 1:59 PM, Shanyu Zhao shz...@microsoft.com
 javascript:;
  wrote:
  
Hi,
   
I was trying to build hive trunk, run all unit tests and generate
   reports,
but I'm not sure what's the correct command line. I was using:
mvn clean install -Phadoop-2 -DskipTests mvn test
surefire-report:report -Phadoop-2 But the reports in the root folder
and several other projects (such as
metastore) are empty with no test results. And I couldn't find a
summary page for all unit tests.
   
I was trying to avoid mvn site because it seems to take forever to
finish. Am I using the correct commands? How can I get a report like
the one in the precommit report:
   
   http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/827/testRep
   ort/
?
   
I really appreciate your help!
   
Shanyu
   
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-20 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104398#comment-14104398
 ] 

Nick Dimiduk commented on HIVE-4765:


Ping [~navis], [~sushanth].

Any chance we can get some action on this one for 0.14 release? It's definitely 
better than what's available.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for release of Hive 0.14

2014-08-20 Thread Nick Dimiduk
It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a
big improvement for us HBase folks. Would someone mind having a look in
that direction?

Thanks,
Nick


On Tue, Aug 19, 2014 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote:

 +1
 Sounds good to me.
 Its already almost 4 months since the last release. It is time to
 start preparing for the next one.
 Thanks for volunteering!


 On Tue, Aug 19, 2014 at 2:02 PM, Vikram Dixit vik...@hortonworks.com
 wrote:
  Hi Folks,
 
  I was thinking that it was about time that we had a release of hive 0.14
  given our commitment to having a release of hive on a periodic basis. We
  could cut a branch and start working on a release in say 2 weeks time
  around September 5th (Friday). After branching, we can focus on
 stabilizing
  for the release and hopefully have an RC in about 2 weeks post that. I
  would like to volunteer myself for the duties of the release manager for
  this version if the community agrees.
 
  Thanks
  Vikram.
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Mail bounces from ebuddy.com

2014-08-20 Thread Nick Dimiduk
Not quite taken care of. I'm still getting spam about these addresses.


On Mon, Aug 18, 2014 at 9:18 AM, Lars Francke lars.fran...@gmail.com
wrote:

 Thanks Alan and Ashutosh for taking care of this!


 On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Thanks, Alan for the hint. I just unsubscribed those two email addresses
  from ebuddy.
 
 
  On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com
 wrote:
 
   Anyone who is an admin on the list (I don't who the admins are) can do
   this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org
 where
   USERNAME is the name of the bouncing user (see
   http://untroubled.org/ezmlm/ezman/ezman1.html )
  
   Alan.
  
  
  
 Thejas Nair the...@hortonworks.com
August 17, 2014 at 17:02
   I don't know how to do this.
  
   Carl, Ashutosh,
   Do you guys know how to remove these two invalid emails from the
 mailing
   list ?
  
  
 Lars Francke lars.fran...@gmail.com
August 17, 2014 at 15:41
   Hmm great, I see others mentioning this as well. I'm happy to contact
  INFRA
   but I'm not sure if they are even needed or if someone from the Hive
 team
   can do this?
  
  
   On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:43
   (Excuse the spam.) Actually I'm getting two bounces per message, but
  gmail
   concatenates them so I didn't notice the second one.
  
   -- Lefty
  
  
   On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:36
   Curious, I've only been getting one bounce per message. Anyway thanks
 for
   bringing this up.
  
   -- Lefty
  
  
  
 Lars Francke lars.fran...@gmail.com
August 7, 2014 at 4:38
   Hi,
  
   every time I send a mail to dev@ I get two bounce mails from two
 people
  at
   ebuddy.com. I don't want to post the E-Mail addresses publicly but I
 can
   send them on if needed (and it can be triggered easily by just replying
  to
   this mail I guess).
  
   Could we maybe remove them from the list?
  
   Cheers,
   Lars
  
  
   --
   Sent with Postbox http://www.getpostbox.com
  
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
   to which it is addressed and may contain information that is
  confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-15 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099180#comment-14099180
 ] 

Nick Dimiduk commented on HIVE-7068:


This is really cool, nice work fellas! It's a shame to see so many of the 
StorageHandler warts repeated here too, but that's how it is.

Does it make sense to try to share more code between the accumulo and hbase 
modules? Column mapping stuff looks pretty much identical to me, and maybe the 
hbase module could benefit from some of the comparator work? Nothing critical 
for this patch, but could be good for follow-on work.

I'm with [~navis] on this one, +1 for getting it committed setting users loose 
to play!

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-13 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096390#comment-14096390
 ] 

Nick Dimiduk commented on HIVE-4765:


Hi [~navis]. Have you had time to look at this lately? It would sure be better 
than the mostly-broken instructions on 
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-12 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Release Note: 
Hive can now execute queries against HBase table snapshots. This feature is 
available for any table defined using the HBaseStorageHandler. It requires at 
least HBase 0.98.3.

To query against a snapshot instead of the online table, specify the snapshot 
name via hive.hbase.snapshot.name. The snapshot will be restored into a unique 
directory under /tmp. This location can be overridden by setting a path via 
hive.hbase.snapshot.restoredir.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088450#comment-14088450
 ] 

Nick Dimiduk commented on HIVE-7618:


Option (1) would pave the way for migrating all these baked-in storage systems 
over to the StorageHandler interface -- a healthy thing for the long-term 
success of Hive, IMHO. That's a larger conversation though.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088453#comment-14088453
 ] 

Nick Dimiduk commented on HIVE-7618:


Anyway, +1 for your patch v2. It's 80% of what we'd want for option (1) anyway. 
Would be best if DDLTask didn't need that nasty hack.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086367#comment-14086367
 ] 

Nick Dimiduk commented on HIVE-6584:


Restore location is optional. It defaults to /tmp. The restore process creates 
a uniquely named (random uuid) directory under this path for any give restore, 
so users who never set this value will not conflict with each other.

It would be nice if hive had some kind of post-job hook that could be used to 
clean up the restoredir artifacts after the input format is finished with them.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087080#comment-14087080
 ] 

Nick Dimiduk commented on HIVE-7618:


This will cause failure in HIVE-6584. Reason being, StorageHandler tables don't 
have a location, so this makeLocationQualified constructs locations that don't 
exist. IIRC, this resulted in an HBase table being assigned a location in the 
warehouse, which then confuses other pieces.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087082#comment-14087082
 ] 

Nick Dimiduk commented on HIVE-7618:


Can the test be updated to explicitly set a foreign hdfs for its location?

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-04 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084852#comment-14084852
 ] 

Nick Dimiduk commented on HIVE-6584:


Thanks folks! Any chance of getting a commit this week? :)

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-04 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085004#comment-14085004
 ] 

Nick Dimiduk commented on HIVE-4765:


Bump. Patch still applies to master, with a little fuzz.

Is the new SerDe and Union business necessary? It would be really great to 
integrate this into the StorageHandler as an online switch, like as I was 
aiming for on HIVE-2365. Swapping out the output format at runtime seems to 
work alright and it saves the user from having to define another table, repeat 
the column mapping stuff, etc.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7572) Enable LOAD DATA into StorageHandler tables

2014-07-31 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-7572:
--

 Summary: Enable LOAD DATA into StorageHandler tables
 Key: HIVE-7572
 URL: https://issues.apache.org/jira/browse/HIVE-7572
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Nick Dimiduk


Once annoyance when working with HBaseStorageHandler is its inaccessibility to 
local data. Populating an HBase table from local test data, for instance, is a 
multi-step process:

{noformat}
# create a hive table you HAVE to populate
 CREATE TABLE src(key int, value string);
# populate the intermediate hive table
 LOAD DATA LOCAL INPATH '/path/to/hive/data/files/kv1.txt' OVERWRITE INTO 
 TABLE src;
# create the hbase table you WANT to populate
 CREATE TABLE hbase_src(key INT, value STRING) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
 ('hbase.columns.mapping' = ':key,cf:val') TBLPROPERTIES ('hbase.table.name' = 
 'hbase_src');
# copy data into hbase
 INSERT OVERWRITE TABLE hbase_src SELECT * FROM src;
{noformat}

This multi-step process could be simplified and isn't limited to 
HBaseStorageHandler -- any StorageHandler implementation will suffer this 
problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23824: Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread nick dimiduk

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23824/
---

(Updated July 31, 2014, 6:33 p.m.)


Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and 
Swarnim Kulkarni.


Changes
---

Updating with patch v14 from JIRA.


Bugs: HIVE-6584
https://issues.apache.org/jira/browse/HIVE-6584


Repository: hive-git


Description
---

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
This allows a MR job to consume a stable, read-only view of an HBase table 
directly off of HDFS. Bypassing the online region server API provides a nice 
performance boost for the full scan. HBASE-10642 is backporting that feature to 
0.94/0.96 and also adding a mapred implementation. Once that's available, we 
should add an input format. A follow-on patch could work out how to integrate 
this functionality into the StorageHandler, similar to how HIVE-6473 integrates 
the HFileOutputFormat into existing table definitions.

See JIRA for further conversation.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 15bc0a3 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
dbf5e51 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseTableSnapshotInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1032cc9 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java
 PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
b92db11 
  hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out 
PRE-CREATION 
  hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
96a0de2 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
cdc0a65 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java ccfb58f 
  pom.xml b3216e1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 40d910c 

Diff: https://reviews.apache.org/r/23824/diff/


Testing
---

Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on 
a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD 
of 0.98 branch.


Thanks,

nick dimiduk



[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.14.patch

Once more, for old time's sake, eh [~navis]? ;)

Attaching and updated patch that isolates the snapshot classes from the 
StorageHandler. I tried this out against a local-mode HBase built against the 
tag 0.96.0RC5 (I didn't see a release tag, surprisingly...). Regular online 
operations work as expected. When I {{set hive.hbase.snapshot.name=foo;}}, I 
get a nice error message in my stacktrace:

{noformat}
FAILED: RuntimeException This version of HBase does not support Hive over table 
snapshots. Please upgrade to at least HBase 0.98.3 or later. See HIVE-6584 for 
details.
{noformat}

I hope this meets your requirement.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081249#comment-14081249
 ] 

Nick Dimiduk commented on HIVE-6584:


I updated RB as well, the interesting addition is 
HBaseTableSnapshotInputFormatUtil.java and its use: 
https://reviews.apache.org/r/23824/diff/1-2/#7

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076331#comment-14076331
 ] 

Nick Dimiduk commented on HIVE-6584:


Thanks for having a look, [~navis]. As it is, this patch requires HBASE-11137, 
which has not been pack-ported to 0.96. There's no technical reason not to 
back-port it, simply that 0.96 is in maintenance mode only and we're 
encouraging folks to upgrade from 0.96.2 to 0.98.x.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7496) Exclude conf/hive-default.xml.template in version control and include it dist profile

2014-07-28 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076335#comment-14076335
 ] 

Nick Dimiduk commented on HIVE-7496:


Hurray! :)

 Exclude conf/hive-default.xml.template in version control and include it dist 
 profile
 -

 Key: HIVE-7496
 URL: https://issues.apache.org/jira/browse/HIVE-7496
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7496.1.patch.txt, HIVE-7496.2.patch.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7534) remove reflection from HBaseSplit

2014-07-28 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-7534:
--

 Summary: remove reflection from HBaseSplit
 Key: HIVE-7534
 URL: https://issues.apache.org/jira/browse/HIVE-7534
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor


HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 
for version hbase-0.98.3. This ticket is to bump the hbase dependency version 
and clean up that code once hbase-0.98.5 has released.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076782#comment-14076782
 ] 

Nick Dimiduk commented on HIVE-6584:


Thanks for having a look, [~sushanth]!

bq. The one thing I'd change before committing is a word-wrap for the ASF 
header in conf/hive-default.xml.template, to retain old newline behaviour 
there. But otherwise, looks good to me.

I believe HIVE-7496 drops conf/hive-default.xml.template all together.

bq. We'll need to update those TODOs in a bit once we upgrade to a newer 
version of HBase (0.98.5+) to pick up HBASE-11555.

I opened HIVE-7534 to track this.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7534) remove reflection from HBaseSplit

2014-07-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7534:
---

Affects Version/s: 0.14.0

 remove reflection from HBaseSplit
 -

 Key: HIVE-7534
 URL: https://issues.apache.org/jira/browse/HIVE-7534
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Nick Dimiduk
Priority: Minor

 HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 
 for version hbase-0.98.3. This ticket is to bump the hbase dependency version 
 and clean up that code once hbase-0.98.5 has released.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-28 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.13.patch

Attaching luck v13: rebased onto trunk.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Error running unit tests from eclipse (weird classpath issue)

2014-07-23 Thread Nick Dimiduk
I'm sure google knows; I don't.

For what it's worth, I run tests via maven and attach the debugger to the
remote process when necessary.

-n

On Tuesday, July 22, 2014, Pavel Chadnov pavelchad...@gmail.com wrote:

 How can I do it in eclipse. I do this in console and it works

 On Tuesday, July 22, 2014, Nick Dimiduk ndimi...@gmail.com
 javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com'); wrote:

 Are you specifying a hadoop profile via eclipse? Ie, from maven,
 -Phadoop-2.


 On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com
 wrote:

 Hey Guys,


 I'm trying to run Hive unit tests on eclipse and have few failures. One
 of
 the interesting one is throwing this exception as shown below when ran
 from
 eclipse, this one passes fine from the console.


 java.lang.IncompatibleClassChangeError: Implementing class

 at java.lang.ClassLoader.defineClass1(Native Method)

 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

 ...

 ...

 at java.lang.Class.forName(Class.java:190)

 at
 org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)

 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)

 at

 org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)

 at
 org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575)

 at

 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568)

 at

 org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84)

 at junit.framework.TestCase.runBare(TestCase.java:132)


 I tried adding hadoop-shims project in the classpath by manually adding
 them but no luck. Would really appreciate any help here.


 Thanks,

 Pavel




 --
 Regards,
 Pavel Chadnov




conf/hive-default.xml.template conflict-central

2014-07-23 Thread Nick Dimiduk
Hi there,

I've noticed that the above-mentioned file has become a constant source of
conflicts when applying patches. Is it possible to remove it from source
control and depend on the build process to generate it locally? This
appears to be what's happening anyway.

Thanks,
Nick


Re: conf/hive-default.xml.template conflict-central

2014-07-23 Thread Nick Dimiduk
Thanks Lefty.


On Wed, Jul 23, 2014 at 2:42 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Interesting idea, Nick.  How about adding it to the discussion
 
 https://issues.apache.org/jira/browse/HIVE-6037?focusedCommentId=14061647page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061647
 
 on HIVE-6037?

 -- Lefty


 On Wed, Jul 23, 2014 at 5:27 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Hi there,
 
  I've noticed that the above-mentioned file has become a constant source
 of
  conflicts when applying patches. Is it possible to remove it from source
  control and depend on the build process to generate it locally? This
  appears to be what's happening anyway.
 
  Thanks,
  Nick
 



[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-07-23 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072506#comment-14072506
 ] 

Nick Dimiduk commented on HIVE-6037:


Is it possible to remove the generated file from source control and depend on 
the build process to generate it locally? Why must it be committed?

 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, 
 HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, 
 HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, 
 HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, 
 HIVE-6037.9.patch.txt, HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.12.patch

Patch v12 should fix the two test failures. One comes from changes made in 
HBASE-11335. The other has to do with assumptions around default filesystem 
path that are unrelated to HBase.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 23824: Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread nick dimiduk

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23824/
---

Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and 
Swarnim Kulkarni.


Bugs: HIVE-6584
https://issues.apache.org/jira/browse/HIVE-6584


Repository: hive-git


Description
---

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
This allows a MR job to consume a stable, read-only view of an HBase table 
directly off of HDFS. Bypassing the online region server API provides a nice 
performance boost for the full scan. HBASE-10642 is backporting that feature to 
0.94/0.96 and also adding a mapred implementation. Once that's available, we 
should add an input format. A follow-on patch could work out how to integrate 
this functionality into the StorageHandler, similar to how HIVE-6473 integrates 
the HFileOutputFormat into existing table definitions.

See JIRA for further conversation.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 593c566 
  conf/hive-default.xml.template ba922d0 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
dbf5e51 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1032cc9 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java
 PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
b92db11 
  hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out 
PRE-CREATION 
  hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
96a0de2 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
cdc0a65 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2fefa06 
  pom.xml b5a5697 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c80a2a3 

Diff: https://reviews.apache.org/r/23824/diff/


Testing
---

Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on 
a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD 
of 0.98 branch.


Thanks,

nick dimiduk



[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071133#comment-14071133
 ] 

Nick Dimiduk commented on HIVE-6584:


I think these failed tests are unrelated.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Error running unit tests from eclipse (weird classpath issue)

2014-07-22 Thread Nick Dimiduk
Are you specifying a hadoop profile via eclipse? Ie, from maven, -Phadoop-2.


On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com
wrote:

 Hey Guys,


 I'm trying to run Hive unit tests on eclipse and have few failures. One of
 the interesting one is throwing this exception as shown below when ran from
 eclipse, this one passes fine from the console.


 java.lang.IncompatibleClassChangeError: Implementing class

 at java.lang.ClassLoader.defineClass1(Native Method)

 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

 ...

 ...

 at java.lang.Class.forName(Class.java:190)

 at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)

 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)

 at
 org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)

 at
 org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254)

 at org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652)

 at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641)

 at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584)

 at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568)

 at

 org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84)

 at junit.framework.TestCase.runBare(TestCase.java:132)


 I tried adding hadoop-shims project in the classpath by manually adding
 them but no luck. Would really appreciate any help here.


 Thanks,

 Pavel



[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.10.patch

Updated the patch once more. This has been tested on a distributed cluster as 
well, things are working correctly.

{noformat}
HADOOP_CLASSPATH=/path/to/high-scale-lib-1.1.1.jar hive -e set 
hive.hbase.snapshot.name=foo_snap; select count(*) from foo;
{noformat}

Optionally you can specify {{hive.hbase.snapshot.restoredir}} to something 
other than the default.

I also opened HBASE-11555 so we can do away with the reflection stuff after a 
later release.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069319#comment-14069319
 ] 

Nick Dimiduk commented on HIVE-6584:


HBASE-11557 will remove the requirement of specifying high-scale-lib.jar in 
HADOOP_CLASSPATH.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.11.patch

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-18 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066930#comment-14066930
 ] 

Nick Dimiduk commented on HIVE-6584:


[~tenggyut]:

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned 
inputformat will always be HiveHBaseTabelInputFormat (at least according to my 
test)

My patch has the logic necessary to perform the switch at runtime. It does 
indeed work with the latest patch.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check 
whether the HBase table exist or not, regardless the external table that hive 
gonna create is based on actual table or a snapshot.

I'm not sure about this. Anyway that's not related to this feature. 
HBaseStorageHandler has no means of creating/dropping table snapshots. If 
you're seeing some issue here with StorageHandler DDL operations, please file a 
separate JIRA.

bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a 
direct subclass of InputSplit, not a subclass of tablesplit

Nor should it be. The TableSnapshotRegionSplit is tracking different 
information from TableSplit.

bq. 4. there is no public setScan method in 
TableSnapshotInputFormat.RecordReader, instead it will translate a string into 
a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan.

Indeed, there is disparity between the HBase's mapred and mapreduce 
implementations. I opened HBASE-11179 for some cleanup on the HBase side. 
convertStringToScan details are HBase-private API as of 0.96. I opened 
HBASE-11163 to make necessary scanner support available in mapred API, but it's 
not yet been implemented.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-17 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.9.patch

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063719#comment-14063719
 ] 

Nick Dimiduk commented on HIVE-6584:


Ouch. Most of these tests run/pass for me locally. Will investigate further. 
I'm also curious why the {{explain}} commands in {{hbase_handler_snapshot.q}} 
are not including the Input/OutputFormats.

[~sushanth], [~ashutoshc] any ideas on this latter issue?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-07-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064347#comment-14064347
 ] 

Nick Dimiduk commented on HIVE-4765:


This looks like a nice improvement [~navis]!

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-15 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.8.patch

Attaching my updated patch. It includes changes to the hbase test drivers so 
that there are snapshots available to testing from q files.

[~tenggyut]: I'll have a look over your patch tomorrow. Maybe we can put our 
stuff together and get a working new feature :)

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-20 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039467#comment-14039467
 ] 

Nick Dimiduk commented on HIVE-6584:


Can you regenerate your patch, rooted in the trunk directory instead of above 
it? That's the reason this patch fails the buildbot.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-13 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.4.patch

Rebased onto trunk and fixed two broken hbase tests.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-12 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497
 ] 

Nick Dimiduk commented on HIVE-6584:


Thanks for the insightful comments, [~tenggyut].

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned 
inputformat will always be HiveHBaseTabelInputFormat (at least according to my 
test)

I was afraid of this in my initial design thinking, but my experiments proved 
otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if 
I'm able.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check 
whether the HBase table exist or not, regardless the external table that hive 
gonna create is based on actual table or a snapshot.

I haven't yet looked at the use-case of consuming a snapshot for which there is 
no table in HBase. I planned to approach this kind of feature in follow-on 
work; the goal here is to get jus the basics working.

bq. 3, 4 [snip]

These are both true.

bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary 
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot 
situation.

A goal of this patch is to be able to query snapshots created from online 
tables already registered with Hive using the HBaseStorageHandler. Implementing 
HBaseSnapshotStorageHandler requires a separate table registration for the 
snapshot. I think that's undesirable. Regarding the hbase snapshot situation, 
let's make it better on the HBase side. What do you recommend?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.3.patch

Ping. Rebased onto trunk.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Fix Version/s: 0.14.0

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Status: Patch Available  (was: Open)

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7197) Enabled and address flakiness of hbase_bulk.m

2014-06-09 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HIVE-7197:
--

 Summary: Enabled and address flakiness of hbase_bulk.m
 Key: HIVE-7197
 URL: https://issues.apache.org/jira/browse/HIVE-7197
 Project: Hive
  Issue Type: Test
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor


There's a nice e2e test for existing bulkload workflow, but it's disabled. We 
should turn it on and fix what's broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-09 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025318#comment-14025318
 ] 

Nick Dimiduk commented on HIVE-6473:


bq. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. 
Will address it in a follow-on ticket.

Opened HIVE-7197.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7197) Enable and address flakiness of hbase_bulk.m

2014-06-09 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7197:
---

Summary: Enable and address flakiness of hbase_bulk.m  (was: Enabled and 
address flakiness of hbase_bulk.m)

 Enable and address flakiness of hbase_bulk.m
 

 Key: HIVE-7197
 URL: https://issues.apache.org/jira/browse/HIVE-7197
 Project: Hive
  Issue Type: Test
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor

 There's a nice e2e test for existing bulkload workflow, but it's disabled. We 
 should turn it on and fix what's broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-06-09 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Status: Patch Available  (was: Open)

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, 
 HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-09 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Release Note: 
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;

  was:
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/hfilelocn;


 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-09 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Release Note: 
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;

hfile.family.path can also be set as a table property, HQL value takes 
precedence.

  was:
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;


 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-09 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025376#comment-14025376
 ] 

Nick Dimiduk commented on HIVE-6473:


Made a small change. Thanks Sushanth!

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Fix Version/s: 0.14.0

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.6.patch

Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes 
but is flakey for me. Will address it in a follow-on ticket.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-06-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Attachment: HIVE-2365.3.patch

Rebased onto HIVE-6473 patch v6.

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, 
 HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-03 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.5.patch

Test still passes locally for me:

{noformat}
---
Test set: org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver
---
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.33 sec - in 
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver
{noformat}

Re-attaching patch for build bot.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-06-03 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Status: Open  (was: Patch Available)

Canceling patch until dependency HIVE-6473 is committed.

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017258#comment-14017258
 ] 

Nick Dimiduk commented on HIVE-6473:


Ping [~sushanth], [~swarnim] mind taking another look?

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-05-22 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.2.patch

Rebased onto trunk, mostly clean, though there were some changes to merge since 
HIVE-6411.

Also updated pom.xml to hbase-0.98.3. This will be released by the end of the 
month and will include the dependency, HBASE-11137.

I'm still looking for advice for adding tests.

Please have a look, [~sushanth], [~ashutoshc], [~sershe], [~swarnim], [~navis].

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-22 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.4.patch

Ping. Rebased onto trunk.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase

2014-05-22 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Attachment: HIVE-2365.3.patch

Rebased onto the latest patch on HIVE-6473. This way it's easier to see what's 
changed and how in order to use the MoveTask to handle LoadIncrementalHFiles. 
All three modified tests pass locally for me.

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table

2014-05-22 Thread nick dimiduk

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18492/
---

(Updated May 22, 2014, 3:53 p.m.)


Review request for hive.


Changes
---

patch v4 from JIRA.


Bugs: HIVE-6473
https://issues.apache.org/jira/browse/HIVE-6473


Repository: hive-git


Description
---

From the JIRA:

Generating HFiles for bulkload into HBase could be more convenient. Right now 
we require the user to register a new table with the appropriate output format. 
This patch allows the exact same functionality, but through an existing table 
managed by the HBaseStorageHandler.


Diffs (updated)
-

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
255ffa2 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 
be1210e 
  hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q 
PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d 
  hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION 
  
hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out
 PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_bulk.q.out PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/18492/diff/


Testing
---


Thanks,

nick dimiduk



  1   2   3   >