Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan
Nice work! Congratulations Sushanth! On Wed, Jul 22, 2015 at 9:45 AM, Carl Steinbach c...@apache.org wrote: I am pleased to announce that Sushanth Sowmyan has been elected to the Hive Project Management Committee. Please join me in congratulating Sushanth! Thanks. - Carl
Re: ApacheCON EU HBase Track Submissions
Get your submissions in, the deadline is imminent! On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote: Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
Re: ApacheCON EU HBase Track Submissions
On Thu, Jun 25, 2015 at 12:13 PM, Owen O'Malley omal...@apache.org wrote: Actually, Apache: Big Data Europe CFP closes 10 July. The CFP is: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp My message was regarding a track for ApacheCON EU Core, for which the CFP closes July 1. Communities are not given the opportunity to organize tracks for the big data conference. On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk ndimi...@apache.org wrote: Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
ApacheCON EU HBase Track Submissions
Hello developers, users, speakers, As part of ApacheCON's inaugural Apache: Big Data, I'm hoping to see a HBase: NoSQL + SQL track come together. The idea is to showcase the growing ecosystem of applications and tools built on top of and around Apache HBase. To have a track, we need content, and that's where YOU come in. CFP for ApacheCon closes in one week, July 1. Get your HBase + Hive talks submitted so we can pull together a full day of great HBase ecosystem talks! Already planning to submit a talk on Hive? Work in HBase and we'll get it promoted as part of the track! Thanks, Nick ApacheCON EU Sept 28 - Oct 2 Corinthia Hotel, Budapest, Hungary (a beautiful venue in an awesome city!) http://apachecon.eu/ CFP link: http://events.linuxfoundation.org/cfp/dashboard
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
On Fri, May 22, 2015 at 1:19 PM, Alan Gates alanfga...@gmail.com wrote: I see your point on saying the contributor may not understand where best to put the patch, and thus the committer decides. However, it would be very disappointing for a contributor who uses branch-1 to build a new feature only to have the committer put it only in master. So I would modify your modification to say at the discretion of the contributor and Hive committers. For what its worth, this is more or less how HBase works. All features land first in master and then percolate backwards to open, active branches where's it's acceptable to do so. Since our 1.0 release, we're trying to make 1.0+ follow more closely to semantic versioning. This means that new features never land in a released minor branch. Bug fixes are applied to all applicable branches, sometimes this means older release branches and not master. Sometimes that means contributors are forced to upgrade in order to take advantage of their contribution in an Apache release (they're fine to run their own patched builds as they like; it's open source). Right now we have: master - (unreleased, development branch for eventual 2.0) branch-1 - (unreleased, development branch for 1.x series, soon to be branch basis for 1.2) branch-1.1 - (released branch, accepting only bug fixes for 1.1.x line) branch-1.0 - (released branch, accepting only bug fixes for 1.0.x line) When we're ready, branch-1.2 will fork from branch-1 and branch-1 will become development branch for 1.3. Eventually we'll decide it's time for 2.0 and master will be branched, creating branch-2. branch-2 will follow the same process. We also maintain active branches for 0.98.x and 0.94.x. These branches are different, following our old model of receiving backward-compatible new features in .x versions. 0.94 is basically retired now, only getting bug fixes. 0.94 is only hadoop-1, 0.98 supports both hadoop-1 and hadoop-2 (maybe we've retired hadoop-2 support here in the .12 release?), 1.x support hadoop-2 only. 2.0 is undecided, but presumably will be hadoop-2 and hadoop-3 if we can extend our shim layer for it. We have separate release managers for 0.94, 0.98, 1.0, and 1.1, and we're discussing preparations for 1.2. They enforce commits against their respective branches. kulkarni.swar...@gmail.com May 22, 2015 at 11:41 +1 on the new proposal. Feedback below: New features must be put into master. Whether to put them into branch-1 is at the discretion of the developer. How about we change this to *All* features must be put into master. Whether to put them into branch-1 is at the discretion of the *committer*. The reason I think is going forward for us to sustain as a happy and healthy community, it's imperative for us to make it not only easy for the users, but also for developers and committers to contribute/commit patches. To me being a hive contributor would be hard to determine which branch my code belongs. Also IMO(and I might be wrong) but many committers have their own areas of expertise and it's also very hard for them to immediately determine what branch a patch should go to unless very well documented somewhere. Putting all code into the master would be an easy approach to follow and then cherry picking to other branches can be done. So even if people forget to do that, we can always go back to master and port the patches out to these branches. So we have a master branch, a branch-1 for stable code, branch-2 for experimental and bleeding edge code and so on. Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on. Another reason I say this is because in my experience, a pretty significant amount of work is hive is still bug fixes and I think that is what the user cares most about(correctness above anything else). So with this approach, might be very obvious to what branches to commit this to. -- Swarnim Chris Drome cdr...@yahoo-inc.com.INVALID May 22, 2015 at 0:49 I understand the motivation and benefits of creating a branch-2 where more disruptive work can go on without affecting branch-1. While not necessarily against this approach, from Yahoo's standpoint, I do have some questions (concerns). Upgrading to a new version of Hive requires a significant commitment of time and resources to stabilize and certify a build for deployment to our clusters. Given the size of our clusters and scale of datasets, we have to be particularly careful about adopting new functionality. However, at the same time we are interested in new testing and making available new features and functionality. That said, we would have to rely on branch-1 for the immediate future. One concern is that branch-1 would be left to stagnate, at which point there would be no option but for users to move to branch-2 as branch-1 would be effectively end-of-lifed. I'm not sure how long this would take, but it would eventually happen as a direct result of the
Re: [DISCUSS] Hive/HBase Integration
I'm particularly interested in how the extraction of ORC will necessitate creation of a more expressive API for Hive's storage tier. StorageHandlerV2 should be considered with HBase in mind as well as ORC. On Sat, May 9, 2015 at 9:37 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Hello all, So last week, Myself, Brock Noland and Nick Dimiduk got a chance to present some of the work we have been doing in the Hive/HBase integration space at HBaseCon 2015 (slides here[1] for anyone interested). One of the interesting things that we noted at this conference was that even though this was an HBase conference, *SQL on HBase* was by far the most popular theme with talks on Apache Phoenix, Trafodion, Apache Kylin, Apache Drill and a SQL-On-HBase panel to compare these and other technologies. I personally feel that with the existing work, we have come a long way but still have work to do and would need more love to make this a top-notch feature of Hive. However I was curious to know what the community thought about it and where do they see this integration stand in coming time when compared with all the other upcoming techs? Thanks, Swarnim [1] https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p
Re: VOTE: move to git
+1 On Wed, Apr 15, 2015 at 3:18 PM, Jimmy Xiang jxi...@cloudera.com wrote: +1 On Wed, Apr 15, 2015 at 3:09 PM, Vivek Shrivastava vivshrivast...@gmail.com wrote: +1 On Apr 15, 2015 6:06 PM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: +1 On 4/15/15, 2:55 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: +1 On Apr 15, 2015, at 2:48 PM, Gopal Vijayaraghavan gop...@apache.org wrote: +1 End the merge madness. On 4/15/15, 11:46 PM, Sergey Shelukhin ser...@apache.org wrote: Hi. We’ve been discussing this some time ago; this time I¹d like to start an official vote about moving Hive project to git from svn. I volunteer to facilitate the move; that seems to be just filing INFRA jira, and following instructions such as verifying that the new repo is sane. Please vote: +1 move to git 0 don’t care -1 stay on svn +1.
Re: ORC separate project
I think the storage-api would be very helpful for HBase integration as well. On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley omal...@apache.org wrote: On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates alanfga...@gmail.com wrote: Carl Steinbach cwsteinb...@gmail.com April 1, 2015 at 0:01 Hi Owen, I think you're referring to the following questions I asked last week on the PMC mailing list: 1) How much if any of the code for vectorization/sargs/ACID will migrate over to the new ORC project. 2) Will Hive contributors encounter situations where they are required to make changes to ORC in order to complete work on projects related to vectorization/sargs/ACID or other Hive features? What I'd like to see here is well defined interfaces in Hive so that any storage format that wants can implement them. Hopefully that means things like interfaces and utility classes for acid, sargs, and vectorization move into this new Hive module storage-api. Then Orc, Parquet, etc. can depend on this module without needing to pull in all of Hive. Then Hive contributors would only be forced to make changes in Orc when they want to implement something in Orc. Agreed. The goal of the new module keep a clean separation between the code for ORC and Hive so that vectorization, sargs, and acid are kept in Hive and are not moved to or duplicated in the ORC project. .. Owen
Re: ORC separate project
This is a great plan, +1! On Thursday, March 19, 2015, Owen O'Malley omal...@apache.org wrote: All, Over the last year, there has been a fair number of projects that want to integrate with ORC, but don't want a dependence on Hive's exec jar. Additionally, we've been working on a C++ reader (and soon writer) and it would be great to host them both in the same project. Toward that end, I'd like to create a separate ORC project at Apache. There will be lots of technical details to work out, but I wanted to give the Hive community a chance to discuss it. Do any of the Hive committers want to be included on the proposal? Of the current Hive committers, my list looks like: * Alan * Gunther * Prasanth * Lefty * Owen * Sergey * Gopal * Kevin Did I miss anyone? Thanks! Owen
HS2 standalone JDBC jar not standalone
Heya, I've like to use jmeter against HS2/JDBC and I'm finding the standalone jar isn't actually standalone. It appears to include a number of dependencies but not Hadoop Common stuff. Is there a packaging of this jar that is actually standalone? Are there instructing for using this standalone jar as it is? Thanks, Nick jmeter.JMeter: Uncaught exception: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138) at org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108) at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214) at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more
Re: HS2 standalone JDBC jar not standalone
Thanks Alexander! On Mon, Mar 2, 2015 at 10:31 AM, Alexander Pivovarov apivova...@gmail.com wrote: yes, we even have a ticket for that https://issues.apache.org/jira/browse/HIVE-9600 btw can anyone test jdbc driver with kerberos enabled? https://issues.apache.org/jira/browse/HIVE-9599 On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote: Heya, I've like to use jmeter against HS2/JDBC and I'm finding the standalone jar isn't actually standalone. It appears to include a number of dependencies but not Hadoop Common stuff. Is there a packaging of this jar that is actually standalone? Are there instructing for using this standalone jar as it is? Thanks, Nick jmeter.JMeter: Uncaught exception: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138) at org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108) at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214) at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more
Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin
Nice work Sergey! On Wednesday, February 25, 2015, Carl Steinbach c...@apache.org wrote: I am pleased to announce that Sergey Shelukhin has been elected to the Hive Project Management Committee. Please join me in congratulating Sergey! Thanks. - Carl
[jira] [Updated] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-4765: --- Fix Version/s: 1.2.0 Planting a stake to get this in for 1.2.0. Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Fix For: 1.2.0 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9591) Add support for OrderedByte encodings
Nick Dimiduk created HIVE-9591: -- Summary: Add support for OrderedByte encodings Key: HIVE-9591 URL: https://issues.apache.org/jira/browse/HIVE-9591 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk HBase has added out-of-the-box support for order-preserving data encoding for many common primitive types (HBASE-8201). The StorageHandler should add support for these encodings, which will increase the range of predicates that can be pushed down to HBase. I propose adding a {{#o / #orderedbytes}} specifier to the column mappings to enable this hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-7805: --- Priority: Major (was: Minor) Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.1.patch, HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307771#comment-14307771 ] Nick Dimiduk commented on HIVE-7805: Bumping priority for a nice, easy performance gain. Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.1.patch, HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9521) Drop support for Java6
[ https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302555#comment-14302555 ] Nick Dimiduk commented on HIVE-9521: Test failure looks unrelated and console output looks clean. Can I get a +1/commit? :) Drop support for Java6 -- Key: HIVE-9521 URL: https://issues.apache.org/jira/browse/HIVE-9521 Project: Hive Issue Type: Improvement Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 1.2.0 Attachments: HIVE-9521.00.patch As logical continuation of HIVE-4583, let's start using java7 syntax as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9521) Drop support for Java6
[ https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9521: --- Attachment: HIVE-9521.00.patch Drop support for Java6 -- Key: HIVE-9521 URL: https://issues.apache.org/jira/browse/HIVE-9521 Project: Hive Issue Type: Improvement Reporter: Nick Dimiduk Fix For: 1.2.0 Attachments: HIVE-9521.00.patch As logical continuation of HIVE-4583, let's start using java7 syntax as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9521) Drop support for Java6
[ https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9521: --- Status: Patch Available (was: Open) Drop support for Java6 -- Key: HIVE-9521 URL: https://issues.apache.org/jira/browse/HIVE-9521 Project: Hive Issue Type: Improvement Reporter: Nick Dimiduk Fix For: 1.2.0 Attachments: HIVE-9521.00.patch As logical continuation of HIVE-4583, let's start using java7 syntax as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9521) Drop support for Java6
Nick Dimiduk created HIVE-9521: -- Summary: Drop support for Java6 Key: HIVE-9521 URL: https://issues.apache.org/jira/browse/HIVE-9521 Project: Hive Issue Type: Improvement Reporter: Nick Dimiduk Fix For: 1.2.0 As logical continuation of HIVE-4583, let's start using java7 syntax as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan
[ https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9504: --- Attachment: (was: HIVE-9504.00.patch) [beeline] ZipException when using !scan --- Key: HIVE-9504 URL: https://issues.apache.org/jira/browse/HIVE-9504 Project: Hive Issue Type: Bug Components: Beeline Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9504.00.patch Notice this while mucking around: {noformat} 0: jdbc:hive2://localhost:1/ !scan java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:220) at java.util.zip.ZipFile.init(ZipFile.java:150) at java.util.jar.JarFile.init(JarFile.java:166) at java.util.jar.JarFile.init(JarFile.java:130) at org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579) at org.apache.hive.beeline.Commands.scan(Commands.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan
[ https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9504: --- Attachment: HIVE-9504.00.patch [beeline] ZipException when using !scan --- Key: HIVE-9504 URL: https://issues.apache.org/jira/browse/HIVE-9504 Project: Hive Issue Type: Bug Components: Beeline Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9504.00.patch Notice this while mucking around: {noformat} 0: jdbc:hive2://localhost:1/ !scan java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:220) at java.util.zip.ZipFile.init(ZipFile.java:150) at java.util.jar.JarFile.init(JarFile.java:166) at java.util.jar.JarFile.init(JarFile.java:130) at org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579) at org.apache.hive.beeline.Commands.scan(Commands.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan
[ https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9504: --- Status: Patch Available (was: Open) [beeline] ZipException when using !scan --- Key: HIVE-9504 URL: https://issues.apache.org/jira/browse/HIVE-9504 Project: Hive Issue Type: Bug Components: Beeline Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9504.00.patch Notice this while mucking around: {noformat} 0: jdbc:hive2://localhost:1/ !scan java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:220) at java.util.zip.ZipFile.init(ZipFile.java:150) at java.util.jar.JarFile.init(JarFile.java:166) at java.util.jar.JarFile.init(JarFile.java:130) at org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579) at org.apache.hive.beeline.Commands.scan(Commands.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9504) [beeline] ZipException when using !scan
Nick Dimiduk created HIVE-9504: -- Summary: [beeline] ZipException when using !scan Key: HIVE-9504 URL: https://issues.apache.org/jira/browse/HIVE-9504 Project: Hive Issue Type: Bug Components: Beeline Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 0.15.0 Notice this while mucking around: {noformat} 0: jdbc:hive2://localhost:1/ !scan java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:220) at java.util.zip.ZipFile.init(ZipFile.java:150) at java.util.jar.JarFile.init(JarFile.java:166) at java.util.jar.JarFile.init(JarFile.java:130) at org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579) at org.apache.hive.beeline.Commands.scan(Commands.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan
[ https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-9504: --- Attachment: HIVE-9504.00.patch [beeline] ZipException when using !scan --- Key: HIVE-9504 URL: https://issues.apache.org/jira/browse/HIVE-9504 Project: Hive Issue Type: Bug Components: Beeline Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9504.00.patch Notice this while mucking around: {noformat} 0: jdbc:hive2://localhost:1/ !scan java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:220) at java.util.zip.ZipFile.init(ZipFile.java:150) at java.util.jar.JarFile.init(JarFile.java:166) at java.util.jar.JarFile.init(JarFile.java:130) at org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589) at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579) at org.apache.hive.beeline.Commands.scan(Commands.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Creating a branch for hbase metastore work
+1 On Thursday, January 22, 2015, Brock Noland br...@cloudera.com wrote: +1 On Thu, Jan 22, 2015 at 8:19 PM, Alan Gates ga...@hortonworks.com javascript:; wrote: I've been working on a prototype of using HBase to store Hive's metadata. Basically I've built a new implementation of RawStore that writes to HBase rather than DataNucleus. I want to see if I can build something that has a much more straightforward schema than DN and that is much faster. I'd like to get this out in public so other can look at it and contribute, but it's no where near ready for real time. So I propose to create a branch and put the code there. Any objections? Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: adding public domain Java files to Hive source
I guess the PMC should be responsive to this kind of question. Can you not depend on a library containing these files? Why include the source directly? On Wed, Jan 21, 2015 at 3:16 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Ping? Where do I write about such matters if not here. On Wed, Jan 14, 2015 at 11:43 AM, Sergey Shelukhin ser...@hortonworks.com wrote: Suppose I want to use a Java source within Hive that has this header (I don't now, but I was considering it and may want it later ;)): /* * Written by Doug Lea with assistance from members of JCP JSR-166 * Expert Group and released to the public domain, as explained at * http://creativecommons.org/licenses/publicdomain */ As far as I see the class is not available in binary distribution, and there are projects on github that use it as is and add their license on top. Can I add it to Apache (Hive) codebase? Should Apache license header be added? Should the original header be retained? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Undeliverable mail: Re: adding public domain Java files to Hive source
Thank you Ashutosh! On Wed, Jan 21, 2015 at 4:10 PM, Ashutosh Chauhan hashut...@apache.org wrote: Done. I have removed two offending ids from list. On Wed, Jan 21, 2015 at 3:22 PM, Nick Dimiduk ndimi...@gmail.com wrote: Seriously, these guys are still spamming this list? Why hasn't the dev-list admin booted these receivers yet? It's been *months*. On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com wrote: Failed to deliver to 'bsc...@ebuddy.com' SMTP module(domain mail-in.ebuddy.com:25) reports: host mail-in.ebuddy.com:25 says: 550 5.1.1 User unknown Original-Recipient: rfc822;bsc...@ebuddy.com Final-Recipient: rfc822;bsc...@ebuddy.com Action: failed Status: 5.0.0
Re: Undeliverable mail: Re: adding public domain Java files to Hive source
Seriously, these guys are still spamming this list? Why hasn't the dev-list admin booted these receivers yet? It's been *months*. On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com wrote: Failed to deliver to 'bsc...@ebuddy.com' SMTP module(domain mail-in.ebuddy.com:25) reports: host mail-in.ebuddy.com:25 says: 550 5.1.1 User unknown Original-Recipient: rfc822;bsc...@ebuddy.com Final-Recipient: rfc822;bsc...@ebuddy.com Action: failed Status: 5.0.0
What's the status of AccessServer?
Hi folks, I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One potential way to do that is to follow in the Hive footsteps with a HS2 protocol-compatible service. I've done some prototyping along these lines and see that it's quite feasible. Along the way I came across this proposal for refactoring HS2 into the AccessServer [1]. What's the state of the AccessServer project? Is anyone working on it? Is there a relationship between this effort and Calcite's Avatica [2]? The system proposed in the AccessServer doc seems to fit nicely in line with Calcite's objectives. Thanks, Nick [0]: http://phoenix.apache.org [1]: https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal [2]: http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E
[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234532#comment-14234532 ] Nick Dimiduk commented on HIVE-8809: Using activeByDefault causes issues -- if you specify some other unrelated profiles (thrift generation, for instance), you end up disabling your default profile. Better to use a property flag. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-8809: --- Attachment: HIVE-8809.01.patch Over on HBase, we have the property hadoop.profile and check it's value. See also http://java.dzone.com/articles/maven-profile-best-practices Give this patch a spin. For hadoop1 build, add {{-Dhadoop.profile=1}}. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [PROPOSAL] Hivemall incubation
Hi Makoto, I cannot speak for Hive PMC, only as a data tool user and occasional contributor. I think the idea is very much a good one. Incubator takes a lot of work because it's all about establishing a vibrant developer and user community for the project. Community before code, as they say. I would also encourage you to consider joining forces with DataFu, rather than competing. I think there's a real appetite a wholistic toolbox of patterns and implementations that can span these projects. From my understanding, there's nothing about DataFu that's unique to Pig, they just need the work done to abstract away the Pig bits and implement the Hive interfaces. Is there anything about Hivemall that's unique to Hive, that wouldn't be applicable to Pig as well? +Casey, as I believe he has some interest in seeing DataFu reach a wider audience as well. Good on you. Nick On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote: Hi all, I am the principal developer of Hivemall, a scalable machine learning library for Apache Hive. https://github.com/myui/hivemall When I presented a talk at the last Hadoop Summit in San Jose [1], several audiences asked me the possibility to change the software license of Hivemall to Apache License v2 and then sustainability of the project was their major concerns. Since then, I am wondering to propose Hivemall as an Apache Incubator project. The position of Hivemall for Hive would become similar one to DataFu (an Apache Incubator project) for Apache Pig. I believe that adding machine learning functionality over Apache Hive could extend application range of Apache Hive and Hivemall could help existing Hive users in their learning-scale data analytics projects. I have got approved from my employer (AIST) to change the license of Hivemall to Apache License version 2 and the donating the code to Apache Foundation. And now, I am willing to propose Hivemall as an Apache incubator project, together with Hivemall contributors in NTT corp. I am considering that the current Hivemall codebase is bits large to be included in Hive contrib and thus it is better to be a separated incubator project. I would like to propose Hivemall to be graduated as a subproject of Apache Hive. Is the strategy possible from the Hive PMC point of view? http://incubator.apache.org/guides/graduation.html#subproject-or-top-level Before formulating a proposal, I would like to hear Hive developers’ opinion (e.g., possibilities, +1/-1, and missing pieces for incubations) on incubating Hivemall. BTW, I found this JIRA issue mentioning Hivemall. https://issues.apache.org/jira/browse/HIVE-7940 Is there a possibility to cooperate with them in proposing Hivemall to Apache Incubator project? According the incubation guides, I need a mentor/champion for incubating. http://incubator.apache.org/guides/proposal.html#formulating Your help toward the incubation will be much appreciated. Thanks, Makoto [1] http://www.slideshare.net/myui/hivemall-hadoop-summit-2014-san-jose -- *** Makoto YUI m@aist.go.jp javascript:; Information Technology Research Institute, AIST. http://staff.aist.go.jp/m.yui/ ***
Re: [PROPOSAL] Hivemall incubation
Thank you for humoring my questions. I do not know the mind of the DataFu community. Your observations are quite clear; I have no further concerns. -n On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote: Hi Nick, Thank you for the comments. (2014/11/22 3:42), Nick Dimiduk wrote: I would also encourage you to consider joining forces with DataFu, rather than competing. I think there's a real appetite a wholistic toolbox of patterns and implementations that can span these projects. From my understanding, there's nothing about DataFu that's unique to Pig, they just need the work done to abstract away the Pig bits and implement the Hive interfaces. My current understanding of DataFu is that it is UDF collections for Apache Pig. Though Hive interface is not yet supported in DataFu, is the direction (to extend DataFu for Hive) a consensus in DataFu community? My concern is that merging Hivemall codebase to DataFu makes the building and packing process of DataFu complex and the target/objective of the project unclear. I do not think that Hivemall competes with DataFu because 1) There are users who prefer Pig and Hive respectively, and 2) Pig/DataFu is useful for what HiveQL is unsuited (e.g., complex feature engineering steps). After preprocessing using DataFu, Hivemall can be applied for classification/regression in a scalable way in Hive. Is there anything about Hivemall that's unique to Hive, that wouldn't be applicable to Pig as well? The techniques used in Hivemall (e.g., training data amplification that emulates iterative training and machine learning algorithms as table-generating functions) could be appreciable to Apache Pig. However, I am not a heavy user of Pig and porting Hivemall to Pig requires a bunch of works. So, I am currently considering to stick with HiveQL interfaces (Hive, HCatalog, and Tez for the software stack of Hivemall) in developing Hivemall because SQL-like interface is friendly to a broader range of developers. Thanks, Makoto -- *** Makoto YUI m@aist.go.jp Information Technology Research Institute, AIST. https://staff.aist.go.jp/m.yui/index_e.html ***
[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats
[ https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206979#comment-14206979 ] Nick Dimiduk commented on HIVE-8808: Yes, it looks like our input formats are stateful. We're set up to specify a config object and that's closed over to provide custom record readers, based on which mapred API is being implemented. HiveInputFormat caching cannot work with all input formats -- Key: HIVE-8808 URL: https://issues.apache.org/jira/browse/HIVE-8808 Project: Hive Issue Type: Bug Reporter: Brock Noland In {{HiveInputFormat}} we implement instance caching (see {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are stateless but I don't think this assumption is true, especially with regards to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats
[ https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206983#comment-14206983 ] Nick Dimiduk commented on HIVE-8808: FYI, this may be changing with HBase 1.0 as we're reworking the way connection management is handled. I'm behind on my reviews, so I don't know if this assumption will change. HiveInputFormat caching cannot work with all input formats -- Key: HIVE-8808 URL: https://issues.apache.org/jira/browse/HIVE-8808 Project: Hive Issue Type: Bug Reporter: Brock Noland In {{HiveInputFormat}} we implement instance caching (see {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are stateless but I don't think this assumption is true, especially with regards to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2828) make timestamp accessible in the hbase KeyValue
[ https://issues.apache.org/jira/browse/HIVE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167773#comment-14167773 ] Nick Dimiduk commented on HIVE-2828: Left a comment on phabricator. +1 make timestamp accessible in the hbase KeyValue Key: HIVE-2828 URL: https://issues.apache.org/jira/browse/HIVE-2828 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Trivial Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.5.patch, HIVE-2828.6.patch.txt, HIVE-2828.7.patch.txt, HIVE-2828.8.patch.txt Originated from HIVE-2781 and not accepted, but I think this could be helpful to someone. By using special column notation ':timestamp' in HBASE_COLUMNS_MAPPING, user might access timestamp value in hbase KeyValue. {code} CREATE TABLE hbase_table (key int, value string, time timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string,:timestamp) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hive unit test report question
IMHO, would be better to wire up the integration suite via failsafe plugin (surefire for IT) and link the modules correctly. This is on (admittedly, near the bottom of) my todo list. See also HBase poms for an example. -n On Saturday, September 6, 2014, wzc wzc1...@gmail.com wrote: hi all: I would like to create a jenkins job to run both hive ut and integration test. Right now it seems that I have to execute mulitple maven goals in different poms: mvn clean install surefire-report:report -Daggregate=true -Phadoop-2 cd itests mvn clean install surefire-report:report -Daggregate=true -Phadoop-2 I would like to use one maven jenkins job and right now I cant figure out how to configure job propery to execute maven goals in different poms (maybe I can add post-build step to execute another shell?). Each hive ptest2 job can run all tests and I would like to know the configure it use. Any help is appreciated. Thanks. 2014-01-14 14:05 GMT+08:00 Shanyu Zhao shz...@microsoft.com javascript:;: Thanks guys for your help! I found Eugene's comments are particularly helpful. With -Daggregate=true I now can see an aggregated unit test results. Btw, I didn't mean to run itests, I just want to run all unit tests. I think in the FAQ they made it clear that itests are disconnected from the top level pom.xml. Shanyu -Original Message- From: Eugene Koifman [mailto:ekoif...@hortonworks.com javascript:;] Sent: Monday, January 13, 2014 4:06 PM To: dev@hive.apache.org javascript:; Subject: Re: hive unit test report question I think you want to add -Daggregate=true you should then have target/site/surefire-report.html in the module where you ran the command On Mon, Jan 13, 2014 at 2:54 PM, Szehon Ho sze...@cloudera.com javascript:; wrote: Hi Shanyu, Are you running in /itests? The unit tests are in there, and are not run if you are running from the root. Thanks Szehon On Mon, Jan 13, 2014 at 1:59 PM, Shanyu Zhao shz...@microsoft.com javascript:; wrote: Hi, I was trying to build hive trunk, run all unit tests and generate reports, but I'm not sure what's the correct command line. I was using: mvn clean install -Phadoop-2 -DskipTests mvn test surefire-report:report -Phadoop-2 But the reports in the root folder and several other projects (such as metastore) are empty with no test results. And I couldn't find a summary page for all unit tests. I was trying to avoid mvn site because it seems to take forever to finish. Am I using the correct commands? How can I get a report like the one in the precommit report: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/827/testRep ort/ ? I really appreciate your help! Shanyu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104398#comment-14104398 ] Nick Dimiduk commented on HIVE-4765: Ping [~navis], [~sushanth]. Any chance we can get some action on this one for 0.14 release? It's definitely better than what's available. Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for release of Hive 0.14
It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a big improvement for us HBase folks. Would someone mind having a look in that direction? Thanks, Nick On Tue, Aug 19, 2014 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote: +1 Sounds good to me. Its already almost 4 months since the last release. It is time to start preparing for the next one. Thanks for volunteering! On Tue, Aug 19, 2014 at 2:02 PM, Vikram Dixit vik...@hortonworks.com wrote: Hi Folks, I was thinking that it was about time that we had a release of hive 0.14 given our commitment to having a release of hive on a periodic basis. We could cut a branch and start working on a release in say 2 weeks time around September 5th (Friday). After branching, we can focus on stabilizing for the release and hopefully have an RC in about 2 weeks post that. I would like to volunteer myself for the duties of the release manager for this version if the community agrees. Thanks Vikram. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Mail bounces from ebuddy.com
Not quite taken care of. I'm still getting spam about these addresses. On Mon, Aug 18, 2014 at 9:18 AM, Lars Francke lars.fran...@gmail.com wrote: Thanks Alan and Ashutosh for taking care of this! On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org wrote: Thanks, Alan for the hint. I just unsubscribed those two email addresses from ebuddy. On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote: Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099180#comment-14099180 ] Nick Dimiduk commented on HIVE-7068: This is really cool, nice work fellas! It's a shame to see so many of the StorageHandler warts repeated here too, but that's how it is. Does it make sense to try to share more code between the accumulo and hbase modules? Column mapping stuff looks pretty much identical to me, and maybe the hbase module could benefit from some of the comparator work? Nothing critical for this patch, but could be good for follow-on work. I'm with [~navis] on this one, +1 for getting it committed setting users loose to play! Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096390#comment-14096390 ] Nick Dimiduk commented on HIVE-4765: Hi [~navis]. Have you had time to look at this lately? It would sure be better than the mostly-broken instructions on https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Release Note: Hive can now execute queries against HBase table snapshots. This feature is available for any table defined using the HBaseStorageHandler. It requires at least HBase 0.98.3. To query against a snapshot instead of the online table, specify the snapshot name via hive.hbase.snapshot.name. The snapshot will be restored into a unique directory under /tmp. This location can be overridden by setting a path via hive.hbase.snapshot.restoredir. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure
[ https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088450#comment-14088450 ] Nick Dimiduk commented on HIVE-7618: Option (1) would pave the way for migrating all these baked-in storage systems over to the StorageHandler interface -- a healthy thing for the long-term success of Hive, IMHO. That's a larger conversation though. TestDDLWithRemoteMetastoreSecondNamenode unit test failure -- Key: HIVE-7618 URL: https://issues.apache.org/jira/browse/HIVE-7618 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after HIVE-6584 was committed. {noformat} TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219 Table should be located in the second filesystem expected:[hdfs] but was:[pfile] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure
[ https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088453#comment-14088453 ] Nick Dimiduk commented on HIVE-7618: Anyway, +1 for your patch v2. It's 80% of what we'd want for option (1) anyway. Would be best if DDLTask didn't need that nasty hack. TestDDLWithRemoteMetastoreSecondNamenode unit test failure -- Key: HIVE-7618 URL: https://issues.apache.org/jira/browse/HIVE-7618 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after HIVE-6584 was committed. {noformat} TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219 Table should be located in the second filesystem expected:[hdfs] but was:[pfile] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086367#comment-14086367 ] Nick Dimiduk commented on HIVE-6584: Restore location is optional. It defaults to /tmp. The restore process creates a uniquely named (random uuid) directory under this path for any give restore, so users who never set this value will not conflict with each other. It would be nice if hive had some kind of post-job hook that could be used to clean up the restoredir artifacts after the input format is finished with them. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure
[ https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087080#comment-14087080 ] Nick Dimiduk commented on HIVE-7618: This will cause failure in HIVE-6584. Reason being, StorageHandler tables don't have a location, so this makeLocationQualified constructs locations that don't exist. IIRC, this resulted in an HBase table being assigned a location in the warehouse, which then confuses other pieces. TestDDLWithRemoteMetastoreSecondNamenode unit test failure -- Key: HIVE-7618 URL: https://issues.apache.org/jira/browse/HIVE-7618 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7618.1.patch Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after HIVE-6584 was committed. {noformat} TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219 Table should be located in the second filesystem expected:[hdfs] but was:[pfile] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure
[ https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087082#comment-14087082 ] Nick Dimiduk commented on HIVE-7618: Can the test be updated to explicitly set a foreign hdfs for its location? TestDDLWithRemoteMetastoreSecondNamenode unit test failure -- Key: HIVE-7618 URL: https://issues.apache.org/jira/browse/HIVE-7618 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7618.1.patch Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after HIVE-6584 was committed. {noformat} TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219 Table should be located in the second filesystem expected:[hdfs] but was:[pfile] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084852#comment-14084852 ] Nick Dimiduk commented on HIVE-6584: Thanks folks! Any chance of getting a commit this week? :) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085004#comment-14085004 ] Nick Dimiduk commented on HIVE-4765: Bump. Patch still applies to master, with a little fuzz. Is the new SerDe and Union business necessary? It would be really great to integrate this into the StorageHandler as an online switch, like as I was aiming for on HIVE-2365. Swapping out the output format at runtime seems to work alright and it saves the user from having to define another table, repeat the column mapping stuff, etc. Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7572) Enable LOAD DATA into StorageHandler tables
Nick Dimiduk created HIVE-7572: -- Summary: Enable LOAD DATA into StorageHandler tables Key: HIVE-7572 URL: https://issues.apache.org/jira/browse/HIVE-7572 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Nick Dimiduk Once annoyance when working with HBaseStorageHandler is its inaccessibility to local data. Populating an HBase table from local test data, for instance, is a multi-step process: {noformat} # create a hive table you HAVE to populate CREATE TABLE src(key int, value string); # populate the intermediate hive table LOAD DATA LOCAL INPATH '/path/to/hive/data/files/kv1.txt' OVERWRITE INTO TABLE src; # create the hbase table you WANT to populate CREATE TABLE hbase_src(key INT, value STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:val') TBLPROPERTIES ('hbase.table.name' = 'hbase_src'); # copy data into hbase INSERT OVERWRITE TABLE hbase_src SELECT * FROM src; {noformat} This multi-step process could be simplified and isn't limited to HBaseStorageHandler -- any StorageHandler implementation will suffer this problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23824: Add HiveHBaseTableSnapshotInputFormat
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23824/ --- (Updated July 31, 2014, 6:33 p.m.) Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and Swarnim Kulkarni. Changes --- Updating with patch v14 from JIRA. Bugs: HIVE-6584 https://issues.apache.org/jira/browse/HIVE-6584 Repository: hive-git Description --- HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a mapred implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. See JIRA for further conversation. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 15bc0a3 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java dbf5e51 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseTableSnapshotInputFormatUtil.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1032cc9 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out b92db11 hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out PRE-CREATION hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 96a0de2 itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java cdc0a65 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java ccfb58f pom.xml b3216e1 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 40d910c Diff: https://reviews.apache.org/r/23824/diff/ Testing --- Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD of 0.98 branch. Thanks, nick dimiduk
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.14.patch Once more, for old time's sake, eh [~navis]? ;) Attaching and updated patch that isolates the snapshot classes from the StorageHandler. I tried this out against a local-mode HBase built against the tag 0.96.0RC5 (I didn't see a release tag, surprisingly...). Regular online operations work as expected. When I {{set hive.hbase.snapshot.name=foo;}}, I get a nice error message in my stacktrace: {noformat} FAILED: RuntimeException This version of HBase does not support Hive over table snapshots. Please upgrade to at least HBase 0.98.3 or later. See HIVE-6584 for details. {noformat} I hope this meets your requirement. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081249#comment-14081249 ] Nick Dimiduk commented on HIVE-6584: I updated RB as well, the interesting addition is HBaseTableSnapshotInputFormatUtil.java and its use: https://reviews.apache.org/r/23824/diff/1-2/#7 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076331#comment-14076331 ] Nick Dimiduk commented on HIVE-6584: Thanks for having a look, [~navis]. As it is, this patch requires HBASE-11137, which has not been pack-ported to 0.96. There's no technical reason not to back-port it, simply that 0.96 is in maintenance mode only and we're encouraging folks to upgrade from 0.96.2 to 0.98.x. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7496) Exclude conf/hive-default.xml.template in version control and include it dist profile
[ https://issues.apache.org/jira/browse/HIVE-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076335#comment-14076335 ] Nick Dimiduk commented on HIVE-7496: Hurray! :) Exclude conf/hive-default.xml.template in version control and include it dist profile - Key: HIVE-7496 URL: https://issues.apache.org/jira/browse/HIVE-7496 Project: Hive Issue Type: Task Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7496.1.patch.txt, HIVE-7496.2.patch.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7534) remove reflection from HBaseSplit
Nick Dimiduk created HIVE-7534: -- Summary: remove reflection from HBaseSplit Key: HIVE-7534 URL: https://issues.apache.org/jira/browse/HIVE-7534 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Priority: Minor HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 for version hbase-0.98.3. This ticket is to bump the hbase dependency version and clean up that code once hbase-0.98.5 has released. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076782#comment-14076782 ] Nick Dimiduk commented on HIVE-6584: Thanks for having a look, [~sushanth]! bq. The one thing I'd change before committing is a word-wrap for the ASF header in conf/hive-default.xml.template, to retain old newline behaviour there. But otherwise, looks good to me. I believe HIVE-7496 drops conf/hive-default.xml.template all together. bq. We'll need to update those TODOs in a bit once we upgrade to a newer version of HBase (0.98.5+) to pick up HBASE-11555. I opened HIVE-7534 to track this. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7534) remove reflection from HBaseSplit
[ https://issues.apache.org/jira/browse/HIVE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-7534: --- Affects Version/s: 0.14.0 remove reflection from HBaseSplit - Key: HIVE-7534 URL: https://issues.apache.org/jira/browse/HIVE-7534 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Nick Dimiduk Priority: Minor HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 for version hbase-0.98.3. This ticket is to bump the hbase dependency version and clean up that code once hbase-0.98.5 has released. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.13.patch Attaching luck v13: rebased onto trunk. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Error running unit tests from eclipse (weird classpath issue)
I'm sure google knows; I don't. For what it's worth, I run tests via maven and attach the debugger to the remote process when necessary. -n On Tuesday, July 22, 2014, Pavel Chadnov pavelchad...@gmail.com wrote: How can I do it in eclipse. I do this in console and it works On Tuesday, July 22, 2014, Nick Dimiduk ndimi...@gmail.com javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com'); wrote: Are you specifying a hadoop profile via eclipse? Ie, from maven, -Phadoop-2. On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com wrote: Hey Guys, I'm trying to run Hive unit tests on eclipse and have few failures. One of the interesting one is throwing this exception as shown below when ran from eclipse, this one passes fine from the console. java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ... ... at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254) at org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652) at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568) at org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84) at junit.framework.TestCase.runBare(TestCase.java:132) I tried adding hadoop-shims project in the classpath by manually adding them but no luck. Would really appreciate any help here. Thanks, Pavel -- Regards, Pavel Chadnov
conf/hive-default.xml.template conflict-central
Hi there, I've noticed that the above-mentioned file has become a constant source of conflicts when applying patches. Is it possible to remove it from source control and depend on the build process to generate it locally? This appears to be what's happening anyway. Thanks, Nick
Re: conf/hive-default.xml.template conflict-central
Thanks Lefty. On Wed, Jul 23, 2014 at 2:42 PM, Lefty Leverenz leftylever...@gmail.com wrote: Interesting idea, Nick. How about adding it to the discussion https://issues.apache.org/jira/browse/HIVE-6037?focusedCommentId=14061647page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061647 on HIVE-6037? -- Lefty On Wed, Jul 23, 2014 at 5:27 PM, Nick Dimiduk ndimi...@gmail.com wrote: Hi there, I've noticed that the above-mentioned file has become a constant source of conflicts when applying patches. Is it possible to remove it from source control and depend on the build process to generate it locally? This appears to be what's happening anyway. Thanks, Nick
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072506#comment-14072506 ] Nick Dimiduk commented on HIVE-6037: Is it possible to remove the generated file from source control and depend on the build process to generate it locally? Why must it be committed? Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.12.patch Patch v12 should fix the two test failures. One comes from changes made in HBASE-11335. The other has to do with assumptions around default filesystem path that are unrelated to HBase. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 23824: Add HiveHBaseTableSnapshotInputFormat
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23824/ --- Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and Swarnim Kulkarni. Bugs: HIVE-6584 https://issues.apache.org/jira/browse/HIVE-6584 Repository: hive-git Description --- HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a mapred implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. See JIRA for further conversation. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 593c566 conf/hive-default.xml.template ba922d0 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java dbf5e51 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1032cc9 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out b92db11 hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out PRE-CREATION hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 96a0de2 itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java cdc0a65 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2fefa06 pom.xml b5a5697 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c80a2a3 Diff: https://reviews.apache.org/r/23824/diff/ Testing --- Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD of 0.98 branch. Thanks, nick dimiduk
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071133#comment-14071133 ] Nick Dimiduk commented on HIVE-6584: I think these failed tests are unrelated. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Error running unit tests from eclipse (weird classpath issue)
Are you specifying a hadoop profile via eclipse? Ie, from maven, -Phadoop-2. On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com wrote: Hey Guys, I'm trying to run Hive unit tests on eclipse and have few failures. One of the interesting one is throwing this exception as shown below when ran from eclipse, this one passes fine from the console. java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ... ... at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254) at org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652) at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568) at org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84) at junit.framework.TestCase.runBare(TestCase.java:132) I tried adding hadoop-shims project in the classpath by manually adding them but no luck. Would really appreciate any help here. Thanks, Pavel
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.10.patch Updated the patch once more. This has been tested on a distributed cluster as well, things are working correctly. {noformat} HADOOP_CLASSPATH=/path/to/high-scale-lib-1.1.1.jar hive -e set hive.hbase.snapshot.name=foo_snap; select count(*) from foo; {noformat} Optionally you can specify {{hive.hbase.snapshot.restoredir}} to something other than the default. I also opened HBASE-11555 so we can do away with the reflection stuff after a later release. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069319#comment-14069319 ] Nick Dimiduk commented on HIVE-6584: HBASE-11557 will remove the requirement of specifying high-scale-lib.jar in HADOOP_CLASSPATH. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.11.patch Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066930#comment-14066930 ] Nick Dimiduk commented on HIVE-6584: [~tenggyut]: bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) My patch has the logic necessary to perform the switch at runtime. It does indeed work with the latest patch. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I'm not sure about this. Anyway that's not related to this feature. HBaseStorageHandler has no means of creating/dropping table snapshots. If you're seeing some issue here with StorageHandler DDL operations, please file a separate JIRA. bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a direct subclass of InputSplit, not a subclass of tablesplit Nor should it be. The TableSnapshotRegionSplit is tracking different information from TableSplit. bq. 4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan. Indeed, there is disparity between the HBase's mapred and mapreduce implementations. I opened HBASE-11179 for some cleanup on the HBase side. convertStringToScan details are HBase-private API as of 0.96. I opened HBASE-11163 to make necessary scanner support available in mapred API, but it's not yet been implemented. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.9.patch Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063719#comment-14063719 ] Nick Dimiduk commented on HIVE-6584: Ouch. Most of these tests run/pass for me locally. Will investigate further. I'm also curious why the {{explain}} commands in {{hbase_handler_snapshot.q}} are not including the Input/OutputFormats. [~sushanth], [~ashutoshc] any ideas on this latter issue? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064347#comment-14064347 ] Nick Dimiduk commented on HIVE-4765: This looks like a nice improvement [~navis]! Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.8.patch Attaching my updated patch. It includes changes to the hbase test drivers so that there are snapshots available to testing from q files. [~tenggyut]: I'll have a look over your patch tomorrow. Maybe we can put our stuff together and get a working new feature :) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039467#comment-14039467 ] Nick Dimiduk commented on HIVE-6584: Can you regenerate your patch, rooted in the trunk directory instead of above it? That's the reason this patch fails the buildbot. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.4.patch Rebased onto trunk and fixed two broken hbase tests. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497 ] Nick Dimiduk commented on HIVE-6584: Thanks for the insightful comments, [~tenggyut]. bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will always be HiveHBaseTabelInputFormat (at least according to my test) I was afraid of this in my initial design thinking, but my experiments proved otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if I'm able. bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table exist or not, regardless the external table that hive gonna create is based on actual table or a snapshot. I haven't yet looked at the use-case of consuming a snapshot for which there is no table in HBase. I planned to approach this kind of feature in follow-on work; the goal here is to get jus the basics working. bq. 3, 4 [snip] These are both true. bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot situation. A goal of this patch is to be able to query snapshots created from online tables already registered with Hive using the HBaseStorageHandler. Implementing HBaseSnapshotStorageHandler requires a separate table registration for the snapshot. I think that's undesirable. Regarding the hbase snapshot situation, let's make it better on the HBase side. What do you recommend? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.3.patch Ping. Rebased onto trunk. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Fix Version/s: 0.14.0 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Status: Patch Available (was: Open) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7197) Enabled and address flakiness of hbase_bulk.m
Nick Dimiduk created HIVE-7197: -- Summary: Enabled and address flakiness of hbase_bulk.m Key: HIVE-7197 URL: https://issues.apache.org/jira/browse/HIVE-7197 Project: Hive Issue Type: Test Components: HBase Handler Reporter: Nick Dimiduk Priority: Minor There's a nice e2e test for existing bulkload workflow, but it's disabled. We should turn it on and fix what's broken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025318#comment-14025318 ] Nick Dimiduk commented on HIVE-6473: bq. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. Will address it in a follow-on ticket. Opened HIVE-7197. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7197) Enable and address flakiness of hbase_bulk.m
[ https://issues.apache.org/jira/browse/HIVE-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-7197: --- Summary: Enable and address flakiness of hbase_bulk.m (was: Enabled and address flakiness of hbase_bulk.m) Enable and address flakiness of hbase_bulk.m Key: HIVE-7197 URL: https://issues.apache.org/jira/browse/HIVE-7197 Project: Hive Issue Type: Test Components: HBase Handler Reporter: Nick Dimiduk Priority: Minor There's a nice e2e test for existing bulkload workflow, but it's disabled. We should turn it on and fix what's broken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase
[ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-2365: --- Status: Patch Available (was: Open) SQL support for bulk load into HBase Key: HIVE-2365 URL: https://issues.apache.org/jira/browse/HIVE-2365 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: John Sichi Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch Support the as simple as this SQL for bulk load from Hive into HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Release Note: Allows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL: set hive.hbase.generatehfiles=true; set hfile.family.path=/tmp/columnfamily_name; was: Allows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL: set hive.hbase.generatehfiles=true; set hfile.family.path=/tmp/hfilelocn; Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Release Note: Allows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL: set hive.hbase.generatehfiles=true; set hfile.family.path=/tmp/columnfamily_name; hfile.family.path can also be set as a table property, HQL value takes precedence. was: Allows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL: set hive.hbase.generatehfiles=true; set hfile.family.path=/tmp/columnfamily_name; Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025376#comment-14025376 ] Nick Dimiduk commented on HIVE-6473: Made a small change. Thanks Sushanth! Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Fix Version/s: 0.14.0 Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Attachment: HIVE-6473.6.patch Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. Will address it in a follow-on ticket. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase
[ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-2365: --- Attachment: HIVE-2365.3.patch Rebased onto HIVE-6473 patch v6. SQL support for bulk load into HBase Key: HIVE-2365 URL: https://issues.apache.org/jira/browse/HIVE-2365 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: John Sichi Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch Support the as simple as this SQL for bulk load from Hive into HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Attachment: HIVE-6473.5.patch Test still passes locally for me: {noformat} --- Test set: org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver --- Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.33 sec - in org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver {noformat} Re-attaching patch for build bot. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase
[ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-2365: --- Status: Open (was: Patch Available) Canceling patch until dependency HIVE-6473 is committed. SQL support for bulk load into HBase Key: HIVE-2365 URL: https://issues.apache.org/jira/browse/HIVE-2365 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: John Sichi Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch Support the as simple as this SQL for bulk load from Hive into HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017258#comment-14017258 ] Nick Dimiduk commented on HIVE-6473: Ping [~sushanth], [~swarnim] mind taking another look? Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.2.patch Rebased onto trunk, mostly clean, though there were some changes to merge since HIVE-6411. Also updated pom.xml to hbase-0.98.3. This will be released by the end of the month and will include the dependency, HBASE-11137. I'm still looking for advice for adding tests. Please have a look, [~sushanth], [~ashutoshc], [~sershe], [~swarnim], [~navis]. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Attachment: HIVE-6473.4.patch Ping. Rebased onto trunk. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase
[ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-2365: --- Attachment: HIVE-2365.3.patch Rebased onto the latest patch on HIVE-6473. This way it's easier to see what's changed and how in order to use the MoveTask to handle LoadIncrementalHFiles. All three modified tests pass locally for me. SQL support for bulk load into HBase Key: HIVE-2365 URL: https://issues.apache.org/jira/browse/HIVE-2365 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: John Sichi Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch Support the as simple as this SQL for bulk load from Hive into HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18492/ --- (Updated May 22, 2014, 3:53 p.m.) Review request for hive. Changes --- patch v4 from JIRA. Bugs: HIVE-6473 https://issues.apache.org/jira/browse/HIVE-6473 Repository: hive-git Description --- From the JIRA: Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. Diffs (updated) - hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 255ffa2 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java be1210e hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_bulk.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18492/diff/ Testing --- Thanks, nick dimiduk