from:"Nick Dimiduk"


 [ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-4765:
---
Fix Version/s: 1.2.0

Planting a stake to get this in for 1.2.0.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9591) Add support for OrderedByte encodings

Nick Dimiduk created HIVE-9591:
--

 Summary: Add support for OrderedByte encodings
 Key: HIVE-9591
 URL: https://issues.apache.org/jira/browse/HIVE-9591
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk


HBase has added out-of-the-box support for order-preserving data encoding for 
many common primitive types (HBASE-8201). The StorageHandler should add support 
for these encodings, which will increase the range of predicates that can be 
pushed down to HBase.

I propose adding a {{#o / #orderedbytes}} specifier to the column mappings to 
enable this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler


 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7805:
---
Priority: Major  (was: Minor)

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7805) Support running multiple scans in hbase-handler


[ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307771#comment-14307771
 ] 

Nick Dimiduk commented on HIVE-7805:


Bumping priority for a nice, easy performance gain.

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9521) Drop support for Java6

2015-02-02 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302555#comment-14302555
 ] 

Nick Dimiduk commented on HIVE-9521:


Test failure looks unrelated and console output looks clean. Can I get a 
+1/commit? :)

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9521:
---
Attachment: HIVE-9521.00.patch

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9521:
---
Status: Patch Available  (was: Open)

 Drop support for Java6
 --

 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0

 Attachments: HIVE-9521.00.patch


 As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9521) Drop support for Java6

2015-01-29 Thread Nick Dimiduk (JIRA)

Nick Dimiduk created HIVE-9521:
--

 Summary: Drop support for Java6
 Key: HIVE-9521
 URL: https://issues.apache.org/jira/browse/HIVE-9521
 Project: Hive
  Issue Type: Improvement
Reporter: Nick Dimiduk
 Fix For: 1.2.0


As logical continuation of HIVE-4583, let's start using java7 syntax as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan


 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: (was: HIVE-9504.00.patch)

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan


 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: HIVE-9504.00.patch

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan


 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Status: Patch Available  (was: Open)

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9504) [beeline] ZipException when using !scan

Nick Dimiduk created HIVE-9504:
--

 Summary: [beeline] ZipException when using !scan
 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0


Notice this while mucking around:

{noformat}
0: jdbc:hive2://localhost:1/ !scan
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:220)
at java.util.zip.ZipFile.init(ZipFile.java:150)
at java.util.jar.JarFile.init(JarFile.java:166)
at java.util.jar.JarFile.init(JarFile.java:130)
at 
org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
at org.apache.hive.beeline.Commands.scan(Commands.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9504) [beeline] ZipException when using !scan


 [ 
https://issues.apache.org/jira/browse/HIVE-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-9504:
---
Attachment: HIVE-9504.00.patch

 [beeline] ZipException when using !scan
 ---

 Key: HIVE-9504
 URL: https://issues.apache.org/jira/browse/HIVE-9504
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9504.00.patch


 Notice this while mucking around:
 {noformat}
 0: jdbc:hive2://localhost:1/ !scan
 java.util.zip.ZipException: error in opening zip file
 at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.init(ZipFile.java:220)
 at java.util.zip.ZipFile.init(ZipFile.java:150)
 at java.util.jar.JarFile.init(JarFile.java:166)
 at java.util.jar.JarFile.init(JarFile.java:130)
 at 
 org.apache.hive.beeline.ClassNameCompleter.getClassNames(ClassNameCompleter.java:128)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1589)
 at org.apache.hive.beeline.BeeLine.scanDrivers(BeeLine.java:1579)
 at org.apache.hive.beeline.Commands.scan(Commands.java:278)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:935)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:778)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:740)
 at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:470)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Creating a branch for hbase metastore work

2015-01-22 Thread Nick Dimiduk

+1

On Thursday, January 22, 2015, Brock Noland br...@cloudera.com wrote:

 +1

 On Thu, Jan 22, 2015 at 8:19 PM, Alan Gates ga...@hortonworks.com
 javascript:; wrote:
  I've been working on a prototype of using HBase to store Hive's metadata.
  Basically I've built a new implementation of RawStore that writes to
 HBase
  rather than DataNucleus.  I want to see if I can build something that
 has a
  much more straightforward schema than DN and that is much faster.
 
  I'd like to get this out in public so other can look at it and
 contribute,
  but it's no where near ready for real time.  So I propose to create a
 branch
  and put the code there.  Any objections?
 
  Alan.
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure or forwarding
 of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
 it
  from your system. Thank You.

Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk

I guess the PMC should be responsive to this kind of question.

Can you not depend on a library containing these files? Why include the
source directly?

On Wed, Jan 21, 2015 at 3:16 PM, Sergey Shelukhin ser...@hortonworks.com
wrote:

 Ping? Where do I write about such matters if not here.

 On Wed, Jan 14, 2015 at 11:43 AM, Sergey Shelukhin ser...@hortonworks.com
 
 wrote:

  Suppose I want to use a Java source within Hive that has this header (I
  don't now, but I was considering it and may want it later ;)):
 
  /*
   * Written by Doug Lea with assistance from members of JCP JSR-166
   * Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/licenses/publicdomain
   */
 
  As far as I see the class is not available in binary distribution, and
  there are projects on github that use it as is and add their license on
 top.
  Can I add it to Apache (Hive) codebase?
  Should Apache license header be added? Should the original header be
  retained?
 
 
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: Undeliverable mail: Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk

Thank you Ashutosh!

On Wed, Jan 21, 2015 at 4:10 PM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Done. I have removed two offending ids from list.

 On Wed, Jan 21, 2015 at 3:22 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Seriously, these guys are still spamming this list? Why hasn't the
 dev-list
  admin booted these receivers yet? It's been *months*.
 
  On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com
 wrote:
 
   Failed to deliver to 'bsc...@ebuddy.com'
   SMTP module(domain mail-in.ebuddy.com:25) reports:
host mail-in.ebuddy.com:25 says:
550 5.1.1 User unknown
  
  
   Original-Recipient: rfc822;bsc...@ebuddy.com
   Final-Recipient: rfc822;bsc...@ebuddy.com
   Action: failed
   Status: 5.0.0

Re: Undeliverable mail: Re: adding public domain Java files to Hive source

2015-01-21 Thread Nick Dimiduk

Seriously, these guys are still spamming this list? Why hasn't the dev-list
admin booted these receivers yet? It's been *months*.

On Wed, Jan 21, 2015 at 3:19 PM, mailer-dae...@mail.mailbrush.com wrote:

 Failed to deliver to 'bsc...@ebuddy.com'
 SMTP module(domain mail-in.ebuddy.com:25) reports:
  host mail-in.ebuddy.com:25 says:
  550 5.1.1 User unknown


 Original-Recipient: rfc822;bsc...@ebuddy.com
 Final-Recipient: rfc822;bsc...@ebuddy.com
 Action: failed
 Status: 5.0.0

What's the status of AccessServer?

2014-12-17 Thread Nick Dimiduk

Hi folks,

I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One
potential way to do that is to follow in the Hive footsteps with a HS2
protocol-compatible service. I've done some prototyping along these lines
and see that it's quite feasible. Along the way I came across this proposal
for refactoring HS2 into the AccessServer [1].

What's the state of the AccessServer project? Is anyone working on it? Is
there a relationship between this effort and Calcite's Avatica [2]? The
system proposed in the AccessServer doc seems to fit nicely in line with
Calcite's objectives.

Thanks,
Nick

[0]: http://phoenix.apache.org
[1]:
https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal
[2]:
http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E

[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234532#comment-14234532
 ] 

Nick Dimiduk commented on HIVE-8809:


Using activeByDefault causes issues -- if you specify some other unrelated 
profiles (thrift generation, for instance), you end up disabling your default 
profile. Better to use a property flag.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, 
 dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, 
 dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-8809:
---
Attachment: HIVE-8809.01.patch

Over on HBase, we have the property hadoop.profile and check it's value. See 
also http://java.dzone.com/articles/maven-profile-best-practices

Give this patch a spin. For hadoop1 build, add {{-Dhadoop.profile=1}}.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, 
 dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, 
 dep_with_hadoop_2.txt, dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [PROPOSAL] Hivemall incubation

2014-11-21 Thread Nick Dimiduk

Hi Makoto,

I cannot speak for Hive PMC, only as a data tool user and occasional
contributor. I think the idea is very much a good one. Incubator takes a
lot of work because it's all about establishing a vibrant developer and
user community for the project. Community before code, as they say.

I would also encourage you to consider joining forces with DataFu, rather
than competing. I think there's a real appetite a wholistic toolbox of
patterns and implementations that can span these projects. From my
understanding, there's nothing about DataFu that's unique to Pig, they just
need the work done to abstract away the Pig bits and implement the Hive
interfaces.

Is there anything about Hivemall that's unique to Hive, that wouldn't be
applicable to Pig as well?

+Casey, as I believe he has some interest in seeing DataFu reach a wider
audience as well.

Good on you.
Nick

On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote:

 Hi all,

 I am the principal developer of Hivemall, a scalable machine learning
 library for Apache Hive.

   https://github.com/myui/hivemall

 When I presented a talk at the last Hadoop Summit in San Jose [1],
 several audiences asked me the possibility to change the software
 license of Hivemall to Apache License v2 and then sustainability of the
 project was their major concerns.

 Since then, I am wondering to propose Hivemall as an Apache Incubator
 project. The position of Hivemall for Hive would become similar one to
 DataFu (an Apache Incubator project) for Apache Pig.

 I believe that adding machine learning functionality over Apache Hive
 could extend application range of Apache Hive and Hivemall could help
 existing Hive users in their learning-scale data analytics projects.

 I have got approved from my employer (AIST) to change the license of
 Hivemall to Apache License version 2 and the donating the code to Apache
 Foundation. And now, I am willing to propose Hivemall as an Apache
 incubator project, together with Hivemall contributors in NTT corp.

 I am considering that the current Hivemall codebase is bits large to be
 included in Hive contrib and thus it is better to be a separated
 incubator project. I would like to propose Hivemall to be graduated as a
 subproject of Apache Hive.

 Is the strategy possible from the Hive PMC point of view?
 http://incubator.apache.org/guides/graduation.html#subproject-or-top-level

 Before formulating a proposal, I would like to hear Hive developers’
 opinion (e.g., possibilities, +1/-1, and missing pieces for incubations)
 on incubating Hivemall.

 BTW, I found this JIRA issue mentioning Hivemall.
 https://issues.apache.org/jira/browse/HIVE-7940

 Is there a possibility to cooperate with them in proposing Hivemall to
 Apache Incubator project? According the incubation guides, I need a
 mentor/champion for incubating.
 http://incubator.apache.org/guides/proposal.html#formulating

 Your help toward the incubation will be much appreciated.

 Thanks,
 Makoto

 [1] http://www.slideshare.net/myui/hivemall-hadoop-summit-2014-san-jose

 -- *** Makoto YUI
 m@aist.go.jp javascript:; Information Technology Research
 Institute, AIST.
 http://staff.aist.go.jp/m.yui/ ***

Re: [PROPOSAL] Hivemall incubation

2014-11-21 Thread Nick Dimiduk

Thank you for humoring my questions. I do not know the mind of the DataFu
community. Your observations are quite clear; I have no further concerns.

-n

On Friday, November 21, 2014, Makoto Yui yuin...@gmail.com wrote:

 Hi Nick,

 Thank you for the comments.

 (2014/11/22 3:42), Nick Dimiduk wrote:

 I would also encourage you to consider joining forces with DataFu,
 rather than competing. I think there's a real appetite a wholistic
 toolbox of patterns and implementations that can span these projects.
  From my understanding, there's nothing about DataFu that's unique to
 Pig, they just need the work done to abstract away the Pig bits and
 implement the Hive interfaces.


 My current understanding of DataFu is that it is UDF collections for
 Apache Pig. Though Hive interface is not yet supported in DataFu, is the
 direction (to extend DataFu for Hive) a consensus in DataFu community?

 My concern is that merging Hivemall codebase to DataFu makes the building
 and packing process of DataFu complex and the target/objective of the
 project unclear.

 I do not think that Hivemall competes with DataFu because
 1) There are users who prefer Pig and Hive respectively, and
 2) Pig/DataFu is useful for what HiveQL is unsuited (e.g., complex feature
 engineering steps). After preprocessing using DataFu, Hivemall can be
 applied for classification/regression in a scalable way in Hive.

  Is there anything about Hivemall that's unique to Hive, that wouldn't be
 applicable to Pig as well?


 The techniques used in Hivemall (e.g., training data amplification that
 emulates iterative training and machine learning algorithms as
 table-generating functions) could be appreciable to Apache Pig.

 However, I am not a heavy user of Pig and porting Hivemall to Pig requires
 a bunch of works. So, I am currently considering to stick with HiveQL
 interfaces (Hive, HCatalog, and Tez for the software stack of Hivemall) in
 developing Hivemall because SQL-like interface is friendly to a broader
 range of developers.

 Thanks,
 Makoto

 --
 ***
 Makoto YUI m@aist.go.jp
 Information Technology Research Institute, AIST.
 https://staff.aist.go.jp/m.yui/index_e.html
 ***

[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats

2014-11-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206979#comment-14206979
 ] 

Nick Dimiduk commented on HIVE-8808:


Yes, it looks like our input formats are stateful. We're set up to specify a 
config object and that's closed over to provide custom record readers, based on 
which mapred API is being implemented.

 HiveInputFormat caching cannot work with all input formats
 --

 Key: HIVE-8808
 URL: https://issues.apache.org/jira/browse/HIVE-8808
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In {{HiveInputFormat}} we implement instance caching (see 
 {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are 
 stateless but I don't think this assumption is true, especially with regards 
 to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats

2014-11-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206983#comment-14206983
 ] 

Nick Dimiduk commented on HIVE-8808:


FYI, this may be changing with HBase 1.0 as we're reworking the way connection 
management is handled. I'm behind on my reviews, so I don't know if this 
assumption will change.

 HiveInputFormat caching cannot work with all input formats
 --

 Key: HIVE-8808
 URL: https://issues.apache.org/jira/browse/HIVE-8808
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In {{HiveInputFormat}} we implement instance caching (see 
 {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are 
 stateless but I don't think this assumption is true, especially with regards 
 to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2828) make timestamp accessible in the hbase KeyValue

2014-10-10 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167773#comment-14167773
 ] 

Nick Dimiduk commented on HIVE-2828:


Left a comment on phabricator. +1

 make timestamp accessible in the hbase KeyValue 
 

 Key: HIVE-2828
 URL: https://issues.apache.org/jira/browse/HIVE-2828
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2828.D1989.5.patch, HIVE-2828.6.patch.txt, 
 HIVE-2828.7.patch.txt, HIVE-2828.8.patch.txt


 Originated from HIVE-2781 and not accepted, but I think this could be helpful 
 to someone.
 By using special column notation ':timestamp' in HBASE_COLUMNS_MAPPING, user 
 might access timestamp value in hbase KeyValue.
 {code}
 CREATE TABLE hbase_table (key int, value string, time timestamp)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string,:timestamp)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hive unit test report question

2014-09-07 Thread Nick Dimiduk

IMHO, would be better to wire up the integration suite via failsafe plugin
(surefire for IT) and link the modules correctly. This is on (admittedly,
near the bottom of) my todo list. See also HBase poms for an example.

-n

On Saturday, September 6, 2014, wzc wzc1...@gmail.com wrote:

 hi all:
  I would like to create a jenkins job to run both hive ut and integration
 test. Right now it seems that I have to execute mulitple maven goals in
 different poms:

 mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2
  cd itests
  mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2


 I would like to use one maven jenkins job and right now I cant figure out
 how to configure job propery to execute  maven goals  in different poms
 (maybe I can add post-build step to execute another shell?).  Each hive
 ptest2 job can run all tests and I would like to know the configure it use.

 Any help is appreciated.

 Thanks.







 2014-01-14 14:05 GMT+08:00 Shanyu Zhao shz...@microsoft.com
 javascript:;:

  Thanks guys for your help!
 
  I found Eugene's comments are particularly helpful. With
  -Daggregate=true I now can see an aggregated unit test results.
 
  Btw, I didn't mean to run itests, I just want to run all unit tests. I
  think in the FAQ they made it clear that itests are disconnected from the
  top level pom.xml.
 
  Shanyu
 
  -Original Message-
  From: Eugene Koifman [mailto:ekoif...@hortonworks.com javascript:;]
  Sent: Monday, January 13, 2014 4:06 PM
  To: dev@hive.apache.org javascript:;
  Subject: Re: hive unit test report question
 
  I think you want to add
  -Daggregate=true
  you should then have target/site/surefire-report.html in the module where
  you ran the command
 
 
 
  On Mon, Jan 13, 2014 at 2:54 PM, Szehon Ho sze...@cloudera.com
 javascript:; wrote:
 
   Hi Shanyu,
  
   Are you running in /itests?  The unit tests are in there, and are not
   run if you are running from the root.
  
   Thanks
   Szehon
  
  
   On Mon, Jan 13, 2014 at 1:59 PM, Shanyu Zhao shz...@microsoft.com
 javascript:;
  wrote:
  
Hi,
   
I was trying to build hive trunk, run all unit tests and generate
   reports,
but I'm not sure what's the correct command line. I was using:
mvn clean install -Phadoop-2 -DskipTests mvn test
surefire-report:report -Phadoop-2 But the reports in the root folder
and several other projects (such as
metastore) are empty with no test results. And I couldn't find a
summary page for all unit tests.
   
I was trying to avoid mvn site because it seems to take forever to
finish. Am I using the correct commands? How can I get a report like
the one in the precommit report:
   
   http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/827/testRep
   ort/
?
   
I really appreciate your help!
   
Shanyu
   
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.

[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-20 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104398#comment-14104398
 ] 

Nick Dimiduk commented on HIVE-4765:


Ping [~navis], [~sushanth].

Any chance we can get some action on this one for 0.14 release? It's definitely 
better than what's available.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Timeline for release of Hive 0.14

2014-08-20 Thread Nick Dimiduk

It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a
big improvement for us HBase folks. Would someone mind having a look in
that direction?

Thanks,
Nick


On Tue, Aug 19, 2014 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote:

 +1
 Sounds good to me.
 Its already almost 4 months since the last release. It is time to
 start preparing for the next one.
 Thanks for volunteering!


 On Tue, Aug 19, 2014 at 2:02 PM, Vikram Dixit vik...@hortonworks.com
 wrote:
  Hi Folks,
 
  I was thinking that it was about time that we had a release of hive 0.14
  given our commitment to having a release of hive on a periodic basis. We
  could cut a branch and start working on a release in say 2 weeks time
  around September 5th (Friday). After branching, we can focus on
 stabilizing
  for the release and hopefully have an RC in about 2 weeks post that. I
  would like to volunteer myself for the duties of the release manager for
  this version if the community agrees.
 
  Thanks
  Vikram.
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: Mail bounces from ebuddy.com

2014-08-20 Thread Nick Dimiduk

Not quite taken care of. I'm still getting spam about these addresses.


On Mon, Aug 18, 2014 at 9:18 AM, Lars Francke lars.fran...@gmail.com
wrote:

 Thanks Alan and Ashutosh for taking care of this!


 On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Thanks, Alan for the hint. I just unsubscribed those two email addresses
  from ebuddy.
 
 
  On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com
 wrote:
 
   Anyone who is an admin on the list (I don't who the admins are) can do
   this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org
 where
   USERNAME is the name of the bouncing user (see
   http://untroubled.org/ezmlm/ezman/ezman1.html )
  
   Alan.
  
  
  
 Thejas Nair the...@hortonworks.com
August 17, 2014 at 17:02
   I don't know how to do this.
  
   Carl, Ashutosh,
   Do you guys know how to remove these two invalid emails from the
 mailing
   list ?
  
  
 Lars Francke lars.fran...@gmail.com
August 17, 2014 at 15:41
   Hmm great, I see others mentioning this as well. I'm happy to contact
  INFRA
   but I'm not sure if they are even needed or if someone from the Hive
 team
   can do this?
  
  
   On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:43
   (Excuse the spam.) Actually I'm getting two bounces per message, but
  gmail
   concatenates them so I didn't notice the second one.
  
   -- Lefty
  
  
   On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:36
   Curious, I've only been getting one bounce per message. Anyway thanks
 for
   bringing this up.
  
   -- Lefty
  
  
  
 Lars Francke lars.fran...@gmail.com
August 7, 2014 at 4:38
   Hi,
  
   every time I send a mail to dev@ I get two bounce mails from two
 people
  at
   ebuddy.com. I don't want to post the E-Mail addresses publicly but I
 can
   send them on if needed (and it can be triggered easily by just replying
  to
   this mail I guess).
  
   Could we maybe remove them from the list?
  
   Cheers,
   Lars
  
  
   --
   Sent with Postbox http://www.getpostbox.com
  
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
   to which it is addressed and may contain information that is
  confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-15 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099180#comment-14099180
]

Nick Dimiduk commented on HIVE-7068:

This is really cool, nice work fellas! It's a shame to see so many of the
StorageHandler warts repeated here too, but that's how it is.

Does it make sense to try to share more code between the accumulo and hbase
modules? Column mapping stuff looks pretty much identical to me, and maybe the
hbase module could benefit from some of the comparator work? Nothing critical
for this patch, but could be good for follow-on work.

I'm with [~navis] on this one, +1 for getting it committed setting users loose
to play!

Integrate AccumuloStorageHandler

Key: HIVE-7068
URL: https://issues.apache.org/jira/browse/HIVE-7068
Project: Hive
Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch

[Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to
HBase. Some [initial
work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done
to support querying an Accumulo table using Hive already. It is not a
complete solution as, most notably, the current implementation presently
lacks support for INSERTs.
I would like to polish up the AccumuloStorageHandler (presently based on
0.10), implement missing basic functionality and compare it to the
HBaseStorageHandler (to ensure that we follow the same general usage
patterns).
I've also been in communication with [~bfem] (the initial author) who
expressed interest in working on this again. I hope to coordinate efforts
with him.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-13 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096390#comment-14096390
 ] 

Nick Dimiduk commented on HIVE-4765:


Hi [~navis]. Have you had time to look at this lately? It would sure be better 
than the mostly-broken instructions on 
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-12 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HIVE-6584:
---

Release Note:
Hive can now execute queries against HBase table snapshots. This feature is
available for any table defined using the HBaseStorageHandler. It requires at
least HBase 0.98.3.

To query against a snapshot instead of the online table, specify the snapshot
name via hive.hbase.snapshot.name. The snapshot will be restored into a unique
directory under /tmp. This location can be overridden by setting a path via
hive.hbase.snapshot.restoredir.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Labels: TODOC14
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch,
HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch,
HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch,
HIVE-6584.8.patch, HIVE-6584.9.patch

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts.
This allows a MR job to consume a stable, read-only view of an HBase table
directly off of HDFS. Bypassing the online region server API provides a nice
performance boost for the full scan. HBASE-10642 is backporting that feature
to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's
available, we should add an input format. A follow-on patch could work out
how to integrate this functionality into the StorageHandler, similar to how
HIVE-6473 integrates the HFileOutputFormat into existing table definitions.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-06 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088450#comment-14088450
 ] 

Nick Dimiduk commented on HIVE-7618:


Option (1) would pave the way for migrating all these baked-in storage systems 
over to the StorageHandler interface -- a healthy thing for the long-term 
success of Hive, IMHO. That's a larger conversation though.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-06 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088453#comment-14088453
 ] 

Nick Dimiduk commented on HIVE-7618:


Anyway, +1 for your patch v2. It's 80% of what we'd want for option (1) anyway. 
Would be best if DDLTask didn't need that nasty hack.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-05 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086367#comment-14086367
]

Nick Dimiduk commented on HIVE-6584:

Restore location is optional. It defaults to /tmp. The restore process creates
a uniquely named (random uuid) directory under this path for any give restore,
so users who never set this value will not conflict with each other.

It would be nice if hive had some kind of post-job hook that could be used to
clean up the restoredir artifacts after the input format is finished with them.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-05 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087080#comment-14087080
 ] 

Nick Dimiduk commented on HIVE-7618:


This will cause failure in HIVE-6584. Reason being, StorageHandler tables don't 
have a location, so this makeLocationQualified constructs locations that don't 
exist. IIRC, this resulted in an HBase table being assigned a location in the 
warehouse, which then confuses other pieces.

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-05 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087082#comment-14087082
 ] 

Nick Dimiduk commented on HIVE-7618:


Can the test be updated to explicitly set a foreign hdfs for its location?

 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-08-04 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084852#comment-14084852
]

Nick Dimiduk commented on HIVE-6584:

Thanks folks! Any chance of getting a commit this week? :)

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-04 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085004#comment-14085004
 ] 

Nick Dimiduk commented on HIVE-4765:


Bump. Patch still applies to master, with a little fuzz.

Is the new SerDe and Union business necessary? It would be really great to 
integrate this into the StorageHandler as an online switch, like as I was 
aiming for on HIVE-2365. Swapping out the output format at runtime seems to 
work alright and it saves the user from having to define another table, repeat 
the column mapping stuff, etc.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7572) Enable LOAD DATA into StorageHandler tables

2014-07-31 Thread Nick Dimiduk (JIRA)

Nick Dimiduk created HIVE-7572:
--

 Summary: Enable LOAD DATA into StorageHandler tables
 Key: HIVE-7572
 URL: https://issues.apache.org/jira/browse/HIVE-7572
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Nick Dimiduk


Once annoyance when working with HBaseStorageHandler is its inaccessibility to 
local data. Populating an HBase table from local test data, for instance, is a 
multi-step process:

{noformat}
# create a hive table you HAVE to populate
 CREATE TABLE src(key int, value string);
# populate the intermediate hive table
 LOAD DATA LOCAL INPATH '/path/to/hive/data/files/kv1.txt' OVERWRITE INTO 
 TABLE src;
# create the hbase table you WANT to populate
 CREATE TABLE hbase_src(key INT, value STRING) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
 ('hbase.columns.mapping' = ':key,cf:val') TBLPROPERTIES ('hbase.table.name' = 
 'hbase_src');
# copy data into hbase
 INSERT OVERWRITE TABLE hbase_src SELECT * FROM src;
{noformat}

This multi-step process could be simplified and isn't limited to 
HBaseStorageHandler -- any StorageHandler implementation will suffer this 
problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23824: Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread nick dimiduk


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23824/
---

(Updated July 31, 2014, 6:33 p.m.)


Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and 
Swarnim Kulkarni.


Changes
---

Updating with patch v14 from JIRA.


Bugs: HIVE-6584
https://issues.apache.org/jira/browse/HIVE-6584


Repository: hive-git


Description
---

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
This allows a MR job to consume a stable, read-only view of an HBase table 
directly off of HDFS. Bypassing the online region server API provides a nice 
performance boost for the full scan. HBASE-10642 is backporting that feature to 
0.94/0.96 and also adding a mapred implementation. Once that's available, we 
should add an input format. A follow-on patch could work out how to integrate 
this functionality into the StorageHandler, similar to how HIVE-6473 integrates 
the HFileOutputFormat into existing table definitions.

See JIRA for further conversation.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 15bc0a3 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
dbf5e51 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseTableSnapshotInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1032cc9 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java
 PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
b92db11 
  hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out 
PRE-CREATION 
  hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
96a0de2 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
cdc0a65 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java ccfb58f 
  pom.xml b3216e1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 40d910c 

Diff: https://reviews.apache.org/r/23824/diff/


Testing
---

Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on 
a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD 
of 0.98 branch.


Thanks,

nick dimiduk

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.14.patch

Once more, for old time's sake, eh [~navis]? ;)

Attaching and updated patch that isolates the snapshot classes from the
StorageHandler. I tried this out against a local-mode HBase built against the
tag 0.96.0RC5 (I didn't see a release tag, surprisingly...). Regular online
operations work as expected. When I {{set hive.hbase.snapshot.name=foo;}}, I
get a nice error message in my stacktrace:

{noformat}
FAILED: RuntimeException This version of HBase does not support Hive over table
snapshots. Please upgrade to at least HBase 0.98.3 or later. See HIVE-6584 for
details.
{noformat}

I hope this meets your requirement.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-31 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081249#comment-14081249
 ] 

Nick Dimiduk commented on HIVE-6584:


I updated RB as well, the interesting addition is 
HBaseTableSnapshotInputFormatUtil.java and its use: 
https://reviews.apache.org/r/23824/diff/1-2/#7

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.14.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076331#comment-14076331
]

Nick Dimiduk commented on HIVE-6584:

Thanks for having a look, [~navis]. As it is, this patch requires HBASE-11137,
which has not been pack-ported to 0.96. There's no technical reason not to
back-port it, simply that 0.96 is in maintenance mode only and we're
encouraging folks to upgrade from 0.96.2 to 0.98.x.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch,
HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch,
HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7496) Exclude conf/hive-default.xml.template in version control and include it dist profile


[ 
https://issues.apache.org/jira/browse/HIVE-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076335#comment-14076335
 ] 

Nick Dimiduk commented on HIVE-7496:


Hurray! :)

 Exclude conf/hive-default.xml.template in version control and include it dist 
 profile
 -

 Key: HIVE-7496
 URL: https://issues.apache.org/jira/browse/HIVE-7496
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7496.1.patch.txt, HIVE-7496.2.patch.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7534) remove reflection from HBaseSplit

Nick Dimiduk created HIVE-7534:
--

 Summary: remove reflection from HBaseSplit
 Key: HIVE-7534
 URL: https://issues.apache.org/jira/browse/HIVE-7534
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor


HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 
for version hbase-0.98.3. This ticket is to bump the hbase dependency version 
and clean up that code once hbase-0.98.5 has released.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076782#comment-14076782
]

Nick Dimiduk commented on HIVE-6584:

Thanks for having a look, [~sushanth]!

bq. The one thing I'd change before committing is a word-wrap for the ASF
header in conf/hive-default.xml.template, to retain old newline behaviour
there. But otherwise, looks good to me.

I believe HIVE-7496 drops conf/hive-default.xml.template all together.

bq. We'll need to update those TODOs in a bit once we upgrade to a newer
version of HBase (0.98.5+) to pick up HBASE-11555.

I opened HIVE-7534 to track this.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7534) remove reflection from HBaseSplit


 [ 
https://issues.apache.org/jira/browse/HIVE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7534:
---

Affects Version/s: 0.14.0

 remove reflection from HBaseSplit
 -

 Key: HIVE-7534
 URL: https://issues.apache.org/jira/browse/HIVE-7534
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Nick Dimiduk
Priority: Minor

 HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 
 for version hbase-0.98.3. This ticket is to bump the hbase dependency version 
 and clean up that code once hbase-0.98.5 has released.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.13.patch

Attaching luck v13: rebased onto trunk.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.13.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Error running unit tests from eclipse (weird classpath issue)

2014-07-23 Thread Nick Dimiduk

I'm sure google knows; I don't.

For what it's worth, I run tests via maven and attach the debugger to the
remote process when necessary.

-n

On Tuesday, July 22, 2014, Pavel Chadnov pavelchad...@gmail.com wrote:

 How can I do it in eclipse. I do this in console and it works

 On Tuesday, July 22, 2014, Nick Dimiduk ndimi...@gmail.com
 javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com'); wrote:

 Are you specifying a hadoop profile via eclipse? Ie, from maven,
 -Phadoop-2.


 On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com
 wrote:

 Hey Guys,


 I'm trying to run Hive unit tests on eclipse and have few failures. One
 of
 the interesting one is throwing this exception as shown below when ran
 from
 eclipse, this one passes fine from the console.


 java.lang.IncompatibleClassChangeError: Implementing class

 at java.lang.ClassLoader.defineClass1(Native Method)

 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

 ...

 ...

 at java.lang.Class.forName(Class.java:190)

 at
 org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)

 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)

 at

 org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)

 at
 org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575)

 at

 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568)

 at

 org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84)

 at junit.framework.TestCase.runBare(TestCase.java:132)


 I tried adding hadoop-shims project in the classpath by manually adding
 them but no luck. Would really appreciate any help here.


 Thanks,

 Pavel




 --
 Regards,
 Pavel Chadnov

conf/hive-default.xml.template conflict-central

2014-07-23 Thread Nick Dimiduk

Hi there,

I've noticed that the above-mentioned file has become a constant source of
conflicts when applying patches. Is it possible to remove it from source
control and depend on the build process to generate it locally? This
appears to be what's happening anyway.

Thanks,
Nick

Re: conf/hive-default.xml.template conflict-central

2014-07-23 Thread Nick Dimiduk

Thanks Lefty.


On Wed, Jul 23, 2014 at 2:42 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Interesting idea, Nick.  How about adding it to the discussion
 
 https://issues.apache.org/jira/browse/HIVE-6037?focusedCommentId=14061647page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061647
 
 on HIVE-6037?

 -- Lefty


 On Wed, Jul 23, 2014 at 5:27 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Hi there,
 
  I've noticed that the above-mentioned file has become a constant source
 of
  conflicts when applying patches. Is it possible to remove it from source
  control and depend on the build process to generate it locally? This
  appears to be what's happening anyway.
 
  Thanks,
  Nick

[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-07-23 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072506#comment-14072506
 ] 

Nick Dimiduk commented on HIVE-6037:


Is it possible to remove the generated file from source control and depend on 
the build process to generate it locally? Why must it be committed?

 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, 
 HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, 
 HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, 
 HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, 
 HIVE-6037.9.patch.txt, HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.12.patch

Patch v12 should fix the two test failures. One comes from changes made in
HBASE-11335. The other has to do with assumptions around default filesystem
path that are unrelated to HBase.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 23824: Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread nick dimiduk


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23824/
---

Review request for hive, Ashutosh Chauhan, Navis Ryu, Sushanth Sowmyan, and 
Swarnim Kulkarni.


Bugs: HIVE-6584
https://issues.apache.org/jira/browse/HIVE-6584


Repository: hive-git


Description
---

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
This allows a MR job to consume a stable, read-only view of an HBase table 
directly off of HDFS. Bypassing the online region server API provides a nice 
performance boost for the full scan. HBASE-10642 is backporting that feature to 
0.94/0.96 and also adding a mapred implementation. Once that's available, we 
should add an input format. A follow-on patch could work out how to integrate 
this functionality into the StorageHandler, similar to how HIVE-6473 integrates 
the HFileOutputFormat into existing table definitions.

See JIRA for further conversation.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 593c566 
  conf/hive-default.xml.template ba922d0 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 998c15c 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
dbf5e51 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1032cc9 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableSnapshotInputFormat.java
 PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_snapshot.q PRE-CREATION 
  hbase-handler/src/test/results/positive/external_table_ppd.q.out 6f1adf4 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
b92db11 
  hbase-handler/src/test/results/positive/hbase_handler_snapshot.q.out 
PRE-CREATION 
  hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596a 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 
96a0de2 
  itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
cdc0a65 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2fefa06 
  pom.xml b5a5697 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c80a2a3 

Diff: https://reviews.apache.org/r/23824/diff/


Testing
---

Unit tests, local-mode testing, pseudo-distributed mode testing, and tested on 
a small distributed cluster. Tests included hbase versions 0.98.3 and the HEAD 
of 0.98 branch.


Thanks,

nick dimiduk

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-22 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071133#comment-14071133
 ] 

Nick Dimiduk commented on HIVE-6584:


I think these failed tests are unrelated.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.12.patch, 
 HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, 
 HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Error running unit tests from eclipse (weird classpath issue)

2014-07-22 Thread Nick Dimiduk

Are you specifying a hadoop profile via eclipse? Ie, from maven, -Phadoop-2.


On Tue, Jul 22, 2014 at 4:03 PM, Pavel Chadnov pavelchad...@gmail.com
wrote:

 Hey Guys,


 I'm trying to run Hive unit tests on eclipse and have few failures. One of
 the interesting one is throwing this exception as shown below when ran from
 eclipse, this one passes fine from the console.


 java.lang.IncompatibleClassChangeError: Implementing class

 at java.lang.ClassLoader.defineClass1(Native Method)

 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

 ...

 ...

 at java.lang.Class.forName(Class.java:190)

 at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)

 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)

 at
 org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)

 at
 org.apache.hadoop.hive.conf.HiveConf$ConfVars.clinit(HiveConf.java:254)

 at org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:652)

 at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:641)

 at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:584)

 at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:575)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:568)

 at

 org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.setUp(TestSymlinkTextInputFormat.java:84)

 at junit.framework.TestCase.runBare(TestCase.java:132)


 I tried adding hadoop-shims project in the classpath by manually adding
 them but no luck. Would really appreciate any help here.


 Thanks,

 Pavel

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.10.patch

Updated the patch once more. This has been tested on a distributed cluster as
well, things are working correctly.

{noformat}
HADOOP_CLASSPATH=/path/to/high-scale-lib-1.1.1.jar hive -e set
hive.hbase.snapshot.name=foo_snap; select count(*) from foo;
{noformat}

Optionally you can specify {{hive.hbase.snapshot.restoredir}} to something
other than the default.

I also opened HBASE-11555 so we can do away with the reflection stuff after a
later release.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch,
HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch,
HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch,
HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069319#comment-14069319
 ] 

Nick Dimiduk commented on HIVE-6584:


HBASE-11557 will remove the requirement of specifying high-scale-lib.jar in 
HADOOP_CLASSPATH.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch, 
 HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, HIVE-6584.8.patch, 
 HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-21 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.11.patch

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, 
 HIVE-6584.10.patch, HIVE-6584.11.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, 
 HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, HIVE-6584.7.patch, 
 HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-18 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066930#comment-14066930
]

Nick Dimiduk commented on HIVE-6584:

[~tenggyut]:

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)

My patch has the logic necessary to perform the switch at runtime. It does
indeed work with the latest patch.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check
whether the HBase table exist or not, regardless the external table that hive
gonna create is based on actual table or a snapshot.

I'm not sure about this. Anyway that's not related to this feature.
HBaseStorageHandler has no means of creating/dropping table snapshots. If
you're seeing some issue here with StorageHandler DDL operations, please file a
separate JIRA.

bq. 3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat is a
direct subclass of InputSplit, not a subclass of tablesplit

Nor should it be. The TableSnapshotRegionSplit is tracking different
information from TableSplit.

bq. 4. there is no public setScan method in
TableSnapshotInputFormat.RecordReader, instead it will translate a string into
a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan.

Indeed, there is disparity between the HBase's mapred and mapreduce
implementations. I opened HBASE-11179 for some cleanup on the HBase side.
convertStringToScan details are HBase-private API as of 0.96. I opened
HBASE-11163 to make necessary scanner support available in mapred API, but it's
not yet been implemented.

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch,
HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-17 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.9.patch

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch, HIVE-6584.9.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-16 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063719#comment-14063719
 ] 

Nick Dimiduk commented on HIVE-6584:


Ouch. Most of these tests run/pass for me locally. Will investigate further. 
I'm also curious why the {{explain}} commands in {{hbase_handler_snapshot.q}} 
are not including the Input/OutputFormats.

[~sushanth], [~ashutoshc] any ideas on this latter issue?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch, 
 HIVE-6584.7.patch, HIVE-6584.8.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-07-16 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064347#comment-14064347
 ] 

Nick Dimiduk commented on HIVE-4765:


This looks like a nice improvement [~navis]!

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-07-15 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.8.patch

Attaching my updated patch. It includes changes to the hbase test drivers so
that there are snapshots available to testing from q files.

[~tenggyut]: I'll have a look over your patch tomorrow. Maybe we can put our
stuff together and get a working new feature :)

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch, HIVE-6584.6.patch,
HIVE-6584.7.patch, HIVE-6584.8.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-20 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039467#comment-14039467
 ] 

Nick Dimiduk commented on HIVE-6584:


Can you regenerate your patch, rooted in the trunk directory instead of above 
it? That's the reason this patch fails the buildbot.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch, HIVE-6584.5.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-13 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.4.patch

Rebased onto trunk and fixed two broken hbase tests.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-12 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497
]

Nick Dimiduk commented on HIVE-6584:

Thanks for the insightful comments, [~tenggyut].

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)

I was afraid of this in my initial design thinking, but my experiments proved
otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if
I'm able.

I haven't yet looked at the use-case of consuming a snapshot for which there is
no table in HBase. I planned to approach this kind of feature in follow-on
work; the goal here is to get jus the basics working.

bq. 3, 4 [snip]

These are both true.

bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot
situation.

A goal of this patch is to be able to query snapshots created from online
tables already registered with Hive using the HBaseStorageHandler. Implementing
HBaseSnapshotStorageHandler requires a separate table registration for the
snapshot. I think that's undesirable. Regarding the hbase snapshot situation,
let's make it better on the HBase side. What do you recommend?

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.3.patch

Ping. Rebased onto trunk.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Fix Version/s: 0.14.0

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Status: Patch Available  (was: Open)

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7197) Enabled and address flakiness of hbase_bulk.m

Nick Dimiduk created HIVE-7197:
--

 Summary: Enabled and address flakiness of hbase_bulk.m
 Key: HIVE-7197
 URL: https://issues.apache.org/jira/browse/HIVE-7197
 Project: Hive
  Issue Type: Test
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor


There's a nice e2e test for existing bulkload workflow, but it's disabled. We 
should turn it on and fix what's broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table


[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025318#comment-14025318
 ] 

Nick Dimiduk commented on HIVE-6473:


bq. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. 
Will address it in a follow-on ticket.

Opened HIVE-7197.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7197) Enable and address flakiness of hbase_bulk.m


 [ 
https://issues.apache.org/jira/browse/HIVE-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7197:
---

Summary: Enable and address flakiness of hbase_bulk.m  (was: Enabled and 
address flakiness of hbase_bulk.m)

 Enable and address flakiness of hbase_bulk.m
 

 Key: HIVE-7197
 URL: https://issues.apache.org/jira/browse/HIVE-7197
 Project: Hive
  Issue Type: Test
  Components: HBase Handler
Reporter: Nick Dimiduk
Priority: Minor

 There's a nice e2e test for existing bulkload workflow, but it's disabled. We 
 should turn it on and fix what's broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase


 [ 
https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-2365:
---

Status: Patch Available  (was: Open)

 SQL support for bulk load into HBase
 

 Key: HIVE-2365
 URL: https://issues.apache.org/jira/browse/HIVE-2365
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: John Sichi
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, 
 HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, 
 HIVE-2365.WIP.01.patch


 Support the as simple as this SQL for bulk load from Hive into HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Release Note: 
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;

  was:
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/hfilelocn;


 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Release Note: 
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;

hfile.family.path can also be set as a table property, HQL value takes 
precedence.

  was:
Allows direct creation of HFiles and location for them as part of 
HBaseStorageHandler write if the following properties are specified in the HQL:

set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;


 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table