Re: HBase table map to hive
Hi! You can use map on your column family or a prefix of column qualifier. https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily -- Kind regards/ Pozdrawiam, Wojciech Indyk http://datacentric.pl 2016-04-04 14:13 GMT+02:00 ram kumar : > Hi, > > I have a hbase table with column name changes (increases) over time. > Is there a way to map such hbase to hive table, > inferring schema from the hbase table? > > Thanks
confluence access
Hello! Please grant me a write-access to the confluence (user woj_in), due to https://issues.apache.org/jira/browse/HIVE-11329?focusedCommentId=14740243&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740243 -- Kindly regards/ Pozdrawiam, Wojciech Indyk
Re: hbase column without prefix
Hello! I've posted a bug on this issue: https://issues.apache.org/jira/browse/HIVE-11329 What do you think? I can prepare a patch. Kindly regards Wojciech Indyk 2015-07-07 9:51 GMT+02:00 Wojciech Indyk : > Hi! > I use hbase column regex matching to create map column in hive, like: > "hbase.columns.mapping" = ":key,s:ap_.*" > then I have values in column: > {"ap_col1":"23","ap_col2":"7"} > is it possible to cut the prefix ap_ to have values like below? > {"col1":"23","col2":"7"} > > Kindly regards > Wojciech Indyk
hbase prefix column mapping with no value
Hello! I have few columns in hbase with prefix. The culumns has no value, like: pre_a -> timestamp=123, value= pre_b -> timestamp=123, value= I want to map the prefix on map in hive, like https://issues.apache.org/jira/browse/HIVE-3725 . Using map does not work: there is empty map in hive. Any type of the value of the map doesn't work with empty value in hbase. The mapping works with non-empty values. Is it a bug? Should I report it? Kindly regards Wojciech Indyk
hbase column without prefix
Hi! I use hbase column regex matching to create map column in hive, like: "hbase.columns.mapping" = ":key,s:ap_.*" then I have values in column: {"ap_col1":"23","ap_col2":"7"} is it possible to cut the prefix ap_ to have values like below? {"col1":"23","col2":"7"} Kindly regards Wojciech Indyk
Re: ORC String error
Hi! The workaround: “set hive.optimize.index.filter=false" doesn't work. Still ArrayIndexOutOfBounds by select max(length(url)) query. Kindly regards Wojciech Indyk 2014-08-11 8:49 GMT+02:00 Prasanth Jayachandran : > Hi > > My suspicion for the error is because of this issue > https://issues.apache.org/jira/browse/HIVE-6320 > Applying this patch should resolve the issue. The alternative workaround > would to “set hive.optimize.index.filter=false" > > Thanks > Prasanth Jayachandran > > On Aug 10, 2014, at 11:45 PM, Wojciech Indyk > wrote: > > Hi! > I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12. > I created ORC table with Snappy compression consisted with some > integer and string columns. I imported few ~40GB gz files data into > HDFS, then as external table inserted the external table into ORC > table. > > Unfortunately, when I want to process two string columns (url and > refererurl) I got an error: > Error: java.io.IOException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 625920 at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused > by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: > 625920 at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303) > ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: > 625920 at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) > ... 15 more > > The error occurs for one mapper per file. E.g. I have two 40GB files > in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers > fail with the error above (different array index). > When I process other columns (both int and string type) a processing > is finished correctly. I see the error is related to StringTreeReader. > What is the default delimiter for ORC columns? Maybe the delimiter > exists in the error string record? But I think it shouldn't cause > IndexOutOfBounds... > Is any limitation of string length for ORC? I know there is default > 128MB stripe for ORC, but I don't expect so huge string as 100MB. > > Kindly regards > Wojciech Indyk > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for t
ORC String error
Hi! I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12. I created ORC table with Snappy compression consisted with some integer and string columns. I imported few ~40GB gz files data into HDFS, then as external table inserted the external table into ORC table. Unfortunately, when I want to process two string columns (url and refererurl) I got an error: Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303) ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) ... 15 more The error occurs for one mapper per file. E.g. I have two 40GB files in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers fail with the error above (different array index). When I process other columns (both int and string type) a processing is finished correctly. I see the error is related to StringTreeReader. What is the default delimiter for ORC columns? Maybe the delimiter exists in the error string record? But I think it shouldn't cause IndexOutOfBounds... Is any limitation of string length for ORC? I know there is default 128MB stripe for ORC, but I don't expect so huge string as 100MB. Kindly regards Wojciech Indyk
ORC String error
Hi! I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12. I created ORC table with Snappy compression consisted with some integer and string columns. I imported few ~40GB gz files data into HDFS, then as external table inserted the external table into ORC table. Unfortunately, when I want to process two string columns (url and refererurl) I got an error: Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303) ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 625920 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) ... 15 more The error occurs for one mapper per file. E.g. I have two 40GB files in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers fail with the error above (different array index). When I process other columns (both int and string type) a processing is finished correctly. I see the error is related to StringTreeReader. What is the default delimiter for ORC columns? Maybe the delimiter exists in the error string record? But I think it shouldn't cause IndexOutOfBounds... Is any limitation of string length for ORC? I know there is default 128MB stripe for ORC, but I don't expect so huge string as 100MB. Kindly regards Wojciech Indyk