Re: HBase table map to hive

2016-04-04 Thread Wojciech Indyk
Hi!
You can use map on your column family or a prefix of
column qualifier.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily

--
Kind regards/ Pozdrawiam,
Wojciech Indyk
http://datacentric.pl


2016-04-04 14:13 GMT+02:00 ram kumar :
> Hi,
>
> I have a hbase table with column name changes (increases) over time.
> Is there a way to map such hbase to hive table,
> inferring schema from the hbase table?
>
> Thanks


confluence access

2015-09-11 Thread Wojciech Indyk
Hello!
Please grant me a write-access to the confluence (user woj_in), due to
https://issues.apache.org/jira/browse/HIVE-11329?focusedCommentId=14740243&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740243

--
Kindly regards/ Pozdrawiam,
Wojciech Indyk


Re: hbase column without prefix

2015-07-21 Thread Wojciech Indyk
Hello!
I've posted a bug on this issue:
https://issues.apache.org/jira/browse/HIVE-11329
What do you think? I can prepare a patch.

Kindly regards
Wojciech Indyk


2015-07-07 9:51 GMT+02:00 Wojciech Indyk :
> Hi!
> I use hbase column regex matching to create map column in hive, like:
> "hbase.columns.mapping" = ":key,s:ap_.*"
> then I have values in column:
> {"ap_col1":"23","ap_col2":"7"}
> is it possible to cut the prefix ap_ to have values like below?
> {"col1":"23","col2":"7"}
>
> Kindly regards
> Wojciech Indyk


hbase prefix column mapping with no value

2015-07-20 Thread Wojciech Indyk
Hello!
I have few columns in hbase with prefix. The culumns has no value, like:
pre_a -> timestamp=123, value=
pre_b -> timestamp=123, value=

I want to map the prefix on map in hive, like
https://issues.apache.org/jira/browse/HIVE-3725 .
Using map does not work: there is empty map in hive.
Any type of the value of the map doesn't work with empty value in
hbase. The mapping works with non-empty values. Is it a bug? Should I
report it?

Kindly regards
Wojciech Indyk


hbase column without prefix

2015-07-07 Thread Wojciech Indyk
Hi!
I use hbase column regex matching to create map column in hive, like:
"hbase.columns.mapping" = ":key,s:ap_.*"
then I have values in column:
{"ap_col1":"23","ap_col2":"7"}
is it possible to cut the prefix ap_ to have values like below?
{"col1":"23","col2":"7"}

Kindly regards
Wojciech Indyk


Re: ORC String error

2014-08-11 Thread Wojciech Indyk
Hi!
The workaround: “set hive.optimize.index.filter=false" doesn't work.
Still ArrayIndexOutOfBounds by select max(length(url)) query.
Kindly regards
Wojciech Indyk


2014-08-11 8:49 GMT+02:00 Prasanth Jayachandran :
> Hi
>
> My suspicion for the error is because of this issue
> https://issues.apache.org/jira/browse/HIVE-6320
> Applying this patch should resolve the issue. The alternative workaround
> would to “set hive.optimize.index.filter=false"
>
> Thanks
> Prasanth Jayachandran
>
> On Aug 10, 2014, at 11:45 PM, Wojciech Indyk 
> wrote:
>
> Hi!
> I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12.
> I created ORC table with Snappy compression consisted with some
> integer and string columns. I imported few ~40GB gz files data into
> HDFS, then as external table inserted the external table into ORC
> table.
>
> Unfortunately, when I want to process two string columns (url and
> refererurl) I got an error:
> Error: java.io.IOException: java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException: 625920 at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
> by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException:
> 625920 at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303)
> ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException:
> 625920 at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more
>
> The error occurs for one mapper per file. E.g. I have two 40GB files
> in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers
> fail with the error above (different array index).
> When I process other columns (both int and string type) a processing
> is finished correctly. I see the error is related to StringTreeReader.
> What is the default delimiter for ORC columns? Maybe the delimiter
> exists in the error string record? But I think it shouldn't cause
> IndexOutOfBounds...
> Is any limitation of string length for ORC? I know there is default
> 128MB stripe for ORC, but I don't expect so huge string as 100MB.
>
> Kindly regards
> Wojciech Indyk
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for t

ORC String error

2014-08-10 Thread Wojciech Indyk
Hi!
I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12.
I created ORC table with Snappy compression consisted with some
integer and string columns. I imported few ~40GB gz files data into
HDFS, then as external table inserted the external table into ORC
table.

Unfortunately, when I want to process two string columns (url and
refererurl) I got an error:
Error: java.io.IOException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException:
625920 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303)
... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException:
625920 at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more

The error occurs for one mapper per file. E.g. I have two 40GB files
in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers
fail with the error above (different array index).
When I process other columns (both int and string type) a processing
is finished correctly. I see the error is related to StringTreeReader.
What is the default delimiter for ORC columns? Maybe the delimiter
exists in the error string record? But I think it shouldn't cause
IndexOutOfBounds...
Is any limitation of string length for ORC? I know there is default
128MB stripe for ORC, but I don't expect so huge string as 100MB.

Kindly regards
Wojciech Indyk


ORC String error

2014-08-10 Thread Wojciech Indyk
Hi!
I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12.
I created ORC table with Snappy compression consisted with some integer and
string columns. I imported few ~40GB gz files data into HDFS, then as
external table inserted the external table into ORC table.

Unfortunately, when I want to process two string columns (url and
refererurl) I got an error:
Error: java.io.IOException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by:
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303)
... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 625920 at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more

The error occurs for one mapper per file. E.g. I have two 40GB files in my
ORC table. Hive creates 300 mappers for a query. Only 2 mappers fail with
the error above (different array index).
When I process other columns (both int and string type) a processing is
finished correctly. I see the error is related to StringTreeReader. What is
the default delimiter for ORC columns? Maybe the delimiter exists in the
error string record? But I think it shouldn't cause IndexOutOfBounds...
Is any limitation of string length for ORC? I know there is default 128MB
stripe for ORC, but I don't expect so huge string as 100MB.

Kindly regards
Wojciech Indyk