[ 
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123690#comment-15123690
 ] 

Ilya Kats commented on HIVE-6147:
---------------------------------

I'm trying to create a table in Hive 0.14 that points to an HBase table with 
one column family ("c") and one column ("b") that contains schema-less avro 
serialized object:
{code:sql}
CREATE EXTERNAL TABLE customers
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,c:b", 
  "c.b.serialization.type"="avro", 
  "c.b.avro.schema.url"="hdfs:/....../Customer.avsc") 
TBLPROPERTIES ("hbase.table.name" = "customers", 
"hbase.struct.autogenerate"="true", 
"hive.serialization.extend.nesting.levels"="true");
{code}

The DDL above creates the table successfully, but queries fail with the 
following error:
{code}
Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error 
evaluating c_b
16/01/29 15:36:55 [main]: ERROR CliDriver: Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error 
evaluating c_b
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error 
evaluating c_b
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
c_b
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:571)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:563)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
        ... 12 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorException: An 
error occurred retrieving schema from bytes
        at 
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:331)
        at 
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:287)
        at 
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.getStructFieldData(AvroLazyObjectInspector.java:142)
        at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:109)
        at 
org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldData(DelegatedStructObjectInspector.java:88)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
        ... 17 more
Caused by: java.io.IOException: Not a data file.
        at 
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
        at 
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:328)
        ... 25 more
{code}

It seems that there is a problem in the following code in 
AvroLazyObjectInspector:
{code}
...
private Object deserializeStruct(Object struct, String fieldName) {
...
if (readerSchema == null) {
...
} else {
      // a reader schema was provided
      if (schemaRetriever != null) {
        // a schema retriever has been provided as well. Attempt to read the 
write schema from the
        // retriever
        ws = schemaRetriever.retrieveWriterSchema(data);

        if (ws == null) {
          throw new IllegalStateException(
              "Null writer schema retrieved from schemaRetriever for field [" + 
fieldName + "]");
        }
      } else {
        // attempt retrieving the schema from the data
        ws = retrieveSchemaFromBytes(data);   
      }

      rs = readerSchema;

      try {
        avroWritable.readFields(data, ws, rs);
      } catch (IOException ioe) {
        throw new AvroObjectInspectorException("Error deserializing avro 
payload", ioe);
      }
    }
...
}
...
{code}
because it tries to retrieve the write schema from data ({{ws = 
retrieveSchemaFromBytes(data)}}) even if the schema URL (reader schema) had 
been provided. Is there way to make it work for schema-less avro data?  


> Support avro data stored in HBase columns
> -----------------------------------------
>
>                 Key: HIVE-6147
>                 URL: https://issues.apache.org/jira/browse/HIVE-6147
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, 
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, 
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data 
> types in columns. It would be nice to be able to store and query Avro objects 
> in HBase columns by making them visible as structs to Hive. This will allow 
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to