[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-12-18 Thread Jonathan Bender (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251986#comment-14251986
 ] 

Jonathan Bender commented on HIVE-7049:
---

Seems like we can get away with the following patch (confirm the fileSchema AKA 
writer's schema is actually a union type before trying to find the type that 
the reader schema expects).  If not, just use the schema as is (it should be 
promoted to a union by Avro).

This worked for me in local testing.

```diff --git 
a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
index ce933ff..032761c 100644
--- 
a/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
+++ 
b/src/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
@@ -265,9 +265,12 @@ private Object deserializeNullableUnion(Object datum, 
Schema fileSchema, Schema
 if(schema.getType().equals(Schema.Type.NULL)) {
   return null;
 }
+Schema writerSchema = fileSchema;
+if (writerSchema != null && 
writerSchema.getType().equals(Schema.Type.UNION)) {
+  writerSchema = writerSchema.getTypes().get(tag);  
+}
 
-return worker(datum, fileSchema == null ? null : 
fileSchema.getTypes().get(tag), schema,
-SchemaToTypeInfo.generateTypeInfo(schema));
+return worker(datum, writerSchema, schema, 
SchemaToTypeInfo.generateTypeInfo(schema));
 
   }
 ```

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-09-10 Thread sudhir mallem (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129380#comment-14129380
 ] 

sudhir mallem commented on HIVE-7049:
-

I'm seeing a similar issue but with "long" datatype. 

Hive version: 0.13.1

Here is the error:
{code}
exception java.io.IOException:org.apache.avro.AvroRuntimeException: Not a 
union: "long"
{code}

Here is the error from Log:
{code}
2014-09-10 23:45:05,679 ERROR CliDriver (SessionState.java:printError(545)) - 
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: 
Not a union: "long"
java.io.IOException: org.apache.avro.AvroRuntimeException: Not a union: "long"
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:534)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1519)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.avro.AvroRuntimeException: Not a union: "long"
at org.apache.avro.Schema.getTypes(Schema.java:266)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:269)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:200)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:99)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:620)
{code}


> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000673#comment-14000673
 ] 

Xuefu Zhang commented on HIVE-7049:
---

{code}
+if (AvroSerdeUtils.isNullableType(recordSchema)) {
+  Schema tmpFileSchema = fileSchema;
+  if (tmpFileSchema == null || 
!AvroSerdeUtils.isNullableType(tmpFileSchema)) {
+   tmpFileSchema = null;
+  }
+  return deserializeNullableUnion(datum, tmpFileSchema, recordSchema, 
columnType);
 }
{code}

If fileSchema is not null, but AvroSerdeUtils.isNullableType(tmpFileSchema) 
returns false, then tmpFileSchema = null. So you pass null as fileSchema in  
deserializeNullableUnion(datum, tmpFileSchema, recordSchema, columnType).  This 
doesn't seem right.


> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000642#comment-14000642
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Null is passed only if record schema is null but file schema is not null.
Do you see any use case for decimal too?

>Thus, we might need to fix in a different way.

Do you want me to fix it differently? or you are looking to address this for 
decimal differently?



> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999687#comment-13999687
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

[~xuefuz] : can you please help me to understand the problem mentioned in the 
previous comment?


> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1410#comment-1410
 ] 

Xuefu Zhang commented on HIVE-7049:
---

It seems that your patch tries to fix the issue by ignoring the file schema ( 
passing NULL down). File schema is needed to read decimal data correctly. Thus, 
we might need to fix in a different way.

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-15 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998371#comment-13998371
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Thanks @xzhang.
>However, the fix in your patch seems having a problem with decimal, which may 
>need more deliberation.

What is the (potential) problem in decimal?
Any proposal what to do to address the decimal problem?



> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997714#comment-13997714
 ] 

Xuefu Zhang commented on HIVE-7049:
---

[~kamrul] If Hive can support the AVRO schema resolutions you mentioned, I 
don't see any obstacles. However, the fix in your patch seems having a problem 
with decimal, which may need more deliberation.

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-14 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997378#comment-13997378
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Thank [~xuefuz] for the comments.

I believe it is a valid Avro schema evolution.
Please see the following comments copied from  the link:
http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution
{noformat}
* if reader's is a union, but writer's is not
The first schema in the reader's union that matches the writer's schema is 
recursively resolved against it. If none match, an error is signalled.
* if writer's is a union, but reader's is not
If the reader's schema matches the selected writer's schema, it is recursively 
resolved against it. If they do not match, an error is signalled.
{noformat}

Moreover, i tested a similar scenarios using pure avro code where i wrote using 
schema "string" and read it using ["null","string"].

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996047#comment-13996047
 ] 

Hive QA commented on HIVE-7049:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644526/HIVE-7049.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/188/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/188/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644526

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996660#comment-13996660
 ] 

Xuefu Zhang commented on HIVE-7049:
---

Thanks for bringing this up.  I'm wondering if the situation you described is 
an issue of incompatibility of schemas rather than a bug. Record schema says 
that a field is union (nullable), while file schema says that the file is not a 
union, which seems suggesting that the data is not compatible with the schema. 
While we may need to provided a better error message for this, ignoring the 
file schema (by passing NULL down) will very likely break decimal support, 
which needs the file schema to read data correctly.

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995896#comment-13995896
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

RB at: https://reviews.apache.org/r/21353/

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)