[ https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578783#comment-13578783 ]
Michael Malak commented on HIVE-3528: ------------------------------------- Sean: OK, I've researched the problem further. There is in fact a null-struct test case in line 14 of https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt The test script at https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q does indeed work when I tested it locally. But in that test, the query gets all of its data from a test table verbatim: INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; If instead we stick in a hard-coded null for the struct directly into the query, it fails: INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer; with the following error: FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>. Note, though, that substituting a hard-coded null for string1 (and restoring struct1 to the query) does work: INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer; I will be entering an all-new JIRA for this. > Avro SerDe doesn't handle serializing Nullable types that require access to a > Schema > ------------------------------------------------------------------------------------ > > Key: HIVE-3528 > URL: https://issues.apache.org/jira/browse/HIVE-3528 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Reporter: Sean Busbey > Assignee: Sean Busbey > Labels: avro > Fix For: 0.11.0 > > Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt > > > Deserialization properly handles hiding Nullable Avro types, including > complex types like record, map, array, etc. However, when Serialization > attempts to write out these types it erroneously makes use of the UNION > schema that contains NULL and the other type. > This results in Schema mis-match errors for Record, Array, Enum, Fixed, and > Bytes. > Here's a [review board of unit tests that express the > problem|https://reviews.apache.org/r/7431/], as well as one that supports the > case that it's only when the schema is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira