[ https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Malak updated HIVE-4022: -------------------------------- Description: Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement. Again, this affects both Avro-backed tables and native Hive tables. ***For native Hive tables: The following: echo 1,2 >twovalues.csv hive CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc; CREATE TABLE oc (z STRUCT<a: int, b: int>); INSERT INTO TABLE oc SELECT null FROM tc; produces the error FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>. The following: INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc; produces the error: FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>. ***For Avro: In HIVE-3528, there is in fact a null-struct test case in line 14 of https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt The test script at https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q does indeed work. But in that test, the query gets all of its data from a test table verbatim: INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; If instead we stick in a hard-coded null for the struct directly into the query, it fails: INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer; with the following error: FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>. Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work: INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer; was: Related to HIVE-3528, There is in fact a null-struct test case in line 14 of https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt The test script at https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q does indeed work. But in that test, the query gets all of its data from a test table verbatim: INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; If instead we stick in a hard-coded null for the struct directly into the query, it fails: INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer; with the following error: FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>. Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work: INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer; Summary: Structs and struct fields cannot be NULL in INSERT statements (was: Avro SerDe queries don't handle hard-coded nulls for optional/nullable structs) > Structs and struct fields cannot be NULL in INSERT statements > ------------------------------------------------------------- > > Key: HIVE-4022 > URL: https://issues.apache.org/jira/browse/HIVE-4022 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Reporter: Michael Malak > > Originally thought to be Avro-specific, and first noted with respect to > HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require > access to a Schema", it turns out even native Hive tables cannot store NULL > in a STRUCT field or for the entire STRUCT itself, at least when the NULL is > specified directly in the INSERT statement. > Again, this affects both Avro-backed tables and native Hive tables. > ***For native Hive tables: > The following: > echo 1,2 >twovalues.csv > hive > CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc; > CREATE TABLE oc (z STRUCT<a: int, b: int>); > INSERT INTO TABLE oc SELECT null FROM tc; > produces the error > FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target > table because column number/types are different 'oc': Cannot convert column 0 > from void to struct<a:int,b:int>. > The following: > INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc; > produces the error: > FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target > table because column number/types are different 'oc': Cannot convert column 0 > from struct<a:void,b:void> to struct<a:int,b:int>. > ***For Avro: > In HIVE-3528, there is in fact a null-struct test case in line 14 of > https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt > The test script at > https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q > does indeed work. But in that test, the query gets all of its data from a > test table verbatim: > INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; > If instead we stick in a hard-coded null for the struct directly into the > query, it fails: > INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, > bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, > bytes1, fixed1 FROM test_serializer; > with the following error: > FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target > table because column number/types are different 'as_avro': Cannot convert > column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>. > Note, though, that substituting a hard-coded null for string1 (and restoring > struct1 into the query) does work: > INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, > bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, > bytes1, fixed1 FROM test_serializer; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira