Huaisi Xu has posted comments on this change. Change subject: IMPALA-3092: Return NULL for Avro missing fields w/o default ......................................................................
Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/2361/1//COMMIT_MSG Commit Message: Line 8: without a default value > Instead of copying the JIRA title, describe what this patch actually does. Done Line 11: ****(can someone help fill this?) slots. With this patch > We used to require that any missing fields missing in an Avro file have a c Done http://gerrit.cloudera.org:8080/#/c/2361/1/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: Line 217: "WriteDefaultValue() doesn't support default records yet, should have failed"; > You'll hit this DCHECK if there's a missing record with a default value, si Could you elaborate this? I think the frontend takes care of this already? https://github.com/cloudera/Impala/commit/2b4d7ecd22e914f0f4c9d612f76de22fe1d484d9#diff-c060f9513a0936a6112829eac25692f8R69 I had a table with properties: 'avro.schema.literal' = '{"type": "record", "name": "a", "fields":[ {"type": ["string", "null"], "name": "a"}, {"type": ["string", "null"], "name": "b"}, {"name": "c", "type": {"type":"record", "name":"ac", "fields": [{"type": "int", "name": "aa"}, {"type": "int", "name": "bb"}]}, "default": {"aa": 222, "bb": 333}} ]}' Line 252: if (!default_value) { > if (default_value == NULL) Done Line 253: RawValue::Write(NULL, avro_header_->template_tuple, slot_desc, NULL); > We usually do template_tuple->SetNull(slot_desc->null_indicator_offset()) d Done http://gerrit.cloudera.org:8080/#/c/2361/1/tests/query_test/test_avro.py File tests/query_test/test_avro.py: Line 12: class TestAvroAddColumn(ImpalaTestSuite): > yes I just understood... thank you! I have the definition as: CREATE EXTERNAL TABLE avro_missing_columns_test (col1 string, col2 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ "name": "a", "type": "record", "fields": [ {"name":"boolean1", "type":"boolean"}, {"name":"int1", "type":"int"}, {"name":"long1", "type":"long"}, {"name":"float1", "type":"float"}, {"name":"double1", "type":"double"}, {"name":"string1", "type":"string"}, {"name":"string2", "type": ["string", "null"]}, {"name":"string3", "type": ["null", "string"]} ]}') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '${hiveconf:hive.metastore.warehouse.dir}/avro_schema_resolution_test/'; What do you mean by adding a record field? add to be a nested complex type? frontend does not allow this, see my comment for the .cc file -- To view, visit http://gerrit.cloudera.org:8080/2361 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie86421fc3da51086e566998af317faa62ba9789b Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Huaisi Xu <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-HasComments: Yes
