Huaisi Xu has posted comments on this change.

Change subject: IMPALA-3092: Return NULL for Avro missing fields w/o default
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/2361/1//COMMIT_MSG
Commit Message:

Line 8: without a default value
> Instead of copying the JIRA title, describe what this patch actually does. 
Done


Line 11: ****(can someone help fill this?) slots. With this patch
> We used to require that any missing fields missing in an Avro file have a c
Done


http://gerrit.cloudera.org:8080/#/c/2361/1/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 217:             "WriteDefaultValue() doesn't support default records yet, 
should have failed";
> You'll hit this DCHECK if there's a missing record with a default value, si
Could you elaborate this? I think the frontend takes care of this already? 
https://github.com/cloudera/Impala/commit/2b4d7ecd22e914f0f4c9d612f76de22fe1d484d9#diff-c060f9513a0936a6112829eac25692f8R69

I had a table with properties:
'avro.schema.literal' = '{"type": "record", "name": "a", "fields":[

{"type": ["string", "null"], "name": "a"},
{"type": ["string", "null"], "name": "b"},
{"name": "c", "type": {"type":"record", "name":"ac", "fields": [{"type": "int", 
"name": "aa"}, {"type": "int", "name": "bb"}]}, "default": {"aa": 222, "bb": 
333}} 

]}'


Line 252:   if (!default_value) {
> if (default_value == NULL)
Done


Line 253:     RawValue::Write(NULL, avro_header_->template_tuple, slot_desc, 
NULL);
> We usually do template_tuple->SetNull(slot_desc->null_indicator_offset()) d
Done


http://gerrit.cloudera.org:8080/#/c/2361/1/tests/query_test/test_avro.py
File tests/query_test/test_avro.py:

Line 12: class TestAvroAddColumn(ImpalaTestSuite):
> yes I just understood... thank you!
I have the definition as:
CREATE EXTERNAL TABLE avro_missing_columns_test (col1 string, col2 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.literal'='{
"name": "a",
"type": "record",
"fields": [
  {"name":"boolean1", "type":"boolean"},
  {"name":"int1",     "type":"int"},
  {"name":"long1",    "type":"long"},
  {"name":"float1",   "type":"float"},
  {"name":"double1",  "type":"double"},
  {"name":"string1",  "type":"string"},
  {"name":"string2",  "type": ["string", "null"]},
  {"name":"string3",  "type": ["null", "string"]}
]}')
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 
'${hiveconf:hive.metastore.warehouse.dir}/avro_schema_resolution_test/';

What do you mean by adding a record field? add to be a nested complex type? 
frontend does not allow this, see my comment for the .cc file


-- 
To view, visit http://gerrit.cloudera.org:8080/2361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie86421fc3da51086e566998af317faa62ba9789b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-HasComments: Yes

Reply via email to