Steven Willis created ARROW-9336: ------------------------------------ Summary: Creating RecordBatch with structs missing keys results in a malformed table Key: ARROW-9336 URL: https://issues.apache.org/jira/browse/ARROW-9336 Project: Apache Arrow Issue Type: Bug Components: Ruby Affects Versions: 0.17.1 Reporter: Steven Willis
Using {{::Arrow::RecordBatch.new(schema, data)}} (which uses the {{RecordBatchBuilder}}) appears to handle when a record is missing an entry for a top level column, but it doesn't handle when a record is missing an entry within a struct column. For example, I'd expect the following code to print out {{true}} for each {{puts}}, but 2 of them are {{false}}: {code:ruby} require 'parquet' require 'arrow' schema = [ {name: "a", type: "string"}, {name: "b", type: "struct", fields: [ {name: "c", type: "string"}, {name: "d", type: "string"}, ] }, ] arrow_schema = ::Arrow::Schema.new(schema) record_batch = ::Arrow::RecordBatch.new( arrow_schema, [ {"a" => "a", "b" => {"c" => "c", }}, { "b" => {"c" => "c", }}, { "b" => { "d" => "d"}}, ] ) table = record_batch.to_table puts(table['a'][0] == 'a') puts(table['a'][1].nil?) puts(table['a'][2].nil?) puts(table['b'][0].key?('c')) puts(table['b'][0]['c'] == 'c') puts(table['b'][0].key?('d')) puts(table['b'][0]['d'].nil?) # False ? puts(!table['b'][0].key?('e')) puts(table['b'][1].key?('c')) puts(table['b'][1]['c'] == 'c') puts(table['b'][1].key?('d')) puts(table['b'][1]['d'].nil?) puts(!table['b'][1].key?('e')) puts(table['b'][2].key?('c')) puts(table['b'][2]['c'].nil?) puts(table['b'][2].key?('d')) puts(table['b'][2]['d'] == 'd') # False ? puts(!table['b'][2].key?('e')) {code} I'd expect {{puts(table)}} to print this representation: {noformat} a b 0 a {"c"=>"c", "d"=>nil} 1 {"c"=>"c", "d"=>nil} 2 {"c"=>nil, "d"=>"d"} {noformat} But it prints this instead: {noformat} a b 0 a {"c"=>"c", "d"=>"d"} 1 {"c"=>"c", "d"=>nil} 2 {"c"=>nil, "d"=>nil} {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)