[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Grove updated ARROW-8258: ------------------------------ Description: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} The struct array read from the file contains: {code:java} [PrimitiveArray<UInt64> [ 1567318008000000, 1567319357000000, 1567320092000000, 1567321151000000, {code} When the Parquet arrow reader creates the record batch, the following validation logic fails: {code:java} for i in 0..columns.len() { if columns[i].len() != len { return Err(ArrowError::InvalidArgumentError( "all columns in a record batch must have the same length".to_string(), )); } if columns[i].data_type() != schema.field(i).data_type() { return Err(ArrowError::InvalidArgumentError(format!( "column types must match schema types, expected {:?} but found {:?} at column index {}", schema.field(i).data_type(), columns[i].data_type(), i))); } } {code} was: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} The struct array read from the file contains: {code:java} [PrimitiveArray<UInt64> [ 1567318008000000, 1567319357000000, 1567320092000000, 1567321151000000, {code} > [Rust] [Parquet] ArrowReader fails on some timestamp types > ---------------------------------------------------------- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust > Reporter: Andy Grove > Assignee: Andy Grove > Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray<UInt64> > [ > 1567318008000000, > 1567319357000000, > 1567320092000000, > 1567321151000000, {code} > When the Parquet arrow reader creates the record batch, the following > validation logic fails: > {code:java} > for i in 0..columns.len() { > if columns[i].len() != len { > return Err(ArrowError::InvalidArgumentError( > "all columns in a record batch must have the same > length".to_string(), > )); > } > if columns[i].data_type() != schema.field(i).data_type() { > return Err(ArrowError::InvalidArgumentError(format!( > "column types must match schema types, expected {:?} but found > {:?} at column index {}", > schema.field(i).data_type(), > columns[i].data_type(), > i))); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)