[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

Andy Grove (Jira) Sun, 29 Mar 2020 11:35:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andy Grove updated ARROW-8258:
------------------------------
    Description: 
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
    Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}
The struct array read from the file contains:
{code:java}
[PrimitiveArray<UInt64>
[
  1567318008000000,
  1567319357000000,
  1567320092000000,
  1567321151000000, {code}
 When the Parquet arrow reader creates the record batch, the following 
validation logic fails:
{code:java}
for i in 0..columns.len() {
    if columns[i].len() != len {
        return Err(ArrowError::InvalidArgumentError(
            "all columns in a record batch must have the same 
length".to_string(),
        ));
    }
    if columns[i].data_type() != schema.field(i).data_type() {
        return Err(ArrowError::InvalidArgumentError(format!(
            "column types must match schema types, expected {:?} but found {:?} 
at column index {}",
            schema.field(i).data_type(),
            columns[i].data_type(),
            i)));
    }
}
 {code}

  was:
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
    Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}
The struct array read from the file contains:
{code:java}
[PrimitiveArray<UInt64>
[
  1567318008000000,
  1567319357000000,
  1567320092000000,
  1567321151000000, {code}
 


> [Rust] [Parquet] ArrowReader fails on some timestamp types
> ----------------------------------------------------------
>
>                 Key: ARROW-8258
>                 URL: https://issues.apache.org/jira/browse/ARROW-8258
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>             Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
>     Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray<UInt64>
> [
>   1567318008000000,
>   1567319357000000,
>   1567320092000000,
>   1567321151000000, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
>     if columns[i].len() != len {
>         return Err(ArrowError::InvalidArgumentError(
>             "all columns in a record batch must have the same 
> length".to_string(),
>         ));
>     }
>     if columns[i].data_type() != schema.field(i).data_type() {
>         return Err(ArrowError::InvalidArgumentError(format!(
>             "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
>             schema.field(i).data_type(),
>             columns[i].data_type(),
>             i)));
>     }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

Reply via email to