[jira] [Commented] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Renjie Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070666#comment-17070666
 ] 

Renjie Liu commented on ARROW-8258:
---

I think the root cause is here 
[https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L220]

The array reader only did conversion of data buffer, but left data type 
incorrect. I'll submit a PR to fix it this week.

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same 
> length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070472#comment-17070472
 ] 

Andy Grove commented on ARROW-8258:
---

[~liurenjie1024] [~sunchao] I may need some help with this one.

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)