[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-21 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
---
Component/s: Rust

> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> -
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Morgan Cassels
>Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>   
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
>  arrow::arrow_reader::tests::failing_test stdout 
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
---
Description: 
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

  
{code:java}
#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 

  was:
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

 
{code:java}

#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 


> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> -
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Morgan Cassels
>Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>   
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
>  arrow::arrow_reader::tests::failing_test stdout 
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
---
Description: 
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

 
{code:java}

#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 

  was:
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);

letmutrecord_batches = Vec::new();

letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();

forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
 arrow::arrow_reader::tests::failing_test stdout 

thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```


> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> -
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Morgan Cassels
>Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>  
>  
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
>  arrow::arrow_reader::tests::failing_test stdout 
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-13120:
--

 Summary: [Rust][Parquet] Cannot read multiple batches from parquet 
with string list column
 Key: ARROW-13120
 URL: https://issues.apache.org/jira/browse/ARROW-13120
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Morgan Cassels
 Attachments: test.parquet

This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);

letmutrecord_batches = Vec::new();

letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();

forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
 arrow::arrow_reader::tests::failing_test stdout 

thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns

2021-03-11 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-11940:
---
Description: 
Joining DataFrames on a TimestampMillisecond column gives error:

```

'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
in hasher")

arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30

'

```

  was:
Joining DataFrames on a TimestampMillisecond column gives error:

```

'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
in hasher")'

```


> [Rust][Datafusion] Support joins on TimestampMillisecond columns
> 
>
> Key: ARROW-11940
> URL: https://issues.apache.org/jira/browse/ARROW-11940
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Priority: Major
>
> Joining DataFrames on a TimestampMillisecond column gives error:
> ```
> 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
> in hasher")
> arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30
> '
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns

2021-03-11 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-11940:
--

 Summary: [Rust][Datafusion] Support joins on TimestampMillisecond 
columns
 Key: ARROW-11940
 URL: https://issues.apache.org/jira/browse/ARROW-11940
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - DataFusion
Reporter: Morgan Cassels


Joining DataFrames on a TimestampMillisecond column gives error:

```

'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
in hasher")'

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10329) [Rust][Datafusion] Datafusion queries involving a column name that begins with a number produces unexpected results

2020-10-16 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-10329:
---
Summary: [Rust][Datafusion] Datafusion queries involving a column name that 
begins with a number produces unexpected results  (was: Datafusion queries 
involving a column name that begins with a number produces unexpected results)

> [Rust][Datafusion] Datafusion queries involving a column name that begins 
> with a number produces unexpected results
> ---
>
> Key: ARROW-10329
> URL: https://issues.apache.org/jira/browse/ARROW-10329
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Priority: Major
>
> This bug can be worked around by wrapping column names in quotes.
> Example:
> {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}}
> {{let logical_plan = ctx.create_logical_plan(query)?;}}
> {{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}}
> The resulting table produced by this query looks like:
> ||{{_20mph}}||{{_25mph}}||
> |16|21|
> |16|21|
> Every row is identical, where the column value is equal to the initial number 
> that appears in the column name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10329) Datafusion queries involving a column name that begins with a number produces unexpected results

2020-10-16 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-10329:
---
Description: 
This bug can be worked around by wrapping column names in quotes.

Example:

{{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}}

{{let logical_plan = ctx.create_logical_plan(query)?;}}

{{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}}

The resulting table produced by this query looks like:
||{{_20mph}}||{{_25mph}}||
|16|21|
|16|21|

Every row is identical, where the column value is equal to the initial number 
that appears in the column name.

  was:
This bug can be worked around by wrapping column names in quotes.

Example:

{{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}}

{{let logical_plan = ctx.create_logical_plan(query)?;}}

{{logical_plan.schema().fields() }}now has fields: {{_20mph, _25mph}}

The resulting table produced by this query looks like:
||{{_20mph}}||{{_25mph}}||
|16|21|
|16|21|

Every row is identical, where the column value is equal to the initial number 
that appears in the column name.


> Datafusion queries involving a column name that begins with a number produces 
> unexpected results
> 
>
> Key: ARROW-10329
> URL: https://issues.apache.org/jira/browse/ARROW-10329
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Priority: Major
>
> This bug can be worked around by wrapping column names in quotes.
> Example:
> {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}}
> {{let logical_plan = ctx.create_logical_plan(query)?;}}
> {{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}}
> The resulting table produced by this query looks like:
> ||{{_20mph}}||{{_25mph}}||
> |16|21|
> |16|21|
> Every row is identical, where the column value is equal to the initial number 
> that appears in the column name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10329) Datafusion queries involving a column name that begins with a number produces unexpected results

2020-10-16 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-10329:
--

 Summary: Datafusion queries involving a column name that begins 
with a number produces unexpected results
 Key: ARROW-10329
 URL: https://issues.apache.org/jira/browse/ARROW-10329
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Morgan Cassels


This bug can be worked around by wrapping column names in quotes.

Example:

{{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}}

{{let logical_plan = ctx.create_logical_plan(query)?;}}

{{logical_plan.schema().fields() }}now has fields: {{_20mph, _25mph}}

The resulting table produced by this query looks like:
||{{_20mph}}||{{_25mph}}||
|16|21|
|16|21|

Every row is identical, where the column value is equal to the initial number 
that appears in the column name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9696) [Rust] [Datafusion] nested binary expressions broken

2020-08-11 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-9696:
-

 Summary: [Rust] [Datafusion] nested binary expressions broken
 Key: ARROW-9696
 URL: https://issues.apache.org/jira/browse/ARROW-9696
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Morgan Cassels
Assignee: Morgan Cassels


Nested binary expressions were previously supported and broken by the sqlparser 
upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)