[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column
[ https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-13120: --- Component/s: Rust > [Rust][Parquet] Cannot read multiple batches from parquet with string list > column > - > > Key: ARROW-13120 > URL: https://issues.apache.org/jira/browse/ARROW-13120 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Morgan Cassels >Priority: Major > Attachments: test.parquet > > > This issue only occurs when the batch size < the number of rows in the table. > The attached parquet `test.parquet` has 31430 rows and a single column > containing string lists. This issue does not appear to occur for parquets > with integer list columns. > > {code:java} > #[test] > fn failing_test() { > let parquet_file_reader = get_test_reader("test.parquet"); > let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); > let mut record_batches = Vec::new(); > let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); > for batch in record_batch_reader { >record_batches.push(batch); > } > } > {code} > > {code:java} > arrow::arrow_reader::tests::failing_test stdout > thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected > infallable creation of GenericListArray from ArrayDataRef failed: > InvalidArgumentError("offsets do not start at zero")', > arrow/src/array/array_list.rs:195:45 > note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column
[ https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-13120: --- Description: This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns. {code:java} #[test] fn failing_test() { let parquet_file_reader = get_test_reader("test.parquet"); let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); let mut record_batches = Vec::new(); let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); for batch in record_batch_reader { record_batches.push(batch); } } {code} {code:java} arrow::arrow_reader::tests::failing_test stdout thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace {code} was: This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns. {code:java} #[test] fn failing_test() { let parquet_file_reader = get_test_reader("test.parquet"); let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); let mut record_batches = Vec::new(); let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); for batch in record_batch_reader { record_batches.push(batch); } } {code} {code:java} arrow::arrow_reader::tests::failing_test stdout thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace {code} > [Rust][Parquet] Cannot read multiple batches from parquet with string list > column > - > > Key: ARROW-13120 > URL: https://issues.apache.org/jira/browse/ARROW-13120 > Project: Apache Arrow > Issue Type: Bug >Reporter: Morgan Cassels >Priority: Major > Attachments: test.parquet > > > This issue only occurs when the batch size < the number of rows in the table. > The attached parquet `test.parquet` has 31430 rows and a single column > containing string lists. This issue does not appear to occur for parquets > with integer list columns. > > {code:java} > #[test] > fn failing_test() { > let parquet_file_reader = get_test_reader("test.parquet"); > let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); > let mut record_batches = Vec::new(); > let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); > for batch in record_batch_reader { >record_batches.push(batch); > } > } > {code} > > {code:java} > arrow::arrow_reader::tests::failing_test stdout > thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected > infallable creation of GenericListArray from ArrayDataRef failed: > InvalidArgumentError("offsets do not start at zero")', > arrow/src/array/array_list.rs:195:45 > note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column
[ https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-13120: --- Description: This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns. {code:java} #[test] fn failing_test() { let parquet_file_reader = get_test_reader("test.parquet"); let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); let mut record_batches = Vec::new(); let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); for batch in record_batch_reader { record_batches.push(batch); } } {code} {code:java} arrow::arrow_reader::tests::failing_test stdout thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace {code} was: This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns. ``` #[test] fnfailing_test() { letparquet_file_reader = get_test_reader("test.parquet"); letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader); letmutrecord_batches = Vec::new(); letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); forbatchinrecord_batch_reader { record_batches.push(batch); } } ``` ``` arrow::arrow_reader::tests::failing_test stdout thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` > [Rust][Parquet] Cannot read multiple batches from parquet with string list > column > - > > Key: ARROW-13120 > URL: https://issues.apache.org/jira/browse/ARROW-13120 > Project: Apache Arrow > Issue Type: Bug >Reporter: Morgan Cassels >Priority: Major > Attachments: test.parquet > > > This issue only occurs when the batch size < the number of rows in the table. > The attached parquet `test.parquet` has 31430 rows and a single column > containing string lists. This issue does not appear to occur for parquets > with integer list columns. > > > {code:java} > #[test] > fn failing_test() { > let parquet_file_reader = get_test_reader("test.parquet"); > let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader); > let mut record_batches = Vec::new(); > let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); > for batch in record_batch_reader { >record_batches.push(batch); > } > } > {code} > > {code:java} > arrow::arrow_reader::tests::failing_test stdout > thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected > infallable creation of GenericListArray from ArrayDataRef failed: > InvalidArgumentError("offsets do not start at zero")', > arrow/src/array/array_list.rs:195:45 > note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column
Morgan Cassels created ARROW-13120: -- Summary: [Rust][Parquet] Cannot read multiple batches from parquet with string list column Key: ARROW-13120 URL: https://issues.apache.org/jira/browse/ARROW-13120 Project: Apache Arrow Issue Type: Bug Reporter: Morgan Cassels Attachments: test.parquet This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns. ``` #[test] fnfailing_test() { letparquet_file_reader = get_test_reader("test.parquet"); letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader); letmutrecord_batches = Vec::new(); letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap(); forbatchinrecord_batch_reader { record_batches.push(batch); } } ``` ``` arrow::arrow_reader::tests::failing_test stdout thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns
[ https://issues.apache.org/jira/browse/ARROW-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-11940: --- Description: Joining DataFrames on a TimestampMillisecond column gives error: ``` 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type in hasher") arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30 ' ``` was: Joining DataFrames on a TimestampMillisecond column gives error: ``` 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type in hasher")' ``` > [Rust][Datafusion] Support joins on TimestampMillisecond columns > > > Key: ARROW-11940 > URL: https://issues.apache.org/jira/browse/ARROW-11940 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - DataFusion >Reporter: Morgan Cassels >Priority: Major > > Joining DataFrames on a TimestampMillisecond column gives error: > ``` > 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type > in hasher") > arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30 > ' > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns
Morgan Cassels created ARROW-11940: -- Summary: [Rust][Datafusion] Support joins on TimestampMillisecond columns Key: ARROW-11940 URL: https://issues.apache.org/jira/browse/ARROW-11940 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: Morgan Cassels Joining DataFrames on a TimestampMillisecond column gives error: ``` 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type in hasher")' ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10329) [Rust][Datafusion] Datafusion queries involving a column name that begins with a number produces unexpected results
[ https://issues.apache.org/jira/browse/ARROW-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-10329: --- Summary: [Rust][Datafusion] Datafusion queries involving a column name that begins with a number produces unexpected results (was: Datafusion queries involving a column name that begins with a number produces unexpected results) > [Rust][Datafusion] Datafusion queries involving a column name that begins > with a number produces unexpected results > --- > > Key: ARROW-10329 > URL: https://issues.apache.org/jira/browse/ARROW-10329 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: Morgan Cassels >Priority: Major > > This bug can be worked around by wrapping column names in quotes. > Example: > {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}} > {{let logical_plan = ctx.create_logical_plan(query)?;}} > {{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}} > The resulting table produced by this query looks like: > ||{{_20mph}}||{{_25mph}}|| > |16|21| > |16|21| > Every row is identical, where the column value is equal to the initial number > that appears in the column name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10329) Datafusion queries involving a column name that begins with a number produces unexpected results
[ https://issues.apache.org/jira/browse/ARROW-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Morgan Cassels updated ARROW-10329: --- Description: This bug can be worked around by wrapping column names in quotes. Example: {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}} {{let logical_plan = ctx.create_logical_plan(query)?;}} {{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}} The resulting table produced by this query looks like: ||{{_20mph}}||{{_25mph}}|| |16|21| |16|21| Every row is identical, where the column value is equal to the initial number that appears in the column name. was: This bug can be worked around by wrapping column names in quotes. Example: {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}} {{let logical_plan = ctx.create_logical_plan(query)?;}} {{logical_plan.schema().fields() }}now has fields: {{_20mph, _25mph}} The resulting table produced by this query looks like: ||{{_20mph}}||{{_25mph}}|| |16|21| |16|21| Every row is identical, where the column value is equal to the initial number that appears in the column name. > Datafusion queries involving a column name that begins with a number produces > unexpected results > > > Key: ARROW-10329 > URL: https://issues.apache.org/jira/browse/ARROW-10329 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: Morgan Cassels >Priority: Major > > This bug can be worked around by wrapping column names in quotes. > Example: > {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}} > {{let logical_plan = ctx.create_logical_plan(query)?;}} > {{logical_plan.schema().fields() now has fields: [_20mph, _25mph]}} > The resulting table produced by this query looks like: > ||{{_20mph}}||{{_25mph}}|| > |16|21| > |16|21| > Every row is identical, where the column value is equal to the initial number > that appears in the column name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10329) Datafusion queries involving a column name that begins with a number produces unexpected results
Morgan Cassels created ARROW-10329: -- Summary: Datafusion queries involving a column name that begins with a number produces unexpected results Key: ARROW-10329 URL: https://issues.apache.org/jira/browse/ARROW-10329 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Morgan Cassels This bug can be worked around by wrapping column names in quotes. Example: {{let query = "SELECT 16_20mph, 21_25mph FROM foo;"}} {{let logical_plan = ctx.create_logical_plan(query)?;}} {{logical_plan.schema().fields() }}now has fields: {{_20mph, _25mph}} The resulting table produced by this query looks like: ||{{_20mph}}||{{_25mph}}|| |16|21| |16|21| Every row is identical, where the column value is equal to the initial number that appears in the column name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9696) [Rust] [Datafusion] nested binary expressions broken
Morgan Cassels created ARROW-9696: - Summary: [Rust] [Datafusion] nested binary expressions broken Key: ARROW-9696 URL: https://issues.apache.org/jira/browse/ARROW-9696 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Morgan Cassels Assignee: Morgan Cassels Nested binary expressions were previously supported and broken by the sqlparser upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)