[ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
-----------------------------------
    Description: 
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

  
{code:java}
#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
---- arrow::arrow_reader::tests::failing_test stdout ----
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 

  was:
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

 
{code:java}

#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
---- arrow::arrow_reader::tests::failing_test stdout ----
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 


> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-13120
>                 URL: https://issues.apache.org/jira/browse/ARROW-13120
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Morgan Cassels
>            Priority: Major
>         Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>   
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>    record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
> ---- arrow::arrow_reader::tests::failing_test stdout ----
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to