[
https://issues.apache.org/jira/browse/ARROW-16642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alenka Frim updated ARROW-16642:
--------------------------------
Summary: [C++] An Error Occured While Reading Parquet File Using C++ -
GetRecordBatchReader -Corrupt snappy compressed data. (was: An Error Occured
While Reading Parquet File Using C++ - GetRecordBatchReader -Corrupt snappy
compressed data. )
> [C++] An Error Occured While Reading Parquet File Using C++ -
> GetRecordBatchReader -Corrupt snappy compressed data.
> --------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16642
> URL: https://issues.apache.org/jira/browse/ARROW-16642
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 8.0.0
> Environment: C++,arrow 7.0.0 ,snappy 1.1.8, arrow 8.0.0
> pyarrow 7.0.0 ubuntu 9.4.0 python3.8,
> Reporter: yurikoomiga
> Priority: Major
> Labels: pull-request-available
> Attachments: test_std_02.py
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Hi All
> When I use Arrow Reading Parquet File like follow:
> ```
> auto st = parquet::arrow::FileReader::Make(
> arrow::default_memory_pool(),
> parquet::ParquetFileReader::Open(_parquet, _properties),
> &_reader);
> arrow::Status status =
> _reader->GetRecordBatchReader(\{_current_group},_parquet_column_ids,
> &_rb_batch);
> _reader->set_batch_size(65536);
> _reader->set_use_threads(true);
> status = _rb_batch->ReadNext(&_batch); `
> ```
> status is not ok and an error occured like this:
> `IOError: Corrupt snappy compressed data.`
> When I comment out this statement ` _reader->set_use_threads(true);`,The
> program runs normally and I can read parquet file well.
> Program errors only occur when I read multiple columns and using
> `_reader->set_use_threads(true); `and a single column will not occur error
> The testing parquet file is created by pyarrow,I use only 1 group and each
> group has 3000000 records.
> The parquet file has 20 columns including int and string types
> you can create a test parquet file using attachment python script
> In my case,I read 0,1,2,3,4,5,6 index columns
> Reading file using C++,arrow 7.0.0 ,snappy 1.1.8
> Writting file using python3.8 ,pyarrow 7.0.0
> Looking forward to your reply
> Thank you!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)