alamb edited a comment on pull request #8007:
URL: https://github.com/apache/arrow/pull/8007#issuecomment-676450197
As the code that I removed in this PR added to support empty pages in
8a61570a8bd64ae6f6e47ce8efd2287c3c2feded (but it didn't seem to have any
tests), I was worried that this deletion would cause some sort of reversion in
behavior. To ensure it didn't, I added a test for empty pages in commit
cfdea8349a8f41229a4c8fb2cfddec3bb11df112
To try and ensure that the test covers the empty pages case, I and ran the
new test on commit ed1f771dccdde623ce85e212eccb2b573185c461 (the one right
before 8a61570a8bd64ae6f6e47ce8efd2287c3c2feded) and the test fails, thus
leading me to conclude that the code removed in this PR is unnecessary and zero
page parquet files are still supported.
Log of what I did:
Get commit right before support for empty pages:
```
alamb@MacBook-Pro rust % git checkout
ed1f771dccdde623ce85e212eccb2b573185c461
git checkout ed1f771dccdde623ce85e212eccb2b573185c461
M testing
Note: switching to 'ed1f771dccdde623ce85e212eccb2b573185c461'.
...
HEAD is now at ed1f771dc ARROW-8717: [CI][Packaging] Add build dependency on
boost to homebrew
```
Apply new test for no pages:
```
alamb@MacBook-Pro rust % git cherry-pick
cfdea8349a8f41229a4c8fb2cfddec3bb11df112
git cherry-pick cfdea8349a8f41229a4c8fb2cfddec3bb11df112
Auto-merging rust/parquet/src/arrow/array_reader.rs
[detached HEAD ab3bcdd8f] Add a test for reading empty pages
Date: Wed Aug 19 10:16:31 2020 -0400
1 file changed, 57 insertions(+), 1 deletion(-)
```
Then I had to edit the test a little to use `.len` instead of `.is_empty`
(which did not exist in the old commit):
```
diff --git a/rust/parquet/src/arrow/array_reader.rs
b/rust/parquet/src/arrow/array_reader.rs
index d70619518..54c929c05 100644
--- a/rust/parquet/src/arrow/array_reader.rs
+++ b/rust/parquet/src/arrow/array_reader.rs
@@ -967,7 +967,7 @@ mod tests {
// expect no values to be read
let array = array_reader.next_batch(50).unwrap();
- assert!(array.is_empty());
+ assert!(array.len() == 0);
}
#[test]
```
And then I ran the test, and it fails:
```
cd arrow/rust/parquet &&
PARQUET_TEST_DATA=`pwd`/../../cpp/submodules/parquet-testing/data
ARROW_TEST_DATA=`pwd`/../../testing/data cargo test
...
---- arrow::array_reader::tests::test_primitive_array_reader_empty_pages
stdout ----
thread 'arrow::array_reader::tests::test_primitive_array_reader_empty_pages'
panicked at 'called `Result::unwrap()` on an `Err` value: General("Can\'t build
array without pages!")', parquet/src/arrow/array_reader.rs:962:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
FYI @houqp and @paddyhoran
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]