mapleFU opened a new pull request, #14351: URL: https://github.com/apache/arrow/pull/14351
This patch add crc in writing and reading DATA_PAGE. And crc for dictionary, DATA_PAGE_V2 will be added in comming patches. * [x] Implement crc in writing DATA_PAGE * [x] Implement crc in reading DATA_PAGE * [x] Adding config for write crc page and checking * [ ] Testing DATA_PAGE with crc, the testing maybe borrowed from `parquet-mr` And there is some questions, I found that in thirdparty, arrow imports `crc32c`, which is extracted from leveldb's crc library. But seems that our standard uses crc32, which has a different magic number. So I use `boost/crc`, which is used in gandiva. The default config of `enable crc` in parquet-mr for writer is true, but here I use `false`, because set it true may slow down writer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
