Zoltan Ivanfi has uploaded a new change for review.
http://gerrit.cloudera.org:8080/4835
Change subject: Remove seemingly incorrect DCHECK-s.
..
Remove seemingly incorrect DCHECK-s.
The first conditional DCHECK means that if a page's size is 0 then it's
compressed size is also 0. This, however, seems to be a false
assumption, as the compressed output may include metadata, such as
length or checksum.
The GZIP compressor, for example, states that an input of 0 bytes
requires 23 bytes when compressed. The Snappy compressor also mentions
storing length information in the compressed output. The compressed size
estimation in the LZ4 compressor also contains a constant part.
The "Last page might be empty" comment and the second DCHECK also seems
to be based on a false assumption. If a value doesn't fit on the current
page, AppendRow creates a possible bigger new page and tries writing the
data in the new page instead. This means that if the data is bigger than
the page size, then the current page is finalized and a new page is
added, even if the original page was empty. In other words, empty pages
can occur in the middle of the pages_ array as well, not only at the end
of it.
Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
---
M be/src/exec/hdfs-parquet-table-writer.cc
1 file changed, 1 insertion(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/4835/1
--
To view, visit http://gerrit.cloudera.org:8080/4835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi