Juan Yu has posted comments on this change. Change subject: IMPALA-3038: Add multistream gzip/bzip2 test coverage ......................................................................
Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/2543/7/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: Line 255: // Repeatedly pick random-size input data(~1MB), compress it, then concatenate > What does the ~1MB mean? I think this is why I got confused about L270 earl I try to simulate pbzip2, it split large input into smaller chunks then compress them in parallel and then concatenate result. I take raw_input(this is 1M), shorten it to make variable length, then compress it. repeat those to get multiple streams. int len = RAW_INPUT_SIZE - (rand() % 1024); compressor->ProcessBlock(false, len, raw_input, &compressed_length, &compressed_stream); The total output compressed data will be no more than 16M (this is to make sure it's larger the 8M IO buffer). for the raw input I generated, the compress ratio is about 2:1. so I limit the total input uncompressed data to no more than 32M. Line 266: EXPECT_OK(Codec::CreateCompressor(&mem_pool_, true, format, &compressor)); > Move created compressor above comment Done -- To view, visit http://gerrit.cloudera.org:8080/2543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b0e1971145dd457e71fc9c00ce7c06fff8dea88 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Juan Yu <j...@cloudera.com> Gerrit-Reviewer: Juan Yu <j...@cloudera.com> Gerrit-Reviewer: Skye Wanderman-Milne <s...@cloudera.com> Gerrit-HasComments: Yes