Attila Jeges has uploaded a new patch set (#3). Change subject: IMPALA-3079: Fix sequence file writer ......................................................................
IMPALA-3079: Fix sequence file writer Before the fix, sequence file writer produced corrupt files in some cases. Steps to reproduce: SET ALLOW_UNSUPPORTED_FORMATS=1; create table store_sales_seq_snap like tpcds_parquet.store_sales stored as SEQUENCEFILE; insert into store_sales_seq_snap partition(ss_sold_date_sk) select * from tpcds_parquet.store_sales where ss_sold_date_sk between 2450816 and 2451200; The insert statement produces a corrupt file that cannot be read back. This change fixes: - The implementation of zero-compressed encoding in ReadWriteUtil class. - The calculation of block sizes in SnappyBlockCompressor class. - Creating record/block compressed sequence files in HdfsSequenceTableWriter class. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 8 files changed, 385 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/3 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>