Hello Marcel Kornacker, Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4863 to look at the new patch set (#19). Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input ...................................................................... IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. - It adds tests to the Analyzer test suite. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/hdfs-text-table-writer.h M bin/impala-config.sh M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert.py M tests/query_test/test_insert_behaviour.py 26 files changed, 457 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/19 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com>