David Knupp has uploaded a new change for review. http://gerrit.cloudera.org:8080/5177
Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales ...................................................................... IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/1 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp <dkn...@cloudera.com>