[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..

IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

When loading testdata for TPC-H/TPC-DS, we first run a preload script to
generate local data, and then upload them to HDFS to be used by Hive.
The preload script currently always generates the data, which is
time-consuming in large scale factors.

This patch modifies the preload scripts to check if the last run
succeeded, and reuse the data if it does. Otherwise, generate the data
and leave a success marker in the data directory.

Tests:
 - Verified the scripts locally.

Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Reviewed-on: http://gerrit.cloudera.org:8080/18233
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M testdata/datasets/tpcds/preload
M testdata/datasets/tpch/preload
2 files changed, 14 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Feb 2022 20:28:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7852/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Feb 2022 13:48:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Feb 2022 13:48:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Feb 2022 08:40:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-16 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18233/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18233/1//COMMIT_MSG@11
PS1, Line 11:  generates t
> nit: generates
Done



--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 17 Feb 2022 01:03:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-16 Thread Quanlong Huang (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18233

to look at the new patch set (#2).

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..

IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

When loading testdata for TPC-H/TPC-DS, we first run a preload script to
generate local data, and then upload them to HDFS to be used by Hive.
The preload script currently always generates the data, which is
time-consuming in large scale factors.

This patch modifies the preload scripts to check if the last run
succeeded, and reuse the data if it does. Otherwise, generate the data
and leave a success marker in the data directory.

Tests:
 - Verified the scripts locally.

Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
---
M testdata/datasets/tpcds/preload
M testdata/datasets/tpch/preload
2 files changed, 14 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/18233/2
--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-16 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18233/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18233/1//COMMIT_MSG@11
PS1, Line 11:  re-generate
nit: generates



--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 16 Feb 2022 18:27:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18233 )

Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10158/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 15 Feb 2022 08:50:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

2022-02-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18233


Change subject: IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading
..

IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

When loading testdata for TPC-H/TPC-DS, we first run a preload script to
generate local data, and then upload them to HDFS to be used by Hive.
The preload script currently always re-generate the data, which is
time-consuming in large scale factors.

This patch modifies the preload scripts to check if the last run
succeeded, and reuse the data if it does. Otherwise, generate the data
and leave a success marker in the data directory.

Tests:
 - Verified the scripts locally.

Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
---
M testdata/datasets/tpcds/preload
M testdata/datasets/tpch/preload
2 files changed, 14 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/18233/1
--
To view, visit http://gerrit.cloudera.org:8080/18233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Gerrit-Change-Number: 18233
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang