Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3739: Enable stress tests on Kudu ......................................................................
Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG Commit Message: PS1, Line 13: D > ds Done http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py File testdata/bin/load-tpc-kudu.py: PS1, Line 50: with > IIRC this syntax breaks on py 2.4, which we shouldn't be using for these te Hm, I've seen other scripts (e.g. load_nested.py) already using the same syntax. Maybe Michael has a recommendation here. PS1, Line 96: 'tpch', 'tpcds', 'TPCDS', 'TPCH' > are both cases necessary? I just added it for usability in case someone decides to specify the workload in upper case. Removed. PS1, Line 100: parser.add_argument("-b", "--buckets", default="9", : help="Number of buckets to partition Kudu tables (only for hash-based).") > Seems fine for now, but maybe we could have #buckets as a multiple of the # Left a TODO for now, so we can revisit later depending on how we can to test this. http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpcds/tpcds_kudu_template.sql File testdata/datasets/tpcds/tpcds_kudu_template.sql: Line 1: ---- Template SQL statements to create and load TPCDS tables in > can you explain a bit about how you picked the PKs? While we probably need Good points. In general, I followed the spec in setting the PK columns. Added a TODO to have two different variables for buckets one for fact and one for dimension tables. PS1, Line 2: KUDU. > prev line Done http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpch/tpch_kudu_template.sql File testdata/datasets/tpch/tpch_kudu_template.sql: Line 1: ---- Template SQL statements to create and load TPCH tables in > remove the tpch tables in tpch_schema_template.sql? Added a TODO to do this in a follow up patch. PS1, Line 2: KUDU > prev line Done http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS1, Line 900: engine='' > I wasn't sure what engine meant until I looked at the usage. I'm wondering Yeah, I over-generalized this one. Changed it to something more explicit. Done PS1, Line 1382: if not args.tpcds_db and not args.tpch_db and not args.random_db \ : and not args.tpch_nested_db and not args.tpch_kudu_db \ : and not args.tpcds_kudu_db and not args.query_file_path: : raise Exception("At least one of --tpcds-db, --tpch-db, --tpch-kudu-db," : "--tpcds-kudu-db, --tpch-nested-db, --random-db, --query-file-path is required") > Hmm cumbersome... Maybe someone with more python experience knows a better Hm, maybe Michael has a suggestion here. -- To view, visit http://gerrit.cloudera.org:8080/4327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes