Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3739: Enable stress tests on Kudu ......................................................................
Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG Commit Message: PS1, Line 13: D ds Line 18: 3. Created SQL files with TPC-DS queries to be executed in Kudu. (Comment that TPC-H already exists?) http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py File testdata/bin/load-tpc-kudu.py: PS1, Line 50: with IIRC this syntax breaks on py 2.4, which we shouldn't be using for these tests but i think still barfs if this file gets referenced by other code. This might be fine as long as we're confident this will never get pulled into other modules that end up run as part of normal tests. PS1, Line 96: 'tpch', 'tpcds', 'TPCDS', 'TPCH' are both cases necessary? PS1, Line 100: parser.add_argument("-b", "--buckets", default="9", : help="Number of buckets to partition Kudu tables (only for hash-based).") Seems fine for now, but maybe we could have #buckets as a multiple of the #nodes (which I think we can get from one of the test infra classes), at least for the big tables. Maybe dimension tables are always a constant. http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpcds/tpcds_kudu_template.sql File testdata/datasets/tpcds/tpcds_kudu_template.sql: Line 1: ---- Template SQL statements to create and load TPCDS tables in can you explain a bit about how you picked the PKs? While we probably need to think more about modeling these later, I think we may need to have a way to set the #buckets for big tables and some other value for smaller tables. Maybe we can just have 2 variables for now? I think it matters because it directly affects the number of scan ranges we create, and only having 9 ranges for the bigger tables isn't enough on an 8 node cluster to expose some real issues (I'd think). PS1, Line 2: KUDU. prev line http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpch/tpch_kudu_template.sql File testdata/datasets/tpch/tpch_kudu_template.sql: Line 1: ---- Template SQL statements to create and load TPCH tables in remove the tpch tables in tpch_schema_template.sql? PS1, Line 2: KUDU prev line http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS1, Line 900: engine='' I wasn't sure what engine meant until I looked at the usage. I'm wondering if there's another way to do this, maybe after I look at the file structure I'll have an idea. If nothing else, please comment this. PS1, Line 1382: if not args.tpcds_db and not args.tpch_db and not args.random_db \ : and not args.tpch_nested_db and not args.tpch_kudu_db \ : and not args.tpcds_kudu_db and not args.query_file_path: : raise Exception("At least one of --tpcds-db, --tpch-db, --tpch-kudu-db," : "--tpcds-kudu-db, --tpch-nested-db, --random-db, --query-file-path is required") Hmm cumbersome... Maybe someone with more python experience knows a better way to handle these (opt/argparse feature?) -- To view, visit http://gerrit.cloudera.org:8080/4327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
