Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
......................................................................


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4327/1//COMMIT_MSG
Commit Message:

PS1, Line 13: D
ds


Line 18: 3. Created SQL files with TPC-DS queries to be executed in Kudu.
(Comment that TPC-H already exists?)


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS1, Line 50: with
IIRC this syntax breaks on py 2.4, which we shouldn't be using for these tests 
but i think still barfs if this file gets referenced by other code. This might 
be fine as long as we're confident this will never get pulled into other 
modules that end up run as part of normal tests.


PS1, Line 96: 'tpch', 'tpcds', 'TPCDS', 'TPCH'
are both cases necessary?


PS1, Line 100:   parser.add_argument("-b", "--buckets", default="9",
             :       help="Number of buckets to partition Kudu tables (only for 
hash-based).")
Seems fine for now, but maybe we could have #buckets as a multiple of the 
#nodes (which I think we can get from one of the test infra classes), at least 
for the big tables. Maybe dimension tables are always a constant.


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCDS tables in
can you explain a bit about how you picked the PKs? While we probably need to 
think more about modeling these later, I think we may need to have a way to set 
the #buckets for big tables and some other value for smaller tables. Maybe we 
can just have 2 variables for now? I think it matters because it directly 
affects the number of scan ranges we create, and only having 9 ranges for the 
bigger tables isn't enough on an 8 node cluster to expose some real issues (I'd 
think).


PS1, Line 2: KUDU.
prev line


http://gerrit.cloudera.org:8080/#/c/4327/1/testdata/datasets/tpch/tpch_kudu_template.sql
File testdata/datasets/tpch/tpch_kudu_template.sql:

Line 1: ---- Template SQL statements to create and load TPCH tables in
remove the tpch tables in tpch_schema_template.sql?


PS1, Line 2: KUDU
prev line


http://gerrit.cloudera.org:8080/#/c/4327/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS1, Line 900: engine=''
I wasn't sure what engine meant until I looked at the usage. I'm wondering if 
there's another way to do this, maybe after I look at the file structure I'll 
have an idea. If nothing else, please comment this.


PS1, Line 1382:   if not args.tpcds_db and not args.tpch_db and not 
args.random_db \
              :       and not args.tpch_nested_db and not args.tpch_kudu_db \
              :       and not args.tpcds_kudu_db and not args.query_file_path:
              :     raise Exception("At least one of --tpcds-db, --tpch-db, 
--tpch-kudu-db,"
              :         "--tpcds-kudu-db, --tpch-nested-db, --random-db, 
--query-file-path is required")
Hmm cumbersome... Maybe someone with more python experience knows a better way 
to handle these (opt/argparse feature?)


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

Reply via email to