Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
......................................................................


Patch Set 1:

(16 comments)

Looks like it's going in the right direction.

Most comments are questions.

Might be good to outline the flow of a new Kudu stress run so it's easier to 
follow for reviewers. Maybe in a top-level comment inside concurrent_select.py

http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 680:     # Query to run before worst_case_sql to get the worst case 
performance
What's worst_case_sql? Can you give a little more flavor to the comment? I'm 
not really sure why this setup query is needed. Also consider that the setup 
query itself may fail/crash which might complicate things.


Line 688:     self.options = {}
needs a comment


Line 691:     self.modifies_table = False
let's introduce a stmt_type enum that can have values like 
SELECT/INSERT/UPSERT/DELETE/COMPUTE_STATS etc.


Line 694:     self.MIN_INT = 10
var names are not very telling, what are these numbers?


Line 832:       raise
seems like we intentionally did not raise before?


Line 942:           LOG.debug("Result set empty for query with id %s",
Another possible use of stmt_type


Line 1239:     try:
we can use the stmt_type here to only run explain for those stmts that support 
it (i.e. anything except compute stats).


Line 1356: def clean_up_database(cursor):
reset_database?

"clean up" sounds like DROP DATABASE CASCADE


Line 1357:   LOG.info("Cleaning up {0} database".format(cursor.db_name))
needs comment


Line 1360:     if not table.name.endswith("_original"):
seems simpler to move the modified tables into a new database, then you don't 
need to check "_original" everywhere


Line 1371:       if len(table.primary_keys) < 1:
add comment: only run for Kudu tables which must have a primary key


Line 1373:       # For now we will not handle the case in a special way where 
several columns are a
Why? Seems easy enough to do.


Line 1376:       if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
Why this restriction?


Line 1383:       if table.name + "_original" in set(table.name for table in 
tables):
What's the motivation for this? We might be able to achieve the same thing in a 
different way. Having a query represent two queries seems somewhat abstraction 
breaking.

Also, for UPSERT it's ok of the keys already exist. I think wen to explicitly 
also test the path where PKs already exist.


Line 1433:       drop_query.sql = "DROP STATS {0}".format(table.name)
why drop? we're not checking the results anyway


Line 1438:       compute_query.name = "compute_stats_{0}".format(table.name)
can you add the mt_dop option as part of the name?


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to