[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Patch Set 1: (23 comments) http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 680: # Query to run before worst_case_sql to get the worst case performance > What's worst_case_sql? Can you give a little more flavor to the comment? I' I updated the patch significantly (based on our discussion), this is no longer here. Line 688: self.options = {} > needs a comment Done Line 691: self.modifies_table = False > let's introduce a stmt_type enum that can have values like SELECT/INSERT/UP Done Line 694: self.MIN_INT = 10 > var names are not very telling, what are these numbers? This not relevant any more. Removed. PS1, Line 695: self.MIN_INT = 20 > You've defined self.MIN_INT twice. Yeah, like I said it was a rough patch. These numbers are not needed any more. PS1, Line 698: def sql(self): > This whole set of methods could use docstrings to explain to someone roughl This method was removed. PS1, Line 701: return str(randint(10, 20)) > I would like for us to make our randomized tests more repeatable. To that e After talking to Alex, I removed any randomness from the Query class. Now 1 query object always represents a single sql statement. PS1, Line 713: result = re.sub(pattern, "10", self._sql) > Was the hard coding of 10 here meant to actually be self.MIN_INT? Yes exactly, that was the intention. However, this is no longer relevant. Line 832: raise > seems like we intentionally did not raise before? this was a mistake, removed. Line 942: LOG.debug("Result set empty for query with id %s", > Another possible use of stmt_type No longer relevant. Line 1239: try: > we can use the stmt_type here to only run explain for those stmts that supp Done Line 1356: def clean_up_database(cursor): > reset_database? Done Line 1357: LOG.info("Cleaning up {0} database".format(cursor.db_name)) > needs comment Added method docstring. Line 1360: if not table.name.endswith("_original"): > seems simpler to move the modified tables into a new database, then you don I am not sure if it's simpler. It seems to me it's simpler to have everything in a single database. We can maybe discuss this offline. Line 1371: if len(table.primary_keys) < 1: > add comment: only run for Kudu tables which must have a primary key Done Line 1373: # For now we will not handle the case in a special way where several columns are a > Why? Seems easy enough to do. I am not really sure if it's necessary to handle this in some special way. Do you have any suggestions? Line 1376: if primary_key.exact_type not in (TinyInt, SmallInt, BigInt): > Why this restriction? We want to be able to apply the "modulo operation". Added comment. Added comment. PS1, Line 1379: query.modifies_table = False > Why is this False? It's relative to the original state. For example, if we run a query If we have an unmodified lineitem item table, then we run an upsert query on it, it should remain unchanged. Added comment in class. Line 1383: if table.name + "_original" in set(table.name for table in tables): > What's the motivation for this? We might be able to achieve the same thing As discussed in person, I agree that 1 Query object should represent 1 query. Fixed. PS1, Line 1392: def generate_delete_queries(cursor): > Lots of duplicated logic here and in generate_upsert_queries(). I think add Rewrote to eliminate code duplication. Line 1433: drop_query.sql = "DROP STATS {0}".format(table.name) > why drop? we're not checking the results anyway Done Line 1438: compute_query.name = "compute_stats_{0}".format(table.name) > can you add the mt_dop option as part of the name? Done PS1, Line 1664: if query._sql in queries_with_runtime_info_by_db_and_sql[query.db_name]: > It doesn't seem right to be examining a private attribute from the outside. Yeah, I agree, this was a rough patch. Fixed now. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Preview: IMPALA-4467: Add support for CRUD operations in stress test NOTE: This is a preview, I'm only looking for high level comments about the overall design of the feature. - Added support for upsert and delete statements. Update and insert still need to be done. - Added support for compute stats. - Added support for query options to queries. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M tests/stress/concurrent_select.py M tests/util/calculation_util.py 2 files changed, 254 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Michael Brown has posted comments on this change. Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS1, Line 695: self.MIN_INT = 20 You've defined self.MIN_INT twice. PS1, Line 698: def sql(self): This whole set of methods could use docstrings to explain to someone roughly how the interface works. It's more of a burden on the reader to understand the interface by reading how it is called. PS1, Line 701: return str(randint(10, 20)) I would like for us to make our randomized tests more repeatable. To that end, I think we should, at least in new code, do the following: - provide a seed command line option in all entry points. This could probably go into cli_options since I'd like to use this for the random query generator in new code - if the seed needs to be "randomly chosen" for more chaos, or is seeded based on the timer, that's fine. However we should print the seed chosen so the queries can be repeated. - no longer use the module-level functions and instead initialize a random.Random() object. See https://docs.python.org/2/library/random.html. PS1, Line 713: result = re.sub(pattern, "10", self._sql) Was the hard coding of 10 here meant to actually be self.MIN_INT? PS1, Line 1379: query.modifies_table = False Why is this False? PS1, Line 1392: def generate_delete_queries(cursor): Lots of duplicated logic here and in generate_upsert_queries(). I think addressing Alex's point about about breaking abstractions should also consider this: the code is mostly the same except for the "setup" SQL that's needed before. PS1, Line 1664: if query._sql in queries_with_runtime_info_by_db_and_sql[query.db_name]: It doesn't seem right to be examining a private attribute from the outside. We should use the public interface and the public interface should be providing what callers need. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Alex Behm has posted comments on this change. Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Patch Set 1: (16 comments) Looks like it's going in the right direction. Most comments are questions. Might be good to outline the flow of a new Kudu stress run so it's easier to follow for reviewers. Maybe in a top-level comment inside concurrent_select.py http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 680: # Query to run before worst_case_sql to get the worst case performance What's worst_case_sql? Can you give a little more flavor to the comment? I'm not really sure why this setup query is needed. Also consider that the setup query itself may fail/crash which might complicate things. Line 688: self.options = {} needs a comment Line 691: self.modifies_table = False let's introduce a stmt_type enum that can have values like SELECT/INSERT/UPSERT/DELETE/COMPUTE_STATS etc. Line 694: self.MIN_INT = 10 var names are not very telling, what are these numbers? Line 832: raise seems like we intentionally did not raise before? Line 942: LOG.debug("Result set empty for query with id %s", Another possible use of stmt_type Line 1239: try: we can use the stmt_type here to only run explain for those stmts that support it (i.e. anything except compute stats). Line 1356: def clean_up_database(cursor): reset_database? "clean up" sounds like DROP DATABASE CASCADE Line 1357: LOG.info("Cleaning up {0} database".format(cursor.db_name)) needs comment Line 1360: if not table.name.endswith("_original"): seems simpler to move the modified tables into a new database, then you don't need to check "_original" everywhere Line 1371: if len(table.primary_keys) < 1: add comment: only run for Kudu tables which must have a primary key Line 1373: # For now we will not handle the case in a special way where several columns are a Why? Seems easy enough to do. Line 1376: if primary_key.exact_type not in (TinyInt, SmallInt, BigInt): Why this restriction? Line 1383: if table.name + "_original" in set(table.name for table in tables): What's the motivation for this? We might be able to achieve the same thing in a different way. Having a query represent two queries seems somewhat abstraction breaking. Also, for UPSERT it's ok of the keys already exist. I think wen to explicitly also test the path where PKs already exist. Line 1433: drop_query.sql = "DROP STATS {0}".format(table.name) why drop? we're not checking the results anyway Line 1438: compute_query.name = "compute_stats_{0}".format(table.name) can you add the mt_dop option as part of the name? -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/5093 Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Preview: IMPALA-4467: Add support for CRUD operations in stress test NOTE: This is a preview, I'm only looking for high level comments about the overall design of the feature. - Added support for upsert and delete statements. Update and insert still need to be done. - Added support for compute stats. - Added support for query options to queries. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M tests/stress/concurrent_select.py M tests/util/calculation_util.py 2 files changed, 257 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/1 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/5092 Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Preview: IMPALA-4467: Add support for CRUD operations in stress test NOTE: This is a preview, I'm only looking for high level comments about the overall design of the feature. - Added support for upsert and delete statements. Update and insert still need to be done. - Added support for compute stats. - Added support for query options to queries. Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e --- M tests/stress/concurrent_select.py M tests/util/calculation_util.py 2 files changed, 248 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5092/1 -- To view, visit http://gerrit.cloudera.org:8080/5092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Taras Bobrovytsky has abandoned this change. Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test .. Abandoned Pushed the wrong code review -- To view, visit http://gerrit.cloudera.org:8080/5092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky