Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test ......................................................................
Patch Set 2: (25 comments) http://gerrit.cloudera.org:8080/#/c/5093/2//COMMIT_MSG Commit Message: Line 8: > It would help to be more expansive in the commit message here. How would so Done PS2, Line 11: - Update impyla version in order to be able to have access to query : error text for DML queries. > Did you regression test the Impyla update? We have system tests in Kudu tha I plan to send this through GVO, so it will be tested there. http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS2, Line 279: self._select_probability = 0.5 > Nit: I think this should default to None, because the interface to run_quer Done PS2, Line 331: queries have completed. 'select_probabilty' > 1. spelling: probability Done PS2, Line 725: # set_up_sql accoplishes this task. > spelling: accomplishes Done PS2, Line 735: # If we run this query on a table in initial state, the table remains unchanged if : # this is False. (For example running a query like : # "upsert into lineitem select * from lineitem_original" leaves lineitem unmodified if : # it is in original state.) : self.modifies_table = False > Be a little more explicit here. This claim presumes lineitem's data was alr Yes, insert and upsert should have modifies_table set to false. After insert and upsert queries are executed, the table will remain unmodified. Imagine we have 2 identical tables with identical data: lineitem and lineitem_original. Running a query like: upsert into lineitem select * from lineitem original. This query does not change the table at all. After the query is executed, the table is identical to how it was before the query. PS2, Line 740: # Type of query. Can have the following values: SELECT, COMPUTE_STATS, INSERT, UPDATE, : # UPSERT, DELETE. : self.query_type = 'SELECT' > Non-blocking suggestion: instead of raw strings create a class that has the Good suggestion. Done. PS2, Line 794: if run_set_up and query.set_up_sql: > If only one of these is true, is that a programming error? Do we need to as No, we only run set up during the binary search phase. Added comment to the docstring. PS2, Line 1061: # TODO: > It would be good if TODOs were bound to Jiras. Can you file one, please? Done PS2, Line 1411: _orginal > spelling: original Done PS2, Line 1417: set(table.name for table in tables): > What is the purpose of creating a set here? For performance, it's faster to check if an element exists in a set than a list. It doesn't really matter though since the list of table names is going to be small anyways. Should I leave it as is? PS2, Line 1427: """Generate insert, upsert, update, delete DML statements. : : For each table in the database that cursor is connected to, create several queries, : one for each mod value in 'dml_mod_values'. This value controls which rows will be : affected. The generated queries assume that for each table in the database, there : exists a table with a '_original' suffix that has unmodified, for example, tpch data. : """ > Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left. I agree, I don't like this type of indentation as well. Done. PS2, Line 1434: tables = [cursor.describe_table(t) for t in cursor.list_table_names()] > Save an indent level after L1437 and change this expression to only include Done PS2, Line 1443: numberical > just say integer, since decimals etc. are also numerical Done PS2, Line 1449: insert_query.modifies_table = False > Add a comment explaining why this is False. It seems to me an INSERT absolu Yes an insert can modify the table, but not if the table is in the original state. If let's say lineitem and lineitem_original are identical, then a query like insert into table lineitem select * from lineitem_original Where x should remain lineitem unchanged. PS2, Line 1464: for col in table.cols if not col.is_primary_key > You could use table.updatable_column_names instead. Not sure, is it always going to be the case that a column is updatable if and only if it's not a primary key? PS2, Line 1466: "UPDATE a SET {3} FROM {0} a JOIN {0}_original b " : "ON a.{1} = b.{1} + 1 WHERE a.{1} % {2} = 0").format( : table.name, primary_key.name, mod_value, update_list) > fairly complex format string. Please use keys in places of numbers. You ca Done PS2, Line 1471: upsert_query.modifies_table = False > Add a comment explaining why this is False. It seems to me an UPSERT absolu Done PS2, Line 1505: return result > Should you assert that len(result) > 0 ? The continues on L1439 and L1445, Done PS2, Line 1541: search_pattern = r"\.(.*) \(" > It would be safer to include "CREATE TABLE" in this pattern. Done PS2, Line 1613: parser.add_argument( : "--mem-limit-padding-abs", type=int, default=0, : help="Pad query mem limits found by solo execution with this value (in megabytes)" : " running concurrently. After padding queries will not be expected to fail" : " due to mem limit exceeded.") > Please document a scenario in which someone would want to use this. It's hard to say when someone would want to use this. It's just a knob that can be adjusted if necessary. I updated the comment a little. PS2, Line 1623: "--reset-database" > Nit; this resets all databases, so it should be "--reset-databases" Done PS2, Line 1628: parser.add_argument( : "--dml-mod-values", nargs="+", type=int, default=[11], : help="List of mod values to use for the DML queries.") > Please provide guidance on what this means and how someone would use it. Done PS2, Line 1811: if args.reset_database: : for database in set(query.db_name for query in queries): : with impala.cursor(db_name=database) as cursor: : reset_database(cursor) > Won't this reset *all* databases, not just ones on which you want to run DM Yes, but this might be the behavior we desire. For example, we just finished a DML run on the tpch database. Now we want to run a standard job with only select queries + correctness verification). We need to reset the database before starting the run. PS2, Line 1898: if args.reset_database: : for database in set(query.db_name for query in queries): : with impala.cursor(db_name=database) as cursor: : reset_database(cursor) > Can you add a comment describing why this is called again? Done -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes