Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test ......................................................................
Patch Set 4: (13 comments) Looks pretty good! http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML Seems ok for now, but from a perspective we should consider changing these mod values to %rows of the original table. I think ultimately a user wants to control the "size" of the DML operations and that is more naturally expressed via percent of rows imo. The mod values only seem meaningful if you also know the table size, and applying the same mod values to different tables seems strange. http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 368: self._select_probability = select_probability check that this is indeed a float between 0 and 1 Line 747: self.modifies_table = False This flag is pretty confusing. What do we use it for? Even in your example, the table is "modified" in the sense that the upsert changes the state of the table, but not the user-visible contents. Also, after we've deleted some rows, an insert/upsert may even change the user-visible contents. If we need this flag we should come up with a more accurate name. Line 1297: return add comment why we need to skip compute stats Line 1428: if table.name + "_original" in set(table.name for table in tables): What does it mean if this is not true? Maybe add a LOG.debug message to explain Line 1429: cursor.execute("SHOW CREATE TABLE " + table.name) Add a note or TODO that this will not work for certain types of Kudu tables, e.g., if they have range partitions. I assume that the Kudu tables used in our testing have simple hash partitioning based on the PK? Sounds like a potential coverage gap. On the other hand, for more complex partition schemes, this SHOW CREATE TABLE trick will not work. Line 1440: For each table in the database that cursor is connected to, create several queries, Mention the limitations of this function: 1. Only generates DML statements against Kudu tables, and ignores non-Kudu tables. 2. Requires that the type of the first column of the primary key is an integer type 3. ... Line 1443: exists a table with a '_original' suffix that has unmodified, for example, tpch data. that has unmodified -> that is never modified Line 1450: continue LOG.debug what is happening here Line 1452: if primary_key.exact_type not in (TinyInt, SmallInt, BigInt): what about Int? Line 1456: continue LOG.debug what is happening here because some of these limitations are not really obvious Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " This might give you strange results for tables with multiple primary keys. Maybe we should explicitly limit this functionality to tables with a single primary-key column? Seems like a fine limitation for now, but better to be explicit about what works as expected, and what probably does not. Line 1559: cursor.execute("SHOW CREATE TABLE " + table_name) same deal with SHOW CREATE TABLE as above, add a note or TODO that this process will not work for some Kudu tables -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes