Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
......................................................................


Patch Set 4:

(13 comments)

Looks pretty good!

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
Seems ok for now, but from a perspective we should consider changing these mod 
values to %rows of the original table. I think ultimately a user wants to 
control the "size" of the DML operations and that is more naturally expressed 
via percent of rows imo. The mod values only seem meaningful if you also know 
the table size, and applying the same mod values to different tables seems 
strange.


http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 368:     self._select_probability = select_probability
check that this is indeed a float between 0 and 1


Line 747:     self.modifies_table = False
This flag is pretty confusing. What do we use it for?

Even in your example, the table is "modified" in the sense that the upsert 
changes the state of the table, but not the user-visible contents.

Also, after we've deleted some rows, an insert/upsert may even change the 
user-visible contents.

 If we need this flag we should come up with a more accurate name.


Line 1297:       return
add comment why we need to skip compute stats


Line 1428:     if table.name + "_original" in set(table.name for table in 
tables):
What does it mean if this is not true? Maybe add a LOG.debug message to explain


Line 1429:       cursor.execute("SHOW CREATE TABLE " + table.name)
Add a note or TODO that this will not work for certain types  of Kudu tables, 
e.g., if they have range partitions.

I assume that the Kudu tables used in our testing have simple hash partitioning 
based on the PK? Sounds like a potential coverage gap. On the other hand, for 
more complex partition schemes, this SHOW CREATE TABLE trick will not work.


Line 1440:   For each table in the database that cursor is connected to, create 
several queries,
Mention the limitations of this function:
1. Only generates DML statements against Kudu tables, and ignores non-Kudu 
tables.
2. Requires that the type of the first column of the primary key is an integer 
type
3. ...


Line 1443:   exists a table with a '_original' suffix that has unmodified, for 
example, tpch data.
that has unmodified -> that is never modified


Line 1450:       continue
LOG.debug what is happening here


Line 1452:     if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
what about Int?


Line 1456:       continue
LOG.debug what is happening here because some of these limitations are not 
really obvious


Line 1481:           "UPDATE a SET {update_list} FROM {table_name} a JOIN 
{table_name}_original b "
This might give you strange results for tables with multiple primary keys. 
Maybe we should explicitly limit this  functionality to tables with a single 
primary-key column?

Seems like a fine limitation for now, but better to be explicit about what 
works as expected, and what probably does not.


Line 1559:       cursor.execute("SHOW CREATE TABLE " + table_name)
same deal with SHOW CREATE TABLE as above, add a note or TODO that this process 
will not work for some Kudu tables


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to