Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
......................................................................


Patch Set 2:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/5093/2//COMMIT_MSG
Commit Message:

Line 8: 
> It would help to be more expansive in the commit message here. How would so
Done


PS2, Line 11: - Update impyla version in order to be able to have access to 
query
            :   error text for DML queries.
> Did you regression test the Impyla update? We have system tests in Kudu tha
I plan to send this through GVO, so it will be tested there.


http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS2, Line 279:     self._select_probability = 0.5
> Nit: I think this should default to None, because the interface to run_quer
Done


PS2, Line 331:        queries have completed. 'select_probabilty' 
> 1. spelling: probability
Done


PS2, Line 725:     # set_up_sql accoplishes this task.
> spelling: accomplishes
Done


PS2, Line 735:     # If we run this query on a table in initial state, the 
table remains unchanged if
             :     # this is False. (For example running a query like
             :     # "upsert into lineitem select * from lineitem_original" 
leaves lineitem unmodified if
             :     # it is in original state.)
             :     self.modifies_table = False
> Be a little more explicit here. This claim presumes lineitem's data was alr
Yes, insert and upsert should have modifies_table set to false. After insert 
and upsert queries are executed, the table will remain unmodified.

Imagine we have 2 identical tables with identical data: lineitem and 
lineitem_original.

Running a query like:
upsert into lineitem select * from lineitem original.

This query does not change the table at all. After the query is executed, the 
table is identical to how it was before the query.


PS2, Line 740:     # Type of query. Can have the following values: SELECT, 
COMPUTE_STATS, INSERT, UPDATE,
             :     # UPSERT, DELETE.
             :     self.query_type = 'SELECT'
> Non-blocking suggestion: instead of raw strings create a class that has the
Good suggestion. Done.


PS2, Line 794:         if run_set_up and query.set_up_sql:
> If only one of these is true, is that a programming error? Do we need to as
No, we only run set up during the binary search phase. Added comment to the 
docstring.


PS2, Line 1061:   # TODO:
> It would be good if TODOs were bound to Jiras. Can you file one, please?
Done


PS2, Line 1411: _orginal
> spelling: original
Done


PS2, Line 1417: set(table.name for table in tables):
> What is the purpose of creating a set here?
For performance, it's faster to check if an element exists in a set than a 
list. It doesn't really matter though since the list of table names is going to 
be small anyways. Should I leave it as is?


PS2, Line 1427:   """Generate insert, upsert, update, delete DML statements.
              : 
              :      For each table in the database that cursor is connected 
to, create several queries,
              :      one for each mod value in 'dml_mod_values'. This value 
controls which rows will be
              :      affected. The generated queries assume that for each table 
in the database, there
              :      exists a table with a '_original' suffix that has 
unmodified, for example, tpch data.
              :   """
> Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left.
I agree, I don't like this type of indentation as well. Done.


PS2, Line 1434:   tables = [cursor.describe_table(t) for t in 
cursor.list_table_names()]
> Save an indent level after L1437 and change this expression to only include
Done


PS2, Line 1443: numberical
> just say integer, since decimals etc. are also numerical
Done


PS2, Line 1449:         insert_query.modifies_table = False
> Add a comment explaining why this is False. It seems to me an INSERT absolu
Yes an insert can modify the table, but not if the table is in the original 
state. If let's say lineitem and lineitem_original are identical, then a query  
like insert into table lineitem select * from lineitem_original Where x should 
remain lineitem unchanged.


PS2, Line 1464:             for col in table.cols if not col.is_primary_key
> You could use table.updatable_column_names instead.
Not sure, is it always going to be the case that a column is updatable if and 
only if it's not a primary key?


PS2, Line 1466:             "UPDATE a SET {3} FROM {0} a JOIN {0}_original b "
              :             "ON a.{1} = b.{1} + 1 WHERE a.{1} % {2} = 
0").format(
              :                 table.name, primary_key.name, mod_value, 
update_list)
> fairly complex format string. Please use keys in places of numbers.  You ca
Done


PS2, Line 1471:         upsert_query.modifies_table = False
> Add a comment explaining why this is False. It seems to me an UPSERT absolu
Done


PS2, Line 1505:   return result
> Should you assert that len(result) > 0 ? The continues on L1439 and L1445, 
Done


PS2, Line 1541:       search_pattern = r"\.(.*) \("
> It would be safer to include "CREATE TABLE" in this pattern.
Done


PS2, Line 1613:   parser.add_argument(
              :       "--mem-limit-padding-abs", type=int, default=0,
              :       help="Pad query mem limits found by solo execution with 
this value (in megabytes)"
              :       " running concurrently. After padding queries will not be 
expected to fail"
              :       " due to mem limit exceeded.")
> Please document a scenario in which someone would want to use this.
It's hard to say when someone would want to use this. It's just a knob that can 
be adjusted if necessary. I updated the comment a little.


PS2, Line 1623:       "--reset-database"
> Nit; this resets all databases, so it should be "--reset-databases"
Done


PS2, Line 1628:   parser.add_argument(
              :       "--dml-mod-values", nargs="+", type=int, default=[11],
              :       help="List of mod values to use for the DML queries.")
> Please provide guidance on what this means and how someone would use it.
Done


PS2, Line 1811:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
> Won't this reset *all* databases, not just ones on which you want to run DM
Yes, but this might be the behavior we desire. For example, we just finished a 
DML run on the tpch database. Now we want to run a standard job with only 
select queries + correctness verification). We need to reset the database 
before starting the run.


PS2, Line 1898:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
> Can you add a comment describing why this is called again?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to