[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

Michael Brown (Code Review) Tue, 06 Dec 2016 17:40:12 -0800

Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
......................................................................



Patch Set 2:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/5093/2//COMMIT_MSG
Commit Message:

Line 8: 
It would help to be more expansive in the commit message here. How would 
someone run DML in the stress test? Do you want to show some useful usage? What 
assumptions have you made? Etc.


PS2, Line 11: - Update impyla version in order to be able to have access to 
query
            :   error text for DML queries.
Did you regression test the Impyla update? We have system tests in Kudu that 
use it. I can say I smoke tested Impyla 0.14.0 on the random query generator 
and data generator and it seems OK.


PS2, Line 13: - Made flake8 fixes. flake8 on this file is clean.
It looks great. Thank you!


http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS2, Line 279:     self._select_probability = 0.5
Nit: I think this should default to None, because the interface to 
run_queries() requires for this to be set anyway. I think setting this to None 
would find bugs where we missed setting it to a valid value. With defaulting to 
0.5, you hide that.


PS2, Line 331:        queries have completed. 'select_probabilty' 
1. spelling: probability

2. Please mention valid values/types for select_probability

3. Please document verify_results


PS2, Line 725:     # set_up_sql accoplishes this task.
spelling: accomplishes


PS2, Line 735:     # If we run this query on a table in initial state, the 
table remains unchanged if
             :     # this is False. (For example running a query like
             :     # "upsert into lineitem select * from lineitem_original" 
leaves lineitem unmodified if
             :     # it is in original state.)
             :     self.modifies_table = False
Be a little more explicit here. This claim presumes lineitem's data was already 
copied into lineitem_original, right? Out of that context, this is confusing as 
is the below where Insert and Upsert have modifies_table set to False.


PS2, Line 740:     # Type of query. Can have the following values: SELECT, 
COMPUTE_STATS, INSERT, UPDATE,
             :     # UPSERT, DELETE.
             :     self.query_type = 'SELECT'
Non-blocking suggestion: instead of raw strings create a class that has these 
as enumerated values.

class QueryType(object)
  SELECT, COMPUTE_STATS, ... = xrange(6)

The reason for this is that if you mistype one of the identifiers, Python will 
fail with a NameError and find your bug. If you mistype a raw string, Python 
will just run the string comparison anyway. It lets bugs be found more directly


PS2, Line 794:         if run_set_up and query.set_up_sql:
If only one of these is true, is that a programming error? Do we need to assert 
if that's the case?


PS2, Line 1061:   # TODO:
It would be good if TODOs were bound to Jiras. Can you file one, please?


PS2, Line 1411: _orginal
spelling: original


PS2, Line 1417: set(table.name for table in tables):
What is the purpose of creating a set here?


PS2, Line 1427:   """Generate insert, upsert, update, delete DML statements.
              : 
              :      For each table in the database that cursor is connected 
to, create several queries,
              :      one for each mod value in 'dml_mod_values'. This value 
controls which rows will be
              :      affected. The generated queries assume that for each table 
in the database, there
              :      exists a table with a '_original' suffix that has 
unmodified, for example, tpch data.
              :   """
Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left.

Details:

I'm not a fan of this form of docstring for OCD aesthetic reasons, and I see 
high amount of it in tests/comparison, so it's on my mind lately. I went to 
look at PEP-257 and saw that the valid multiline forms are:

  """first line
  [required empty line that Gerrit does not preserve]
  other lines
  """

or

  """
  first line
  [required empty line that Gerrit does not preserve]
  other lines
  """

The reason this is so is that the way object.__doc__ gets rendered depends on 
line number. For all lines in a docstring except the first, leading space 
relative to the indentation is preserved.

So the way it's done in much of our Python code base is wrong:

  >>> def function():
  ...   """This is my docstring
  ...
  ...      ....on multiple lines
  ...   """
  ...   pass
  ...
  >>> print function.__doc__
  This is my docstring

       ....on multiple lines

  >>>

https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation


PS2, Line 1434:   tables = [cursor.describe_table(t) for t in 
cursor.list_table_names()]
Save an indent level after L1437 and change this expression to only include 
describe(t) if t.endswith.("_original)


PS2, Line 1443: numberical
just say integer, since decimals etc. are also numerical


PS2, Line 1449:         insert_query.modifies_table = False
Add a comment explaining why this is False. It seems to me an INSERT absolutely 
can modify a table.


PS2, Line 1464:             for col in table.cols if not col.is_primary_key
You could use table.updatable_column_names instead.


PS2, Line 1466:             "UPDATE a SET {3} FROM {0} a JOIN {0}_original b "
              :             "ON a.{1} = b.{1} + 1 WHERE a.{1} % {2} = 
0").format(
              :                 table.name, primary_key.name, mod_value, 
update_list)
fairly complex format string. Please use keys in places of numbers.  You can do 
"{key}".format(key=value)


PS2, Line 1471:         upsert_query.modifies_table = False
Add a comment explaining why this is False. It seems to me an UPSERT absolutely 
can modify a table.


PS2, Line 1505:   return result
Should you assert that len(result) > 0 ? The continues on L1439 and L1445, plus 
the input data, could result in a case where no DML statements get added. It 
would be a shame to silently run the test. Presumably the caller wanted to run 
DML statements.


PS2, Line 1541:       search_pattern = r"\.(.*) \("
It would be safer to include "CREATE TABLE" in this pattern.


PS2, Line 1613:   parser.add_argument(
              :       "--mem-limit-padding-abs", type=int, default=0,
              :       help="Pad query mem limits found by solo execution with 
this value (in megabytes)"
              :       " running concurrently. After padding queries will not be 
expected to fail"
              :       " due to mem limit exceeded.")
Please document a scenario in which someone would want to use this.


PS2, Line 1623:       "--reset-database"
Nit; this resets all databases, so it should be "--reset-databases"


PS2, Line 1628:   parser.add_argument(
              :       "--dml-mod-values", nargs="+", type=int, default=[11],
              :       help="List of mod values to use for the DML queries.")
Please provide guidance on what this means and how someone would use it.


PS2, Line 1811:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
Won't this reset *all* databases, not just ones on which you want to run DML 
against?


PS2, Line 1898:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
Can you add a comment describing why this is called again?


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

Reply via email to