[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-12-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 680: # Query to run before worst_case_sql to get the worst case 
performance
> What's worst_case_sql? Can you give a little more flavor to the comment? I'
I updated the patch significantly (based on our discussion), this is no longer 
here.


Line 688: self.options = {}
> needs a comment
Done


Line 691: self.modifies_table = False
> let's introduce a stmt_type enum that can have values like SELECT/INSERT/UP
Done


Line 694: self.MIN_INT = 10
> var names are not very telling, what are these numbers?
This not relevant any more. Removed.


PS1, Line 695: self.MIN_INT = 20
> You've defined self.MIN_INT twice.
Yeah, like I said it was a rough patch. These numbers are not needed any more.


PS1, Line 698:   def sql(self):
> This whole set of methods could use docstrings to explain to someone roughl
This method was removed.


PS1, Line 701:   return str(randint(10, 20))
> I would like for us to make our randomized tests more repeatable. To that e
After talking to Alex, I removed any randomness from the Query class. Now 1 
query object always represents a single sql statement.


PS1, Line 713: result = re.sub(pattern, "10", self._sql)
> Was the hard coding of 10 here meant to actually be self.MIN_INT?
Yes exactly, that was the intention. However, this is no longer relevant.


Line 832:   raise
> seems like we intentionally did not raise before?
this was a mistake, removed.


Line 942:   LOG.debug("Result set empty for query with id %s",
> Another possible use of stmt_type
No longer relevant.


Line 1239: try:
> we can use the stmt_type here to only run explain for those stmts that supp
Done


Line 1356: def clean_up_database(cursor):
> reset_database?
Done


Line 1357:   LOG.info("Cleaning up {0} database".format(cursor.db_name))
> needs comment
Added method docstring.


Line 1360: if not table.name.endswith("_original"):
> seems simpler to move the modified tables into a new database, then you don
I am not sure if it's simpler. It seems to me it's simpler to have everything 
in a single database. We can maybe discuss this offline.


Line 1371:   if len(table.primary_keys) < 1:
> add comment: only run for Kudu tables which must have a primary key
Done


Line 1373:   # For now we will not handle the case in a special way where 
several columns are a
> Why? Seems easy enough to do.
I am not really sure if it's necessary to handle this in some special way. Do 
you have any suggestions?


Line 1376:   if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
> Why this restriction?
We want to be able to apply the "modulo operation". Added comment. Added 
comment.


PS1, Line 1379:   query.modifies_table = False
> Why is this False?
It's relative to the original state. For example, if we run a query  If we have 
an unmodified lineitem item table, then we run an upsert query on it, it should 
remain unchanged. Added comment in class.


Line 1383:   if table.name + "_original" in set(table.name for table in 
tables):
> What's the motivation for this? We might be able to achieve the same thing 
As discussed in person, I agree that 1 Query object should represent 1 query. 
Fixed.


PS1, Line 1392: def generate_delete_queries(cursor):
> Lots of duplicated logic here and in generate_upsert_queries(). I think add
Rewrote to eliminate code duplication.


Line 1433:   drop_query.sql = "DROP STATS {0}".format(table.name)
> why drop? we're not checking the results anyway
Done


Line 1438:   compute_query.name = "compute_stats_{0}".format(table.name)
> can you add the mt_dop option as part of the name?
Done


PS1, Line 1664: if query._sql in 
queries_with_runtime_info_by_db_and_sql[query.db_name]:
> It doesn't seem right to be examining a private attribute from the outside.
Yeah, I agree, this was a rough patch. Fixed now.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..

Preview: IMPALA-4467: Add support for CRUD operations in stress test

NOTE: This is a preview, I'm only looking for high level comments about
the overall design of the feature.

- Added support for upsert and delete statements. Update and insert
  still need to be done.
- Added support for compute stats.
- Added support for query options to queries.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M tests/stress/concurrent_select.py
M tests/util/calculation_util.py
2 files changed, 254 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-18 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS1, Line 695: self.MIN_INT = 20
You've defined self.MIN_INT twice.


PS1, Line 698:   def sql(self):
This whole set of methods could use docstrings to explain to someone roughly 
how the interface works. It's more of a burden on the reader to understand the 
interface by reading how it is called.


PS1, Line 701:   return str(randint(10, 20))
I would like for us to make our randomized tests more repeatable. To that end, 
I think we should, at least in new code, do the following:

- provide a seed command line option in all entry points. This could probably 
go into cli_options since I'd like to use this for the random query generator 
in new code

- if the seed needs to be "randomly chosen" for more chaos, or is seeded based 
on the timer, that's fine. However we should print the seed chosen so the 
queries can be repeated.

- no longer use the module-level functions and instead initialize a 
random.Random() object. See https://docs.python.org/2/library/random.html.


PS1, Line 713: result = re.sub(pattern, "10", self._sql)
Was the hard coding of 10 here meant to actually be self.MIN_INT?


PS1, Line 1379:   query.modifies_table = False
Why is this False?


PS1, Line 1392: def generate_delete_queries(cursor):
Lots of duplicated logic here and in generate_upsert_queries(). I think 
addressing Alex's point about about breaking abstractions should also consider 
this: the code is mostly the same except for the "setup" SQL that's needed 
before.


PS1, Line 1664: if query._sql in 
queries_with_runtime_info_by_db_and_sql[query.db_name]:
It doesn't seem right to be examining a private attribute from the outside. We 
should use the public interface and the public interface should be providing 
what callers need.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..


Patch Set 1:

(16 comments)

Looks like it's going in the right direction.

Most comments are questions.

Might be good to outline the flow of a new Kudu stress run so it's easier to 
follow for reviewers. Maybe in a top-level comment inside concurrent_select.py

http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 680: # Query to run before worst_case_sql to get the worst case 
performance
What's worst_case_sql? Can you give a little more flavor to the comment? I'm 
not really sure why this setup query is needed. Also consider that the setup 
query itself may fail/crash which might complicate things.


Line 688: self.options = {}
needs a comment


Line 691: self.modifies_table = False
let's introduce a stmt_type enum that can have values like 
SELECT/INSERT/UPSERT/DELETE/COMPUTE_STATS etc.


Line 694: self.MIN_INT = 10
var names are not very telling, what are these numbers?


Line 832:   raise
seems like we intentionally did not raise before?


Line 942:   LOG.debug("Result set empty for query with id %s",
Another possible use of stmt_type


Line 1239: try:
we can use the stmt_type here to only run explain for those stmts that support 
it (i.e. anything except compute stats).


Line 1356: def clean_up_database(cursor):
reset_database?

"clean up" sounds like DROP DATABASE CASCADE


Line 1357:   LOG.info("Cleaning up {0} database".format(cursor.db_name))
needs comment


Line 1360: if not table.name.endswith("_original"):
seems simpler to move the modified tables into a new database, then you don't 
need to check "_original" everywhere


Line 1371:   if len(table.primary_keys) < 1:
add comment: only run for Kudu tables which must have a primary key


Line 1373:   # For now we will not handle the case in a special way where 
several columns are a
Why? Seems easy enough to do.


Line 1376:   if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
Why this restriction?


Line 1383:   if table.name + "_original" in set(table.name for table in 
tables):
What's the motivation for this? We might be able to achieve the same thing in a 
different way. Having a query represent two queries seems somewhat abstraction 
breaking.

Also, for UPSERT it's ok of the keys already exist. I think wen to explicitly 
also test the path where PKs already exist.


Line 1433:   drop_query.sql = "DROP STATS {0}".format(table.name)
why drop? we're not checking the results anyway


Line 1438:   compute_query.name = "compute_stats_{0}".format(table.name)
can you add the mt_dop option as part of the name?


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5093

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..

Preview: IMPALA-4467: Add support for CRUD operations in stress test

NOTE: This is a preview, I'm only looking for high level comments about
the overall design of the feature.

- Added support for upsert and delete statements. Update and insert
  still need to be done.
- Added support for compute stats.
- Added support for query options to queries.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M tests/stress/concurrent_select.py
M tests/util/calculation_util.py
2 files changed, 257 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5092

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..

Preview: IMPALA-4467: Add support for CRUD operations in stress test

NOTE: This is a preview, I'm only looking for high level comments about
the overall design of the feature.

- Added support for upsert and delete statements. Update and insert
  still need to be done.
- Added support for compute stats.
- Added support for query options to queries.

Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e
---
M tests/stress/concurrent_select.py
M tests/util/calculation_util.py
2 files changed, 248 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5092/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test

2016-11-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has abandoned this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress 
test
..


Abandoned

Pushed the wrong code review

-- 
To view, visit http://gerrit.cloudera.org:8080/5092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I4efd32e938d4d825840f43b747203f3cba2a3c2e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky