[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Reviewed-on: http://gerrit.cloudera.org:8080/5093
Reviewed-by: Taras Bobrovytsky 
Tested-by: Impala Public Jenkins
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 720 insertions(+), 279 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Taras Bobrovytsky: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 12: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/129/

-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#12).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 720 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 12: Code-Review+2

Carrying the +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#12).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 720 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/11/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1716:   " also be used in order reset the databases before running 
other (non stress) tests"
> also be used to reset ...
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 11: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/11/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1716:   " also be used in order reset the databases before running 
other (non stress) tests"
also be used to reset ...


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 11: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS9, Line 1871:   def populate_all_queries(queries):
> Sorry, to be clear, although I feel main() is too long, I'm not asking you 
Done


http://gerrit.cloudera.org:8080/#/c/5093/10/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS10, Line 1687: 
> nit: needs a space
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#11).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 720 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#11).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 720 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS9, Line 1871:   def populate_all_queries(queries):
> The code already has methods that do that, like load_random_queries_and_pop
Sorry, to be clear, although I feel main() is too long, I'm not asking you to 
take steps to drastically shorten it. I simply want the nested function made 
separate to somewhat reduce a main() that is already too long.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-19 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS9, Line 1871:   if args.reset_databases_before_bin
> We would also have to pass in other variables into this function, such as i
The code already has methods that do that, like 
load_random_queries_and_populate_runtime_info(). I still feel this main() 
method is entirely too long, and adding a long nested function into the middle 
of it makes readability even worse. Nested functions, closures, etc. are better 
suited as small declarations that are easily readable. This is difficult to 
follow.

Alex, you've been reviewing this code as well. Do you have any opinion?


http://gerrit.cloudera.org:8080/#/c/5093/10/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS10, Line 1687: other(non stress)
nit: needs a space


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5093/9//COMMIT_MSG
Commit Message:

Line 14: 
> It would be helpful to have 1-2 examples of a full concurrent_select.py cal
Done


http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS9, Line 1682:   help="If True, databases will be reset to their original 
state after the binary"
  :   " search.")
> On L1971 you say it "may be a good idea" to use this option. I think it mig
Done. Improved the help text.


PS9, Line 1871:   def populate_all_queries(queries):
> It's odd for a function of this size to live in a scope of its size. Can yo
We would also have to pass in other variables into this function, such as 
impala, args, runtime_info_path. This seems kind of excessive to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-16 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#10).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Here's an example of a full call with the new options that runs the
stress test on the local mini cluster:
./concurrent_select.py \
--tpch-kudu-db=tpch_kudu \
--generate-dml-queries \
--dml-mod-values 11 13 17 \
--generate-compute-stats-queries \
--select-probability=0.5 \
--mem-limit-padding-pct=25 \
--mem-limit-padding-abs=50 \
--reset-databases-before-binary-search \
--reset-databases-after-binary-search

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 717 insertions(+), 279 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-16 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5093/9//COMMIT_MSG
Commit Message:

Line 14: 
It would be helpful to have 1-2 examples of a full concurrent_select.py call 
with DML statement-related options included.


http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS9, Line 1682:   help="If True, databases will be reset to their original 
state after the binary"
  :   " search.")
On L1971 you say it "may be a good idea" to use this option. I think it might 
help more if this help option says that too, or uses even stronger language 
like "it is suggested to use this option if you plan on running other tests on 
the same data", or something similar.


PS9, Line 1871:   def populate_all_queries(queries):
It's odd for a function of this size to live in a scope of its size. Can you 
factor this out into a top level function instead? My main objection to the 
design is that queries_with_runtime_info_by_db is needed in this function but 
isn't referenced. The function and parent scope are sufficiently long to make 
it harder to read.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 691 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#8).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 691 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
> In order to calculate %rows, we don't actually need to know how many rows a
Ahh yes, you are right. Thanks for clarifying. Not sure what I was thinking. I 
was clearly wrong.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 689 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
> I agree you can achieve the same thing with mod values in principle. Let me
In order to calculate %rows, we don't actually need to know how many rows are 
in the table. %rows = 100 / mod_value. This is why I think it still makes sense 
to keep the mod_values. Another way to think of mod_value is if mod_value = N, 
then the DML statement should affect every Nth row.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1471:   # TODO: IMPALA-: Add support for tables with multiple 
primary keys.
> fill in real JIRA or just leave the TODO
Yes, I'll fill in the Jira. Michael wants every TODO to have a Jira attached.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1471:   # TODO: IMPALA-: Add support for tables with multiple 
primary keys.
fill in real JIRA or just leave the TODO


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 689 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#6).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 689 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1481:   "UPDATE a SET {update_list} FROM {table_name} a JOIN 
{table_name}_original b "
> To me it's not really about validating the results, but more about predicta
Done. Skipping creation of DML queries for tables with more than 1 primary key 
column.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5093/5/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 745: self.population_order = 0
Much clearer, thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
> Actually a mod value is equivalent to %rows:
I agree you can achieve the same thing with mod values in principle. Let me 
rephrase my points:
* Users of this test may not know the number of rows in the target table up 
front. So before I can begin to run this test, I must first look up the number 
of rows and then compute the mod values to achieve a desired %rows.
* A single mod value does not represent the same %rows for different tables 
(which could have a different number of rows). My understanding is that we run 
on multiple test tables with the same mod values.
* Just like you said, if users were allowed to specify %rows, the framework 
could internally translate that into a mod value based on the #rows of the 
table(s). Seems easier for users.
* Further, the "concept" of %rows would still apply even for tables with sparse 
primary keys, or where there are multiple primary-key columns. The internal 
mechanism for translating %rows into predicates would be different, of course, 
but the concept of "mod values" does not seem very intuitive for those cases.

We definitely don't need to do this now, but it might be worth recording the 
above improvement in a JIRA.


http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1429:   cursor.execute("SHOW CREATE TABLE " + table.name)
> Yes, to my knowledge at this time we only use Kudu tables with a simple has
Thanks.

You can look at AnalyzeDDLTest#TestCreateManagedKuduTable to look at examples 
of more advanced partitioning schemes.


Line 1481:   "UPDATE a SET {update_list} FROM {table_name} a JOIN 
{table_name}_original b "
> Maybe it's okay to keep it as is? It can potentially result in many rows ha
To me it's not really about validating the results, but more about 
predictability of the test's behavior. As a user, when I provide a list of mod 
values as an input I have a certain expectation of the "work" that those 
translate to. For tables with several primary-key columns this update (and the 
delete/upsert below) may be modifying far more rows than I expected based on 
the mod values I gave. Also consider that the join in this update could really 
blow up.

What's the benefit of leaving it as is?

I think it would be better to be explicit about the limitations. Adding a check 
here seems easy enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-13 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 684 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-13 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#5).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 684 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-13 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#5).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 676 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-09 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 652 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS2, Line 1464:   insert_query.modifies_table = False
> It's possible updatale_column_names is a bad name. In any case, it currentl
I think it would be more clear to leave it as is.


http://gerrit.cloudera.org:8080/#/c/5093/3/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS3, Line 1461: this will still be still
> Some extra words here.
Done


PS3, Line 1657: quereis
> spelling: queries
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-09 Thread Taras Bobrovytsky (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5093

to look at the new patch set (#4).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 652 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-08 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 652 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-08 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 652 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-08 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

For every Kudu table in the databases, we make a copy and add a
'_original' suffix to the table name. The DML queries will only make
modifications to the non original table, the original table will never
be modified. The orignal tables could be used to bring the non-original
table to the inital state. Two flags were added for doing this:
--reset-databases-before-binary-search and
--reset-databases-after-binary-search.

The DML queries are generated based on the mod values passed in with the
following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
queries are generated. The DML operations will touch table rows where
primary_key % mod_value = 0. So, the larger the mod value, the more rows
would be affected. The DML queries are generated in such a way that the
data for the insert, upsert, and update queries is taken from the table
with the _original suffix. The stress test generates DML queries for
only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu
--tpch-db=tpch_100 --generate-dml-queries would only generate queries
for the tpch_100_kudu database.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 652 insertions(+), 262 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-06 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
..


Patch Set 2:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/5093/2//COMMIT_MSG
Commit Message:

Line 8: 
It would help to be more expansive in the commit message here. How would 
someone run DML in the stress test? Do you want to show some useful usage? What 
assumptions have you made? Etc.


PS2, Line 11: - Update impyla version in order to be able to have access to 
query
:   error text for DML queries.
Did you regression test the Impyla update? We have system tests in Kudu that 
use it. I can say I smoke tested Impyla 0.14.0 on the random query generator 
and data generator and it seems OK.


PS2, Line 13: - Made flake8 fixes. flake8 on this file is clean.
It looks great. Thank you!


http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS2, Line 279: self._select_probability = 0.5
Nit: I think this should default to None, because the interface to 
run_queries() requires for this to be set anyway. I think setting this to None 
would find bugs where we missed setting it to a valid value. With defaulting to 
0.5, you hide that.


PS2, Line 331:queries have completed. 'select_probabilty' 
1. spelling: probability

2. Please mention valid values/types for select_probability

3. Please document verify_results


PS2, Line 725: # set_up_sql accoplishes this task.
spelling: accomplishes


PS2, Line 735: # If we run this query on a table in initial state, the 
table remains unchanged if
 : # this is False. (For example running a query like
 : # "upsert into lineitem select * from lineitem_original" 
leaves lineitem unmodified if
 : # it is in original state.)
 : self.modifies_table = False
Be a little more explicit here. This claim presumes lineitem's data was already 
copied into lineitem_original, right? Out of that context, this is confusing as 
is the below where Insert and Upsert have modifies_table set to False.


PS2, Line 740: # Type of query. Can have the following values: SELECT, 
COMPUTE_STATS, INSERT, UPDATE,
 : # UPSERT, DELETE.
 : self.query_type = 'SELECT'
Non-blocking suggestion: instead of raw strings create a class that has these 
as enumerated values.

class QueryType(object)
  SELECT, COMPUTE_STATS, ... = xrange(6)

The reason for this is that if you mistype one of the identifiers, Python will 
fail with a NameError and find your bug. If you mistype a raw string, Python 
will just run the string comparison anyway. It lets bugs be found more directly


PS2, Line 794: if run_set_up and query.set_up_sql:
If only one of these is true, is that a programming error? Do we need to assert 
if that's the case?


PS2, Line 1061:   # TODO:
It would be good if TODOs were bound to Jiras. Can you file one, please?


PS2, Line 1411: _orginal
spelling: original


PS2, Line 1417: set(table.name for table in tables):
What is the purpose of creating a set here?


PS2, Line 1427:   """Generate insert, upsert, update, delete DML statements.
  : 
  :  For each table in the database that cursor is connected 
to, create several queries,
  :  one for each mod value in 'dml_mod_values'. This value 
controls which rows will be
  :  affected. The generated queries assume that for each table 
in the database, there
  :  exists a table with a '_original' suffix that has 
unmodified, for example, tpch data.
  :   """
Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left.

Details:

I'm not a fan of this form of docstring for OCD aesthetic reasons, and I see 
high amount of it in tests/comparison, so it's on my mind lately. I went to 
look at PEP-257 and saw that the valid multiline forms are:

  """first line
  [required empty line that Gerrit does not preserve]
  other lines
  """

or

  """
  first line
  [required empty line that Gerrit does not preserve]
  other lines
  """

The reason this is so is that the way object.__doc__ gets rendered depends on 
line number. For all lines in a docstring except the first, leading space 
relative to the indentation is preserved.

So the way it's done in much of our Python code base is wrong:

  >>> def function():
  ...   """This is my docstring
  ...
  ...  on multiple lines
  ...   """
  ...   pass
  ...
  >>> print function.__doc__
  This is my docstring

   on multiple lines

  >>>

https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation


PS2, Line 1434:   tables = [cursor.describe_table(t) for t in 
cursor.list_table_names()]
Save an indent level after L1437 and change this expression to only include 
describe(t) if t.endswith.("_original)



[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 564 insertions(+), 206 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 564 insertions(+), 206 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

2016-12-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4467: Add support for DML statements in stress test
..

IMPALA-4467: Add support for DML statements in stress test

- Add support for insert, upsert, update and and delete statements.
- Add support for compute stats with mt_dop query options.
- Update impyla version in order to be able to have access to query
  error text for DML queries.
- Made flake8 fixes. flake8 on this file is clean.

Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
---
M infra/python/deps/requirements.txt
M tests/stress/concurrent_select.py
M tests/util/parse_util.py
3 files changed, 564 insertions(+), 206 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky