[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Reviewed-on: http://gerrit.cloudera.org:8080/5093 Reviewed-by: Taras BobrovytskyTested-by: Impala Public Jenkins --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 720 insertions(+), 279 deletions(-) Approvals: Impala Public Jenkins: Verified Taras Bobrovytsky: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 12: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/129/ -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#12). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 720 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/12 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 12: Code-Review+2 Carrying the +2. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#12). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 720 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/12 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/11/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1716: " also be used in order reset the databases before running other (non stress) tests" > also be used to reset ... Done -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 11: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/11/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1716: " also be used in order reset the databases before running other (non stress) tests" also be used to reset ... -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 11: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS9, Line 1871: def populate_all_queries(queries): > Sorry, to be clear, although I feel main() is too long, I'm not asking you Done http://gerrit.cloudera.org:8080/#/c/5093/10/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS10, Line 1687: > nit: needs a space Done -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#11). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 720 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/11 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#11). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 720 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/11 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS9, Line 1871: def populate_all_queries(queries): > The code already has methods that do that, like load_random_queries_and_pop Sorry, to be clear, although I feel main() is too long, I'm not asking you to take steps to drastically shorten it. I simply want the nested function made separate to somewhat reduce a main() that is already too long. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS9, Line 1871: if args.reset_databases_before_bin > We would also have to pass in other variables into this function, such as i The code already has methods that do that, like load_random_queries_and_populate_runtime_info(). I still feel this main() method is entirely too long, and adding a long nested function into the middle of it makes readability even worse. Nested functions, closures, etc. are better suited as small declarations that are easily readable. This is difficult to follow. Alex, you've been reviewing this code as well. Do you have any opinion? http://gerrit.cloudera.org:8080/#/c/5093/10/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS10, Line 1687: other(non stress) nit: needs a space -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/5093/9//COMMIT_MSG Commit Message: Line 14: > It would be helpful to have 1-2 examples of a full concurrent_select.py cal Done http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS9, Line 1682: help="If True, databases will be reset to their original state after the binary" : " search.") > On L1971 you say it "may be a good idea" to use this option. I think it mig Done. Improved the help text. PS9, Line 1871: def populate_all_queries(queries): > It's odd for a function of this size to live in a scope of its size. Can yo We would also have to pass in other variables into this function, such as impala, args, runtime_info_path. This seems kind of excessive to me. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#10). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 717 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/10 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/5093/9//COMMIT_MSG Commit Message: Line 14: It would be helpful to have 1-2 examples of a full concurrent_select.py call with DML statement-related options included. http://gerrit.cloudera.org:8080/#/c/5093/9/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS9, Line 1682: help="If True, databases will be reset to their original state after the binary" : " search.") On L1971 you say it "may be a good idea" to use this option. I think it might help more if this help option says that too, or uses even stronger language like "it is suggested to use this option if you plan on running other tests on the same data", or something similar. PS9, Line 1871: def populate_all_queries(queries): It's odd for a function of this size to live in a scope of its size. Can you factor this out into a top level function instead? My main objection to the design is that queries_with_runtime_info_by_db is needed in this function but isn't referenced. The function and parent scope are sufficiently long to make it harder to read. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 691 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#8). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 691 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > In order to calculate %rows, we don't actually need to know how many rows a Ahh yes, you are right. Thanks for clarifying. Not sure what I was thinking. I was clearly wrong. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/7 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > I agree you can achieve the same thing with mod values in principle. Let me In order to calculate %rows, we don't actually need to know how many rows are in the table. %rows = 100 / mod_value. This is why I think it still makes sense to keep the mod_values. Another way to think of mod_value is if mod_value = N, then the DML statement should affect every Nth row. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1471: # TODO: IMPALA-: Add support for tables with multiple primary keys. > fill in real JIRA or just leave the TODO Yes, I'll fill in the Jira. Michael wants every TODO to have a Jira attached. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1471: # TODO: IMPALA-: Add support for tables with multiple primary keys. fill in real JIRA or just leave the TODO -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#6). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " > To me it's not really about validating the results, but more about predicta Done. Skipping creation of DML queries for tables with more than 1 primary key column. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/5/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 745: self.population_order = 0 Much clearer, thanks! -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > Actually a mod value is equivalent to %rows: I agree you can achieve the same thing with mod values in principle. Let me rephrase my points: * Users of this test may not know the number of rows in the target table up front. So before I can begin to run this test, I must first look up the number of rows and then compute the mod values to achieve a desired %rows. * A single mod value does not represent the same %rows for different tables (which could have a different number of rows). My understanding is that we run on multiple test tables with the same mod values. * Just like you said, if users were allowed to specify %rows, the framework could internally translate that into a mod value based on the #rows of the table(s). Seems easier for users. * Further, the "concept" of %rows would still apply even for tables with sparse primary keys, or where there are multiple primary-key columns. The internal mechanism for translating %rows into predicates would be different, of course, but the concept of "mod values" does not seem very intuitive for those cases. We definitely don't need to do this now, but it might be worth recording the above improvement in a JIRA. http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1429: cursor.execute("SHOW CREATE TABLE " + table.name) > Yes, to my knowledge at this time we only use Kudu tables with a simple has Thanks. You can look at AnalyzeDDLTest#TestCreateManagedKuduTable to look at examples of more advanced partitioning schemes. Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " > Maybe it's okay to keep it as is? It can potentially result in many rows ha To me it's not really about validating the results, but more about predictability of the test's behavior. As a user, when I provide a list of mod values as an input I have a certain expectation of the "work" that those translate to. For tables with several primary-key columns this update (and the delete/upsert below) may be modifying far more rows than I expected based on the mod values I gave. Also consider that the join in this update could really blow up. What's the benefit of leaving it as is? I think it would be better to be explicit about the limitations. Adding a check here seems easy enough. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 684 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#5). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 684 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#5). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 676 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/5 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 652 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/4 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS2, Line 1464: insert_query.modifies_table = False > It's possible updatale_column_names is a bad name. In any case, it currentl I think it would be more clear to leave it as is. http://gerrit.cloudera.org:8080/#/c/5093/3/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS3, Line 1461: this will still be still > Some extra words here. Done PS3, Line 1657: quereis > spelling: queries Done -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#4). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 652 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/4 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 652 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 652 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 652 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/3 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Michael Brown has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 2: (26 comments) http://gerrit.cloudera.org:8080/#/c/5093/2//COMMIT_MSG Commit Message: Line 8: It would help to be more expansive in the commit message here. How would someone run DML in the stress test? Do you want to show some useful usage? What assumptions have you made? Etc. PS2, Line 11: - Update impyla version in order to be able to have access to query : error text for DML queries. Did you regression test the Impyla update? We have system tests in Kudu that use it. I can say I smoke tested Impyla 0.14.0 on the random query generator and data generator and it seems OK. PS2, Line 13: - Made flake8 fixes. flake8 on this file is clean. It looks great. Thank you! http://gerrit.cloudera.org:8080/#/c/5093/2/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS2, Line 279: self._select_probability = 0.5 Nit: I think this should default to None, because the interface to run_queries() requires for this to be set anyway. I think setting this to None would find bugs where we missed setting it to a valid value. With defaulting to 0.5, you hide that. PS2, Line 331:queries have completed. 'select_probabilty' 1. spelling: probability 2. Please mention valid values/types for select_probability 3. Please document verify_results PS2, Line 725: # set_up_sql accoplishes this task. spelling: accomplishes PS2, Line 735: # If we run this query on a table in initial state, the table remains unchanged if : # this is False. (For example running a query like : # "upsert into lineitem select * from lineitem_original" leaves lineitem unmodified if : # it is in original state.) : self.modifies_table = False Be a little more explicit here. This claim presumes lineitem's data was already copied into lineitem_original, right? Out of that context, this is confusing as is the below where Insert and Upsert have modifies_table set to False. PS2, Line 740: # Type of query. Can have the following values: SELECT, COMPUTE_STATS, INSERT, UPDATE, : # UPSERT, DELETE. : self.query_type = 'SELECT' Non-blocking suggestion: instead of raw strings create a class that has these as enumerated values. class QueryType(object) SELECT, COMPUTE_STATS, ... = xrange(6) The reason for this is that if you mistype one of the identifiers, Python will fail with a NameError and find your bug. If you mistype a raw string, Python will just run the string comparison anyway. It lets bugs be found more directly PS2, Line 794: if run_set_up and query.set_up_sql: If only one of these is true, is that a programming error? Do we need to assert if that's the case? PS2, Line 1061: # TODO: It would be good if TODOs were bound to Jiras. Can you file one, please? PS2, Line 1411: _orginal spelling: original PS2, Line 1417: set(table.name for table in tables): What is the purpose of creating a set here? PS2, Line 1427: """Generate insert, upsert, update, delete DML statements. : : For each table in the database that cursor is connected to, create several queries, : one for each mod value in 'dml_mod_values'. This value controls which rows will be : affected. The generated queries assume that for each table in the database, there : exists a table with a '_original' suffix that has unmodified, for example, tpch data. : """ Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left. Details: I'm not a fan of this form of docstring for OCD aesthetic reasons, and I see high amount of it in tests/comparison, so it's on my mind lately. I went to look at PEP-257 and saw that the valid multiline forms are: """first line [required empty line that Gerrit does not preserve] other lines """ or """ first line [required empty line that Gerrit does not preserve] other lines """ The reason this is so is that the way object.__doc__ gets rendered depends on line number. For all lines in a docstring except the first, leading space relative to the indentation is preserved. So the way it's done in much of our Python code base is wrong: >>> def function(): ... """This is my docstring ... ... on multiple lines ... """ ... pass ... >>> print function.__doc__ This is my docstring on multiple lines >>> https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation PS2, Line 1434: tables = [cursor.describe_table(t) for t in cursor.list_table_names()] Save an indent level after L1437 and change this expression to only include describe(t) if t.endswith.("_original)
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 564 insertions(+), 206 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 564 insertions(+), 206 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 564 insertions(+), 206 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/2 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky