Hello Bharath Vissapragada, Michael Ho, Quanlong Huang, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14012 to look at the new patch set (#3). Change subject: IMPALA-4551: Limit the size of SQL statements ...................................................................... IMPALA-4551: Limit the size of SQL statements Various BI tools generate and run SQL. When used incorrectly or misconfigured, the tools can generate extremely large SQLs. Some of these SQL statements reach 10s of megabytes. Large SQL statements impose costs throughout execution, including statement rewrite logic in the frontend and codegen in the backend. The resource usage of these statements can impact the stability of the system or the ability to run other SQL statements. This implements two new query options that provide controls to reject large SQL statements. - The first, MAX_STATEMENT_LENGTH_BYTES is a cap on the total size of the SQL statement (in bytes). It is applied before any parsing or analysis. It uses a default value of 16MB. - The second, STATEMENT_EXPRESSION_LIMIT, is a limit on the total number of expressions in a statement or any views that it references. The limit is applied upon the first round of analysis, but it is not reapplied when statement rewrite rules are applied. Certain expressions such as literals in IN lists or VALUES clauses are not analyzed and do not count towards the limit. It uses a default value of 250,000. The two are complementary. Since enforcing the statement expression limit requires parsing and analyzing the statement, the MAX_STATEMENT_LENGTH_BYTES sets an upper bound on the size of statement that needs to be parsed and analyzed. Testing confirms that even statements approaching 16MB get through the first round of analysis within a few seconds and then are rejected. This also changes the logging in tests/common/impala_connection.py to limit the total SQL size that it will print to 128KB. This is prevents the JUnitXML (which includes this logging) from being too large. Existing tests do not run SQL larger than about 80KB, so this only applies to tests added in this change that run multi-MB SQLs to verify limits. Testing: - This adds frontend tests that verify the low level semantics about how expressions are counted and verifies that the expression limits are enforced. - This adds end-to-end tests that verify both the MAX_STATEMENT_LENGTH_BYTES and STATEMENT_EXPRESSION_LIMIT at their defaults values. - There is also an end-to-end test that runs in exhaustive mode that runs a SQL with close to 250,000 expressions. Change-Id: I5675fb4a08c1dc51ae5bcf467cbb969cc064602c --- M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M tests/common/impala_connection.py M tests/query_test/test_exprs.py 13 files changed, 377 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/14012/3 -- To view, visit http://gerrit.cloudera.org:8080/14012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5675fb4a08c1dc51ae5bcf467cbb969cc064602c Gerrit-Change-Number: 14012 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>