Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/24018 )
Change subject: IMPALA-14768: Add the operation type to the lineage graph ...................................................................... IMPALA-14768: Add the operation type to the lineage graph This patch makes Impala produce the operation type of the completed query in the corresponding lineage event so that it would be easier for data lineage tools like Apache Atlas to derive the operation type of a given query. Note that currently Apache Atlas determines the operation type of a given Impala query by matching the field of 'queryText' in the lineage event against predefined regular expressions. Refer to https://github.com/apache/atlas/blob/2957ff2/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L49-L77 for more details. However, such an approach is not robust. Recall that the string in 'queryText' is produced by Impala server replacing each newline in the original query string with a space, which is followed by redaction. Thus, 'queryText' may not be a valid SQL statement afterward. string stmt = replace_all_copy(query_ctx->client_request.stmt, "\n", " "); Redact(&stmt); // 'redacted_stmt' will be the string Impala uses to populate // 'queryText' of the lineage event. query_ctx->client_request.__set_redacted_stmt((const string) stmt); For instance, when the original query string contains a one-line SQL comment, it could be difficult for one to decide where that one-line SQL comment ends if every newline in the original query string is already replaced with a space. Therefore, after this patch, it would be much easier for data lineage tools to determine the operation type since it will be directly provided in the lineage log. On the other hand, apart from the field of 'operationType_', this patch also makes PlannerTest#testLineage() check the field of 'queryStr_' of ColumnLineageGraph when testLineage() compares the actual lineage graph with the expected one in lineage.test run in the frontend test. Testing: - Added a new test case to lineage.test run in end-to-end test to show that Impala could produce a lineage event for INSERT OVERWRITE. - Updated lineage.test run in end-to-end and frontend tests to make sure each lineage event comes with its respective operation type. Change-Id: Icb94120a9bb1b994d4e681ea98521035bcc6510e Reviewed-on: http://gerrit.cloudera.org:8080/24018 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/util/lineage-util.h M common/thrift/LineageGraph.thrift M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ParsedStatement.java M fe/src/main/java/org/apache/impala/analysis/ParsedStatementImpl.java M fe/src/main/java/org/apache/impala/planner/Planner.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteParsedStatement.java M testdata/workloads/functional-planner/queries/PlannerTest/lineage.test M testdata/workloads/functional-query/queries/QueryTest/lineage.test 14 files changed, 458 insertions(+), 14 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/24018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icb94120a9bb1b994d4e681ea98521035bcc6510e Gerrit-Change-Number: 24018 Gerrit-PatchSet: 9 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Daniel Vanko <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
