Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24018

to look at the new patch set (#4).

Change subject: IMPALA-14768: Add the operation type to the lineage graph
......................................................................

IMPALA-14768: Add the operation type to the lineage graph

This patch makes Impala produce the operation type of the completed
query in the corresponding lineage event so that it would be easier for
data lineage tools like Apache Atlas to derive the operation type of a
given query. Note that currently Apache Atlas determines the operation
type of a given Impala query by matching the field of 'queryText' in the
lineage event against predefined regular expressions. Refer to
https://github.com/apache/atlas/blob/2957ff2/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L49-L77
for more details.

However, such an approach is not robust. Recall that the string in
'queryText' is produced by Impala server replacing each newline in the
original query string with a space, which is followed by redaction.
Thus, 'queryText' may not be a valid SQL statement afterward.

  string stmt =
    replace_all_copy(query_ctx->client_request.stmt, "\n", " ");
  Redact(&stmt);
  // 'redacted_stmt' will be the string Impala uses to populate
  // 'queryText' of the lineage event.
  query_ctx->client_request.__set_redacted_stmt((const string) stmt);

For instance, when the original query
string contains a one-line SQL comment, it could be difficult for one to
decide where that one-line SQL comment ends if every newline in the
original query string is already replaced with a space.

Therefore, after this patch, it would be much easier for data lineage
tools to determine the operation type since it will be directly provided
in the lineage log.

Testing:
 - Added a new test case to lineage.test to show that Impala could
   produce a lineage event for INSERT OVERWRITE.
 - Updated lineage.test to make sure each lineage event comes with its
   respective operation type.

Change-Id: Icb94120a9bb1b994d4e681ea98521035bcc6510e
---
M be/src/util/lineage-util.h
M common/thrift/LineageGraph.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ParsedStatement.java
M fe/src/main/java/org/apache/impala/analysis/ParsedStatementImpl.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteParsedStatement.java
M testdata/workloads/functional-query/queries/QueryTest/lineage.test
12 files changed, 378 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/24018/4
--
To view, visit http://gerrit.cloudera.org:8080/24018
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb94120a9bb1b994d4e681ea98521035bcc6510e
Gerrit-Change-Number: 24018
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to