Fang-Yu Rao created IMPALA-14768:
------------------------------------

             Summary: Add operation type to the lineage graph
                 Key: IMPALA-14768
                 URL: https://issues.apache.org/jira/browse/IMPALA-14768
             Project: IMPALA
          Issue Type: Task
          Components: Frontend
            Reporter: Fang-Yu Rao
            Assignee: Fang-Yu Rao


Currently, a lineage event log produced by Impala does not include the 
information about the operation type.
{code}
{"queryText":"create table test_db_01.test_tbl_01 (id 
int)","queryId":"b44da06a10682ce9:286bd74300000000","hash":"7debad31b299d7cccdf78a67968eb39d","user":"[email protected]","timestamp":1771622004,"endTime":1771622005,"edges":[],"vertices":[]}
{code}

However, some lineage event processing tool, e.g., Atlas, requires this piece 
of information. To derive the operation type, tools like 
https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaLineageHook.java
 relies on regular expressions in 
https://github.com/apache/atlas/blob/14246fe/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L30-L65
 to determine the operation type of the logged lineage event. But such regular 
expressions are not able to determine the operation type in all cases. One such 
example is when the SQL statement contains one-line comment.

One solution to the aforementioned issue is to make sure the query text of a 
lineage event is a valid SQL statement (IMPALA-14741).

An alternative is for Impala to add an additional field in its lineage graph to 
indicate the operation type. Once Impala is able to log the operation type in a 
lineage event, 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to