[
https://issues.apache.org/jira/browse/IMPALA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fang-Yu Rao updated IMPALA-14768:
---------------------------------
Description:
Currently, a lineage event log produced by Impala does not include the
information about the operation type.
{code}
{"queryText":"create table test_db_01.test_tbl_01 (id
int)","queryId":"b44da06a10682ce9:286bd74300000000","hash":"7debad31b299d7cccdf78a67968eb39d","user":"[email protected]","timestamp":1771622004,"endTime":1771622005,"edges":[],"vertices":[]}
{code}
However, some lineage event processing tool, e.g., Atlas, requires this piece
of information. To derive the operation type, tools like
https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaLineageHook.java
relies on regular expressions in
https://github.com/apache/atlas/blob/14246fe/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L30-L65
to determine the operation type of the logged lineage event. But such regular
expressions are not able to determine the operation type in all cases. One such
example is when the SQL statement contains one-line comment.
One solution to the aforementioned issue is to make sure the query text of a
lineage event is a valid SQL statement (IMPALA-14741).
An alternative is for Impala to add an additional field in its lineage graph to
indicate the operation type. Once Impala is able to log the operation type in a
lineage event, we could change the logic in Atlas hook that derives the
operation type.
was:
Currently, a lineage event log produced by Impala does not include the
information about the operation type.
{code}
{"queryText":"create table test_db_01.test_tbl_01 (id
int)","queryId":"b44da06a10682ce9:286bd74300000000","hash":"7debad31b299d7cccdf78a67968eb39d","user":"[email protected]","timestamp":1771622004,"endTime":1771622005,"edges":[],"vertices":[]}
{code}
However, some lineage event processing tool, e.g., Atlas, requires this piece
of information. To derive the operation type, tools like
https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaLineageHook.java
relies on regular expressions in
https://github.com/apache/atlas/blob/14246fe/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L30-L65
to determine the operation type of the logged lineage event. But such regular
expressions are not able to determine the operation type in all cases. One such
example is when the SQL statement contains one-line comment.
One solution to the aforementioned issue is to make sure the query text of a
lineage event is a valid SQL statement (IMPALA-14741).
An alternative is for Impala to add an additional field in its lineage graph to
indicate the operation type. Once Impala is able to log the operation type in a
lineage event,
> Add operation type to the lineage graph
> ---------------------------------------
>
> Key: IMPALA-14768
> URL: https://issues.apache.org/jira/browse/IMPALA-14768
> Project: IMPALA
> Issue Type: Task
> Components: Frontend
> Reporter: Fang-Yu Rao
> Assignee: Fang-Yu Rao
> Priority: Major
>
> Currently, a lineage event log produced by Impala does not include the
> information about the operation type.
> {code}
> {"queryText":"create table test_db_01.test_tbl_01 (id
> int)","queryId":"b44da06a10682ce9:286bd74300000000","hash":"7debad31b299d7cccdf78a67968eb39d","user":"[email protected]","timestamp":1771622004,"endTime":1771622005,"edges":[],"vertices":[]}
> {code}
> However, some lineage event processing tool, e.g., Atlas, requires this piece
> of information. To derive the operation type, tools like
> https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaLineageHook.java
> relies on regular expressions in
> https://github.com/apache/atlas/blob/14246fe/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L30-L65
> to determine the operation type of the logged lineage event. But such
> regular expressions are not able to determine the operation type in all
> cases. One such example is when the SQL statement contains one-line comment.
> One solution to the aforementioned issue is to make sure the query text of a
> lineage event is a valid SQL statement (IMPALA-14741).
> An alternative is for Impala to add an additional field in its lineage graph
> to indicate the operation type. Once Impala is able to log the operation type
> in a lineage event, we could change the logic in Atlas hook that derives the
> operation type.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]