[ 
https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nixon Rodrigues resolved ATLAS-3655.
------------------------------------
    Resolution: Fixed

PR merged on master & branch-2.0, thanks [~vladglinskiy] for PR

> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-3655
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3655
>             Project: Atlas
>          Issue Type: Task
>            Reporter: Vladislav Glinskiy
>            Priority: Major
>             Fix For: 2.1.0, 3.0.0
>
>         Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations. Currently, Spark Atlas Connector uses 
> 'spark_process' as a top-level type for a Spark session, thus it's being 
> updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 ------> spark_process1 -------> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects 
> all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 
> 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to