[ https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nixon Rodrigues resolved ATLAS-3655. ------------------------------------ Resolution: Fixed PR merged on master & branch-2.0, thanks [~vladglinskiy] for PR > Create 'spark_application' type to avoid 'spark_process' from being updated > for multiple operations > --------------------------------------------------------------------------------------------------- > > Key: ATLAS-3655 > URL: https://issues.apache.org/jira/browse/ATLAS-3655 > Project: Atlas > Issue Type: Task > Reporter: Vladislav Glinskiy > Priority: Major > Fix For: 2.1.0, 3.0.0 > > Attachments: Screenshot from 2020-03-03 16-09-39.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Create 'spark_application' type to avoid 'spark_process' from being updated > for multiple operations. Currently, Spark Atlas Connector uses > 'spark_process' as a top-level type for a Spark session, thus it's being > updated for multiple operations within the same session. > The following statements: > {code:java} > spark.sql("create table table_1(col1 int,col2 string)"); > spark.sql("create table table_2 as select * from table_1"); > {code} > result in the next correct lineage: > table1 ------> spark_process1 -------> table2 > but executing similar statements in the same spark session: > {code:java} > spark.sql("create table table_3(col1 int,col2 string)"); > spark.sql("create table table_4 as select * from table_3"); > {code} > result in the same 'spark_process' being updated and the lineage now connects > all the 4 tables(see screenshot in the attachments). > > The proposal is to create a 'spark_application' entity and associate all > 'spark_process' entities (created within that session) to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)