[ https://issues.apache.org/jira/browse/ATLAS-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ayub Khan updated ATLAS-626: ---------------------------- Description: As part of HIVE-7090, hive supports session level temporary tables and life cycle of them. These temporary tables are used to run some additional queries against it and cleaned up at the end of the session. Inserting data in to table creates this temporary table, whose metadata is synced to atlas and once the session expires, the table is cleaned up in hive but the table still exists in atlas. 1. What is the use-case of storing metadata of this temporary table? Are temporary tables important? 2. Impact: Metadata objects might grow if the the probability of insert operation is high in production. Lineage snapshot link: https://monosnap.com/file/DtnPZA85Ug0Q27arTOqY1FnhbDQdXU {noformat} 0: jdbc:hive2://localhost:10000/default> show tables; +-----------+--+ | tab_name | +-----------+--+ | abc12312 | | h3 | | h5 | +-----------+--+ 3 rows selected (0.231 seconds) 0: jdbc:hive2://localhost:10000/default> insert into table default.h5 values ( "efg1", "abc1", 1231, 123121); INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_local737234864_0006 INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : 2016-04-04 15:21:20,381 Stage-1 map = 100%, reduce = 0% INFO : Ended Job = job_local737234864_0006 INFO : Stage-4 is selected by condition resolver. INFO : Stage-3 is filtered out by condition resolver. INFO : Stage-5 is filtered out by condition resolver. INFO : Moving data to: hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10002 INFO : Loading data to table default.h5 from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 INFO : Table default.h5 stats: [numFiles=5, numRows=5, totalSize=106, rawDataSize=101] No rows affected (3.72 seconds) 0: jdbc:hive2://localhost:10000/default> show tables; +------------------------+--+ | tab_name | +------------------------+--+ | abc12312 | | h3 | | h5 | | values__tmp__table__1 | +------------------------+--+ 4 rows selected (0.196 seconds) 0: jdbc:hive2://localhost:10000/default> describe extended values__tmp__table__1; +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ | col_name | data_type | comment | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ | tmp_values_col1 | string | | | tmp_values_col2 | string | | | tmp_values_col3 | string | | | tmp_values_col4 | string | | | | NULL | NULL | | Detailed Table Information | Table(tableName:values__tmp__table__1, dbName:default, owner:apathan, createTime:1459763477, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:tmp_values_col1, type:string, comment:), FieldSchema(name:tmp_values_col2, type:string, comment:), FieldSchema(name:tmp_values_col3, type:string, comment:), FieldSchema(name:tmp_values_col4, type:string, comment:)], location:hdfs://localhost:9000/tmp/hive/apathan/d461c5c3-931a-4aa3-9124-997b75f10c11/_tmp_space.db/Values__Tmp__Table__1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:true) | | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ 6 rows selected (0.174 seconds) 0: jdbc:hive2://localhost:10000/default> {noformat} was: As part of HIVE-7090, hive supports session level temporary tables and life cycle of them. These temporary tables are used to run some additional queries against it and cleaned up at the end of the session. Inserting data in to table creates this temporary table, whose metadata is synced to atlas and once the session expires, the table is cleaned up in hive but the table still exists in atlas. 1. What is the use-case of storing metadata of this temporary table? Are temporary tables important? 2. Impact: Metadata objects might grow if the the probability of insert operation is high in production. 3. Insert queries fired from across databases are shown in lineage. i.e; Temporary table is shown in lineage only when it is created in different database than the target database. is this expected? Lineage snapshot link: https://monosnap.com/file/DtnPZA85Ug0Q27arTOqY1FnhbDQdXU {noformat} 0: jdbc:hive2://localhost:10000/default> show tables; +-----------+--+ | tab_name | +-----------+--+ | abc12312 | | h3 | | h5 | +-----------+--+ 3 rows selected (0.231 seconds) 0: jdbc:hive2://localhost:10000/default> insert into table default.h5 values ( "efg1", "abc1", 1231, 123121); INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_local737234864_0006 INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : 2016-04-04 15:21:20,381 Stage-1 map = 100%, reduce = 0% INFO : Ended Job = job_local737234864_0006 INFO : Stage-4 is selected by condition resolver. INFO : Stage-3 is filtered out by condition resolver. INFO : Stage-5 is filtered out by condition resolver. INFO : Moving data to: hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10002 INFO : Loading data to table default.h5 from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 INFO : Table default.h5 stats: [numFiles=5, numRows=5, totalSize=106, rawDataSize=101] No rows affected (3.72 seconds) 0: jdbc:hive2://localhost:10000/default> show tables; +------------------------+--+ | tab_name | +------------------------+--+ | abc12312 | | h3 | | h5 | | values__tmp__table__1 | +------------------------+--+ 4 rows selected (0.196 seconds) 0: jdbc:hive2://localhost:10000/default> describe extended values__tmp__table__1; +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ | col_name | data_type | comment | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ | tmp_values_col1 | string | | | tmp_values_col2 | string | | | tmp_values_col3 | string | | | tmp_values_col4 | string | | | | NULL | NULL | | Detailed Table Information | Table(tableName:values__tmp__table__1, dbName:default, owner:apathan, createTime:1459763477, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:tmp_values_col1, type:string, comment:), FieldSchema(name:tmp_values_col2, type:string, comment:), FieldSchema(name:tmp_values_col3, type:string, comment:), FieldSchema(name:tmp_values_col4, type:string, comment:)], location:hdfs://localhost:9000/tmp/hive/apathan/d461c5c3-931a-4aa3-9124-997b75f10c11/_tmp_space.db/Values__Tmp__Table__1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:true) | | +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ 6 rows selected (0.174 seconds) 0: jdbc:hive2://localhost:10000/default> {noformat} > Hive temporary table metadata is captured in atlas. > --------------------------------------------------- > > Key: ATLAS-626 > URL: https://issues.apache.org/jira/browse/ATLAS-626 > Project: Atlas > Issue Type: Bug > Affects Versions: trunk > Reporter: Ayub Khan > Fix For: trunk > > > As part of HIVE-7090, hive supports session level temporary tables and life > cycle of them. > These temporary tables are used to run some additional queries against it and > cleaned up at the end of the session. > Inserting data in to table creates this temporary table, whose metadata is > synced to atlas and once the session expires, the table is cleaned up in hive > but the table still exists in atlas. > 1. What is the use-case of storing metadata of this temporary table? Are > temporary tables important? > 2. Impact: Metadata objects might grow if the the probability of insert > operation is high in production. > Lineage snapshot link: > https://monosnap.com/file/DtnPZA85Ug0Q27arTOqY1FnhbDQdXU > {noformat} > 0: jdbc:hive2://localhost:10000/default> show tables; > +-----------+--+ > | tab_name | > +-----------+--+ > | abc12312 | > | h3 | > | h5 | > +-----------+--+ > 3 rows selected (0.231 seconds) > 0: jdbc:hive2://localhost:10000/default> insert into table default.h5 values > ( "efg1", "abc1", 1231, 123121); > INFO : Number of reduce tasks is set to 0 since there's no reduce operator > INFO : number of splits:1 > INFO : Submitting tokens for job: job_local737234864_0006 > INFO : The url to track the job: http://localhost:8080/ > INFO : Job running in-process (local Hadoop) > INFO : 2016-04-04 15:21:20,381 Stage-1 map = 100%, reduce = 0% > INFO : Ended Job = job_local737234864_0006 > INFO : Stage-4 is selected by condition resolver. > INFO : Stage-3 is filtered out by condition resolver. > INFO : Stage-5 is filtered out by condition resolver. > INFO : Moving data to: > hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 > from > hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10002 > INFO : Loading data to table default.h5 from > hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000 > INFO : Table default.h5 stats: [numFiles=5, numRows=5, totalSize=106, > rawDataSize=101] > No rows affected (3.72 seconds) > 0: jdbc:hive2://localhost:10000/default> show tables; > +------------------------+--+ > | tab_name | > +------------------------+--+ > | abc12312 | > | h3 | > | h5 | > | values__tmp__table__1 | > +------------------------+--+ > 4 rows selected (0.196 seconds) > 0: jdbc:hive2://localhost:10000/default> describe extended > values__tmp__table__1; > +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ > | col_name | > > > > > > > > data_type > > > > > > > | comment | > +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ > | tmp_values_col1 | string > > > > > > > > > > > > > > > | | > | tmp_values_col2 | string > > > > > > > > > > > > > > > | | > | tmp_values_col3 | string > > > > > > > > > > > > > > > | | > | tmp_values_col4 | string > > > > > > > > > > > > > > > | | > | | NULL > > > > > > > > > > > > > > > | NULL | > | Detailed Table Information | Table(tableName:values__tmp__table__1, > dbName:default, owner:apathan, createTime:1459763477, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:tmp_values_col1, > type:string, comment:), FieldSchema(name:tmp_values_col2, type:string, > comment:), FieldSchema(name:tmp_values_col3, type:string, comment:), > FieldSchema(name:tmp_values_col4, type:string, comment:)], > location:hdfs://localhost:9000/tmp/hive/apathan/d461c5c3-931a-4aa3-9124-997b75f10c11/_tmp_space.db/Values__Tmp__Table__1, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}), bucketCols:[], sortCols:[], > parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], > skewedColValueLocationMaps:{}), storedAsSubDirectories:false), > partitionKeys:[], parameters:{}, viewOriginalText:null, > viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null), temporary:true) | | > +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+ > 6 rows selected (0.174 seconds) > 0: jdbc:hive2://localhost:10000/default> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)