[ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107296#comment-16107296 ]
Takeshi Yamamuro edited comment on SPARK-21579 at 7/31/17 1:16 PM: ------------------------------------------------------------------- I think this is an expected behaviour in Spark; `cache1` is a sub-tree of `cache2`, so if you uncache `cache1`, spark also uncaches `cache2`. {code} scala> Seq(("name1", 28)).toDF("name", "age").createOrReplaceTempView("base") scala> sql("cache table cache1 as select * from base where age >= 25") scala> sql("cache table cache2 as select * from cache1 where name = 'p1'") scala> spark.table("cache1").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (_2#3 >= 25) +- LocalTableScan [_1#2, _2#3] scala> spark.table("cache2").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2` +- *Filter (isnotnull(name#5) && (name#5 = p1)) +- InMemoryTableScan [name#5, age#6], [isnotnull(name#5), (name#5 = p1)] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (_2#3 >= 25) +- LocalTableScan [_1#2, _2#3] {code} Probably, you better do this; {code} scala> sql("create temporary view tmp1 as select * from base where age >= 25") scala> sql("cache table cache1 as select * from tmp1") scala> sql("cache table cache2 as select * from tmp1 where name = 'p1'") scala> catalog.dropTempView("cache1") // scala> sql("select * from cache1").explain --> Throws AnalysisException `Table or view not found` scala> sql("select * from cache2").explain scala> sql("select * from cache2").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (((_2#3 >= 25) && isnotnull(_1#2)) && (_1#2 = p1)) +- LocalTableScan [_1#2, _2#3] {code} was (Author: maropu): I think this is an expected behaviour; `cache1` is a sub-tree of `cache2`, so if you uncache `cache1`, spark also uncaches `cache2`. {code} scala> Seq(("name1", 28)).toDF("name", "age").createOrReplaceTempView("base") scala> sql("cache table cache1 as select * from base where age >= 25") scala> sql("cache table cache2 as select * from cache1 where name = 'p1'") scala> spark.table("cache1").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (_2#3 >= 25) +- LocalTableScan [_1#2, _2#3] scala> spark.table("cache2").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2` +- *Filter (isnotnull(name#5) && (name#5 = p1)) +- InMemoryTableScan [name#5, age#6], [isnotnull(name#5), (name#5 = p1)] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache1` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (_2#3 >= 25) +- LocalTableScan [_1#2, _2#3] {code} Probably, you better do this; {code} scala> sql("create temporary view tmp1 as select * from base where age >= 25") scala> sql("cache table cache1 as select * from tmp1") scala> sql("cache table cache2 as select * from tmp1 where name = 'p1'") scala> catalog.dropTempView("cache1") // scala> sql("select * from cache1").explain --> Throws AnalysisException `Table or view not found` scala> sql("select * from cache2").explain scala> sql("select * from cache2").explain == Physical Plan == InMemoryTableScan [name#5, age#6] +- InMemoryRelation [name#5, age#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `cache2` +- *Project [_1#2 AS name#5, _2#3 AS age#6] +- *Filter (((_2#3 >= 25) && isnotnull(_1#2)) && (_1#2 = p1)) +- LocalTableScan [_1#2, _2#3] {code} > dropTempView has a critical BUG > ------------------------------- > > Key: SPARK-21579 > URL: https://issues.apache.org/jira/browse/SPARK-21579 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: ant_nebula > Priority: Critical > > when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from > http://127.0.0.1:4040/storage/. > It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem. > {code:java} > val spark = > SparkSession.builder.master("local").appName("sparkTest").getOrCreate() > val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), > Row("p5", 40), Row("p6", 15)) > val schema = new StructType().add(StructField("name", > StringType)).add(StructField("age", IntegerType)) > val rowRDD = spark.sparkContext.parallelize(rows, 3) > val df = spark.createDataFrame(rowRDD, schema) > df.createOrReplaceTempView("ods_table") > spark.sql("cache table ods_table") > spark.sql("cache table dwd_table1 as select * from ods_table where age>=25") > spark.sql("cache table dwd_table2 as select * from dwd_table1 where > name='p1'") > spark.catalog.dropTempView("dwd_table1") > //spark.catalog.dropTempView("ods_table") > spark.sql("select * from dwd_table2").show() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org