[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

Dong Jiang (JIRA) Wed, 27 Jun 2018 13:00:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dong Jiang updated SPARK-24669:
-------------------------------
    Description: 
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-----+
|   id|
+-----+
|first|
+-----+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-----+
|   id|
+-----+
|first|
+-----+
"now, if I drop the table explicitly, instead of via dropping database cascade, 
then it will be the correct result"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+------+
|    id|
+------+
|second|
+------+
{code}

  was:
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-----+
|   id|
+-----+
|first|
+-----+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-----+
|   id|
+-----+
|first|
+-----+
"now, if I drop the table explicitly, then it will be correct"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+------+
|    id|
+------+
|second|
+------+
{code}


> Managed table was not cleared of path after drop database cascade
> -----------------------------------------------------------------
>
>                 Key: SPARK-24669
>                 URL: https://issues.apache.org/jira/browse/SPARK-24669
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Dong Jiang
>            Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-----+
> |   id|
> +-----+
> |first|
> +-----+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-----+
> |   id|
> +-----+
> |first|
> +-----+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +------+
> |    id|
> +------+
> |second|
> +------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

Reply via email to