Salil Surendran created SPARK-18833:
---------------------------------------

             Summary: Changing partition location using the 'ALTER TABLE .. SET 
LOCATION' command via beeline doesn't get reflected in Spark
                 Key: SPARK-18833
                 URL: https://issues.apache.org/jira/browse/SPARK-18833
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
            Reporter: Salil Surendran


Use the 'ALTER TABLE' command to change the partition location of a table via 
beeline. spark-shell doesn't find any of the data from the table even though 
the data can be read via beeline. To reproduce do the following:

== At hive side: ===
hive> CREATE EXTERNAL TABLE testA (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/A/' ;
hive> CREATE EXTERNAL TABLE testB (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/B/' ;
hive> CREATE EXTERNAL TABLE testC (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/C/' ;

hive> insert into table testA PARTITION (idP='1') values 
('1',"test"),('2',"test2");

hive> ALTER TABLE testB ADD IF NOT EXISTS PARTITION(idP=‘1’);
hive> ALTER TABLE testB PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/';

hive> select * from testA;
OK
1 test 1
2 test2 1


hive> select * from testB;
OK
1 test 1
2 test2 1

Conclusion: it worked changing the location to the place where the parquet file 
is present.


=== At spark side: ===
scala> import org.apache.spark.sql.hive.HiveContext
scala> val hiveContext = new HiveContext(sc)

scala> hiveContext.refreshTable("testB")

scala> hiveContext.sql("select * from testB").count
res2: Long = 0

scala> hiveContext.sql("ALTER TABLE testC ADD IF NOT EXISTS PARTITION(idP='1')")
res3: org.apache.spark.sql.DataFrame = [result: string]

scala> hiveContext.sql("ALTER TABLE testC PARTITION (idP='1') SET LOCATION 
'/user/root/A/idp=1/' ")
res4: org.apache.spark.sql.DataFrame = [result: string]

scala> hiveContext.sql("select * from testC").count
res6: Long = 0

scala> hiveContext.refreshTable("testC")

scala> hiveContext.sql("select * from testC").count
res8: Long = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to