MaxGekk opened a new pull request #31379:
URL: https://github.com/apache/spark/pull/31379


   ### What changes were proposed in this pull request?
   Invoke `CatalogImpl.refreshTable()` in v1 implementation of the `ALTER TABLE 
.. SET LOCATION` command to refresh cached table data.
   
   ### Why are the changes needed?
   The example below portraits the issue:
   
   - Create a source table:
   ```sql
   spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
   spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
   spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
   default      src_tbl false   Partition Values: [part=0]
   Location: 
file:/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0
   ...
   ```
   - Set new location for the empty partition (part=0):
   ```sql
   spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
   spark-sql> ALTER TABLE dst_tbl ADD PARTITION (part=0);
   spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
   spark-sql> CACHE TABLE dst_tbl;
   spark-sql> SELECT * FROM dst_tbl;
   1    1
   spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION 
'/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0';
   spark-sql> SELECT * FROM dst_tbl;
   1    1
   ```
   The last query does not return new loaded data. 
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. After the changes, the example above works correctly:
   ```sql
   spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION 
'/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0';
   spark-sql> SELECT * FROM dst_tbl;
   0    0
   1    1
   ```
   
   
   ### How was this patch tested?
   Added new test to `org.apache.spark.sql.hive.CachedTableSuite`:
   ```
   $ build/sbt -Phive -Phive-thriftserver "test:testOnly *CachedTableSuite"
   ```
   
   Authored-by: Max Gekk <max.g...@gmail.com>
   Signed-off-by: HyukjinKwon <gurwls...@apache.org>
   (cherry picked from commit d242166b8fd741fdd46d9048f847b2fd6e1d07b1)
   Signed-off-by: Max Gekk <max.g...@gmail.com>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to