This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new d28ca9c  [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on 
table refreshing
d28ca9c is described below

commit d28ca9cc9808828118be64a545c3478160fdc170
Author: Max Gekk <max.g...@gmail.com>
AuthorDate: Wed Jun 30 09:44:52 2021 +0300

    [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table 
refreshing
    
    ### What changes were proposed in this pull request?
    In the PR, I propose to catch all non-fatal exceptions coming 
`refreshTable()` at the final stage of table repairing, and output an error 
message instead of failing with an exception.
    
    ### Why are the changes needed?
    1. The uncaught exceptions from table refreshing might be considered as 
regression comparing to previous Spark versions. Table refreshing was 
introduced by https://github.com/apache/spark/pull/31066.
    2. This should improve user experience with Spark SQL. For instance, when 
the `MSCK REPAIR TABLE` is performed in a chain of command in SQL where 
catching exception is difficult or even impossible.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes. Before the changes the `MSCK REPAIR TABLE` command can fail with the 
exception portrayed in SPARK-35935. After the changes, the same command outputs 
error message, and completes successfully.
    
    ### How was this patch tested?
    By existing test suites.
    
    Closes #33137 from MaxGekk/msck-repair-catch-except.
    
    Authored-by: Max Gekk <max.g...@gmail.com>
    Signed-off-by: Max Gekk <max.g...@gmail.com>
---
 .../scala/org/apache/spark/sql/execution/command/ddl.scala     | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 0876b5f..06c6847 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -675,7 +675,15 @@ case class RepairTableCommand(
     // This is always the case for Hive format tables, but is not true for 
Datasource tables created
     // before Spark 2.1 unless they are converted via `msck repair table`.
     spark.sessionState.catalog.alterTable(table.copy(tracksPartitionsInCatalog 
= true))
-    spark.catalog.refreshTable(tableIdentWithDB)
+    try {
+      spark.catalog.refreshTable(tableIdentWithDB)
+    } catch {
+      case NonFatal(e) =>
+        logError(s"Cannot refresh the table '$tableIdentWithDB'. A query of 
the table " +
+          "might return wrong result if the table was cached. To avoid such 
issue, you should " +
+          "uncache the table manually via the UNCACHE TABLE command after 
table recovering will " +
+          "complete fully.", e)
+    }
     logInfo(s"Recovered all partitions: added ($addedAmount), dropped 
($droppedAmount).")
     Seq.empty[Row]
   }

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to