This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 7546b4405d5 [SPARK-42164][CORE] Register partitioned-table-related 
classes to KryoSerializer
7546b4405d5 is described below

commit 7546b4405d5b35626e98b28bfc8031d2100172d1
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Mon Jan 23 21:39:30 2023 -0800

    [SPARK-42164][CORE] Register partitioned-table-related classes to 
KryoSerializer
    
    ### What changes were proposed in this pull request?
    
    This PR aims to register partitioned-table-related classes to 
`KryoSerializer`.
    Specifically, `CREATE TABLE` and `MSCK REPAIR TABLE` uses this classes.
    
    ### Why are the changes needed?
    
    To support partitioned-tables more easily with `KryoSerializer`.  
Previously, it fails like the following.
    
    ```
    java.lang.IllegalArgumentException: Class is not registered:
    org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation
    ```
    
    ```
    java.lang.IllegalArgumentException: Class is not registered:
    org.apache.spark.util.HadoopFSUtils$SerializableFileStatus
    ```
    
    ```
    java.lang.IllegalArgumentException: Class is not registered:
    org.apache.spark.sql.execution.command.PartitionStatistics
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs and manually tests.
    
    **TEST TABLE**
    ```
    $ tree /tmp/t
    /tmp/t
    ├── p=1
    │   └── users.orc
    ├── p=10
    │   └── users.orc
    ├── p=11
    │   └── users.orc
    ├── p=2
    │   └── users.orc
    ├── p=3
    │   └── users.orc
    ├── p=4
    │   └── users.orc
    ├── p=5
    │   └── users.orc
    ├── p=6
    │   └── users.orc
    ├── p=7
    │   └── users.orc
    ├── p=8
    │   └── users.orc
    └── p=9
        └── users.orc
    ```
    
    **CREATE PARTITIONED TABLES AND RECOVER PARTITIONS**
    ```
    $ bin/spark-shell -c spark.kryo.registrationRequired=true -c 
spark.serializer=org.apache.spark.serializer.KryoSerializer -c 
spark.sql.sources.parallelPartitionDiscovery.threshold=1
    
    scala> sql("CREATE TABLE t USING ORC LOCATION '/tmp/t'").show()
    ++
    ||
    ++
    ++
    
    scala> sql("MSCK REPAIR TABLE t").show()
    ++
    ||
    ++
    ++
    ```
    
    Closes #39713 from dongjoon-hyun/SPARK-42164.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala | 5 +++++
 1 file changed, 5 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
index b7035feba84..5499732660b 100644
--- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
@@ -510,6 +510,10 @@ private[serializer] object KryoSerializer {
   // SQL / ML / MLlib classes once and then re-use that filtered list in 
newInstance() calls.
   private lazy val loadableSparkClasses: Seq[Class[_]] = {
     Seq(
+      "org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation",
+      "[Lorg.apache.spark.util.HadoopFSUtils$SerializableBlockLocation;",
+      "org.apache.spark.util.HadoopFSUtils$SerializableFileStatus",
+
       "org.apache.spark.sql.catalyst.expressions.BoundReference",
       "org.apache.spark.sql.catalyst.expressions.SortOrder",
       "[Lorg.apache.spark.sql.catalyst.expressions.SortOrder;",
@@ -536,6 +540,7 @@ private[serializer] object KryoSerializer {
       "org.apache.spark.sql.types.DecimalType",
       "org.apache.spark.sql.types.Decimal$DecimalAsIfIntegral$",
       "org.apache.spark.sql.types.Decimal$DecimalIsFractional$",
+      "org.apache.spark.sql.execution.command.PartitionStatistics",
       
"org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTaskResult",
       "org.apache.spark.sql.execution.joins.EmptyHashedRelation$",
       "org.apache.spark.sql.execution.joins.LongHashedRelation",


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to