ConeyLiu commented on issue #25470: [SPARK-28751][Core][WIP] Improve java 
serializer deserialization performance
URL: https://github.com/apache/spark/pull/25470#issuecomment-524615974
 
 
   Hi @cloud-fan, it's difficult to reuse the `ObjectInpuStream`. The workflow 
of `ObjectOutputStream` write class desc as follows:
   ```java
     /**
        * Writes representation of given class descriptor to stream.
        */
       private void writeClassDesc(ObjectStreamClass desc, boolean unshared)
           throws IOException
       {
           int handle;
           if (desc == null) {
               writeNull();
           } else if (!unshared && (handle = handles.lookup(desc)) != -1) {
               writeHandle(handle);
           } else if (desc.isProxy()) {
               writeProxyDesc(desc, unshared);
           } else {
               writeNonProxyDesc(desc, unshared);
           }
       }
   ```
   It will write the full class name if this is the first time we have met the 
class, else just write the `TC_REFERENCE ` and the handle id.
   
   So same as the `ObjectInputStream`:
   ```java
     /**
        * Reads in and returns (possibly null) class descriptor.  Sets 
passHandle
        * to class descriptor's assigned handle.  If class descriptor cannot be
        * resolved to a class in the local VM, a ClassNotFoundException is
        * associated with the class descriptor's handle.
        */
       private ObjectStreamClass readClassDesc(boolean unshared)
           throws IOException
       {
           byte tc = bin.peekByte();
           ObjectStreamClass descriptor;
           switch (tc) {
               case TC_NULL:
                   descriptor = (ObjectStreamClass) readNull();
                   break;
               case TC_REFERENCE:
                   descriptor = (ObjectStreamClass) readHandle(unshared);
                   break;
               case TC_PROXYCLASSDESC:
                   descriptor = readProxyDesc(unshared);
                   break;
               case TC_CLASSDESC:
                   descriptor = readNonProxyDesc(unshared);
                   break;
               default:
                   throw new StreamCorruptedException(
                       String.format("invalid type code: %02X", tc));
           }
           if (descriptor != null) {
               validateDescriptor(descriptor);
           }
           return descriptor;
       }
   ```
   We read the class from the handle(`descriptor = (ObjectStreamClass) 
readHandle(unshared);`) if the class already encountered before. So we don't 
need to resolve the class with the class name again.
   
   However, we need to keep the mapping between `handle id` and `class` equal 
between `ObjectInputStream` and `ObjectOutputStream`. If we reuse the 
`ObjectInputStream`, it will reuse the previous `handle` cache which will 
destroy the mapping relationship. If we call `ObjectInputStream.reset` to reuse 
the `ObjectInputStream`, we still need to resolve the class with the class 
name. So it's difficult to reuse the `ObjectInputStream`. 
   
   In currently way, we keep a resolved class cache which is a similar method 
used in `ObjectInputStream`. However, this cache is available for use across 
multiple `ObjectInputStream`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to