[ 
https://issues.apache.org/jira/browse/SPARK-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7180:
-----------------------------
    Description: 
Simple reproduction:
{code}
class Parent extends Serializable {
  val a = "a"
  val b = "b"
}

class Child extends Parent with Serializable {
  val c = Array(1)
  val d = Array(2)
  val e = Array(3)
  val f = Array(4)
  val g = Array(5)
  val o = new Object
}

// ArrayOutOfBoundsException
SparkEnv.get.closureSerializer.newInstance().serialize(new Child)
{code}

I dug into this a little and found that we are trying to fill the fields of 
`Parent` with the values of `Child`. See the following output I generated by 
adding println's everywhere:
{code}
* Visiting object org.apache.spark.serializer.Child@2c3299f6 of type 
org.apache.spark.serializer.Child
  - Found 2 class data slot descriptions
  - Looking at desc #1: org.apache.spark.serializer.Parent: static final long 
serialVersionUID = 3254964199136071914L;
    - Found 2 fields
      - Ljava/lang/String; a
      - Ljava/lang/String; b
    - getObjFieldValues: 
      - [I@23faa614
      - [I@1cad7d80
      - [I@420a6d35
      - [I@3a87d472
      - [I@2b8ca663
      - java.lang.Object@1effc3eb
{code}
SerializationDebugger#visitSerializable found two fields that belong to the 
parents, but it tried to cram the child's values into these two fields. The 
mismatch of number of fields here throws the ArrayOutOfBoundExceptions as a 
result. The culprit is this line: 
https://github.com/apache/spark/blob/4d9e560b5470029143926827b1cb9d72a0bfbeff/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L150,
 which runs reflection on the object `Child` even when it's considering the 
description for `Parent`.

  was:
This is most likely not specific to FunSuite, but I when I try to serialize one 
(don't ask why) e.g. ExecutorAllocationManagerSuite, I get an 
ArrayOutOfBoundsException.

I dug into this a little and found that in 
`SerializationDebugger#visitSerializable` incorrectly associates a class's 
fields with another's values. For instance, in the output I generated from 
adding println's everywhere:

{code}
* Visiting ExecutorAllocationManagerSuite 
(org.apache.spark.ExecutorAllocationManagerSuite)
*** visiting serializable object (class 
org.apache.spark.ExecutorAllocationManagerSuite, ExecutorAllocationManagerSuite)
  Final object = ExecutorAllocationManagerSuite + 
org.apache.spark.ExecutorAllocationManagerSuite
  Final object description org.apache.spark.ExecutorAllocationManagerSuite: 
static final long serialVersionUID = 5565470274968132811L;
  Slot descs 2
    - org.scalatest.FunSuite: static final long serialVersionUID = 
-5883370421614863475L;
      > fields = 4
        >> Lorg/scalatest/Suite$NoArgTest$; NoArgTest$module
        >> Lorg/scalatest/Assertions$AssertionsHelper; assertionsHelper
        >> Lorg/scalatest/Engine; org$scalatest$FunSuiteLike$$engine
        >> Ljava/lang/String; styleName
      > numObjFields = 4
    - org.apache.spark.ExecutorAllocationManagerSuite: static final long 
serialVersionUID = 5565470274968132811L;
      > fields = 5
        >> Z invokeBeforeAllAndAfterAllEvenIfNoTestsAreExpected
        >> Z org$scalatest$BeforeAndAfter$$runHasBeenInvoked
        >> Lscala/collection/mutable/ListBuffer; 
org$apache$spark$ExecutorAllocationManagerSuite$$contexts
        >> Ljava/util/concurrent/atomic/AtomicReference; 
org$scalatest$BeforeAndAfter$$afterFunctionAtomic
        >> Ljava/util/concurrent/atomic/AtomicReference; 
org$scalatest$BeforeAndAfter$$beforeFunctionAtomic
      > numObjFields = 3
{code}

We can see that the ExecutorAllocationManagerSuite has two class data slot 
descriptions. The first one refers to fields that belong to FunSuite, and the 
second fields that belong to ExecutorAllocationManagerSuite.

Later, however, when we are looking at the fields that belong to FunSuite, we 
try to assign the values of ExecutorAllocationManagerSuite's fields to them. 
This is because the object we run reflection on is type 
"ExecutorAllocationManagerSuite", and the mismatch in field length causes the 
ArrayOutOfBoundExceptions.

The offending line is: 
https://github.com/apache/spark/blob/4d9e560b5470029143926827b1cb9d72a0bfbeff/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L150

Sorry, I tried my best to explain this problem.


> SerializationDebugger fails with ArrayOutOfBoundsException
> ----------------------------------------------------------
>
>                 Key: SPARK-7180
>                 URL: https://issues.apache.org/jira/browse/SPARK-7180
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.0
>            Reporter: Andrew Or
>
> Simple reproduction:
> {code}
> class Parent extends Serializable {
>   val a = "a"
>   val b = "b"
> }
> class Child extends Parent with Serializable {
>   val c = Array(1)
>   val d = Array(2)
>   val e = Array(3)
>   val f = Array(4)
>   val g = Array(5)
>   val o = new Object
> }
> // ArrayOutOfBoundsException
> SparkEnv.get.closureSerializer.newInstance().serialize(new Child)
> {code}
> I dug into this a little and found that we are trying to fill the fields of 
> `Parent` with the values of `Child`. See the following output I generated by 
> adding println's everywhere:
> {code}
> * Visiting object org.apache.spark.serializer.Child@2c3299f6 of type 
> org.apache.spark.serializer.Child
>   - Found 2 class data slot descriptions
>   - Looking at desc #1: org.apache.spark.serializer.Parent: static final long 
> serialVersionUID = 3254964199136071914L;
>     - Found 2 fields
>       - Ljava/lang/String; a
>       - Ljava/lang/String; b
>     - getObjFieldValues: 
>       - [I@23faa614
>       - [I@1cad7d80
>       - [I@420a6d35
>       - [I@3a87d472
>       - [I@2b8ca663
>       - java.lang.Object@1effc3eb
> {code}
> SerializationDebugger#visitSerializable found two fields that belong to the 
> parents, but it tried to cram the child's values into these two fields. The 
> mismatch of number of fields here throws the ArrayOutOfBoundExceptions as a 
> result. The culprit is this line: 
> https://github.com/apache/spark/blob/4d9e560b5470029143926827b1cb9d72a0bfbeff/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L150,
>  which runs reflection on the object `Child` even when it's considering the 
> description for `Parent`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to