[ 
https://issues.apache.org/jira/browse/SPARK-10407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-10407:
-------------------------------
    Description: 
In my long-running web server that eventually uses a SparkContext, I eventually 
came across some stack overflow errors that could only be cleared by restarting 
my server.

{code}
java.lang.StackOverflowError: null
        at 
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2307) 
~[na:1.7.0_45]
        at 
java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2718)
 ~[na:1.7.0_45]
        at 
java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
 ~[na:1.7.0_45]
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1979) 
~[na:1.7.0_45]
        at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) 
~[na:1.7.0_45]
...
...
 at 
org.apache.commons.lang3.SerializationUtils.clone(SerializationUtils.java:96) 
~[commons-lang3-3.3.jar:3.3]
        at 
org.apache.spark.scheduler.DAGScheduler.submitJob(DAGScheduler.scala:516) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:529) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1770) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1788) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1803) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1276) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
...
{code}

The bottom of the trace indicates that serializing a properties object is part 
of the stack when the overflow happens. I checked the origin of the properties, 
and it turns out it's coming from SparkContext.localProperties, an 
InheritableThreadLocal field.

When I debugged further, I found that localProperties.childValue() wraps its 
parent properties object in another properties object, and returns the wrapper 
properties. The problem is that every time childValue was being called, I was 
seeing the properties that was passed in from the parent have a deeper and 
deeper nesting of wrapped properties. This doesn't make any sense since my 
application doesn't create threads recursively or anything like that, so I'm 
marking this issue as a minor one since it shouldn't affect the average 
application.

On the other hand, there shouldn't really be any reason to be creating the 
properties in childValue using nesting. Instead, the properties returned by 
childValue should be flattened, and more importantly, a deep copy of the 
parent.I'm also concerned about the parent thread possibly modifying the 
wrapped properties object while it's being used by the child thread, creating 
possible race conditions since Properties is not thread-safe.

  was:
In my long-running web server that eventually uses a SparkContext, I eventually 
came across some stack overflow errors that could only be cleared by restarting 
my server.

{code}
java.lang.StackOverflowError: null
        at 
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2307) 
~[na:1.7.0_45]
        at 
java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2718)
 ~[na:1.7.0_45]
        at 
java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
 ~[na:1.7.0_45]
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1979) 
~[na:1.7.0_45]
        at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) 
~[na:1.7.0_45]
...
...
 at 
org.apache.commons.lang3.SerializationUtils.clone(SerializationUtils.java:96) 
~[commons-lang3-3.3.jar:3.3]
        at 
org.apache.spark.scheduler.DAGScheduler.submitJob(DAGScheduler.scala:516) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:529) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1770) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1788) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1803) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1276) 
~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
...
{code}

The bottom of the trace indicates that serializing a properties object is part 
of the stack when the overflow happens. I checked the origin of the properties, 
and it turns out it's coming from SparkContext.localProperties, an 
InheritableThreadLocal field.

When I debugged further, I found that localProperties.childValue() wraps its 
parent properties object in another properties object, and returns the wrapper 
properties. The problem is that every time childValue was being called, I was 
seeing the properties that was passed in from the parent have a deeper and 
deeper nesting of wrapped properties. This doesn't make any sense since my 
application doesn't create threads recursively or anything like that, so I'm 
marking this issue as a minor one since it shouldn't affect the average 
application.

On the other hand, there shouldn't really be any reason to be creating the 
properties in childValue using nesting. Instead, the properties returned by 
childValue should be flattened, and more importantly, a deep copy of the parent.


> Possible Stack-overflow using InheritableThreadLocal nested-properties for 
> SparkContext.localProperties
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10407
>                 URL: https://issues.apache.org/jira/browse/SPARK-10407
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Matt Cheah
>            Priority: Minor
>
> In my long-running web server that eventually uses a SparkContext, I 
> eventually came across some stack overflow errors that could only be cleared 
> by restarting my server.
> {code}
> java.lang.StackOverflowError: null
>         at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2307) 
> ~[na:1.7.0_45]
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2718)
>  ~[na:1.7.0_45]
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
>  ~[na:1.7.0_45]
>         at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1979) 
> ~[na:1.7.0_45]
>         at 
> java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) 
> ~[na:1.7.0_45]
> ...
> ...
>  at 
> org.apache.commons.lang3.SerializationUtils.clone(SerializationUtils.java:96) 
> ~[commons-lang3-3.3.jar:3.3]
>         at 
> org.apache.spark.scheduler.DAGScheduler.submitJob(DAGScheduler.scala:516) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
>         at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:529) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1770) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1788) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1803) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
>         at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1276) 
> ~[spark-core_2.10-1.4.1-palantir1.jar:1.4.1-palantir1]
> ...
> {code}
> The bottom of the trace indicates that serializing a properties object is 
> part of the stack when the overflow happens. I checked the origin of the 
> properties, and it turns out it's coming from SparkContext.localProperties, 
> an InheritableThreadLocal field.
> When I debugged further, I found that localProperties.childValue() wraps its 
> parent properties object in another properties object, and returns the 
> wrapper properties. The problem is that every time childValue was being 
> called, I was seeing the properties that was passed in from the parent have a 
> deeper and deeper nesting of wrapped properties. This doesn't make any sense 
> since my application doesn't create threads recursively or anything like 
> that, so I'm marking this issue as a minor one since it shouldn't affect the 
> average application.
> On the other hand, there shouldn't really be any reason to be creating the 
> properties in childValue using nesting. Instead, the properties returned by 
> childValue should be flattened, and more importantly, a deep copy of the 
> parent.I'm also concerned about the parent thread possibly modifying the 
> wrapped properties object while it's being used by the child thread, creating 
> possible race conditions since Properties is not thread-safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to