Re: spark on yarn-standalone, throws StackOverflowError and fails somtimes and succeed for the rest

2014-05-16 Thread Xiangrui Meng
Could you try `println(result.toDebugString())` right after `val
result = ...` and attach the result? -Xiangrui

On Fri, May 9, 2014 at 8:20 AM, phoenix bai mingzhi...@gmail.com wrote:
 after a couple of tests, I find that, if I use:

 val result = model.predict(prdctpairs)
 result.map(x =
 x.user+,+x.product+,+x.rating).saveAsTextFile(output)

 it always fails with above error and the exception seems iterative.

 but if I do:

 val result = model.predict(prdctpairs)
 result.cach()
 result.map(x =
 x.user+,+x.product+,+x.rating).saveAsTextFile(output)

 it succeeds.

 could anyone help explain why the cach() is necessary?

 thanks



 On Fri, May 9, 2014 at 6:45 PM, phoenix bai mingzhi...@gmail.com wrote:

 Hi all,

 My spark code is running on yarn-standalone.

 the last three lines of the code as below,

 val result = model.predict(prdctpairs)
 result.map(x =
 x.user+,+x.product+,+x.rating).saveAsTextFile(output)
 sc.stop()

 the same code, sometimes be able to run successfully and could give out
 the right result, while from time to time, it throws StackOverflowError and
 fail.

 and  I don`t have a clue how I should debug.

 below is the error, (the start and end portion to be exact):


 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 44 to sp...@rxx43.mc10.site.net:43885
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
 MapOutputTrackerMaster: Size of output statuses for shuffle 44 is 148 bytes
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 45 to sp...@rxx43.mc10.site.net:43885
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35]
 MapOutputTrackerMaster: Size of output statuses for shuffle 45 is 453 bytes
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-20]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 44 to sp...@rxx43.mc10.site.net:56767
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 45 to sp...@rxx43.mc10.site.net:56767
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 44 to sp...@rxx43.mc10.site.net:49879
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
 MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
 45 to sp...@rxx43.mc10.site.net:49879
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
 TaskSetManager: Starting task 946.0:17 as TID 146 on executor 6:
 rx15.mc10.site.net (PROCESS_LOCAL)
 14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
 TaskSetManager: Serialized task 946.0:17 as 6414 bytes in 0 ms
 14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Lost TID
 133 (task 946.0:4)
 14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Loss was
 due to java.lang.StackOverflowError
 java.lang.StackOverflowError
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)

 

 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at
 

spark on yarn-standalone, throws StackOverflowError and fails somtimes and succeed for the rest

2014-05-14 Thread phoenix bai
Hi all,

My spark code is running on yarn-standalone.

the last three lines of the code as below,

val result = model.predict(prdctpairs)
result.map(x =
x.user+,+x.product+,+x.rating).saveAsTextFile(output)
sc.stop()

the same code, sometimes be able to run successfully and could give out the
right result, while from time to time, it throws StackOverflowError and
fail.

and  I don`t have a clue how I should debug.

below is the error, (the start and end portion to be exact):


14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
44 to sp...@rxx43.mc10.site.net:43885
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
MapOutputTrackerMaster: Size of output statuses for shuffle 44 is 148 bytes
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
45 to sp...@rxx43.mc10.site.net:43885
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-35]
MapOutputTrackerMaster: Size of output statuses for shuffle 45 is 453 bytes
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-20]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
44 to sp...@rxx43.mc10.site.net:56767
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
45 to sp...@rxx43.mc10.site.net:56767
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
44 to sp...@rxx43.mc10.site.net:49879
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-29]
MapOutputTrackerMasterActor: Asked to send map output locations for shuffle
45 to sp...@rxx43.mc10.site.net:49879
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
TaskSetManager: Starting task 946.0:17 as TID 146 on executor 6:
rx15.mc10.site.net (PROCESS_LOCAL)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-17]
TaskSetManager: Serialized task 946.0:17 as 6414 bytes in 0 ms
14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Lost TID
133 (task 946.0:4)
14-05-09 17:55:51 WARN [Result resolver thread-0] TaskSetManager: Loss was
due to java.lang.StackOverflowError
java.lang.StackOverflowError
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)



at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5]
TaskSetManager: Starting task 946.0:4 as TID 147 on executor 6:
r15.mc10.site.net (PROCESS_LOCAL)
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5]
TaskSetManager: Serialized task 946.0:4 as 6414 bytes in 0 ms
14-05-09 17:55:51 WARN [Result resolver thread-1] TaskSetManager: Lost TID
139 (task 946.0:10)
14-05-09 17:55:51 INFO [Result resolver thread-1] TaskSetManager: Loss was
due to java.lang.StackOverflowError [duplicate 1]
14-05-09 17:55:51 INFO [spark-akka.actor.default-dispatcher-5]
CoarseGrainedSchedulerBackend: Executor 4 disconnected, so removing it
14-05-09 17:55:51 ERROR