[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175385#comment-16175385
 ] 

Apache Spark commented on SPARK-21928:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/19313

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: John Brock
>Assignee: Imran Rashid
> Fix For: 2.2.1, 2.3.0
>
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:73)
> at 
> org.apache.spark.memory.StaticMemoryManager.acquireStorageMemory(StaticMemoryManager.scala:72)
> at 
> org.apache.spark.storage.memory.MemoryStore.putBytes(MemoryStore.scala:147)
> at 
> org.apache.spark.storage.BlockManager.maybeCacheDiskBytesInMemory(BlockManager.scala:1143)
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:594)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:353)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)
> at 

[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-21 Thread John Brock (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175066#comment-16175066
 ] 

John Brock commented on SPARK-21928:


It does! I see this in the log right before an executor got stuck:
{{17/08/31 19:56:25 INFO MemoryStore: 3 blocks selected for dropping (284.2 MB 
bytes)}}

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: John Brock
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:73)
> at 
> org.apache.spark.memory.StaticMemoryManager.acquireStorageMemory(StaticMemoryManager.scala:72)
> at 
> org.apache.spark.storage.memory.MemoryStore.putBytes(MemoryStore.scala:147)
> at 
> org.apache.spark.storage.BlockManager.maybeCacheDiskBytesInMemory(BlockManager.scala:1143)
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:594)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:353)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)
> at 

[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-20 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174247#comment-16174247
 ] 

Imran Rashid commented on SPARK-21928:
--

I believe I figured out why things get stuck sometimes, filed SPARK-22083.  
[~jbrock], if you happen to still have logs from a case where you see this, can 
you check if the stuck executor shows something like "INFO MemoryStore: 2 
blocks selected for dropping" (or maybe more than 2)?

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: John Brock
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:73)
> at 
> org.apache.spark.memory.StaticMemoryManager.acquireStorageMemory(StaticMemoryManager.scala:72)
> at 
> org.apache.spark.storage.memory.MemoryStore.putBytes(MemoryStore.scala:147)
> at 
> org.apache.spark.storage.BlockManager.maybeCacheDiskBytesInMemory(BlockManager.scala:1143)
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:594)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:353)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
> at 
> 

[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-19 Thread John Brock (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172123#comment-16172123
 ] 

John Brock commented on SPARK-21928:


[~irashid], thanks for taking a look. I see the same thing as that user -- the 
executor gets stuck, causing the application to fail. Before I found the 
workaround I mentioned above with spark.driver.extraClassPath and 
spark.executor.extraClassPath, I was using speculation to kill off the hanging 
tasks, although this wasn't always enough (e.g., if a stage got stuck before 
reaching the speculation threshold), and sometimes caused long-running but 
non-stuck tasks to be killed.

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: John Brock
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:73)
> at 
> org.apache.spark.memory.StaticMemoryManager.acquireStorageMemory(StaticMemoryManager.scala:72)
> at 
> org.apache.spark.storage.memory.MemoryStore.putBytes(MemoryStore.scala:147)
> at 
> org.apache.spark.storage.BlockManager.maybeCacheDiskBytesInMemory(BlockManager.scala:1143)
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:594)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:559)
> at 
> 

[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172115#comment-16172115
 ] 

Apache Spark commented on SPARK-21928:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/19280

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: John Brock
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92)
> at 
> org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:73)
> at 
> org.apache.spark.memory.StaticMemoryManager.acquireStorageMemory(StaticMemoryManager.scala:72)
> at 
> org.apache.spark.storage.memory.MemoryStore.putBytes(MemoryStore.scala:147)
> at 
> org.apache.spark.storage.BlockManager.maybeCacheDiskBytesInMemory(BlockManager.scala:1143)
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:594)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:559)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:559)
> at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:353)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
> at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at 
> 

[jira] [Commented] (SPARK-21928) ML LogisticRegression training occasionally produces java.lang.ClassNotFoundException when attempting to load custom Kryo registrator class

2017-09-19 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171715#comment-16171715
 ] 

Imran Rashid commented on SPARK-21928:
--

Hi [~jbrock],

thanks for reporting this.  I have another bug report which I think is similar, 
I don't think this actually has anything to do with ML, it can happen anytime 
cached RDDs get sent over the network (perhaps that is a bit more likely to 
happen with ML workloads).

I do have one question for you, though -- in the other bug report I have, the 
user says that when they hit this error, the executor gets stuck, which 
eventually leads to the application failing.  However, I haven't been able to 
reproduce that so far -- there are some errors, but from what I see, the 
executor just gives up fetching the remote data, and regenerates it locally.  
What have you observed when this happens?

more details:

>From SPARK-13990 & SPARK-13926, Spark's SerializerManager has [its own 
>instance of a 
>KryoSerializer|https://github.com/apache/spark/blob/581200af717bcefd11c9930ac063fe53c6fd2fde/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L42]
> which does not have the defaultClassLoader set on it. For normal task 
>execution, that doesn't cause problems, because the serializer falls back to 
>the current thread's task loader, which is set anyway.

however, netty maintains its own thread pool, and those threads don't change 
their classloader to include the extra use jars needed for the custom kryo 
registrator. That only matters when cached RDDs are sent across the network. 
That won't happen often, because spark tries to execute tasks where the RDDs 
are already cached.  (You'll notice your stack trace includes netty stuff 
underneath.)

This doesn't effect the shuffle path, because the serde is never done in the 
threads created by netty.

I think a fix for this should be fairly straight-forward, we just need to set 
the classloader on that extra kryo instance.

> ML LogisticRegression training occasionally produces 
> java.lang.ClassNotFoundException when attempting to load custom Kryo 
> registrator class
> ---
>
> Key: SPARK-21928
> URL: https://issues.apache.org/jira/browse/SPARK-21928
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: John Brock
>
> I unfortunately can't reliably reproduce this bug; it happens only 
> occasionally, when training a logistic regression model with very large 
> datasets. The training will often proceed through several {{treeAggregate}} 
> calls without any problems, and then suddenly workers will start running into 
> this {{java.lang.ClassNotFoundException}}.
> After doing some debugging, it seems that whenever this error happens, Spark 
> is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} 
> instance instead of the usual 
> {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} 
> can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't.
> When this error does pop up, it's usually accompanied by the task seeming to 
> hang, and I need to kill Spark manually.
> I'm running a Spark application in cluster mode via spark-submit, and I have 
> a custom Kryo registrator. The JAR is built with {{sbt assembly}}.
> Exception message:
> {noformat}
> 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block 
> StreamChunkId{streamId=542074019336, chunkIndex=0} for request from 
> /10.0.29.65:34332
> org.apache.spark.SparkException: Failed to register classes with Kryo
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277)
> at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377)
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377)
> at 
> org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524)
> at 
> org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545)
> at 
>