[jira] [Commented] (SPARK-20853) spark.ui.reverseProxy=true leads to hanging communication to master

2017-08-03 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113670#comment-16113670
 ] 

Josh Bacon commented on SPARK-20853:


For the record, I'm experiencing the exact same behavior as described by 
[~tmckay]. If total number of workers + drivers exceed 9 (each with 
spark.ui.reverseProxy enabled), then the Master U.I. becomes unresponsive. 
Remove either workers or running drivers below the threshold, the Master U.I. 
will become responsive again.

> spark.ui.reverseProxy=true leads to hanging communication to master
> ---
>
> Key: SPARK-20853
> URL: https://issues.apache.org/jira/browse/SPARK-20853
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
> Environment: ppc64le GNU/Linux, POWER8, only master node is reachable 
> externally other nodes are in an internal network
>Reporter: Benno Staebler
>  Labels: network, web-ui
>
> When *reverse proxy is enabled*
> {quote}
> spark.ui.reverseProxy=true
> spark.ui.reverseProxyUrl=/
> {quote}
>  first of all any invocation of the spark master Web UI hangs forever locally 
> (e.g. http://192.168.10.16:25001) and via external URL without any data 
> received. 
> One, sometimes two spark applications succeed without error and than workers 
> start throwing exceptions:
> {quote}
> Caused by: java.io.IOException: Failed to connect to /192.168.10.16:25050
> {quote}
> The application dies during creation of SparkContext:
> {quote}
> 2017-05-22 16:11:23 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting 
> to master spark://node0101:25000...
> 2017-05-22 16:11:23 INFO  TransportClientFactory:254 - Successfully created 
> connection to node0101/192.168.10.16:25000 after 169 ms (132 ms spent in 
> bootstraps)
> 2017-05-22 16:11:43 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting 
> to master spark://node0101:25000...
> 2017-05-22 16:12:03 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting 
> to master spark://node0101:25000...
> 2017-05-22 16:12:23 ERROR StandaloneSchedulerBackend:70 - Application has 
> been killed. Reason: All masters are unresponsive! Giving up.
> 2017-05-22 16:12:23 WARN  StandaloneSchedulerBackend:66 - Application ID is 
> not initialized yet.
> 2017-05-22 16:12:23 INFO  Utils:54 - Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 25056.
> .
> Caused by: java.lang.IllegalArgumentException: requirement failed: Can only 
> call getServletHandlers on a running MetricsSystem
> {quote}
> *This definitively does not happen without reverse proxy enabled!*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16784) Configurable log4j settings

2017-04-11 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964471#comment-15964471
 ] 

Josh Bacon commented on SPARK-16784:


[~tscholak]  I have not created a follow up issue for this. For your situation 
I'd suggest trying --driver-java-options='..' instead of 
--conf='spark.driver.extraJavaOptions=...'  because the latter is applied after 
driver jvm actually starts (too late for log4j). My use-case I abandoned 
attempting to configured log4j for executors, but was able to work with 
driver/application logs in both cluster and client mode (standalone) via baking 
my log4j.properties files into my apps Uber jar resources. I think the need of 
this issue is a new feature for distributing files/log4j.properties in the 
cluster before the actual Spark Driver starts which I'd imagine is not a 
pressing enough to warrant actual development at the moment.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16784) Configurable log4j settings

2017-03-13 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907766#comment-15907766
 ] 

Josh Bacon commented on SPARK-16784:


>From what I've seen, this limitation is also experienced on the Spark 
>Standalone Cluster Manager.

There doesn't appear to be a way to write custom log4j files for driver and 
executor JVMs, on a per application/submission basis. Spark appears to 
provision configuration files (via --files option) after the driver/executor 
JVMs are already started. A work-around exists by including log4j files in the 
classpath of your application's uber jar (e.i. /src/main/resources/), and then 
appending the following spark-submit options: 

--driver-java-options 
'-Dlog4j.configuration=jar:file:your-application-uber-.jar!/your-custom-driver-log4j.properties
 -Dlog4j.debug'

Unfortunately this does not appear to work for executor log4j, because the 
executor JVM appears to start before provisioning your-application-uber.jar 
file, in the case of the driver, the provisioning takes place before driver JVM 
starts so you're able to reference the relative uber jar file path of the 
driver's working directory.
 
THIS DOESN'T WORK:
--conf 
'spark.executor.extraJavaOptions=-Dlog4j.configuration=jar:file:your-application.jar!/your-custom-executor-log4j.properties
 -Dlog4j.debug'

I'm not familiar with the internals, but if this warrants a new jira ticket, 
let me know and I can create one and will work out a proper description!
Thanks



> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-29 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785715#comment-15785715
 ] 

Josh Bacon commented on SPARK-18737:


I think this issue may be related to the following issues:
https://issues.apache.org/jira/browse/SPARK-18560
https://issues.apache.org/jira/browse/SPARK-18617




> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become more frequent over time.
> Several questions we could not answer during our troubleshooting:
> 1. How can the debug log for Kryo be enabled? -- We tried following the 
> minilog documentation, but no output can be found.
> 2. Is the serializer setting effective 

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-28 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783543#comment-15783543
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/28/16 7:39 PM:
--

Hi Sean, 
We've perform more tests and are experiencing the same issues with the 
following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7):

{code:title=Bar.scala|borderStyle=solid}
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kinesis.KinesisUtils

import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream

object StreamingFromKinesisTest {
  
  def main(args: Array[String]) {
val endpointUrl = "https://kinesis.us-west-2.amazonaws.com;;
val streamName = args(0);
val appName = args(1); //DynamoDB name
val region = "us-west-2";
val sparkSession = 
SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate();
val batchInterval = Seconds(10);
val streamingContext = new StreamingContext(sparkSession.sparkContext, 
batchInterval);
val kinesisStreams = (0 until 2).map { _ => 
  
KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2);
};
val streamOfArrayBytes = streamingContext.union(kinesisStreams);
val streamStrings = streamOfArrayBytes.map(arrayBytes => new 
String(arrayBytes));
streamStrings.foreachRDD((rddString, timestamp) => {
  println(timestamp);
  if (!rddString.isEmpty()) {
println("Success!");
  }
});
streamingContext.start();
streamingContext.awaitTerminationOrTimeout(600)
  }
}
{code}
{panel:title=Executor Log 
Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
16/12/28 11:02:40 INFO BlockManager: Removing RDD 15
16/12/28 11:02:40 INFO BlockManager: Removing RDD 13
16/12/28 11:02:40 INFO BlockManager: Removing RDD 14
16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72
16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72)
16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in 
memory (estimated size 1762.0 B, free 366.3 MB)
16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 2.6 KB, free 366.3 MB)
16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection 
to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps)
16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely
16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72)
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
13994
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at 

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-28 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783543#comment-15783543
 ] 

Josh Bacon commented on SPARK-18737:


Hi Sean, 
We've perform a more tests and are experiencing the same issues with the 
following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7):

{code:title=Bar.scala|borderStyle=solid}
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kinesis.KinesisUtils

import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream

object StreamingFromKinesisTest {
  
  def main(args: Array[String]) {
val endpointUrl = "https://kinesis.us-west-2.amazonaws.com;;
val streamName = args(0);
val appName = args(1); //DynamoDB name
val region = "us-west-2";
val sparkSession = 
SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate();
val batchInterval = Seconds(10);
val streamingContext = new StreamingContext(sparkSession.sparkContext, 
batchInterval);
val kinesisStreams = (0 until 2).map { _ => 
  
KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2);
};
val streamOfArrayBytes = streamingContext.union(kinesisStreams);
val streamStrings = streamOfArrayBytes.map(arrayBytes => new 
String(arrayBytes));
streamStrings.foreachRDD((rddString, timestamp) => {
  println(timestamp);
  if (!rddString.isEmpty()) {
println("Success!");
  }
});
streamingContext.start();
streamingContext.awaitTerminationOrTimeout(600)
  }
}
{code}
{panel:title=Executor Log 
Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
16/12/28 11:02:40 INFO BlockManager: Removing RDD 15
16/12/28 11:02:40 INFO BlockManager: Removing RDD 13
16/12/28 11:02:40 INFO BlockManager: Removing RDD 14
16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72
16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72)
16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in 
memory (estimated size 1762.0 B, free 366.3 MB)
16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 2.6 KB, free 366.3 MB)
16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection 
to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps)
16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely
16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72)
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
13994
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at 

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781450#comment-15781450
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 11:16 PM:
---

Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

We are using the library: org.apache.spark.streaming.kinesis.KinesisUtils


was (Author: jbacon):
Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781450#comment-15781450
 ] 

Josh Bacon commented on SPARK-18737:


Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be 

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781418#comment-15781418
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:33 PM:
---

My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor. We see no work-around 
currently for our streaming jobs in Spark 2.0.1.


was (Author: jbacon):
My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our streaming jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many 

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781418#comment-15781418
 ] 

Josh Bacon commented on SPARK-18737:


My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become 

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781418#comment-15781418
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:32 PM:
---

My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our streaming jobs in Spark 2.0.1.


was (Author: jbacon):
My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks