I'm running a streaming job that has two calls to updateStateByKey. When run in standalone mode both calls to updateStateByKey behave as expected. When run on a cluster, however, it appears that the first call is not being checkpointed as shown in this DAG image:
http://i.imgur.com/zmQ8O2z.png The middle column continues to grow one level deeper every batch until I get a stack overflow error. I'm guessing its a problem of the stateRDD not being persisted, but I can't imagine why they wouldn't be. I thought updateStateByKey was supposed to just handle that for you internally. Any ideas? I'll post stack trace excperpts of the stack overflow if anyone is interested below: Job aborted due to stage failure: Task 7 in stage 195811.0 failed 4 times, most recent failure: Lost task 7.3 in stage 195811.0 (TID 213529, ip-10-168-177-216.ec2.internal): java.lang.StackOverflowError at java.lang.Exception.<init>(Exception.java:102) at java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:89) at java.lang.reflect.InvocationTargetException.<init>(InvocationTargetException.java:72) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1897) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) ... And scala.collection.immutable.$colon$colon in readObject at line 362 scala.collection.immutable.$colon$colon in readObject at line 366 scala.collection.immutable.$colon$colon in readObject at line 362 scala.collection.immutable.$colon$colon in readObject at line 362 scala.collection.immutable.$colon$colon in readObject at line 366 scala.collection.immutable.$colon$colon in readObject at line 362 scala.collection.immutable.$colon$colon in readObject at line 362 ...