Just an update on this,

Looking into Spark logs seems that some partitions are not found and
recomputed. Gives the impression that those are related with the delayed
updatestatebykey calls.

I'm seeing something like:
log line 1 - Partition rdd_132_1 not found, computing it
....
log line N - Found block rdd_132_1 locally
Log line N+1 - Goes into the updatestatebykey X times has many objects with
delayed update
Log line M - Done Checkpointing RDD 126 to hdfs://....

This happens for Y amount of partitions as many seconds the updatestatebykey
call is delayed.

Tnks,
Rod



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/UpdatestateByKey-assumptions-tp10858p10859.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to