Hi all,

What is the correct way to schedule multiple jobs inside foreachRDD method
and importantly await on result to ensure those jobs have completed
successfully?
E.g.:

kafkaDStream.foreachRDD{ rdd =>
val rdd1 = rdd.map(...)
val rdd2 = rdd1.map(...)

val job1Future = Future{
rdd1.saveToCassandra(...)
}

val job2Future = Future{
rdd1.foreachPartition( iter => /* save to Kafka */)
}

      Await.result(
      Future.sequence(job1Future, job2Future),
      Duration.Inf)


   // commit Kafka offsets
}

In this code I'm scheduling two actions in futures and awaiting them. I
need to be sure when I commit Kafka offsets at the end of the batch
processing that job1 and job2 have actually executed successfully. Does
given approach provide these guarantees? I.e. in case one of the jobs fails
the entire batch will be marked as failed too?


Thanks,
Andrii

Reply via email to