I'm receiving the SPARK-5063 error (RDD transformations and actions can
only be invoked by the driver, not inside of other transformations)
whenever I try and restore from a checkpoint in spark streaming on my app.

I'm using python3 and my RDDs are inside a queuestream DStream.

This is the little chunk of code causing issues:

-----

p_batches = [sc.parallelize(batch) for batch in task_batches]

sieving_tasks = ssc.queueStream(p_batches)
sieving_tasks.checkpoint(20)
relations = sieving_tasks.map(lambda s: run_sieving_command(s, poly,
poly_path, fb_paths))
relations.reduce(lambda a, b: (a[0] | b[0], a[1] + b[1])
).foreachRDD(lambda s: update_state(out_files, counts, s))
ssc.checkpoint(s3n_path)

-----

Thanks again!



--

Shaanan Cohney
PhD Student
University of Pennsylvania


shaan...@gmail.com

Reply via email to