Re: PySpark Streaming S3 checkpointing

2017-08-03 Thread Riccardo Ferrari
Hi Steve, Thank you for your answer, much appreciated. Reading the code seems that: - Python StreamingContext.getOrCreate calls Scala StreamingContextPythonHelper().tryRecoverFromCheckpoint(

Re: PySpark Streaming S3 checkpointing

2017-08-02 Thread Steve Loughran
On 2 Aug 2017, at 10:34, Riccardo Ferrari > wrote: Hi list! I am working on a pyspark streaming job (ver 2.2.0) and I need to enable checkpointing. At high level my python script goes like this: class StreamingJob(): def __init__(..): ...

PySpark Streaming S3 checkpointing

2017-08-02 Thread Riccardo Ferrari
Hi list! I am working on a pyspark streaming job (ver 2.2.0) and I need to enable checkpointing. At high level my python script goes like this: class StreamingJob(): def __init__(..): ... sparkContext._jsc.hadoopConfiguration().set('fs.s3a.access.key',)