Re: Use Shared Variable in PySpark Executors

2018-09-22 Thread Soheil Pourbafrani
Ok, I'll do that. Thanks On Sat, Sep 22, 2018 at 7:09 PM Jörn Franke wrote: > Do you want to calculate it and share it once with all other executors? > Then a broadcast variable maybe interesting for you, > > > On 22. Sep 2018, at 16:33, Soheil Pourbafrani > wrote: > > > > Hi, I want to do

Re: Use Shared Variable in PySpark Executors

2018-09-22 Thread Jörn Franke
Do you want to calculate it and share it once with all other executors? Then a broadcast variable maybe interesting for you, > On 22. Sep 2018, at 16:33, Soheil Pourbafrani wrote: > > Hi, I want to do some processing with PySpark and save the results in a > variable of type tuple that should

Use Shared Variable in PySpark Executors

2018-09-22 Thread Soheil Pourbafrani
Hi, I want to do some processing with PySpark and save the results in a variable of type tuple that should be shared among the executors for further processing. Actually, it's a Text Mining Processing and I want to use the Vector Space Model. So I want to calculate the Vector of all Words (that

Watermarking without aggregation with Structured Streaming

2018-09-22 Thread peay
Hello, I am trying to use watermarking without aggregation, to filter out records that are just too late, instead of appending them to the output. My understanding is that aggregation is required for `withWatermark` to have any effect. Is that correct? I am looking for something along the

Structured Streaming together with Cassandra Queries

2018-09-22 Thread Martin Engen
Hello, I have a case where I am continuously getting a bunch sensor-data which is being stored into a Cassandra table (through Kafka). Every week or so, I want to manually enter additional data into the system - and I want this to trigger some calculations merging the manual entered data, and