subject:"RE\: How to do sliding window operation on RDDs in Pyspark\?"

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-04 Thread zakhavan

Thank you. It helps. Zeinab -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-02 Thread Taylor Cox

Have a look at this guide here: https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html You should be able to send your sensor data to a Kafka topic, which Spark will subscribe to. You may need to use an Input DStream to connect

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-02 Thread zakhavan

Thank you, Taylor for your reply. The second solution doesn't work for my case since my text files are getting updated every second. Actually, my input data is live such that I'm getting 2 streams of data from 2 seismic sensors and then I write them into 2 text files for simplicity and this is

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-02 Thread Taylor Cox

Hey Zeinab, We may have to take a small step back here. The sliding window approach (ie: the window operation) is unique to Data stream mining. So it makes sense that window() is restricted to DStream. It looks like you're not using a stream mining approach. From what I can see in your code,