Is there any window operation for RDDs in Pyspark? like for DStreams

2018-11-20 Thread zakhavan
Hello, I have two RDDs and my goal is to calculate the Pearson's correlation between them using sliding window. I want to have 200 samples in each window from rdd1 and rdd2 and calculate the correlation between them and then slide the window with 120 samples and calculate the correlation between

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on wo

2018-10-09 Thread zakhavan
Hello, I'm trying to calculate the Pearson correlation between two DStreams using sliding window in Pyspark. But I keep getting the following error: Traceback (most recent call last): File "/home/zeinab/spark-2.3.1-bin-hadoop2.7/examples/src/main/python/streaming/Cross-Corr.py", line 63, in

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-04 Thread zakhavan
Thank you. It helps. Zeinab -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: How to do sliding window operation on RDDs in Pyspark?

2018-10-02 Thread zakhavan
Thank you, Taylor for your reply. The second solution doesn't work for my case since my text files are getting updated every second. Actually, my input data is live such that I'm getting 2 streams of data from 2 seismic sensors and then I write them into 2 text files for simplicity and this is

How to do sliding window operation on RDDs in Pyspark?

2018-10-02 Thread zakhavan
Hello, I have 2 text file in the following form and my goal is to calculate the Pearson correlation between them using sliding window in pyspark: 123.00 -12.00 334.00 . . . First I read these 2 text file and store them in RDD format and then I apply the window operation on each RDD but I keep

How does mapPartitions function work in Spark streaming on DStreams?

2018-08-09 Thread zakhavan
Hello, I am running a spark streaming program on a dataset which is a sequence of numbers in a text file format. I read the text file and load it into a Kafka topic and then run the Spark streaming program on the DStream and finally write the result into an output text file. But I'm getting

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan
Yes, I am loading a text file from my local machine into a kafka topic using the script below and I'd like to calculate the number of samples per second which is used by kafka consumer. if __name__ == "__main__": print("hello spark") sc = SparkContext(appName="STALTA") ssc =

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan
Hello, I just had a question. Could you refer me to a link or tell me how you calculated these logs such as: *300K msg/sec to a kafka broker, 220bytes per message * I'm load a text file with 36000 records into a kafka topic and I'd like to calculate the data rate (#samples per sec) in kafka.

Run STA/LTA python function using spark streaming: java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

2018-07-10 Thread zakhavan
Hello, I'm trying to run the sta/lta python code which I got it from obspy website using spark streaming and plot the events but I keep getting the following error! "java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute" Here is the code: