Hello,
I have two RDDs and my goal is to calculate the Pearson's correlation
between them using sliding window. I want to have 200 samples in each window
from rdd1 and rdd2 and calculate the correlation between them and then slide
the window with 120 samples and calculate the correlation between
Hello,
I'm trying to calculate the Pearson correlation between two DStreams using
sliding window in Pyspark. But I keep getting the following error:
Traceback (most recent call last):
File
"/home/zeinab/spark-2.3.1-bin-hadoop2.7/examples/src/main/python/streaming/Cross-Corr.py",
line 63, in
Thank you. It helps.
Zeinab
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Thank you, Taylor for your reply. The second solution doesn't work for my
case since my text files are getting updated every second. Actually, my
input data is live such that I'm getting 2 streams of data from 2 seismic
sensors and then I write them into 2 text files for simplicity and this is
Hello,
I have 2 text file in the following form and my goal is to calculate the
Pearson correlation between them using sliding window in pyspark:
123.00
-12.00
334.00
.
.
.
First I read these 2 text file and store them in RDD format and then I apply
the window operation on each RDD but I keep
Hello,
I am running a spark streaming program on a dataset which is a sequence of
numbers in a text file format. I read the text file and load it into a Kafka
topic and then run the Spark streaming program on the DStream and finally
write the result into an output text file. But I'm getting
Yes, I am loading a text file from my local machine into a kafka topic using
the script below and I'd like to calculate the number of samples per second
which is used by kafka consumer.
if __name__ == "__main__":
print("hello spark")
sc = SparkContext(appName="STALTA")
ssc =
Hello,
I just had a question. Could you refer me to a link or tell me how you
calculated these logs such as: *300K msg/sec to a kafka broker, 220bytes per
message * I'm load a text file with 36000 records into a kafka topic and
I'd like to calculate the data rate (#samples per sec) in kafka.
Hello,
I'm trying to run the sta/lta python code which I got it from obspy website
using spark streaming and plot the events but I keep getting the following
error!
"java.lang.IllegalArgumentException: requirement failed: No output
operations registered, so nothing to execute"
Here is the code: