Structured Streaming together with Cassandra Queries

2018-09-22 Thread Martin Engen
Hello,

I have a case where I am continuously getting a bunch sensor-data which is 
being stored into a Cassandra table (through Kafka). Every week or so, I want 
to manually enter additional data into the system - and I want this to trigger 
some calculations merging the manual entered data, and the weeks worth of 
streaming sensor-data.

Is there a way to make dynamic Cassandra queries based on data coming into 
Spark?

example: Pressure sensors are being continuously stored into Cassandra, and I 
enter a weeks worth of temperatures into the system at the end of the week (1 
day/row at a time).
I want each of these rows to trigger queries to Cassandra to get the pressures 
for every specific day, and do some calculations on this.

I have been looking at using Structured Streaming with the 
Cassandra-spark-connector, but I cant find  a way to take data from a row in 
structured streaming into account on the query being made. And I seem to have 
to query for 'everything', and then filter in Spark.

Any ideas or tips for how to solve this?


Re: Structured Streaming, Reading and Updating a variable

2018-05-16 Thread Martin Engen
Executor.scala:338)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)


Any ideas about how to handle this error?


Thanks,
Martin Engen

From: Lalwani, Jayesh <jayesh.lalw...@capitalone.com>
Sent: Tuesday, May 15, 2018 9:59 PM
To: Martin Engen; user@spark.apache.org
Subject: Re: Structured Streaming, Reading and Updating a variable


Do you have a code sample, and detailed error message/exception to show?



From: Martin Engen <martin.en...@outlook.com>
Date: Tuesday, May 15, 2018 at 9:24 AM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Structured Streaming, Reading and Updating a variable



Hello,



I'm working with Structured Streaming, and I need a method of keeping a running 
average based on last 24hours of data.

To help with this, I can use Exponential Smoothing, which means I really only 
need to store 1 value from a previous calculation into the new, and update this 
variable as calculations carry on.



Implementing this is a much bigger challenge then I ever imagined.





I've tried using Accumulators and to Query/Store data to Cassandra after every 
calculation. Both methods worked somewhat locally , but I don't seem to be able 
to use these in the Spark Worker Nodes,  as I get the error

"java.lang.NoClassDefFoundError: Could not initialize class error" both for the 
accumulator and the cassandra connection libary



How can you read/update a variable while doing calculations using Structured 
Streaming?



Thank you







The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Structured Streaming, Reading and Updating a variable

2018-05-15 Thread Martin Engen
Hello,

I'm working with Structured Streaming, and I need a method of keeping a running 
average based on last 24hours of data.
To help with this, I can use Exponential Smoothing, which means I really only 
need to store 1 value from a previous calculation into the new, and update this 
variable as calculations carry on.

Implementing this is a much bigger challenge then I ever imagined.


I've tried using Accumulators and to Query/Store data to Cassandra after every 
calculation. Both methods worked somewhat locally , but I don't seem to be able 
to use these in the Spark Worker Nodes,  as I get the error
"java.lang.NoClassDefFoundError: Could not initialize class error" both for the 
accumulator and the cassandra connection libary

How can you read/update a variable while doing calculations using Structured 
Streaming?

Thank you