Hi all,
We've upgraded Avro from 1.7 to 1.8, to support date/timestamp/decimal
types in the newly added Avro data source in the coming Spark 2.4, and also
to make Avro work with Parquet.
Since Avro 1.8 is not binary compatible with Avro 1.7 (see
https://issues.apache.org/jira/browse/AVRO-1502),
Hi
I would want to know more about Spark CEP (Complex Event Processing). Are there
exist some simple (but also complex) examples with input data (log files ?).
Whether Spark CEP is based on Siddhi ? If yes, it is better to use Siddhi
directly ?
I know CEP engines are intended to stream data,
Hi people,
I would like to share some of my experience in data processing using
stateful structured streaming in Apache Spark. Especially in the case when
there are problems related to OutOfMemory errors because the built-in state
store provider tries to keep all of the data in memory. So, I've
Hi,
or you could just use the structured streaming
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Regards,
Gourav Sengupta
On Tue, Aug 14, 2018 at 10:51 AM, Gerard Maas wrote:
> Hi Aakash,
>
> In Spark Streaming, forEachRDD provides you access to the data in
Hi! I'm working on a project and I'm trying to find out if I can pass data
from my zeroMQ straight into python's streaming API. I saw some links but I
didn't see anything concrete as to how to use it with python. Can anybody
please point me in the right direction?
--
Sent from:
Hi Aakash,
In Spark Streaming, forEachRDD provides you access to the data in
each micro batch.
You can transform that RDD into a DataFrame and implement the flow you
describe.
eg.:
var historyRDD:RDD[mytype] = sparkContext.emptyRDD
// create Kafka Dstream ...
dstream.foreachRDD{ rdd =>
val
Hi all,
The requirement is, to process file using Spark Streaming fed from Kafka
Topic and once all the transformations are done, make it a batch of static
dataframe and pass it into a Spark ML Model tuning.
As of now, I had been doing it in the below fashion -
1) Read the file using Kafka
2)