[SPARK-24771] Upgrade AVRO version from 1.7.7 to 1.8

2018-08-14 Thread Wenchen Fan
Hi all, We've upgraded Avro from 1.7 to 1.8, to support date/timestamp/decimal types in the newly added Avro data source in the coming Spark 2.4, and also to make Avro work with Parquet. Since Avro 1.8 is not binary compatible with Avro 1.7 (see https://issues.apache.org/jira/browse/AVRO-1502),

Spark CEP

2018-08-14 Thread Esa Heikkinen
Hi I would want to know more about Spark CEP (Complex Event Processing). Are there exist some simple (but also complex) examples with input data (log files ?). Whether Spark CEP is based on Siddhi ? If yes, it is better to use Siddhi directly ? I know CEP engines are intended to stream data,

Custom state store provider based on RocksDB

2018-08-14 Thread Alexander Chermenin
Hi people, I would like to share some of my experience in data processing using stateful structured streaming in Apache Spark. Especially in the case when there are problems related to OutOfMemory errors because the built-in state store provider tries to keep all of the data in memory. So, I've

Re: How to convert Spark Streaming to Static Dataframe on the fly and pass it to a ML Model as batch

2018-08-14 Thread Gourav Sengupta
Hi, or you could just use the structured streaming https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Regards, Gourav Sengupta On Tue, Aug 14, 2018 at 10:51 AM, Gerard Maas wrote: > Hi Aakash, > > In Spark Streaming, forEachRDD provides you access to the data in

Sending data from ZeroMQ to Spark Streaming API with Python

2018-08-14 Thread oreogundipe
Hi! I'm working on a project and I'm trying to find out if I can pass data from my zeroMQ straight into python's streaming API. I saw some links but I didn't see anything concrete as to how to use it with python. Can anybody please point me in the right direction? -- Sent from:

Re: How to convert Spark Streaming to Static Dataframe on the fly and pass it to a ML Model as batch

2018-08-14 Thread Gerard Maas
Hi Aakash, In Spark Streaming, forEachRDD provides you access to the data in each micro batch. You can transform that RDD into a DataFrame and implement the flow you describe. eg.: var historyRDD:RDD[mytype] = sparkContext.emptyRDD // create Kafka Dstream ... dstream.foreachRDD{ rdd => val

How to convert Spark Streaming to Static Dataframe on the fly and pass it to a ML Model as batch

2018-08-14 Thread Aakash Basu
Hi all, The requirement is, to process file using Spark Streaming fed from Kafka Topic and once all the transformations are done, make it a batch of static dataframe and pass it into a Spark ML Model tuning. As of now, I had been doing it in the below fashion - 1) Read the file using Kafka 2)