Re: Chaining Spark Streaming Jobs

2017-11-02 Thread Sunita Arvind
Sorry Michael, I ended up using kafka and missed noticing your message. Yes, I did specify the schema with read.schema and thats when I got: at

Re: Chaining Spark Streaming Jobs

2017-09-18 Thread Michael Armbrust
You specify the schema when loading a dataframe by calling spark.read.schema(...)... On Tue, Sep 12, 2017 at 4:50 PM, Sunita Arvind wrote: > Hi Michael, > > I am wondering what I am doing wrong. I get error like: > > Exception in thread "main"

Re: Chaining Spark Streaming Jobs

2017-09-13 Thread Sunita Arvind
Thanks for your suggestion Vincent. Do not have much experience with akka as such. I will explore this option. On Tue, Sep 12, 2017 at 11:01 PM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > What about chaining with akka or akka stream and the fair scheduler ? > > Le 13 sept.

Re: Chaining Spark Streaming Jobs

2017-09-13 Thread vincent gromakowski
What about chaining with akka or akka stream and the fair scheduler ? Le 13 sept. 2017 01:51, "Sunita Arvind" a écrit : Hi Michael, I am wondering what I am doing wrong. I get error like: Exception in thread "main" java.lang.IllegalArgumentException: Schema must be

Re: Chaining Spark Streaming Jobs

2017-09-12 Thread Sunita Arvind
Hi Michael, I am wondering what I am doing wrong. I get error like: Exception in thread "main" java.lang.IllegalArgumentException: Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending on the file format you may be able

Re: Chaining Spark Streaming Jobs

2017-09-08 Thread Sunita Arvind
Thanks for your response Praneeth. We did consider Kafka however cost was the only hold back factor as we might need a larger cluster and existing cluster is on premise and my app is on cloud. So the same cluster cannot be used. But I agree it does sound like a good alternative. Regards Sunita

Re: Chaining Spark Streaming Jobs

2017-09-08 Thread Praneeth Gayam
With file stream you will have to deal with the following 1. The file(s) must not be changed once created. So if the files are being continuously appended, the new data will not be read. Refer 2.

Re: Chaining Spark Streaming Jobs

2017-09-07 Thread Sunita Arvind
Thanks for your response Michael Will try it out. Regards Sunita On Wed, Aug 23, 2017 at 2:30 PM Michael Armbrust wrote: > If you use structured streaming and the file sink, you can have a > subsequent stream read using the file source. This will maintain exactly >

Re: Chaining Spark Streaming Jobs

2017-08-23 Thread Michael Armbrust
If you use structured streaming and the file sink, you can have a subsequent stream read using the file source. This will maintain exactly once processing even if there are hiccups or failures. On Mon, Aug 21, 2017 at 2:02 PM, Sunita Arvind wrote: > Hello Spark Experts,

Chaining Spark Streaming Jobs

2017-08-21 Thread Sunita Arvind
Hello Spark Experts, I have a design question w.r.t Spark Streaming. I have a streaming job that consumes protocol buffer encoded real time logs from a Kafka cluster on premise. My spark application runs on EMR (aws) and persists data onto s3. Before I persist, I need to strip header and convert