Hi, Thank you for all those answers. The below is code I am trying out
val records = sparkSession.read.format("csv").stream("/tmp/input") val re = records.write.format("parquet").trigger(ProcessingTime(100.seconds)). option("checkpointLocation", "/tmp/checkpoint") .startStream("/tmp/output") re.awaitTermination() In above code, I assume batch size is 100 seconds? But it doesn't seems to be that way. On Fri, May 6, 2016 at 3:14 PM, Sachin Aggarwal <different.sac...@gmail.com> wrote: > Hi Madhukara, > > What I understood from the code is that when ever runBatch return they > trigger constructBatch so whatever is processing time for a batch will be > ur batch time if u dnt specify a trigger. > > one flaw which i think in this is if your processing time keeps increasing > with amount of data , then this batch interval keeps on increasing, they > must put some boundary or some logic to block to prevent such case. > > here is one jira which i found related to this:- > https://github.com/apache/spark/pull/12725 > > > On Fri, May 6, 2016 at 2:50 PM, Deepak Sharma <deepakmc...@gmail.com> > wrote: > >> With Structured Streaming ,Spark would provide apis over spark sql engine. >> Its like once you have the structured stream and dataframe created out of >> this , you can do ad-hoc querying on the DF , which means you are actually >> querying the stram without having to store or transform. >> I have not used it yet but seems it will be like start streaming data >> from source as son as you define it. >> >> Thanks >> Deepak >> >> >> On Fri, May 6, 2016 at 1:37 PM, madhu phatak <phatak....@gmail.com> >> wrote: >> >>> Hi, >>> As I was playing with new structured streaming API, I noticed that spark >>> starts processing as and when the data appears. It's no more seems like >>> micro batch processing. Is spark structured streaming will be an event >>> based processing? >>> >>> -- >>> Regards, >>> Madhukara Phatak >>> http://datamantra.io/ >>> >> >> >> >> -- >> Thanks >> Deepak >> www.bigdatabig.com >> www.keosha.net >> > > > > -- > > Thanks & Regards > > Sachin Aggarwal > 7760502772 > -- Regards, Madhukara Phatak http://datamantra.io/