Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Neil Maheshwari
Thank you! I will look at the repository > On Feb 19, 2017, at 2:13 PM, Sam Elamin wrote: > > just doing a bit of research, seems weve been beaten to the punch, theres > already a connector you can use here > > Give it a go and feel free to give the commiter feedback

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Sam Elamin
just doing a bit of research, seems weve been beaten to the punch, theres already a connector you can use here Give it a go and feel free to give the commiter feedback or better yet send some PRs if it needs them :) On Sun, Feb 19, 2017

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Sam Elamin
Hey Neil No worries! Happy to help you write it if you want, just link me to the repo and we can write it together Would be fun! Regards Sam On Sun, 19 Feb 2017 at 21:21, Neil Maheshwari wrote: > Thanks for the advice Sam. I will look into implementing a

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Neil Maheshwari
Thanks for the advice Sam. I will look into implementing a structured streaming connector. > On Feb 19, 2017, at 11:54 AM, Sam Elamin wrote: > > HI Niel, > > My advice would be to write a structured streaming connector. The new > structured streaming APIs were

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Sam Elamin
HI Niel, My advice would be to write a structured streaming connector. The new structured streaming APIs were brought in to handle exactly the issues you describe See this blog There isnt a structured streaming

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Neil Maheshwari
Thanks for your response Ayan. This could be an option. One complication I see with that approach is that I do not want to miss any records that are between the data we have batched to the data store and the checkpoint. I would still need a mechanism for recording the sequence number of the

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-19 Thread Patrick
Hi, Thanks all, I checked with both the approaches, grouping sets worked better for me, because i didn't want to cache it as i am specifying large fraction of memory to Shuffle operation. However, i could only do grouping sets using HiveContext. I am using Spark 1.5 and I think SQLContext doesnt

Re: [Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread ayan guha
Hi AFAIK, Kinesis does not provide any mechanism other than check point to restart. That makes sense as it makes it so generic. Question: why cant you warm up your data from a data store? Say every 30 mins you run a job to aggregate your data to a data store for that hour. When you restart the

[Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Neil Maheshwari
Hello, I am building a Spark streaming application that ingests data from an Amazon Kinesis stream. My application keeps track of the minimum price over a window for groups of similar tickets. When I deploy the application, I would like it to start processing at the start of the previous