Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread lucas.g...@gmail.com
"Building data products is a very different discipline from that of building software." That is a fundamentally incorrect assumption. There will always be a need for figuring out how to apply said principles, but saying 'we're different' has always turned out to be incorrect and I have seen no

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Steve Loughran
On 12 Apr 2017, at 17:25, Gourav Sengupta > wrote: Hi, Your answer is like saying, I know how to code in assembly level language and I am going to build the next GUI in assembly level code and I think that there is a genuine

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Gourav Sengupta
Hi, Your answer is like saying, I know how to code in assembly level language and I am going to build the next GUI in assembly level code and I think that there is a genuine functional requirement to see a color of a button in green on the screen. Perhaps it may be pertinent to read the first

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Steve Loughran
On 11 Apr 2017, at 20:46, Gourav Sengupta > wrote: And once again JAVA programmers are trying to solve a data analytics and data warehousing problem using programming paradigms. It genuinely a pain to see this happen. While I'm

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Sumona Routh
Hi Sam, I would absolutely be interested in reading a blog write-up of how you are doing this. We have pieced together a relatively decent pipeline ourselves, in jenkins, but have many kinks to work out. We also have some new requirements to start running side by side comparisons of different

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Gourav Sengupta
And once again JAVA programmers are trying to solve a data analytics and data warehousing problem using programming paradigms. It genuinely a pain to see this happen. Regards, Gourav On Tue, Apr 11, 2017 at 2:20 PM, Sam Elamin wrote: > Hi Steve > > > Thanks for the

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Sam Elamin
Hi Steve Thanks for the detailed response, I think this problem doesn't have an industry standard solution as of yet and I am sure a lot of people would benefit from the discussion I realise now what you are saying so thanks for clarifying, that said let me try and explain how we approached the

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Steve Loughran
On 7 Apr 2017, at 18:40, Sam Elamin > wrote: Definitely agree with gourav there. I wouldn't want jenkins to run my work flow. Seems to me that you would only be using jenkins for its scheduling capabilities Maybe I was just looking at

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Sam Elamin
Definitely agree with gourav there. I wouldn't want jenkins to run my work flow. Seems to me that you would only be using jenkins for its scheduling capabilities Yes you can run tests but you wouldn't want it to run your orchestration of jobs What happens if jenkijs goes down for any particular

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Gourav Sengupta
Hi Steve, Why would you ever do that? You are suggesting the use of a CI tool as a workflow and orchestration engine. Regards, Gourav Sengupta On Fri, Apr 7, 2017 at 4:07 PM, Steve Loughran wrote: > If you have Jenkins set up for some CI workflow, that can do scheduled

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Steve Loughran
If you have Jenkins set up for some CI workflow, that can do scheduled builds and tests. Works well if you can do some build test before even submitting it to a remote cluster On 7 Apr 2017, at 10:15, Sam Elamin > wrote: Hi Shyla You

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Sam Elamin
Hi Shyla You have multiple options really some of which have been already listed but let me try and clarify Assuming you have a spark application in a jar you have a variety of options You have to have an existing spark cluster that is either running on EMR or somewhere else. *Super simple /

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Gourav Sengupta
Hi Shyla, why would you want to schedule a spark job in EC2 instead of EMR? Regards, Gourav On Fri, Apr 7, 2017 at 1:04 AM, shyla deshpande wrote: > I want to run a spark batch job maybe hourly on AWS EC2 . What is the > easiest way to do this. Thanks >

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Yash Sharma
Hi Shyla, We could suggest based on what you're trying to do exactly. But with the given information - If you have your spark job ready you could schedule it via any scheduling framework like Airflow or Celery or Cron based on how simple/complex you want your work flow to be. Cheers, Yash On

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks