Re: Testing spark with AWS spot instances

2016-03-27 Thread Alexander Pivovarov
I use spot instances for 100 slaves cluster (r3.2xlarge on us-west-1)
Jobs I run usually take about 15 hours - cluster is stable and fast. 1-2
computers might be terminated but it's very rare event and Spark can handle
it.

On Fri, Mar 25, 2016 at 6:28 PM, Sven Krasser  wrote:

> When a spot instance terminates, you lose all data (RDD partitions) stored
> in the executors that ran on that instance. Spark can recreate the
> partitions from input data, but if that requires going through multiple
> preceding shuffles a good chunk of the job will need to be redone.
> -Sven
>
> On Thu, Mar 24, 2016 at 10:15 PM, Dillian Murphey  > wrote:
>
>> I'm very new to apache spark. I'm just a user not a developer.
>>
>> I'm running a cluster with many spot instances. Am I correct in
>> understanding that spark can handle an unlimited number of spot instance
>> failures and restarts?  Sometimes all the spot instances will dissapear
>> without warning, and then they come back.  Can I trust spark to pickup all
>> jobs where it left off?
>>
>> I'm noticing some instability with my system. I'm suspecting it could be
>> disk or RAM issues.  When I add a lot of slaves I run low on RAM on my
>> master.  Maybe that's part of the problem. But jut want to confirm my
>> understanding.
>>
>
>
>
> --
> www.skrasser.com 
>


Re: Testing spark with AWS spot instances

2016-03-25 Thread Sven Krasser
When a spot instance terminates, you lose all data (RDD partitions) stored
in the executors that ran on that instance. Spark can recreate the
partitions from input data, but if that requires going through multiple
preceding shuffles a good chunk of the job will need to be redone.
-Sven

On Thu, Mar 24, 2016 at 10:15 PM, Dillian Murphey 
wrote:

> I'm very new to apache spark. I'm just a user not a developer.
>
> I'm running a cluster with many spot instances. Am I correct in
> understanding that spark can handle an unlimited number of spot instance
> failures and restarts?  Sometimes all the spot instances will dissapear
> without warning, and then they come back.  Can I trust spark to pickup all
> jobs where it left off?
>
> I'm noticing some instability with my system. I'm suspecting it could be
> disk or RAM issues.  When I add a lot of slaves I run low on RAM on my
> master.  Maybe that's part of the problem. But jut want to confirm my
> understanding.
>



-- 
www.skrasser.com 


Testing spark with AWS spot instances

2016-03-24 Thread Dillian Murphey
I'm very new to apache spark. I'm just a user not a developer.

I'm running a cluster with many spot instances. Am I correct in
understanding that spark can handle an unlimited number of spot instance
failures and restarts?  Sometimes all the spot instances will dissapear
without warning, and then they come back.  Can I trust spark to pickup all
jobs where it left off?

I'm noticing some instability with my system. I'm suspecting it could be
disk or RAM issues.  When I add a lot of slaves I run low on RAM on my
master.  Maybe that's part of the problem. But jut want to confirm my
understanding.