Re: delay between removing the block manager of an executor, and marking that as lost

2015-03-04 Thread Akhil Das
You can look at the following

- spark.akka.timeout
- spark.akka.heartbeat.pauses

from http://spark.apache.org/docs/1.2.0/configuration.html

Thanks
Best Regards

On Tue, Mar 3, 2015 at 4:46 PM, twinkle sachdeva twinkle.sachd...@gmail.com
 wrote:

 Hi,

 Is there any relation between removing block manager of an executor and
 marking that as lost?

 In my setup,even after removing block manager ( after failing to do some
 operation )...it is taking more than 20 mins, to mark that as lost executor.

 Following are the logs:

 *15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
 broadcast 20 with removeFromMaster = true - Ask timed out on
 [Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
 after [3 ms]}*

 *15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
 BlockManager BlockManagerId(1, TMO-DN73, 4) with no recent heart beats:
 76924ms exceeds 45000ms*

 *15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
 manager BlockManagerId(1, TMO-DN73, 4)*

 *15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
 TMO-DN73: remote Akka client disassociated*

 How can i make this to happen faster?

 Thanks,
 Twinkle



delay between removing the block manager of an executor, and marking that as lost

2015-03-03 Thread twinkle sachdeva
Hi,

Is there any relation between removing block manager of an executor and
marking that as lost?

In my setup,even after removing block manager ( after failing to do some
operation )...it is taking more than 20 mins, to mark that as lost executor.

Following are the logs:

*15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
broadcast 20 with removeFromMaster = true - Ask timed out on
[Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
after [3 ms]}*

*15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(1, TMO-DN73, 4) with no recent heart beats:
76924ms exceeds 45000ms*

*15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
manager BlockManagerId(1, TMO-DN73, 4)*

*15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
TMO-DN73: remote Akka client disassociated*

How can i make this to happen faster?

Thanks,
Twinkle