RE: What is a Block Manager?

2014-08-27 Thread Liu, Raymond
The framework have those info to manage cluster status, and these info (e.g. 
worker number) is also available through spark metrics system.
While from the user application's point of view, can you give an example why 
you need these info, what would you plan to do with them?

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [mailto:v...@paxata.com] 
Sent: Wednesday, August 27, 2014 1:40 PM
To: Liu, Raymond
Cc: user@spark.apache.org
Subject: Re: What is a Block Manager?

We're a single-app deployment so we want to launch as many executors as the 
system has workers. We accomplish this by not configuring the max for the 
application. However, is there really no way to inspect what machines/executor 
ids/number of workers/etc is available in context? I'd imagine that there'd be 
something in the SparkContext or in the listener, but all I see in the listener 
is block managers getting added and removed. Wouldn't one care about the 
workers getting added and removed at least as much as for block managers?

On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond raymond@intel.com wrote:
Basically, a Block Manager manages the storage for most of the data in spark, 
name a few: block that represent a cached RDD partition, intermediate shuffle 
data, broadcast data etc. it is per executor, while in standalone mode, 
normally, you have one executor per worker.

You don't control how many worker you have at runtime, but you can somehow 
manage how many executors your application will launch  Check different running 
mode's documentation for details  ( but control where? Hardly, yarn mode did 
some works based on data locality, but this is done by framework not user 
program).

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [mailto:v...@paxata.com]
Sent: Tuesday, August 26, 2014 11:42 PM
To: user@spark.apache.org
Subject: What is a Block Manager?

I'm curious not only about what they do, but what their relationship is to the 
rest of the system. I find that I get listener events for n block managers 
added where n is also the number of workers I have available to the 
application. Is this a stable constant?

Also, are there ways to determine at runtime how many workers I have and where 
they are?

Thanks,
Victor



Re: What is a Block Manager?

2014-08-27 Thread Victor Tso-Guillen
I have long-lived state I'd like to maintain on the executors that I'd like
to initialize during some bootstrap phase and to update the master when
such executor leaves the cluster.


On Tue, Aug 26, 2014 at 11:18 PM, Liu, Raymond raymond@intel.com
wrote:

 The framework have those info to manage cluster status, and these info
 (e.g. worker number) is also available through spark metrics system.
 While from the user application's point of view, can you give an example
 why you need these info, what would you plan to do with them?

 Best Regards,
 Raymond Liu

 From: Victor Tso-Guillen [mailto:v...@paxata.com]
 Sent: Wednesday, August 27, 2014 1:40 PM
 To: Liu, Raymond
 Cc: user@spark.apache.org
 Subject: Re: What is a Block Manager?

 We're a single-app deployment so we want to launch as many executors as
 the system has workers. We accomplish this by not configuring the max for
 the application. However, is there really no way to inspect what
 machines/executor ids/number of workers/etc is available in context? I'd
 imagine that there'd be something in the SparkContext or in the listener,
 but all I see in the listener is block managers getting added and removed.
 Wouldn't one care about the workers getting added and removed at least as
 much as for block managers?

 On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond raymond@intel.com
 wrote:
 Basically, a Block Manager manages the storage for most of the data in
 spark, name a few: block that represent a cached RDD partition,
 intermediate shuffle data, broadcast data etc. it is per executor, while in
 standalone mode, normally, you have one executor per worker.

 You don't control how many worker you have at runtime, but you can somehow
 manage how many executors your application will launch  Check different
 running mode's documentation for details  ( but control where? Hardly, yarn
 mode did some works based on data locality, but this is done by framework
 not user program).

 Best Regards,
 Raymond Liu

 From: Victor Tso-Guillen [mailto:v...@paxata.com]
 Sent: Tuesday, August 26, 2014 11:42 PM
 To: user@spark.apache.org
 Subject: What is a Block Manager?

 I'm curious not only about what they do, but what their relationship is to
 the rest of the system. I find that I get listener events for n block
 managers added where n is also the number of workers I have available to
 the application. Is this a stable constant?

 Also, are there ways to determine at runtime how many workers I have and
 where they are?

 Thanks,
 Victor




What is a Block Manager?

2014-08-26 Thread Victor Tso-Guillen
I'm curious not only about what they do, but what their relationship is to
the rest of the system. I find that I get listener events for n block
managers added where n is also the number of workers I have available to
the application. Is this a stable constant?

Also, are there ways to determine at runtime how many workers I have and
where they are?

Thanks,
Victor


RE: What is a Block Manager?

2014-08-26 Thread Liu, Raymond
Basically, a Block Manager manages the storage for most of the data in spark, 
name a few: block that represent a cached RDD partition, intermediate shuffle 
data, broadcast data etc. it is per executor, while in standalone mode, 
normally, you have one executor per worker.

You don't control how many worker you have at runtime, but you can somehow 
manage how many executors your application will launch  Check different running 
mode's documentation for details  ( but control where? Hardly, yarn mode did 
some works based on data locality, but this is done by framework not user 
program).

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [mailto:v...@paxata.com] 
Sent: Tuesday, August 26, 2014 11:42 PM
To: user@spark.apache.org
Subject: What is a Block Manager?

I'm curious not only about what they do, but what their relationship is to the 
rest of the system. I find that I get listener events for n block managers 
added where n is also the number of workers I have available to the 
application. Is this a stable constant?

Also, are there ways to determine at runtime how many workers I have and where 
they are?

Thanks,
Victor

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: What is a Block Manager?

2014-08-26 Thread Victor Tso-Guillen
We're a single-app deployment so we want to launch as many executors as the
system has workers. We accomplish this by not configuring the max for the
application. However, is there really no way to inspect what
machines/executor ids/number of workers/etc is available in context? I'd
imagine that there'd be something in the SparkContext or in the listener,
but all I see in the listener is block managers getting added and removed.
Wouldn't one care about the workers getting added and removed at least as
much as for block managers?


On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond raymond@intel.com wrote:

 Basically, a Block Manager manages the storage for most of the data in
 spark, name a few: block that represent a cached RDD partition,
 intermediate shuffle data, broadcast data etc. it is per executor, while in
 standalone mode, normally, you have one executor per worker.

 You don't control how many worker you have at runtime, but you can somehow
 manage how many executors your application will launch  Check different
 running mode's documentation for details  ( but control where? Hardly, yarn
 mode did some works based on data locality, but this is done by framework
 not user program).

 Best Regards,
 Raymond Liu

 From: Victor Tso-Guillen [mailto:v...@paxata.com]
 Sent: Tuesday, August 26, 2014 11:42 PM
 To: user@spark.apache.org
 Subject: What is a Block Manager?

 I'm curious not only about what they do, but what their relationship is to
 the rest of the system. I find that I get listener events for n block
 managers added where n is also the number of workers I have available to
 the application. Is this a stable constant?

 Also, are there ways to determine at runtime how many workers I have and
 where they are?

 Thanks,
 Victor