Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-12 Thread Yan Fang
Thank you, Tathagata. That explains.

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 7:21 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 Task slot is equivalent to core number. So one core can only run one task
 at a time.

 TD


 On Fri, Jul 11, 2014 at 1:57 PM, Yan Fang yanfang...@gmail.com wrote:

 Hi Tathagata,

 Thank you. Is task slot equivalent to the core number? Or actually one
 core can run multiple tasks at the same time?

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 The same executor can be used for both receiving and processing,
 irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
 down to the number of cores / task slots that executor has. Each receiver
 is like a long running task, so each of them occupy a slot. If there are
 free slots in the executor then other tasks can be run on them.

 So if you are finding that the other tasks are being run, check how many
 cores/task slots the executor has and whether there are more task slots
 than the number of input dstream / receivers you are launching.

 @Praveen  your answers were pretty much spot on, thanks for chipping in!




 On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi Praveen,

 Thank you for the answer. That's interesting because if I only bring up
 one executor for the Spark Streaming, it seems only the receiver is
 working, no other tasks are happening, by checking the log and UI. Maybe
 it's just because the receiving task eats all the resource?, not because
 one executor can only run one receiver?

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka psel...@qubole.com
 wrote:

 Here are my answers. But am just getting started with Spark Streaming
 - so please correct me if am wrong.
 1) Yes
 2) Receivers will run on executors. Its actually a job thats submitted
 where # of tasks equals # of receivers. An executor can actually run more
 than one task at the same time. Hence you could have more number of
 receivers than executors but its not recommended I think.
 3) As said in 2, the executor where receiver task is running can be
 used for map/reduce tasks. In yarn-cluster mode, the driver program is
 actually run as application master (lives in the first container thats
 launched) and this is not an executor - hence its not used for other
 operations.
 4) the driver runs in a separate container. I think the same executor
 can be used for receiver and the processing task also (this part am not
 very sure)


  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com
 wrote:

 Hi all,

 I am working to improve the parallelism of the Spark Streaming
 application. But I have problem in understanding how the executors are 
 used
 and the application is distributed.

 1. In YARN, is one executor equal one container?

 2. I saw the statement that a streaming receiver runs on one work
 machine (*n**ote that each input DStream creates a single receiver
 (running on a worker machine) that receives a single stream of data*
 ). Does the work machine mean the executor or physical machine? If
 I have more receivers than the executors, will it still work?

 3. Is the executor that holds receiver also used for other
 operations, such as map, reduce, or fully occupied by the receiver?
 Similarly, if I run in yarn-cluster mode, is the executor running driver
 program used by other operations too?

 4. So if I have a driver program (cluster mode) and streaming
 receiver, do I have to have at least 2 executors because the program and
 streaming receiver have to be on different executors?

 Thank you. Sorry for having so many questions but I do want to
 understand how the Spark Streaming distributes in order to assign
 reasonable recourse.*_* Thank you again.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108









Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Yan Fang
Hi Praveen,

Thank you for the answer. That's interesting because if I only bring up one
executor for the Spark Streaming, it seems only the receiver is working, no
other tasks are happening, by checking the log and UI. Maybe it's just
because the receiving task eats all the resource?, not because one executor
can only run one receiver?

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka psel...@qubole.com wrote:

 Here are my answers. But am just getting started with Spark Streaming - so
 please correct me if am wrong.
 1) Yes
 2) Receivers will run on executors. Its actually a job thats submitted
 where # of tasks equals # of receivers. An executor can actually run more
 than one task at the same time. Hence you could have more number of
 receivers than executors but its not recommended I think.
 3) As said in 2, the executor where receiver task is running can be used
 for map/reduce tasks. In yarn-cluster mode, the driver program is actually
 run as application master (lives in the first container thats launched) and
 this is not an executor - hence its not used for other operations.
 4) the driver runs in a separate container. I think the same executor can
 be used for receiver and the processing task also (this part am not very
 sure)


  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi all,

 I am working to improve the parallelism of the Spark Streaming
 application. But I have problem in understanding how the executors are used
 and the application is distributed.

 1. In YARN, is one executor equal one container?

 2. I saw the statement that a streaming receiver runs on one work machine
 (*n**ote that each input DStream creates a single receiver (running on
 a worker machine) that receives a single stream of data*). Does the
 work machine mean the executor or physical machine? If I have more
 receivers than the executors, will it still work?

 3. Is the executor that holds receiver also used for other operations,
 such as map, reduce, or fully occupied by the receiver? Similarly, if I run
 in yarn-cluster mode, is the executor running driver program used by other
 operations too?

 4. So if I have a driver program (cluster mode) and streaming receiver,
 do I have to have at least 2 executors because the program and streaming
 receiver have to be on different executors?

 Thank you. Sorry for having so many questions but I do want to understand
 how the Spark Streaming distributes in order to assign reasonable
 recourse.*_* Thank you again.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108





Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Tathagata Das
The same executor can be used for both receiving and processing,
irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
down to the number of cores / task slots that executor has. Each receiver
is like a long running task, so each of them occupy a slot. If there are
free slots in the executor then other tasks can be run on them.

So if you are finding that the other tasks are being run, check how many
cores/task slots the executor has and whether there are more task slots
than the number of input dstream / receivers you are launching.

@Praveen  your answers were pretty much spot on, thanks for chipping in!




On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi Praveen,

 Thank you for the answer. That's interesting because if I only bring up
 one executor for the Spark Streaming, it seems only the receiver is
 working, no other tasks are happening, by checking the log and UI. Maybe
 it's just because the receiving task eats all the resource?, not because
 one executor can only run one receiver?

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka psel...@qubole.com
 wrote:

 Here are my answers. But am just getting started with Spark Streaming -
 so please correct me if am wrong.
 1) Yes
 2) Receivers will run on executors. Its actually a job thats submitted
 where # of tasks equals # of receivers. An executor can actually run more
 than one task at the same time. Hence you could have more number of
 receivers than executors but its not recommended I think.
 3) As said in 2, the executor where receiver task is running can be used
 for map/reduce tasks. In yarn-cluster mode, the driver program is actually
 run as application master (lives in the first container thats launched) and
 this is not an executor - hence its not used for other operations.
 4) the driver runs in a separate container. I think the same executor can
 be used for receiver and the processing task also (this part am not very
 sure)


  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi all,

 I am working to improve the parallelism of the Spark Streaming
 application. But I have problem in understanding how the executors are used
 and the application is distributed.

 1. In YARN, is one executor equal one container?

 2. I saw the statement that a streaming receiver runs on one work
 machine (*n**ote that each input DStream creates a single receiver
 (running on a worker machine) that receives a single stream of data*).
 Does the work machine mean the executor or physical machine? If I have
 more receivers than the executors, will it still work?

 3. Is the executor that holds receiver also used for other operations,
 such as map, reduce, or fully occupied by the receiver? Similarly, if I run
 in yarn-cluster mode, is the executor running driver program used by other
 operations too?

 4. So if I have a driver program (cluster mode) and streaming receiver,
 do I have to have at least 2 executors because the program and streaming
 receiver have to be on different executors?

 Thank you. Sorry for having so many questions but I do want to
 understand how the Spark Streaming distributes in order to assign
 reasonable recourse.*_* Thank you again.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108






Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Yan Fang
Hi Tathagata,

Thank you. Is task slot equivalent to the core number? Or actually one core
can run multiple tasks at the same time?

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 The same executor can be used for both receiving and processing,
 irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
 down to the number of cores / task slots that executor has. Each receiver
 is like a long running task, so each of them occupy a slot. If there are
 free slots in the executor then other tasks can be run on them.

 So if you are finding that the other tasks are being run, check how many
 cores/task slots the executor has and whether there are more task slots
 than the number of input dstream / receivers you are launching.

 @Praveen  your answers were pretty much spot on, thanks for chipping in!




 On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi Praveen,

 Thank you for the answer. That's interesting because if I only bring up
 one executor for the Spark Streaming, it seems only the receiver is
 working, no other tasks are happening, by checking the log and UI. Maybe
 it's just because the receiving task eats all the resource?, not because
 one executor can only run one receiver?

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka psel...@qubole.com
 wrote:

 Here are my answers. But am just getting started with Spark Streaming -
 so please correct me if am wrong.
 1) Yes
 2) Receivers will run on executors. Its actually a job thats submitted
 where # of tasks equals # of receivers. An executor can actually run more
 than one task at the same time. Hence you could have more number of
 receivers than executors but its not recommended I think.
 3) As said in 2, the executor where receiver task is running can be used
 for map/reduce tasks. In yarn-cluster mode, the driver program is actually
 run as application master (lives in the first container thats launched) and
 this is not an executor - hence its not used for other operations.
 4) the driver runs in a separate container. I think the same executor
 can be used for receiver and the processing task also (this part am not
 very sure)


  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com
 wrote:

 Hi all,

 I am working to improve the parallelism of the Spark Streaming
 application. But I have problem in understanding how the executors are used
 and the application is distributed.

 1. In YARN, is one executor equal one container?

 2. I saw the statement that a streaming receiver runs on one work
 machine (*n**ote that each input DStream creates a single receiver
 (running on a worker machine) that receives a single stream of data*).
 Does the work machine mean the executor or physical machine? If I have
 more receivers than the executors, will it still work?

 3. Is the executor that holds receiver also used for other operations,
 such as map, reduce, or fully occupied by the receiver? Similarly, if I run
 in yarn-cluster mode, is the executor running driver program used by other
 operations too?

 4. So if I have a driver program (cluster mode) and streaming receiver,
 do I have to have at least 2 executors because the program and streaming
 receiver have to be on different executors?

 Thank you. Sorry for having so many questions but I do want to
 understand how the Spark Streaming distributes in order to assign
 reasonable recourse.*_* Thank you again.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108







Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Tathagata Das
Task slot is equivalent to core number. So one core can only run one task
at a time.

TD


On Fri, Jul 11, 2014 at 1:57 PM, Yan Fang yanfang...@gmail.com wrote:

 Hi Tathagata,

 Thank you. Is task slot equivalent to the core number? Or actually one
 core can run multiple tasks at the same time?

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 The same executor can be used for both receiving and processing,
 irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
 down to the number of cores / task slots that executor has. Each receiver
 is like a long running task, so each of them occupy a slot. If there are
 free slots in the executor then other tasks can be run on them.

 So if you are finding that the other tasks are being run, check how many
 cores/task slots the executor has and whether there are more task slots
 than the number of input dstream / receivers you are launching.

 @Praveen  your answers were pretty much spot on, thanks for chipping in!




 On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi Praveen,

 Thank you for the answer. That's interesting because if I only bring up
 one executor for the Spark Streaming, it seems only the receiver is
 working, no other tasks are happening, by checking the log and UI. Maybe
 it's just because the receiving task eats all the resource?, not because
 one executor can only run one receiver?

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108


 On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka psel...@qubole.com
 wrote:

 Here are my answers. But am just getting started with Spark Streaming -
 so please correct me if am wrong.
 1) Yes
 2) Receivers will run on executors. Its actually a job thats submitted
 where # of tasks equals # of receivers. An executor can actually run more
 than one task at the same time. Hence you could have more number of
 receivers than executors but its not recommended I think.
 3) As said in 2, the executor where receiver task is running can be
 used for map/reduce tasks. In yarn-cluster mode, the driver program is
 actually run as application master (lives in the first container thats
 launched) and this is not an executor - hence its not used for other
 operations.
 4) the driver runs in a separate container. I think the same executor
 can be used for receiver and the processing task also (this part am not
 very sure)


  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com
 wrote:

 Hi all,

 I am working to improve the parallelism of the Spark Streaming
 application. But I have problem in understanding how the executors are 
 used
 and the application is distributed.

 1. In YARN, is one executor equal one container?

 2. I saw the statement that a streaming receiver runs on one work
 machine (*n**ote that each input DStream creates a single receiver
 (running on a worker machine) that receives a single stream of data*).
 Does the work machine mean the executor or physical machine? If I have
 more receivers than the executors, will it still work?

 3. Is the executor that holds receiver also used for other operations,
 such as map, reduce, or fully occupied by the receiver? Similarly, if I 
 run
 in yarn-cluster mode, is the executor running driver program used by other
 operations too?

 4. So if I have a driver program (cluster mode) and streaming
 receiver, do I have to have at least 2 executors because the program and
 streaming receiver have to be on different executors?

 Thank you. Sorry for having so many questions but I do want to
 understand how the Spark Streaming distributes in order to assign
 reasonable recourse.*_* Thank you again.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108








How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-10 Thread Yan Fang
Hi all,

I am working to improve the parallelism of the Spark Streaming application.
But I have problem in understanding how the executors are used and the
application is distributed.

1. In YARN, is one executor equal one container?

2. I saw the statement that a streaming receiver runs on one work machine (
*n**ote that each input DStream creates a single receiver (running on a
worker machine) that receives a single stream of data*). Does the work
machine mean the executor or physical machine? If I have more receivers
than the executors, will it still work?

3. Is the executor that holds receiver also used for other operations, such
as map, reduce, or fully occupied by the receiver? Similarly, if I run in
yarn-cluster mode, is the executor running driver program used by other
operations too?

4. So if I have a driver program (cluster mode) and streaming receiver, do
I have to have at least 2 executors because the program and streaming
receiver have to be on different executors?

Thank you. Sorry for having so many questions but I do want to understand
how the Spark Streaming distributes in order to assign reasonable
recourse.*_* Thank you again.

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108