subject:"clarification for some spark on yarn configuration options"

Re: clarification for some spark on yarn configuration options

2014-09-23 Thread Greg Hill

Thanks for looking into it.  I'm trying to avoid making the user pass in any 
parameters by configuring it to use the right values for the cluster size by 
default, hence my reliance on the configuration.  I'd rather just use 
spark-defaults.conf than the environment variables, and looking at the code you 
modified, I don't see any place it's picking up spark.driver.memory either.  Is 
that a separate bug?

Greg


From: Andrew Or and...@databricks.commailto:and...@databricks.com
Date: Monday, September 22, 2014 8:11 PM
To: Nishkam Ravi nr...@cloudera.commailto:nr...@cloudera.com
Cc: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com, 
user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

From browsing the code quickly I believe SPARK_DRIVER_MEMORY is not actually 
picked up in cluster mode. This is a bug and I have opened a PR to fix it: 
https://github.com/apache/spark/pull/2500.
For now, please use --driver-memory instead, which should work for both client 
and cluster mode.

Thanks for pointing this out,
-Andrew

2014-09-22 14:04 GMT-07:00 Nishkam Ravi 
nr...@cloudera.commailto:nr...@cloudera.com:
Maybe try --driver-memory if you are using spark-submit?

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill 
greg.h...@rackspace.commailto:greg.h...@rackspace.com wrote:
Ah, I see.  It turns out that my problem is that that comparison is ignoring 
SPARK_DRIVER_MEMORY and comparing to the default of 512m.  Is that a bug that's 
since fixed?  I'm on 1.0.1 and using 'yarn-cluster' as the master.  
'yarn-client' seems to pick up the values and works fine.

Greg

From: Nishkam Ravi nr...@cloudera.commailto:nr...@cloudera.com
Date: Monday, September 22, 2014 3:30 PM
To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Cc: Andrew Or and...@databricks.commailto:and...@databricks.com, 
user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org

Subject: Re: clarification for some spark on yarn configuration options

Greg, if you look carefully, the code is enforcing that the memoryOverhead be 
lower (and not higher) than spark.driver.memory.

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill 
greg.h...@rackspace.commailto:greg.h...@rackspace.com wrote:
I thought I had this all figured out, but I'm getting some weird errors now 
that I'm attempting to deploy this on production-size servers.  It's 
complaining that I'm not allocating enough memory to the memoryOverhead values. 
 I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Unless I'm reading it wrong, those checks are enforcing that you set 
spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but 
that makes no sense to me since that memory is just supposed to be what YARN 
needs on top of what you're allocating for Spark.  My understanding was that 
the overhead values should be quite a bit lower (and by default they are).

Also, why must the executor be allocated less memory than the driver's memory 
overhead value?

What am I misunderstanding here?

Greg

From: Andrew Or and...@databricks.commailto:and...@databricks.com
Date: Tuesday, September 9, 2014 5:49 PM
To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster. The 
equivalent spark.executor.instances is just another way to set the same thing 
in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added to 
spark.executor.memory for the container. In addition to the executor's 
memory, the container in which the executor is launched needs some extra memory 
for system processes, and this is what this overhead (somewhat of a misnomer) 
is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of Spark 
you are using. If you are using Spark 1.1+ (this was released very recently), 
you can directly set spark.driver.memory and this will take effect. 
Otherwise, setting this doesn't actually do anything for client deploy mode, 
and you have two alternatives: (1) set the environment variable equivalent 
SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are using Spark submit (or 
bin/spark-shell, or bin/pyspark, which go through bin/spark-submit), pass the 
--driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class path, you 
can pass the --driver-class-path to Spark submit. If you are using Spark 
1.1+, you

Re: clarification for some spark on yarn configuration options

2014-09-23 Thread Andrew Or

Yes... good find. I have filed a JIRA here:
https://issues.apache.org/jira/browse/SPARK-3661 and will get to fixing it
shortly. Both of these fixes will be available in 1.1.1. Until both of
these are merged in, it appears that the only way you can do it now is
through --driver-memory.

-Andrew

2014-09-23 7:23 GMT-07:00 Greg Hill greg.h...@rackspace.com:

Thanks for looking into it. I'm trying to avoid making the user pass in
any parameters by configuring it to use the right values for the cluster
size by default, hence my reliance on the configuration. I'd rather just
use spark-defaults.conf than the environment variables, and looking at the
code you modified, I don't see any place it's picking up
spark.driver.memory either. Is that a separate bug?

Greg

From: Andrew Or and...@databricks.com
Date: Monday, September 22, 2014 8:11 PM
To: Nishkam Ravi nr...@cloudera.com
Cc: Greg greg.h...@rackspace.com, user@spark.apache.org
user@spark.apache.org

Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

From browsing the code quickly I believe SPARK_DRIVER_MEMORY is not
actually picked up in cluster mode. This is a bug and I have opened a PR to
fix it: https://github.com/apache/spark/pull/2500.
For now, please use --driver-memory instead, which should work for both
client and cluster mode.

Thanks for pointing this out,
-Andrew

2014-09-22 14:04 GMT-07:00 Nishkam Ravi nr...@cloudera.com:

Maybe try --driver-memory if you are using spark-submit?

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill greg.h...@rackspace.com
wrote:

Ah, I see. It turns out that my problem is that that comparison is
ignoring SPARK_DRIVER_MEMORY and comparing to the default of 512m. Is that
a bug that's since fixed? I'm on 1.0.1 and using 'yarn-cluster' as the
master. 'yarn-client' seems to pick up the values and works fine.

Greg

From: Nishkam Ravi nr...@cloudera.com
Date: Monday, September 22, 2014 3:30 PM
To: Greg greg.h...@rackspace.com
Cc: Andrew Or and...@databricks.com, user@spark.apache.org
user@spark.apache.org

Subject: Re: clarification for some spark on yarn configuration options

Greg, if you look carefully, the code is enforcing that the
memoryOverhead be lower (and not higher) than spark.driver.memory.

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill greg.h...@rackspace.com
wrote:

I thought I had this all figured out, but I'm getting some weird
errors now that I'm attempting to deploy this on production-size servers.
It's complaining that I'm not allocating enough memory to the
memoryOverhead values. I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Unless I'm reading it wrong, those checks are enforcing that you set
spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but
that makes no sense to me since that memory is just supposed to be what
YARN needs on top of what you're allocating for Spark. My understanding
was that the overhead values should be quite a bit lower (and by default
they are).

Also, why must the executor be allocated less memory than the
driver's memory overhead value?

What am I misunderstanding here?

Greg

From: Andrew Or and...@databricks.com
Date: Tuesday, September 9, 2014 5:49 PM
To: Greg greg.h...@rackspace.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the
cluster. The equivalent spark.executor.instances is just another way to
set the same thing in your spark-defaults.conf. Maybe this should be
documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin
added to spark.executor.memory for the container. In addition to the
executor's memory, the container in which the executor is launched needs
some extra memory for system processes, and this is what this overhead
(somewhat of a misnomer) is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version
of Spark you are using. If you are using Spark 1.1+ (this was released very
recently), you can directly set spark.driver.memory and this will take
effect. Otherwise, setting this doesn't actually do anything for client
deploy mode, and you have two alternatives: (1) set the environment
variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
bin/spark-submit), pass the --driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class
path, you can pass the --driver-class-path to Spark submit. If you are
using Spark 1.1+, you may set spark.driver.extraClassPath

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill

I thought I had this all figured out, but I'm getting some weird errors now 
that I'm attempting to deploy this on production-size servers.  It's 
complaining that I'm not allocating enough memory to the memoryOverhead values. 
 I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Unless I'm reading it wrong, those checks are enforcing that you set 
spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but 
that makes no sense to me since that memory is just supposed to be what YARN 
needs on top of what you're allocating for Spark.  My understanding was that 
the overhead values should be quite a bit lower (and by default they are).

Also, why must the executor be allocated less memory than the driver's memory 
overhead value?

What am I misunderstanding here?

Greg

From: Andrew Or and...@databricks.commailto:and...@databricks.com
Date: Tuesday, September 9, 2014 5:49 PM
To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster. The 
equivalent spark.executor.instances is just another way to set the same thing 
in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added to 
spark.executor.memory for the container. In addition to the executor's 
memory, the container in which the executor is launched needs some extra memory 
for system processes, and this is what this overhead (somewhat of a misnomer) 
is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of Spark 
you are using. If you are using Spark 1.1+ (this was released very recently), 
you can directly set spark.driver.memory and this will take effect. 
Otherwise, setting this doesn't actually do anything for client deploy mode, 
and you have two alternatives: (1) set the environment variable equivalent 
SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are using Spark submit (or 
bin/spark-shell, or bin/pyspark, which go through bin/spark-submit), pass the 
--driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class path, you 
can pass the --driver-class-path to Spark submit. If you are using Spark 
1.1+, you may set spark.driver.extraClassPath in your spark-defaults.conf. 
There is also an environment variable you could set (SPARK_CLASSPATH), though 
this is now deprecated.

Let me know if you have more questions about these options,
-Andrew


2014-09-08 6:59 GMT-07:00 Greg Hill 
greg.h...@rackspace.commailto:greg.h...@rackspace.com:
Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster or the 
workers per slave node?

Is spark.executor.instances an actual config option?  I found that in a commit, 
but it's not in the docs.

What is the difference between spark.yarn.executor.memoryOverhead and 
spark.executor.memory ?  Same question for the 'driver' variant, but I assume 
it's the same answer.

Is there a spark.driver.memory option that's undocumented or do you have to use 
the environment variable SPARK_DRIVER_MEMORY?

What config option or environment variable do I need to set to get pyspark 
interactive to pick up the yarn class path?  The ones that work for spark-shell 
and spark-submit don't seem to work for pyspark.

Thanks in advance.

Greg

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Nishkam Ravi

Greg, if you look carefully, the code is enforcing that the memoryOverhead
be lower (and not higher) than spark.driver.memory.

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill greg.h...@rackspace.com wrote:

I thought I had this all figured out, but I'm getting some weird errors
now that I'm attempting to deploy this on production-size servers. It's
complaining that I'm not allocating enough memory to the memoryOverhead
values. I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Also, why must the executor be allocated less memory than the driver's
memory overhead value?

What am I misunderstanding here?

Greg

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster.
The equivalent spark.executor.instances is just another way to set the
same thing in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added
to spark.executor.memory for the container. In addition to the executor's
memory, the container in which the executor is launched needs some extra
memory for system processes, and this is what this overhead (somewhat of
a misnomer) is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of
Spark you are using. If you are using Spark 1.1+ (this was released very
recently), you can directly set spark.driver.memory and this will take
effect. Otherwise, setting this doesn't actually do anything for client
deploy mode, and you have two alternatives: (1) set the environment
variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
bin/spark-submit), pass the --driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class
path, you can pass the --driver-class-path to Spark submit. If you are
using Spark 1.1+, you may set spark.driver.extraClassPath in your
spark-defaults.conf. There is also an environment variable you could set
(SPARK_CLASSPATH), though this is now deprecated.

Let me know if you have more questions about these options,
-Andrew

2014-09-08 6:59 GMT-07:00 Greg Hill greg.h...@rackspace.com:

Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster
or the workers per slave node?

Is spark.executor.instances an actual config option? I found that in a
commit, but it's not in the docs.

What is the difference between spark.yarn.executor.memoryOverhead and
spark.executor.memory
? Same question for the 'driver' variant, but I assume it's the same
answer.

Is there a spark.driver.memory option that's undocumented or do you
have to use the environment variable SPARK_DRIVER_MEMORY?

What config option or environment variable do I need to set to get
pyspark interactive to pick up the yarn class path? The ones that work for
spark-shell and spark-submit don't seem to work for pyspark.

Thanks in advance.

Greg

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill

Gah, ignore me again.  I was reading the logic backwards.  For some reason it 
isn't picking up my SPARK_DRIVER_MEMORY environment variable and is using the 
default of 512m.  Probably an environmental issue.

Greg

From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Date: Monday, September 22, 2014 3:26 PM
To: Andrew Or and...@databricks.commailto:and...@databricks.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

I thought I had this all figured out, but I'm getting some weird errors now 
that I'm attempting to deploy this on production-size servers.  It's 
complaining that I'm not allocating enough memory to the memoryOverhead values. 
 I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Unless I'm reading it wrong, those checks are enforcing that you set 
spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but 
that makes no sense to me since that memory is just supposed to be what YARN 
needs on top of what you're allocating for Spark.  My understanding was that 
the overhead values should be quite a bit lower (and by default they are).

Also, why must the executor be allocated less memory than the driver's memory 
overhead value?

What am I misunderstanding here?

Greg

From: Andrew Or and...@databricks.commailto:and...@databricks.com
Date: Tuesday, September 9, 2014 5:49 PM
To: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: clarification for some spark on yarn configuration options

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster. The 
equivalent spark.executor.instances is just another way to set the same thing 
in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added to 
spark.executor.memory for the container. In addition to the executor's 
memory, the container in which the executor is launched needs some extra memory 
for system processes, and this is what this overhead (somewhat of a misnomer) 
is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of Spark 
you are using. If you are using Spark 1.1+ (this was released very recently), 
you can directly set spark.driver.memory and this will take effect. 
Otherwise, setting this doesn't actually do anything for client deploy mode, 
and you have two alternatives: (1) set the environment variable equivalent 
SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are using Spark submit (or 
bin/spark-shell, or bin/pyspark, which go through bin/spark-submit), pass the 
--driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class path, you 
can pass the --driver-class-path to Spark submit. If you are using Spark 
1.1+, you may set spark.driver.extraClassPath in your spark-defaults.conf. 
There is also an environment variable you could set (SPARK_CLASSPATH), though 
this is now deprecated.

Let me know if you have more questions about these options,
-Andrew

2014-09-08 6:59 GMT-07:00 Greg Hill 
greg.h...@rackspace.commailto:greg.h...@rackspace.com:
Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster or the 
workers per slave node?

Is spark.executor.instances an actual config option?  I found that in a commit, 
but it's not in the docs.

What is the difference between spark.yarn.executor.memoryOverhead and 
spark.executor.memory ?  Same question for the 'driver' variant, but I assume 
it's the same answer.

Is there a spark.driver.memory option that's undocumented or do you have to use 
the environment variable SPARK_DRIVER_MEMORY?

What config option or environment variable do I need to set to get pyspark 
interactive to pick up the yarn class path?  The ones that work for spark-shell 
and spark-submit don't seem to work for pyspark.

Thanks in advance.

Greg

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Nishkam Ravi

Maybe try --driver-memory if you are using spark-submit?

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill greg.h...@rackspace.com wrote:

Greg

From: Nishkam Ravi nr...@cloudera.com
Date: Monday, September 22, 2014 3:30 PM
To: Greg greg.h...@rackspace.com
Cc: Andrew Or and...@databricks.com, user@spark.apache.org
user@spark.apache.org

Subject: Re: clarification for some spark on yarn configuration options

Greg, if you look carefully, the code is enforcing that the
memoryOverhead be lower (and not higher) than spark.driver.memory.

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill greg.h...@rackspace.com
wrote:

I thought I had this all figured out, but I'm getting some weird errors
now that I'm attempting to deploy this on production-size servers. It's
complaining that I'm not allocating enough memory to the memoryOverhead
values. I tracked it down to this code:

https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70

Also, why must the executor be allocated less memory than the driver's
memory overhead value?

What am I misunderstanding here?

Greg

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster.
The equivalent spark.executor.instances is just another way to set the
same thing in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added
to spark.executor.memory for the container. In addition to the executor's
memory, the container in which the executor is launched needs some extra
memory for system processes, and this is what this overhead (somewhat of
a misnomer) is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of
Spark you are using. If you are using Spark 1.1+ (this was released very
recently), you can directly set spark.driver.memory and this will take
effect. Otherwise, setting this doesn't actually do anything for client
deploy mode, and you have two alternatives: (1) set the environment
variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
bin/spark-submit), pass the --driver-memory command line argument.

Let me know if you have more questions about these options,
-Andrew

2014-09-08 6:59 GMT-07:00 Greg Hill greg.h...@rackspace.com:

Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster
or the workers per slave node?

Is spark.executor.instances an actual config option? I found that in
a commit, but it's not in the docs.

What is the difference between spark.yarn.executor.memoryOverhead and
spark.executor.memory
? Same question for the 'driver' variant, but I assume it's the same
answer.

Is there a spark.driver.memory option that's undocumented or do you
have to use the environment variable SPARK_DRIVER_MEMORY?

Thanks in advance.

Greg

Re: clarification for some spark on yarn configuration options

2014-09-09 Thread Andrew Or

Hi Greg,

SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster. The
equivalent spark.executor.instances is just another way to set the same
thing in your spark-defaults.conf. Maybe this should be documented. :)

spark.yarn.executor.memoryOverhead is just an additional margin added to
spark.executor.memory for the container. In addition to the executor's
memory, the container in which the executor is launched needs some extra
memory for system processes, and this is what this overhead (somewhat of
a misnomer) is for. The same goes for the driver equivalent.

spark.driver.memory behaves differently depending on which version of
Spark you are using. If you are using Spark 1.1+ (this was released very
recently), you can directly set spark.driver.memory and this will take
effect. Otherwise, setting this doesn't actually do anything for client
deploy mode, and you have two alternatives: (1) set the environment
variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
bin/spark-submit), pass the --driver-memory command line argument.

If you want your PySpark application (driver) to pick up extra class path,
you can pass the --driver-class-path to Spark submit. If you are using
Spark 1.1+, you may set spark.driver.extraClassPath in your
spark-defaults.conf. There is also an environment variable you could set
(SPARK_CLASSPATH), though this is now deprecated.

Let me know if you have more questions about these options,
-Andrew


2014-09-08 6:59 GMT-07:00 Greg Hill greg.h...@rackspace.com:

  Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster
 or the workers per slave node?

  Is spark.executor.instances an actual config option?  I found that in a
 commit, but it's not in the docs.

  What is the difference between spark.yarn.executor.memoryOverhead and 
 spark.executor.memory
 ?  Same question for the 'driver' variant, but I assume it's the same
 answer.

  Is there a spark.driver.memory option that's undocumented or do you have
 to use the environment variable SPARK_DRIVER_MEMORY?

  What config option or environment variable do I need to set to get
 pyspark interactive to pick up the yarn class path?  The ones that work for
 spark-shell and spark-submit don't seem to work for pyspark.

  Thanks in advance.

  Greg

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

Re: clarification for some spark on yarn configuration options

7 matches

Site Navigation

Mail list logo

Footer information