Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Andrew Or
Yeah, we could make it a log a warning instead.

2015-07-15 14:29 GMT-07:00 Kelly, Jonathan :

>  Thanks! Is there an existing JIRA I should watch?
>
>
>  ~ Jonathan
>
>   From: Sandy Ryza 
> Date: Wednesday, July 15, 2015 at 2:27 PM
> To: Jonathan Kelly 
> Cc: "user@spark.apache.org" 
> Subject: Re: Unable to use dynamicAllocation if spark.executor.instances
> is set in spark-defaults.conf
>
>   Hi Jonathan,
>
>  This is a problem that has come up for us as well, because we'd like
> dynamic allocation to be turned on by default in some setups, but not break
> existing users with these properties.  I'm hoping to figure out a way to
> reconcile these by Spark 1.5.
>
>  -Sandy
>
> On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan 
> wrote:
>
>>   Would there be any problem in having spark.executor.instances (or
>> --num-executors) be completely ignored (i.e., even for non-zero values) if
>> spark.dynamicAllocation.enabled is true (i.e., rather than throwing an
>> exception)?
>>
>>  I can see how the exception would be helpful if, say, you tried to pass
>> both "-c spark.executor.instances" (or --num-executors) *and* "-c
>> spark.dynamicAllocation.enabled=true" to spark-submit on the command line
>> (as opposed to having one of them in spark-defaults.conf and one of them in
>> the spark-submit args), but currently there doesn't seem to be any way to
>> distinguish between arguments that were actually passed to spark-submit and
>> settings that simply came from spark-defaults.conf.
>>
>>  If there were a way to distinguish them, I think the ideal situation
>> would be for the validation exception to be thrown only if
>> spark.executor.instances and spark.dynamicAllocation.enabled=true were both
>> passed via spark-submit args or were both present in spark-defaults.conf,
>> but passing spark.dynamicAllocation.enabled=true to spark-submit would take
>> precedence over spark.executor.instances configured in spark-defaults.conf,
>> and vice versa.
>>
>>
>>  Jonathan Kelly
>>
>> Elastic MapReduce - SDE
>>
>> Blackfoot (SEA33) 06.850.F0
>>
>>   From: Jonathan Kelly 
>> Date: Tuesday, July 14, 2015 at 4:23 PM
>> To: "user@spark.apache.org" 
>> Subject: Unable to use dynamicAllocation if spark.executor.instances is
>> set in spark-defaults.conf
>>
>>I've set up my cluster with a pre-calcualted value for
>> spark.executor.instances in spark-defaults.conf such that I can run a job
>> and have it maximize the utilization of the cluster resources by default.
>> However, if I want to run a job with dynamicAllocation (by passing -c
>> spark.dynamicAllocation.enabled=true to spark-submit), I get this exception:
>>
>>  Exception in thread "main" java.lang.IllegalArgumentException:
>> Explicitly setting the number of executors is not compatible with
>> spark.dynamicAllocation.enabled!
>> at
>> org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
>> at
>> org.apache.spark.deploy.yarn.ClientArguments.(ClientArguments.scala:59)
>> at
>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
>>  …
>>
>>  The exception makes sense, of course, but ideally I would like it to
>> ignore what I've put in spark-defaults.conf for spark.executor.instances if
>> I've enabled dynamicAllocation. The most annoying thing about this is that
>> if I have spark.executor.instances present in spark-defaults.conf, I cannot
>> figure out any way to spark-submit a job with
>> spark.dynamicAllocation.enabled=true without getting this error. That is,
>> even if I pass "-c spark.executor.instances=0 -c
>> spark.dynamicAllocation.enabled=true", I still get this error because the
>> validation in ClientArguments.parseArgs() that's checking for this
>> condition simply checks for the presence of spark.executor.instances rather
>> than whether or not its value is > 0.
>>
>>  Should the check be changed to allow spark.executor.instances to be set
>> to 0 if spark.dynamicAllocation.enabled is true? That would be an OK
>> compromise, but I'd really prefer to be able to enable dynamicAllocation
>> simply by setting spark.dynamicAllocation.enabled=true rather than by also
>> having to set spark.executor.instances to 0.
>>
>>
>>  Thanks,
>>
>> Jonathan
>>
>
>


Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
Thanks! Is there an existing JIRA I should watch?

~ Jonathan

From: Sandy Ryza mailto:sandy.r...@cloudera.com>>
Date: Wednesday, July 15, 2015 at 2:27 PM
To: Jonathan Kelly mailto:jonat...@amazon.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: Re: Unable to use dynamicAllocation if spark.executor.instances is set 
in spark-defaults.conf

Hi Jonathan,

This is a problem that has come up for us as well, because we'd like dynamic 
allocation to be turned on by default in some setups, but not break existing 
users with these properties.  I'm hoping to figure out a way to reconcile these 
by Spark 1.5.

-Sandy

On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan 
mailto:jonat...@amazon.com>> wrote:
Would there be any problem in having spark.executor.instances (or 
--num-executors) be completely ignored (i.e., even for non-zero values) if 
spark.dynamicAllocation.enabled is true (i.e., rather than throwing an 
exception)?

I can see how the exception would be helpful if, say, you tried to pass both 
"-c spark.executor.instances" (or --num-executors) *and* "-c 
spark.dynamicAllocation.enabled=true" to spark-submit on the command line (as 
opposed to having one of them in spark-defaults.conf and one of them in the 
spark-submit args), but currently there doesn't seem to be any way to 
distinguish between arguments that were actually passed to spark-submit and 
settings that simply came from spark-defaults.conf.

If there were a way to distinguish them, I think the ideal situation would be 
for the validation exception to be thrown only if spark.executor.instances and 
spark.dynamicAllocation.enabled=true were both passed via spark-submit args or 
were both present in spark-defaults.conf, but passing 
spark.dynamicAllocation.enabled=true to spark-submit would take precedence over 
spark.executor.instances configured in spark-defaults.conf, and vice versa.

Jonathan Kelly
Elastic MapReduce - SDE
Blackfoot (SEA33) 06.850.F0

From: Jonathan Kelly mailto:jonat...@amazon.com>>
Date: Tuesday, July 14, 2015 at 4:23 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: Unable to use dynamicAllocation if spark.executor.instances is set in 
spark-defaults.conf

I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread "main" java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
…

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass "-c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true", I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is > 0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan



Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Sandy Ryza
Hi Jonathan,

This is a problem that has come up for us as well, because we'd like
dynamic allocation to be turned on by default in some setups, but not break
existing users with these properties.  I'm hoping to figure out a way to
reconcile these by Spark 1.5.

-Sandy

On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan 
wrote:

>   Would there be any problem in having spark.executor.instances (or
> --num-executors) be completely ignored (i.e., even for non-zero values) if
> spark.dynamicAllocation.enabled is true (i.e., rather than throwing an
> exception)?
>
>  I can see how the exception would be helpful if, say, you tried to pass
> both "-c spark.executor.instances" (or --num-executors) *and* "-c
> spark.dynamicAllocation.enabled=true" to spark-submit on the command line
> (as opposed to having one of them in spark-defaults.conf and one of them in
> the spark-submit args), but currently there doesn't seem to be any way to
> distinguish between arguments that were actually passed to spark-submit and
> settings that simply came from spark-defaults.conf.
>
>  If there were a way to distinguish them, I think the ideal situation
> would be for the validation exception to be thrown only if
> spark.executor.instances and spark.dynamicAllocation.enabled=true were both
> passed via spark-submit args or were both present in spark-defaults.conf,
> but passing spark.dynamicAllocation.enabled=true to spark-submit would take
> precedence over spark.executor.instances configured in spark-defaults.conf,
> and vice versa.
>
>
>  Jonathan Kelly
>
> Elastic MapReduce - SDE
>
> Blackfoot (SEA33) 06.850.F0
>
>   From: Jonathan Kelly 
> Date: Tuesday, July 14, 2015 at 4:23 PM
> To: "user@spark.apache.org" 
> Subject: Unable to use dynamicAllocation if spark.executor.instances is
> set in spark-defaults.conf
>
>   I've set up my cluster with a pre-calcualted value for
> spark.executor.instances in spark-defaults.conf such that I can run a job
> and have it maximize the utilization of the cluster resources by default.
> However, if I want to run a job with dynamicAllocation (by passing -c
> spark.dynamicAllocation.enabled=true to spark-submit), I get this exception:
>
>  Exception in thread "main" java.lang.IllegalArgumentException:
> Explicitly setting the number of executors is not compatible with
> spark.dynamicAllocation.enabled!
> at
> org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
> at
> org.apache.spark.deploy.yarn.ClientArguments.(ClientArguments.scala:59)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
>  …
>
>  The exception makes sense, of course, but ideally I would like it to
> ignore what I've put in spark-defaults.conf for spark.executor.instances if
> I've enabled dynamicAllocation. The most annoying thing about this is that
> if I have spark.executor.instances present in spark-defaults.conf, I cannot
> figure out any way to spark-submit a job with
> spark.dynamicAllocation.enabled=true without getting this error. That is,
> even if I pass "-c spark.executor.instances=0 -c
> spark.dynamicAllocation.enabled=true", I still get this error because the
> validation in ClientArguments.parseArgs() that's checking for this
> condition simply checks for the presence of spark.executor.instances rather
> than whether or not its value is > 0.
>
>  Should the check be changed to allow spark.executor.instances to be set
> to 0 if spark.dynamicAllocation.enabled is true? That would be an OK
> compromise, but I'd really prefer to be able to enable dynamicAllocation
> simply by setting spark.dynamicAllocation.enabled=true rather than by also
> having to set spark.executor.instances to 0.
>
>
>  Thanks,
>
> Jonathan
>


Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
Would there be any problem in having spark.executor.instances (or 
--num-executors) be completely ignored (i.e., even for non-zero values) if 
spark.dynamicAllocation.enabled is true (i.e., rather than throwing an 
exception)?

I can see how the exception would be helpful if, say, you tried to pass both 
"-c spark.executor.instances" (or --num-executors) *and* "-c 
spark.dynamicAllocation.enabled=true" to spark-submit on the command line (as 
opposed to having one of them in spark-defaults.conf and one of them in the 
spark-submit args), but currently there doesn't seem to be any way to 
distinguish between arguments that were actually passed to spark-submit and 
settings that simply came from spark-defaults.conf.

If there were a way to distinguish them, I think the ideal situation would be 
for the validation exception to be thrown only if spark.executor.instances and 
spark.dynamicAllocation.enabled=true were both passed via spark-submit args or 
were both present in spark-defaults.conf, but passing 
spark.dynamicAllocation.enabled=true to spark-submit would take precedence over 
spark.executor.instances configured in spark-defaults.conf, and vice versa.

Jonathan Kelly
Elastic MapReduce - SDE
Blackfoot (SEA33) 06.850.F0

From: Jonathan Kelly mailto:jonat...@amazon.com>>
Date: Tuesday, July 14, 2015 at 4:23 PM
To: "user@spark.apache.org" 
mailto:user@spark.apache.org>>
Subject: Unable to use dynamicAllocation if spark.executor.instances is set in 
spark-defaults.conf

I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread "main" java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
...

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass "-c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true", I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is > 0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan


Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
bump

From: Jonathan Kelly mailto:jonat...@amazon.com>>
Date: Tuesday, July 14, 2015 at 4:23 PM
To: "user@spark.apache.org" 
mailto:user@spark.apache.org>>
Subject: Unable to use dynamicAllocation if spark.executor.instances is set in 
spark-defaults.conf

I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread "main" java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
...

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass "-c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true", I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is > 0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan