[jira] [Commented] (SPARK-31555) Improve cache block migration

2020-05-18 Thread Dale Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110732#comment-17110732
 ] 

Dale Richardson commented on SPARK-31555:
-

Hi [~holden], happy to have a go at this.

> Improve cache block migration
> -
>
> Key: SPARK-31555
> URL: https://issues.apache.org/jira/browse/SPARK-31555
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Major
>
> We should explore the following improvements to cache block migration:
> 1) Peer selection (right now may overbalance on certain peers)
> 2) Do we need to configure the number of blocks to be migrated at the same 
> time
> 3) Are there any blocks we don't need to replicate (e.g. they are already 
> stored on the desired number of executors even once we remove the executors 
> slated for decommissioning).
> 4) Do we want to prioritize migrating blocks with no replicas
> 5) Log the attempt number for debugging 
> 6) Clarify the logic for determining the number of replicas
> 7) Consider using TestUtils.waitUntilExecutorsUp in tests rather than count 
> to wait for the executors to come up. imho this is the least important.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31520) Add a readiness probe to spark pod definitions

2020-04-22 Thread Dale Richardson (Jira)
Dale Richardson created SPARK-31520:
---

 Summary: Add a readiness probe to spark pod definitions
 Key: SPARK-31520
 URL: https://issues.apache.org/jira/browse/SPARK-31520
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.5
Reporter: Dale Richardson


Add a readiness probe to allow so Kubernetes can communicate basic spark pod 
state to the end user via get/describe pod commands.

A basic TCP/SYN probe of the RPC port should be enough to indicate that a spark 
process is running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31329) Modify executor monitor to allow for moving shuffle blocks

2020-04-21 Thread Dale Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088569#comment-17088569
 ] 

Dale Richardson edited comment on SPARK-31329 at 4/21/20, 11:36 AM:


Hi Holden,

I've been thinking about this issue as well.

Blocks can't move, but they can be replicated.  Any reason we can't just 
replicate the blocks out, allow the existing code paths to update the block 
locations with the block manager master, then unregister the current blocks?

We should chat and coordinate efforts.


was (Author: tigerquoll):
Hi Holden,

I've been thinking about this issue as well.

Blocks can't move, but they can be replicated.  Any reason we can't just 
replicate the blocks out, allow the existing code paths to update the block 
locations with the block manager master, then unregister the current blocks?

> Modify executor monitor to allow for moving shuffle blocks
> --
>
> Key: SPARK-31329
> URL: https://issues.apache.org/jira/browse/SPARK-31329
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
>
> To enable Spark-20629 we need to revisit code that assumes shuffle blocks 
> don't move. Currently, the executor monitor assumes that shuffle blocks are 
> immovable. We should modify this code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31329) Modify executor monitor to allow for moving shuffle blocks

2020-04-21 Thread Dale Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088569#comment-17088569
 ] 

Dale Richardson commented on SPARK-31329:
-

Hi Holden,

I've been thinking about this issue as well.

Blocks can't move, but they can be replicated.  Any reason we can't just 
replicate the blocks out, allow the existing code paths to update the block 
locations with the block manager master, then unregister the current blocks?

> Modify executor monitor to allow for moving shuffle blocks
> --
>
> Key: SPARK-31329
> URL: https://issues.apache.org/jira/browse/SPARK-31329
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
>
> To enable Spark-20629 we need to revisit code that assumes shuffle blocks 
> don't move. Currently, the executor monitor assumes that shuffle blocks are 
> immovable. We should modify this code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22387) propagate session configs to data source read/write options

2018-09-06 Thread Dale Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606527#comment-16606527
 ] 

Dale Richardson commented on SPARK-22387:
-

We've missed shared configs like what is required for Kerberos

> propagate session configs to data source read/write options
> ---
>
> Key: SPARK-22387
> URL: https://issues.apache.org/jira/browse/SPARK-22387
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Wenchen Fan
>Assignee: Jiang Xingbo
>Priority: Major
> Fix For: 2.3.0
>
>
> This is an open discussion. The general idea is we should allow users to set 
> some common configs in session conf so that they don't need to type them 
> again and again for each data source operations.
> Proposal 1:
> propagate every session config which starts with {{spark.datasource.config.}} 
> to data source options. The downside is, users may only want to set some 
> common configs for a specific data source.
> Proposal 2:
> propagate session config which starts with 
> {{spark.datasource.config.myDataSource.}} only to {{myDataSource}} 
> operations. One downside is, some data source may not have a short name and 
> makes the config key pretty long, e.g. 
> {{spark.datasource.config.com.company.foo.bar.key1}}.
> Proposal 3:
> Introduce a trait `WithSessionConfig` which defines session config key 
> prefix. Then we can pick session configs with this key-prefix and propagate 
> it to this particular data source.
> One another thing also worth to think: sometimes it's really annoying if 
> users have a typo in the config key and spend a lot of time to figure out why 
> things don't work as expected. We should allow data source to validate the 
> given options and throw exception if an option can't be recognized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25329) Support passing Kerberos configuration information

2018-09-03 Thread Dale Richardson (JIRA)
Dale Richardson created SPARK-25329:
---

 Summary: Support passing Kerberos configuration information
 Key: SPARK-25329
 URL: https://issues.apache.org/jira/browse/SPARK-25329
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.1
Reporter: Dale Richardson


The current V2 Datasource API provides support for querying a portion of the 
SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API.  
This was designed with the assumption that all configuration information for v2 
data sources should be separate from each other.

Unfortunately, there are some cross-cutting concerns such as authentication 
that touch multiple data sources - this means that common configuration items 
need to be shared amongst multiple data sources.

In particular, Kerberos setup can use the following configuration items:
 # userPrincipal, spark configuration:: spark.yarn.principal
 # userKeytabPath spark configuration: spark.yarn.keytab
 # krb5ConfPath:  java.security.krb5.conf
 # kerberos debugging flag: sun.security.krb5.debug 
 # spark.security.credentials.${service}.enabled
 # JAAS config: java.security.auth.login.config ??
 # ZKServerPrincipal ??

So potential solutions to pass this information to various data sources are:
 # Pass the entire SparkContext object to data sources (not likely)
 # Pass the entire SparkConfig Map object to data sources
 # Pass all required configuration via environment variables
 # Extend SessionConfigSupport to support passing specific white-listed 
configuration values
 # Add a specific data source v2 API "SupportsKerberos" so that a data source 
can indicate that it supports Kerberos and also provide the means to pass 
needed configuration info.
 # Expand out all Kerberos configuration items to be in each data source config 
namespace that needs it.

If the data source requires TLS support then we also need to support passing 
all the  configuration values under  "spark.ssl.*"

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true

2017-09-13 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165448#comment-16165448
 ] 

Dale Richardson edited comment on SPARK-21418 at 9/13/17 11:18 PM:
---

I'm getting the same stackdump in 2.2.0 while using apache-avro (current 
snapshot) in structured streaming with a globbed hdfs input path.  I do not get 
this error if I use a non-globbed input path.

I should note that this is without setting 
sun.io.serialization.extendedDebugInfo at all.


was (Author: tigerquoll):
I'm getting the same stackdump in 2.2.0 while using apache-avro (current 
snapshot) in structured streaming with a globbed hdfs input path.  I do not get 
this error if I use a non-globbed input path.

> NoSuchElementException: None.get in DataSourceScanExec with 
> sun.io.serialization.extendedDebugInfo=true
> ---
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:

[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true

2017-09-13 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165448#comment-16165448
 ] 

Dale Richardson commented on SPARK-21418:
-

I'm getting the same stackdump in 2.2.0 while using apache-avro (current 
snapshot) in structured streaming with a globbed hdfs input path.  I do not get 
this error if I use a non-globbed input path.

> NoSuchElementException: None.get in DataSourceScanExec with 
> sun.io.serialization.extendedDebugInfo=true
> ---
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream

[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files

2015-03-29 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6593:
---
Description: 
When reading a large amount of gzip files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"), If the hadoop input libraries 
report an exception then the entire job is canceled. As default behaviour this 
is probably for the best, but it would be nice in some circumstances where you 
know it will be ok to have the option to skip the corrupted file and continue 
the job. 


  was:
When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
report an exception then the entire job is canceled. As default behaviour this 
is probably for the best, but it would be nice in some circumstances where you 
know it will be ok to have the option to skip the corrupted file and continue 
the job. 



> Provide option for HadoopRDD to skip corrupted files
> 
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of gzip files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"), If the hadoop input libraries 
> report an exception then the entire job is canceled. As default behaviour 
> this is probably for the best, but it would be nice in some circumstances 
> where you know it will be ok to have the option to skip the corrupted file 
> and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files

2015-03-29 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6593:
---
Description: 
When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
report an exception then the entire job is canceled. As default behaviour this 
is probably for the best, but it would be nice in some circumstances where you 
know it will be ok to have the option to skip the corrupted file and continue 
the job. 


  was:
When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
report an exception then the entire job is canceled. As default behaviour this 
is probably for the best, but it would be nice in some circumstances where you 
know it will be ok to have the option to skip the corrupted portion and 
continue the job. 



> Provide option for HadoopRDD to skip corrupted files
> 
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
> report an exception then the entire job is canceled. As default behaviour 
> this is probably for the best, but it would be nice in some circumstances 
> where you know it will be ok to have the option to skip the corrupted file 
> and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files

2015-03-29 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6593:
---
Summary: Provide option for HadoopRDD to skip corrupted files  (was: 
Provide option for HadoopRDD to skip bad data splits.)

> Provide option for HadoopRDD to skip corrupted files
> 
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
> report an exception then the entire job is canceled. As default behaviour 
> this is probably for the best, but it would be nice in some circumstances 
> where you know it will be ok to have the option to skip the corrupted portion 
> and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files

2015-03-29 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385723#comment-14385723
 ] 

Dale Richardson commented on SPARK-6593:


Changed the title and description to focus closer on my particular use case, 
which is corrupted gzip files.

> Provide option for HadoopRDD to skip corrupted files
> 
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
> report an exception then the entire job is canceled. As default behaviour 
> this is probably for the best, but it would be nice in some circumstances 
> where you know it will be ok to have the option to skip the corrupted portion 
> and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

2015-03-29 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6593:
---
Description: 
When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
report an exception then the entire job is canceled. As default behaviour this 
is probably for the best, but it would be nice in some circumstances where you 
know it will be ok to have the option to skip the corrupted portion and 
continue the job. 


  was:
When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted 
then the entire job is canceled. As default behaviour this is probably for the 
best, but it would be nice in some circumstances where you know it will be ok 
to have the option to skip the corrupted portion and continue the job. 



> Provide option for HadoopRDD to skip bad data splits.
> -
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries 
> report an exception then the entire job is canceled. As default behaviour 
> this is probably for the best, but it would be nice in some circumstances 
> where you know it will be ok to have the option to skip the corrupted portion 
> and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

2015-03-29 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385716#comment-14385716
 ] 

Dale Richardson edited comment on SPARK-6593 at 3/29/15 11:35 AM:
--

With a gz file for example, the entire file is a split. so a corrupted gz file 
will kill the entire job - with no way of catching and remediating the error.


was (Author: tigerquoll):
With a gz file for example, the entire file is a split. so a corrupted gz file 
will kill the entire job.

> Provide option for HadoopRDD to skip bad data splits.
> -
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted 
> then the entire job is canceled. As default behaviour this is probably for 
> the best, but it would be nice in some circumstances where you know it will 
> be ok to have the option to skip the corrupted portion and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

2015-03-29 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385716#comment-14385716
 ] 

Dale Richardson commented on SPARK-6593:


With a gz file for example, the entire file is a split. so a corrupted gz file 
will kill the entire job.

> Provide option for HadoopRDD to skip bad data splits.
> -
>
> Key: SPARK-6593
> URL: https://issues.apache.org/jira/browse/SPARK-6593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Dale Richardson
>Priority: Minor
>
> When reading a large amount of files from HDFS eg. with  
> sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted 
> then the entire job is canceled. As default behaviour this is probably for 
> the best, but it would be nice in some circumstances where you know it will 
> be ok to have the option to skip the corrupted portion and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

2015-03-29 Thread Dale Richardson (JIRA)
Dale Richardson created SPARK-6593:
--

 Summary: Provide option for HadoopRDD to skip bad data splits.
 Key: SPARK-6593
 URL: https://issues.apache.org/jira/browse/SPARK-6593
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: Dale Richardson
Priority: Minor


When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted 
then the entire job is canceled. As default behaviour this is probably for the 
best, but it would be nice in some circumstances where you know it will be ok 
to have the option to skip the corrupted portion and continue the job. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-21 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support for bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of scale eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
--cores "numCores - 1"
spark.executor.memory = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


A safety features has been added so that the expression evaluator is only used 
if the configuration strings contains a magic character (currently '!') as the 
first character.


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support for bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of scale eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
--cores "numCores - 1"
spark.executor.memory = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support for bracketed expressions 
> and standard precedence rules.
> * Support for and normalisation of common units of scale eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for 

[jira] [Commented] (SPARK-6396) Add timeout control for broadcast

2015-03-18 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366849#comment-14366849
 ] 

Dale Richardson commented on SPARK-6396:


No problems.

> Add timeout control for broadcast
> -
>
> Key: SPARK-6396
> URL: https://issues.apache.org/jira/browse/SPARK-6396
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Jun Fang
>Priority: Minor
>
> TorrentBroadcast uses fetchBlockSync method of BlockTransferService.scala 
> which call Await.result(result.future, Duration.Inf). In production 
> environment this may cause a hang out when driver and executor are in 
> different date centers. A timeout here would be better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6396) Add timeout control for broadcast

2015-03-18 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366799#comment-14366799
 ] 

Dale Richardson commented on SPARK-6396:


If nobody else is looking at this one I'll have a look at it.

> Add timeout control for broadcast
> -
>
> Key: SPARK-6396
> URL: https://issues.apache.org/jira/browse/SPARK-6396
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Jun Fang
>Priority: Minor
>
> TorrentBroadcast uses fetchBlockSync method of BlockTransferService.scala 
> which call Await.result(result.future, Duration.Inf). In production 
> environment this may cause a hang out when driver and executor are in 
> different date centers. A timeout here would be better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support for bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of scale eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
--cores "numCores - 1"
spark.executor.memory = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support for bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of scale eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support for bracketed expressions 
> and standard precedence rules.
> * Support for and normalisation of common units of scale eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> numCores: Number of cores assigned to the JVM
> physicalMemoryBytes: Memory size

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support for bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of scale eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support for bracketed expressions 
> and standard precedence rules.
> * Support for and normalisation of common units of scale eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> numCores: Number of cores assigned to the JVM
> physicalMemoryBytes: Me

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes: Memory size of hosting machine 
JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine 
 **   JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> numCores: Number of cores assigned to the JVM
> phys

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine 
 **   JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine 
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigne

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine 
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
**  JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
**  JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

**  JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
** JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM


** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to 

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM


** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM

**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to th

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM


** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM

** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to the

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM


** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM

**JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to th

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine
** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM
**JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine


** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to 

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM
** physicalMemoryBytes: Memory size of hosting machine


** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM

** physicalMemoryBytes:  Memory size of hosting machine

** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigne

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM

** physicalMemoryBytes:  Memory size of hosting machine

** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM

** physicalMemoryBytes:  Memory size of hosting machine

** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

**JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigne

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

* Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

* Allow for the referencing of basic environmental information currently 
defined as:
** numCores: Number of cores assigned to the JVM

** physicalMemoryBytes:  Memory size of hosting machine

** JVMtotalMemoryBytes:  current bytes of memory allocated to the JVM

**JVMmaxMemoryBytes:Maximum number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

* Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
* Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

* Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

* Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

* Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

*Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

*Allow for the referencing of basic environmental information currently defined 
as:
**numCores: Number of cores assigned to the JVM

**physicalMemoryBytes:  Memory size of hosting machine
**JVMtotalMemoryBytes:  
current bytes of memory allocated to the JVM
**JVMmaxMemoryBytes:Maximum 
number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

*Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
*Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

*Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

*Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

*Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> * Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> * Allow for the referencing of basic environmental information currently 
> defined as:
> ** numCores: Number of cores assigned to the JV

[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-6214:
---
Description: 
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

* Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

*Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

*Allow for the referencing of basic environmental information currently defined 
as:
**numCores: Number of cores assigned to the JVM

**physicalMemoryBytes:  Memory size of hosting machine
**JVMtotalMemoryBytes:  
current bytes of memory allocated to the JVM
**JVMmaxMemoryBytes:Maximum 
number of bytes of memory available to the 
**JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

*Allow for the limited referencing of other configuration values when 
specifying values. (Other configuration values must be initialised and 
explicitly passed into the expression evaluator for this functionality to be 
enabled).
 
Such a feature would have the following end-user benefits:
*Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

*Have a consistent means of entering configuration information regardless of 
the configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

*Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

*Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8


  was:
This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

Allow for the referencing of basic environmental information currently defined 
as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes:  
Memory size of hosting machine
JVMtotalMemoryBytes:  current bytes of memory 
allocated to the JVM
JVMmaxMemoryBytes:Maximum number of bytes of memory 
available to the JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

Allow for the limited referencing of other configuration values when specifying 
values. (Other configuration values must be initialised and explicitly passed 
into the expression evaluator for this functionality to be enabled).
 
Such a feature would have the following end-user benefits:

Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

Have a consistent means of entering configuration information regardless of the 
configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8



> Allow configuration options to use a simple expression language
> ---
>
> Key: SPARK-6214
> URL: https://issues.apache.org/jira/browse/SPARK-6214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Dale Richardson
>Priority: Minor
>
> This is a proposal to allow for configuration options to be specified via a 
> simple expression language.  This language would have the following features:

> * Allow for basic arithmetic (+-/*) with support bracketed expressions and 
> standard precedence rules.
> *Support for and normalisation of common units of reference eg. MB, GB, 
> seconds,minutes,hours, days and weeks.
> *Allow for the referencing of basic environmental information currently 
> defined as:
> **numCores: Number of cores assigned to the JVM

> **physicalMemoryBytes:  Memo

[jira] [Created] (SPARK-6214) Allow configuration options to use a simple expression language

2015-03-07 Thread Dale Richardson (JIRA)
Dale Richardson created SPARK-6214:
--

 Summary: Allow configuration options to use a simple expression 
language
 Key: SPARK-6214
 URL: https://issues.apache.org/jira/browse/SPARK-6214
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Dale Richardson
Priority: Minor


This is a proposal to allow for configuration options to be specified via a 
simple expression language.  This language would have the following features:

Allow for basic arithmetic (+-/*) with support bracketed expressions and 
standard precedence rules.

Support for and normalisation of common units of reference eg. MB, GB, 
seconds,minutes,hours, days and weeks.

Allow for the referencing of basic environmental information currently defined 
as:
numCores: Number of cores assigned to the JVM
physicalMemoryBytes:  
Memory size of hosting machine
JVMtotalMemoryBytes:  current bytes of memory 
allocated to the JVM
JVMmaxMemoryBytes:Maximum number of bytes of memory 
available to the JVM
JVMfreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes

Allow for the limited referencing of other configuration values when specifying 
values. (Other configuration values must be initialised and explicitly passed 
into the expression evaluator for this functionality to be enabled).
 
Such a feature would have the following end-user benefits:

Allow for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units  e.g. 1 week rather rather then 604800 
seconds

Have a consistent means of entering configuration information regardless of the 
configuration option being added. (eg questions such as ‘is the particular 
option specified in ms or seconds?’ become irrelevant, because the user can 
pick what ever unit makes sense for the magnitude of the value they are 
specifying)

Allow for the scaling of a configuration option in relation to a system 
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

Being able to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-02-05 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306899#comment-14306899
 ] 

Dale Richardson commented on SPARK-5388:


Heh Andrew, definitely starting to look a bit more Rest-like in the protocol!

Http Delete should be used for your kill request - it is considered best 
practice

The primary resource you are dealing with is a submission - this should form 
the base of your url structure.
For a rest protocol, actions/verbs are used to affect these resources - so they 
are mapped to to the HTTP operations of GET/POST/DELETE/HEAD/OPTIONS etc, 
against the resources defined by the full url.

Full URLs serve to identify the resources that these actions are performed on. 
GET/DELETE are used where the full identity of the resource is known at the 
time of generating the request, POST is used when you may not know the address 
of the resource at the time of generating the request (eg When submitting a 
program to run, you will not know submission id because it is returned by the 
request)

So, taking this into account:
RequestSubmitDriver → POST /submission/create
RequestKillDriver → DELETE /submission/[submissionId]
RequestDriverStatus → GET /submission/[submissionId]/status  - The resource is 
the submission, so the current status of the submission in a sub-resource of 
the submission, other sub entries such as 
/submission/[submissionId]/performanceCounters 
could be added in the future without affecting existing clients.




> Provide a stable application submission gateway in standalone cluster mode
> --
>
> Key: SPARK-5388
> URL: https://issues.apache.org/jira/browse/SPARK-5388
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf
>
>
> The existing submission gateway in standalone mode is not compatible across 
> Spark versions. If you have a newer version of Spark submitting to an older 
> version of the standalone Master, it is currently not guaranteed to work. The 
> goal is to provide a stable REST interface to replace this channel.
> For more detail, please see the most recent design doc attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway

2015-01-25 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847
 ] 

Dale Richardson edited comment on SPARK-5388 at 1/26/15 5:39 AM:
-

Hi Andrew,
I think the idea is well worth considering. 

I have a question if there is an intention for other entities (such as job 
servers) to communicate with the master at all? If so then the proposed gateway 
is semantically defined at a fairly low level (just RPC over JSON/HTTP). This 
is fine if the interface is not going to be exposed to anybody who is not a 
spark developer with detailed knowledge of spark internals. Did you use the 
term “REST” to simply mean RPC over JSON/HTTP?

Creating a REST interface is more then a HTTP RPC gateway. If the interface is 
going to be exposed to 3rd parties (such as developers of Job servers and web 
notebooks etc) then there is a benefit to simplifying some of the exposed 
application semantics, and exposing an API that is more integrated with HTTP’s 
protocol semantics which most people are already familiar with - this is what a 
true REST interface does and if you are defining an endpoint for others to use 
it is a very powerful concept that allows other people to quickly grasp how to 
properly use the exposed interface.

A rough sketch of a more “REST”ed version of the API would be:

*Submit_driver_request*
HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver
Responds with standard HTTP Response including allocated DRIVER_ID if driver 
submission ok, http error codes with spark specific error if not.

*Get status of DRIVER*
HTTP GET http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver execution.  If no 
record of driver_id, then http error code 404 (Not found) returned.

*Kill Driver request*
HTTP DELETE http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver kill request, or http 
error code if an error occurs.

I would be happy to prototype something like this up to test the concept out 
for you if you are looking for something more than just RPC over JSON/HTTP.




was (Author: tigerquoll):
Hi Andrew,
I think the idea is well worth considering. In response to the requirement of 
making it easier for client<-> master communications to pass through 
restrictive fire-walling, have you considered just using Akka's REST gateway ( 
http://doc.akka.io/docs/akka/1.3.1/scala/http.html )?

I also have a question if there is an intention for other entities (such as job 
servers) to communicate with the master at all? If so then the proposed gateway 
is semantically defined at a fairly low level (just RPC over JSON/HTTP). This 
is fine if the interface is not going to be exposed to anybody who is not a 
spark developer with detailed knowledge of spark internals. Did you use the 
term “REST” to simply mean RPC over JSON/HTTP?

Creating a REST interface is more then a HTTP RPC gateway. If the interface is 
going to be exposed to 3rd parties (such as developers of Job servers and web 
notebooks etc) then there is a benefit to simplifying some of the exposed 
application semantics, and exposing an API that is more integrated with HTTP’s 
protocol semantics which most people are already familiar with - this is what a 
true REST interface does and if you are defining an endpoint for others to use 
it is a very powerful concept that allows other people to quickly grasp how to 
properly use the exposed interface.

A rough sketch of a more “REST”ed version of the API would be:

*Submit_driver_request*
HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver
Responds with standard HTTP Response including allocated DRIVER_ID if driver 
submission ok, http error codes with spark specific error if not.

*Get status of DRIVER*
HTTP GET http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver execution.  If no 
record of driver_id, then http error code 404 (Not found) returned.

*Kill Driver request*
HTTP DELETE http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver kill request, or http 
error code if an error occurs.

I would be happy to prototype something like this up to test the concept out 
for you if you are looking for something more than just RPC over JSON/HTTP.



> Provide a stable application submission gateway
> ---
>
> Key: SPARK-5388
> URL: https://issues.apache.org/jira/browse/SPARK-5388
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Attachments: Stable Spark Standalone Submission.pdf
>
>
> The existing submission gateway 

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway

2015-01-25 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847
 ] 

Dale Richardson edited comment on SPARK-5388 at 1/26/15 5:30 AM:
-

Hi Andrew,
I think the idea is well worth considering. In response to the requirement of 
making it easier for client<-> master communications to pass through 
restrictive fire-walling, have you considered just using Akka's REST gateway ( 
http://doc.akka.io/docs/akka/1.3.1/scala/http.html )?

I also have a question if there is an intention for other entities (such as job 
servers) to communicate with the master at all? If so then the proposed gateway 
is semantically defined at a fairly low level (just RPC over JSON/HTTP). This 
is fine if the interface is not going to be exposed to anybody who is not a 
spark developer with detailed knowledge of spark internals. Did you use the 
term “REST” to simply mean RPC over JSON/HTTP?

Creating a REST interface is more then a HTTP RPC gateway. If the interface is 
going to be exposed to 3rd parties (such as developers of Job servers and web 
notebooks etc) then there is a benefit to simplifying some of the exposed 
application semantics, and exposing an API that is more integrated with HTTP’s 
protocol semantics which most people are already familiar with - this is what a 
true REST interface does and if you are defining an endpoint for others to use 
it is a very powerful concept that allows other people to quickly grasp how to 
properly use the exposed interface.

A rough sketch of a more “REST”ed version of the API would be:

*Submit_driver_request*
HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver
Responds with standard HTTP Response including allocated DRIVER_ID if driver 
submission ok, http error codes with spark specific error if not.

*Get status of DRIVER*
HTTP GET http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver execution.  If no 
record of driver_id, then http error code 404 (Not found) returned.

*Kill Driver request*
HTTP DELETE http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver kill request, or http 
error code if an error occurs.

I would be happy to prototype something like this up to test the concept out 
for you if you are looking for something more than just RPC over JSON/HTTP.




was (Author: tigerquoll):
Hi Andrew,
I think the idea is well worth considering. In response to the requirement of 
making it easier for client<-> master communications to pass through 
restrictive fire-walling, have you considered just using Akka's REST gateway 
(http://doc.akka.io/docs/akka/1.3.1/scala/http.html)?

I also have a question if there is an intention for other entities (such as job 
servers) to communicate with the master at all? If so then the proposed gateway 
is semantically defined at a fairly low level (just RPC over JSON/HTTP). This 
is fine if the interface is not going to be exposed to anybody who is not a 
spark developer with detailed knowledge of spark internals. Did you use the 
term “REST” to simply mean RPC over JSON/HTTP?

Creating a REST interface is more then a HTTP RPC gateway. If the interface is 
going to be exposed to 3rd parties (such as developers of Job servers and web 
notebooks etc) then there is a benefit to simplifying some of the exposed 
application semantics, and exposing an API that is more integrated with HTTP’s 
protocol semantics which most people are already familiar with - this is what a 
true REST interface does and if you are defining an endpoint for others to use 
it is a very powerful concept that allows other people to quickly grasp how to 
properly use the exposed interface.

A rough sketch of a more “REST”ed version of the API would be:

*Submit_driver_request*
HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver
Responds with standard HTTP Response including allocated DRIVER_ID if driver 
submission ok, http error codes with spark specific error if not.

*Get status of DRIVER*
HTTP GET http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver execution.  If no 
record of driver_id, then http error code 404 (Not found) returned.

*Kill Driver request*
HTTP DELETE http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver kill request, or http 
error code if an error occurs.

I would be happy to prototype something like this up to test the concept out 
for you if you are looking for something more than just RPC over JSON/HTTP.



> Provide a stable application submission gateway
> ---
>
> Key: SPARK-5388
> URL: https://issues.apache.org/jira/browse/SPARK-5388
> Project: Spark
>  Issue Type: Bug
>  Componen

[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway

2015-01-24 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847
 ] 

Dale Richardson commented on SPARK-5388:


Hi Andrew,
I think the idea is well worth considering. In response to the requirement of 
making it easier for client<-> master communications to pass through 
restrictive fire-walling, have you considered just using Akka's REST gateway 
(http://doc.akka.io/docs/akka/1.3.1/scala/http.html)?

I also have a question if there is an intention for other entities (such as job 
servers) to communicate with the master at all? If so then the proposed gateway 
is semantically defined at a fairly low level (just RPC over JSON/HTTP). This 
is fine if the interface is not going to be exposed to anybody who is not a 
spark developer with detailed knowledge of spark internals. Did you use the 
term “REST” to simply mean RPC over JSON/HTTP?

Creating a REST interface is more then a HTTP RPC gateway. If the interface is 
going to be exposed to 3rd parties (such as developers of Job servers and web 
notebooks etc) then there is a benefit to simplifying some of the exposed 
application semantics, and exposing an API that is more integrated with HTTP’s 
protocol semantics which most people are already familiar with - this is what a 
true REST interface does and if you are defining an endpoint for others to use 
it is a very powerful concept that allows other people to quickly grasp how to 
properly use the exposed interface.

A rough sketch of a more “REST”ed version of the API would be:

*Submit_driver_request*
HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver
Responds with standard HTTP Response including allocated DRIVER_ID if driver 
submission ok, http error codes with spark specific error if not.

*Get status of DRIVER*
HTTP GET http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver execution.  If no 
record of driver_id, then http error code 404 (Not found) returned.

*Kill Driver request*
HTTP DELETE http://host:port/SparkMaster/Drivers/
Responds with JSON body containing information on driver kill request, or http 
error code if an error occurs.

I would be happy to prototype something like this up to test the concept out 
for you if you are looking for something more than just RPC over JSON/HTTP.



> Provide a stable application submission gateway
> ---
>
> Key: SPARK-5388
> URL: https://issues.apache.org/jira/browse/SPARK-5388
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Attachments: Stable Spark Standalone Submission.pdf
>
>
> The existing submission gateway in standalone mode is not compatible across 
> Spark versions. If you have a newer version of Spark submitting to an older 
> version of the standalone Master, it is currently not guaranteed to work. The 
> goal is to provide a stable REST interface to replace this channel.
> The first cut implementation will target standalone cluster mode because 
> there are very few messages exchanged. The design, however, will be general 
> enough to eventually support this for other cluster managers too. Note that 
> this is not necessarily required in YARN because we already use YARN's stable 
> interface to submit applications there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4787) Resource unreleased during failure in SparkContext initialization

2014-12-25 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258974#comment-14258974
 ] 

Dale Richardson commented on SPARK-4787:


Pretty simple fix - Can somebody assign this to me?

> Resource unreleased during failure in SparkContext initialization
> -
>
> Key: SPARK-4787
> URL: https://issues.apache.org/jira/browse/SPARK-4787
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Jacky Li
> Fix For: 1.3.0
>
>
> When client creates a SparkContext, currently there are many val to 
> initialize during object initialization. But when there is failure 
> initializing these val, like throwing an exception, the resources in this 
> SparkContext is not released properly. 
> For example, SparkUI object is created and bind to the HTTP server during 
> initialization using
> {{ui.foreach(_.bind())}}
> but if anything goes wrong after this code (say throwing an exception when 
> creating DAGScheduler), the SparkUI server is not stopped, thus the port bind 
> will fail again in the client when creating another SparkContext. So 
> basically this leads to a situation that the client can not create another 
> SparkContext in the same process, which I think it is not reasonable.
> So, I suggest to refactor the SparkContext code to release resource when 
> there is failure during in initialization.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-22 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144095#comment-14144095
 ] 

Dale Richardson commented on SPARK-3620:


Due to typesafe conf being based on a JSON-iike tree structure of config 
values, it will never support non-common prefixes on config variable.  So I've 
gone back to using property objects

> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]
> Initial proposal is to use typesafe conf to read in the config information 
> and merge the various config sources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-21 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-3620:
---
Description: 
I'm proposing its time to refactor the configuration argument handling code in 
spark-submit. The code has grown organically in a short period of time, handles 
a pretty complicated logic flow, and is now pretty fragile. Some issues that 
have been identified:

1. Hand-crafted property file readers that do not support the property file 
format as specified in 
http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)

2. ResolveURI not called on paths read from conf/prop files

3. inconsistent means of merging / overriding values from different sources 
(Some get overridden by file, others by manual settings of field on object, 
Some by properties)

4. Argument validation should be done after combining config files, system 
properties and command line arguments, 

5. Alternate conf file location not handled in shell scripts

6. Some options can only be passed as command line arguments

7. Defaults for options are hard-coded (and sometimes overridden multiple 
times) in many through-out the code e.g. master = local[*]

Initial proposal is to use typesafe conf to read in the config information and 
merge the various config sources

  was:
I'm proposing its time to refactor the configuration argument handling code in 
spark-submit. The code has grown organically in a short period of time, handles 
a pretty complicated logic flow, and is now pretty fragile. Some issues that 
have been identified:

1. Hand-crafted property file readers that do not support the property file 
format as specified in 
http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)

2. ResolveURI not called on paths read from conf/prop files

3. inconsistent means of merging / overriding values from different sources 
(Some get overridden by file, others by manual settings of field on object, 
Some by properties)

4. Argument validation should be done after combining config files, system 
properties and command line arguments, 

5. Alternate conf file location not handled in shell scripts

6. Some options can only be passed as command line arguments

7. Defaults for options are hard-coded (and sometimes overridden multiple 
times) in many through-out the code e.g. master = local[*]


> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]
> Initial proposal is to use typesafe conf to read in the config information 
> and merge the various config sources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-21 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142394#comment-14142394
 ] 

Dale Richardson edited comment on SPARK-3620 at 9/21/14 9:30 AM:
-

Seems to be discussion about moving to typesafe config and back again for 
version 0.9
http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html


was (Author: tigerquoll):
Seems to be discussion about moving to typesafe config and back again at
http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html

> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-21 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142395#comment-14142395
 ] 

Dale Richardson commented on SPARK-3620:


Also somes notes about config properties that do not have unique prefixes at
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

Its seems the following options have non-unique prefixes, which means that some 
typesafe conf functionality may be broken
spark.locality.wait 
spark.locality.wait.node 
spark.locality.wait.process 
spark.locality.wait.rack 

spark.speculation 
spark.speculation.interval 
spark.speculation.multiplier 
spark.speculation.quantile 

> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-21 Thread Dale Richardson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142394#comment-14142394
 ] 

Dale Richardson commented on SPARK-3620:


Seems to be discussion about moving to typesafe config and back again at
http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html

> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-20 Thread Dale Richardson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Richardson updated SPARK-3620:
---
Summary: Refactor config option handling code for spark-submit  (was: 
Refactor parameter handling code for spark-submit)

> Refactor config option handling code for spark-submit
> -
>
> Key: SPARK-3620
> URL: https://issues.apache.org/jira/browse/SPARK-3620
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Dale Richardson
>Assignee: Dale Richardson
>Priority: Minor
>
> I'm proposing its time to refactor the configuration argument handling code 
> in spark-submit. The code has grown organically in a short period of time, 
> handles a pretty complicated logic flow, and is now pretty fragile. Some 
> issues that have been identified:
> 1. Hand-crafted property file readers that do not support the property file 
> format as specified in 
> http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)
> 2. ResolveURI not called on paths read from conf/prop files
> 3. inconsistent means of merging / overriding values from different sources 
> (Some get overridden by file, others by manual settings of field on object, 
> Some by properties)
> 4. Argument validation should be done after combining config files, system 
> properties and command line arguments, 
> 5. Alternate conf file location not handled in shell scripts
> 6. Some options can only be passed as command line arguments
> 7. Defaults for options are hard-coded (and sometimes overridden multiple 
> times) in many through-out the code e.g. master = local[*]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3620) Refactor parameter handling code for spark-submit

2014-09-20 Thread Dale Richardson (JIRA)
Dale Richardson created SPARK-3620:
--

 Summary: Refactor parameter handling code for spark-submit
 Key: SPARK-3620
 URL: https://issues.apache.org/jira/browse/SPARK-3620
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.1.0, 1.0.0
Reporter: Dale Richardson
Priority: Minor


I'm proposing its time to refactor the configuration argument handling code in 
spark-submit. The code has grown organically in a short period of time, handles 
a pretty complicated logic flow, and is now pretty fragile. Some issues that 
have been identified:

1. Hand-crafted property file readers that do not support the property file 
format as specified in 
http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)

2. ResolveURI not called on paths read from conf/prop files

3. inconsistent means of merging / overriding values from different sources 
(Some get overridden by file, others by manual settings of field on object, 
Some by properties)

4. Argument validation should be done after combining config files, system 
properties and command line arguments, 

5. Alternate conf file location not handled in shell scripts

6. Some options can only be passed as command line arguments

7. Defaults for options are hard-coded (and sometimes overridden multiple 
times) in many through-out the code e.g. master = local[*]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org