[jira] [Commented] (SPARK-31555) Improve cache block migration
[ https://issues.apache.org/jira/browse/SPARK-31555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110732#comment-17110732 ] Dale Richardson commented on SPARK-31555: - Hi [~holden], happy to have a go at this. > Improve cache block migration > - > > Key: SPARK-31555 > URL: https://issues.apache.org/jira/browse/SPARK-31555 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > We should explore the following improvements to cache block migration: > 1) Peer selection (right now may overbalance on certain peers) > 2) Do we need to configure the number of blocks to be migrated at the same > time > 3) Are there any blocks we don't need to replicate (e.g. they are already > stored on the desired number of executors even once we remove the executors > slated for decommissioning). > 4) Do we want to prioritize migrating blocks with no replicas > 5) Log the attempt number for debugging > 6) Clarify the logic for determining the number of replicas > 7) Consider using TestUtils.waitUntilExecutorsUp in tests rather than count > to wait for the executors to come up. imho this is the least important. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31520) Add a readiness probe to spark pod definitions
Dale Richardson created SPARK-31520: --- Summary: Add a readiness probe to spark pod definitions Key: SPARK-31520 URL: https://issues.apache.org/jira/browse/SPARK-31520 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.5 Reporter: Dale Richardson Add a readiness probe to allow so Kubernetes can communicate basic spark pod state to the end user via get/describe pod commands. A basic TCP/SYN probe of the RPC port should be enough to indicate that a spark process is running. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31329) Modify executor monitor to allow for moving shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-31329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088569#comment-17088569 ] Dale Richardson edited comment on SPARK-31329 at 4/21/20, 11:36 AM: Hi Holden, I've been thinking about this issue as well. Blocks can't move, but they can be replicated. Any reason we can't just replicate the blocks out, allow the existing code paths to update the block locations with the block manager master, then unregister the current blocks? We should chat and coordinate efforts. was (Author: tigerquoll): Hi Holden, I've been thinking about this issue as well. Blocks can't move, but they can be replicated. Any reason we can't just replicate the blocks out, allow the existing code paths to update the block locations with the block manager master, then unregister the current blocks? > Modify executor monitor to allow for moving shuffle blocks > -- > > Key: SPARK-31329 > URL: https://issues.apache.org/jira/browse/SPARK-31329 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Minor > > To enable Spark-20629 we need to revisit code that assumes shuffle blocks > don't move. Currently, the executor monitor assumes that shuffle blocks are > immovable. We should modify this code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31329) Modify executor monitor to allow for moving shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-31329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088569#comment-17088569 ] Dale Richardson commented on SPARK-31329: - Hi Holden, I've been thinking about this issue as well. Blocks can't move, but they can be replicated. Any reason we can't just replicate the blocks out, allow the existing code paths to update the block locations with the block manager master, then unregister the current blocks? > Modify executor monitor to allow for moving shuffle blocks > -- > > Key: SPARK-31329 > URL: https://issues.apache.org/jira/browse/SPARK-31329 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Minor > > To enable Spark-20629 we need to revisit code that assumes shuffle blocks > don't move. Currently, the executor monitor assumes that shuffle blocks are > immovable. We should modify this code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22387) propagate session configs to data source read/write options
[ https://issues.apache.org/jira/browse/SPARK-22387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606527#comment-16606527 ] Dale Richardson commented on SPARK-22387: - We've missed shared configs like what is required for Kerberos > propagate session configs to data source read/write options > --- > > Key: SPARK-22387 > URL: https://issues.apache.org/jira/browse/SPARK-22387 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Jiang Xingbo >Priority: Major > Fix For: 2.3.0 > > > This is an open discussion. The general idea is we should allow users to set > some common configs in session conf so that they don't need to type them > again and again for each data source operations. > Proposal 1: > propagate every session config which starts with {{spark.datasource.config.}} > to data source options. The downside is, users may only want to set some > common configs for a specific data source. > Proposal 2: > propagate session config which starts with > {{spark.datasource.config.myDataSource.}} only to {{myDataSource}} > operations. One downside is, some data source may not have a short name and > makes the config key pretty long, e.g. > {{spark.datasource.config.com.company.foo.bar.key1}}. > Proposal 3: > Introduce a trait `WithSessionConfig` which defines session config key > prefix. Then we can pick session configs with this key-prefix and propagate > it to this particular data source. > One another thing also worth to think: sometimes it's really annoying if > users have a typo in the config key and spend a lot of time to figure out why > things don't work as expected. We should allow data source to validate the > given options and throw exception if an option can't be recognized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25329) Support passing Kerberos configuration information
Dale Richardson created SPARK-25329: --- Summary: Support passing Kerberos configuration information Key: SPARK-25329 URL: https://issues.apache.org/jira/browse/SPARK-25329 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.1 Reporter: Dale Richardson The current V2 Datasource API provides support for querying a portion of the SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API. This was designed with the assumption that all configuration information for v2 data sources should be separate from each other. Unfortunately, there are some cross-cutting concerns such as authentication that touch multiple data sources - this means that common configuration items need to be shared amongst multiple data sources. In particular, Kerberos setup can use the following configuration items: # userPrincipal, spark configuration:: spark.yarn.principal # userKeytabPath spark configuration: spark.yarn.keytab # krb5ConfPath: java.security.krb5.conf # kerberos debugging flag: sun.security.krb5.debug # spark.security.credentials.${service}.enabled # JAAS config: java.security.auth.login.config ?? # ZKServerPrincipal ?? So potential solutions to pass this information to various data sources are: # Pass the entire SparkContext object to data sources (not likely) # Pass the entire SparkConfig Map object to data sources # Pass all required configuration via environment variables # Extend SessionConfigSupport to support passing specific white-listed configuration values # Add a specific data source v2 API "SupportsKerberos" so that a data source can indicate that it supports Kerberos and also provide the means to pass needed configuration info. # Expand out all Kerberos configuration items to be in each data source config namespace that needs it. If the data source requires TLS support then we also need to support passing all the configuration values under "spark.ssl.*" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165448#comment-16165448 ] Dale Richardson edited comment on SPARK-21418 at 9/13/17 11:18 PM: --- I'm getting the same stackdump in 2.2.0 while using apache-avro (current snapshot) in structured streaming with a globbed hdfs input path. I do not get this error if I use a non-globbed input path. I should note that this is without setting sun.io.serialization.extendedDebugInfo at all. was (Author: tigerquoll): I'm getting the same stackdump in 2.2.0 while using apache-avro (current snapshot) in structured streaming with a globbed hdfs input path. I do not get this error if I use a non-globbed input path. > NoSuchElementException: None.get in DataSourceScanExec with > sun.io.serialization.extendedDebugInfo=true > --- > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos >Assignee: Sean Owen >Priority: Minor > Fix For: 2.2.1, 2.3.0 > > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:
[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165448#comment-16165448 ] Dale Richardson commented on SPARK-21418: - I'm getting the same stackdump in 2.2.0 while using apache-avro (current snapshot) in structured streaming with a globbed hdfs input path. I do not get this error if I use a non-globbed input path. > NoSuchElementException: None.get in DataSourceScanExec with > sun.io.serialization.extendedDebugInfo=true > --- > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos >Assignee: Sean Owen >Priority: Minor > Fix For: 2.2.1, 2.3.0 > > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream
[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6593: --- Description: When reading a large amount of gzip files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"), If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted file and continue the job. was: When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted file and continue the job. > Provide option for HadoopRDD to skip corrupted files > > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of gzip files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"), If the hadoop input libraries > report an exception then the entire job is canceled. As default behaviour > this is probably for the best, but it would be nice in some circumstances > where you know it will be ok to have the option to skip the corrupted file > and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6593: --- Description: When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted file and continue the job. was: When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted portion and continue the job. > Provide option for HadoopRDD to skip corrupted files > > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries > report an exception then the entire job is canceled. As default behaviour > this is probably for the best, but it would be nice in some circumstances > where you know it will be ok to have the option to skip the corrupted file > and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6593: --- Summary: Provide option for HadoopRDD to skip corrupted files (was: Provide option for HadoopRDD to skip bad data splits.) > Provide option for HadoopRDD to skip corrupted files > > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries > report an exception then the entire job is canceled. As default behaviour > this is probably for the best, but it would be nice in some circumstances > where you know it will be ok to have the option to skip the corrupted portion > and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385723#comment-14385723 ] Dale Richardson commented on SPARK-6593: Changed the title and description to focus closer on my particular use case, which is corrupted gzip files. > Provide option for HadoopRDD to skip corrupted files > > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries > report an exception then the entire job is canceled. As default behaviour > this is probably for the best, but it would be nice in some circumstances > where you know it will be ok to have the option to skip the corrupted portion > and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6593: --- Description: When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted portion and continue the job. was: When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted portion and continue the job. > Provide option for HadoopRDD to skip bad data splits. > - > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If the hadoop input libraries > report an exception then the entire job is canceled. As default behaviour > this is probably for the best, but it would be nice in some circumstances > where you know it will be ok to have the option to skip the corrupted portion > and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385716#comment-14385716 ] Dale Richardson edited comment on SPARK-6593 at 3/29/15 11:35 AM: -- With a gz file for example, the entire file is a split. so a corrupted gz file will kill the entire job - with no way of catching and remediating the error. was (Author: tigerquoll): With a gz file for example, the entire file is a split. so a corrupted gz file will kill the entire job. > Provide option for HadoopRDD to skip bad data splits. > - > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted > then the entire job is canceled. As default behaviour this is probably for > the best, but it would be nice in some circumstances where you know it will > be ok to have the option to skip the corrupted portion and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.
[ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385716#comment-14385716 ] Dale Richardson commented on SPARK-6593: With a gz file for example, the entire file is a split. so a corrupted gz file will kill the entire job. > Provide option for HadoopRDD to skip bad data splits. > - > > Key: SPARK-6593 > URL: https://issues.apache.org/jira/browse/SPARK-6593 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Dale Richardson >Priority: Minor > > When reading a large amount of files from HDFS eg. with > sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted > then the entire job is canceled. As default behaviour this is probably for > the best, but it would be nice in some circumstances where you know it will > be ok to have the option to skip the corrupted portion and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.
Dale Richardson created SPARK-6593: -- Summary: Provide option for HadoopRDD to skip bad data splits. Key: SPARK-6593 URL: https://issues.apache.org/jira/browse/SPARK-6593 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: Dale Richardson Priority: Minor When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted portion and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support for bracketed expressions and standard precedence rules. * Support for and normalisation of common units of scale eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. --cores "numCores - 1" spark.executor.memory = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 A safety features has been added so that the expression evaluator is only used if the configuration strings contains a magic character (currently '!') as the first character. was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support for bracketed expressions and standard precedence rules. * Support for and normalisation of common units of scale eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. --cores "numCores - 1" spark.executor.memory = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support for bracketed expressions > and standard precedence rules. > * Support for and normalisation of common units of scale eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for
[jira] [Commented] (SPARK-6396) Add timeout control for broadcast
[ https://issues.apache.org/jira/browse/SPARK-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366849#comment-14366849 ] Dale Richardson commented on SPARK-6396: No problems. > Add timeout control for broadcast > - > > Key: SPARK-6396 > URL: https://issues.apache.org/jira/browse/SPARK-6396 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Spark Core >Affects Versions: 1.3.0, 1.3.1 >Reporter: Jun Fang >Priority: Minor > > TorrentBroadcast uses fetchBlockSync method of BlockTransferService.scala > which call Await.result(result.future, Duration.Inf). In production > environment this may cause a hang out when driver and executor are in > different date centers. A timeout here would be better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6396) Add timeout control for broadcast
[ https://issues.apache.org/jira/browse/SPARK-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366799#comment-14366799 ] Dale Richardson commented on SPARK-6396: If nobody else is looking at this one I'll have a look at it. > Add timeout control for broadcast > - > > Key: SPARK-6396 > URL: https://issues.apache.org/jira/browse/SPARK-6396 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Spark Core >Affects Versions: 1.3.0, 1.3.1 >Reporter: Jun Fang >Priority: Minor > > TorrentBroadcast uses fetchBlockSync method of BlockTransferService.scala > which call Await.result(result.future, Duration.Inf). In production > environment this may cause a hang out when driver and executor are in > different date centers. A timeout here would be better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support for bracketed expressions and standard precedence rules. * Support for and normalisation of common units of scale eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. --cores "numCores - 1" spark.executor.memory = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support for bracketed expressions and standard precedence rules. * Support for and normalisation of common units of scale eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support for bracketed expressions > and standard precedence rules. > * Support for and normalisation of common units of scale eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > numCores: Number of cores assigned to the JVM > physicalMemoryBytes: Memory size
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support for bracketed expressions and standard precedence rules. * Support for and normalisation of common units of scale eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support for bracketed expressions > and standard precedence rules. > * Support for and normalisation of common units of scale eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > numCores: Number of cores assigned to the JVM > physicalMemoryBytes: Me
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > numCores: Number of cores assigned to the JVM > phys
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigne
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM ** JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to th
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to the
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMTotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to th
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM **JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigne
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM ** JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM **JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigne
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. * Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. * Allow for the referencing of basic environmental information currently defined as: ** numCores: Number of cores assigned to the JVM ** physicalMemoryBytes: Memory size of hosting machine ** JVMtotalMemoryBytes: current bytes of memory allocated to the JVM **JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes * Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: * Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds * Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) * Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB * Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. *Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. *Allow for the referencing of basic environmental information currently defined as: **numCores: Number of cores assigned to the JVM **physicalMemoryBytes: Memory size of hosting machine **JVMtotalMemoryBytes: current bytes of memory allocated to the JVM **JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes *Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: *Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds *Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) *Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB *Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > * Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > * Allow for the referencing of basic environmental information currently > defined as: > ** numCores: Number of cores assigned to the JV
[jira] [Updated] (SPARK-6214) Allow configuration options to use a simple expression language
[ https://issues.apache.org/jira/browse/SPARK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-6214: --- Description: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: * Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. *Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. *Allow for the referencing of basic environmental information currently defined as: **numCores: Number of cores assigned to the JVM **physicalMemoryBytes: Memory size of hosting machine **JVMtotalMemoryBytes: current bytes of memory allocated to the JVM **JVMmaxMemoryBytes:Maximum number of bytes of memory available to the **JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes *Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: *Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds *Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) *Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB *Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 was: This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMtotalMemoryBytes: current bytes of memory allocated to the JVM JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 > Allow configuration options to use a simple expression language > --- > > Key: SPARK-6214 > URL: https://issues.apache.org/jira/browse/SPARK-6214 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Dale Richardson >Priority: Minor > > This is a proposal to allow for configuration options to be specified via a > simple expression language. This language would have the following features: > * Allow for basic arithmetic (+-/*) with support bracketed expressions and > standard precedence rules. > *Support for and normalisation of common units of reference eg. MB, GB, > seconds,minutes,hours, days and weeks. > *Allow for the referencing of basic environmental information currently > defined as: > **numCores: Number of cores assigned to the JVM > **physicalMemoryBytes: Memo
[jira] [Created] (SPARK-6214) Allow configuration options to use a simple expression language
Dale Richardson created SPARK-6214: -- Summary: Allow configuration options to use a simple expression language Key: SPARK-6214 URL: https://issues.apache.org/jira/browse/SPARK-6214 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Dale Richardson Priority: Minor This is a proposal to allow for configuration options to be specified via a simple expression language. This language would have the following features: Allow for basic arithmetic (+-/*) with support bracketed expressions and standard precedence rules. Support for and normalisation of common units of reference eg. MB, GB, seconds,minutes,hours, days and weeks. Allow for the referencing of basic environmental information currently defined as: numCores: Number of cores assigned to the JVM physicalMemoryBytes: Memory size of hosting machine JVMtotalMemoryBytes: current bytes of memory allocated to the JVM JVMmaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMfreeMemoryBytes: maxMemoryBytes - totalMemoryBytes Allow for the limited referencing of other configuration values when specifying values. (Other configuration values must be initialised and explicitly passed into the expression evaluator for this functionality to be enabled). Such a feature would have the following end-user benefits: Allow for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds Have a consistent means of entering configuration information regardless of the configuration option being added. (eg questions such as ‘is the particular option specified in ms or seconds?’ become irrelevant, because the user can pick what ever unit makes sense for the magnitude of the value they are specifying) Allow for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB Being able to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306899#comment-14306899 ] Dale Richardson commented on SPARK-5388: Heh Andrew, definitely starting to look a bit more Rest-like in the protocol! Http Delete should be used for your kill request - it is considered best practice The primary resource you are dealing with is a submission - this should form the base of your url structure. For a rest protocol, actions/verbs are used to affect these resources - so they are mapped to to the HTTP operations of GET/POST/DELETE/HEAD/OPTIONS etc, against the resources defined by the full url. Full URLs serve to identify the resources that these actions are performed on. GET/DELETE are used where the full identity of the resource is known at the time of generating the request, POST is used when you may not know the address of the resource at the time of generating the request (eg When submitting a program to run, you will not know submission id because it is returned by the request) So, taking this into account: RequestSubmitDriver → POST /submission/create RequestKillDriver → DELETE /submission/[submissionId] RequestDriverStatus → GET /submission/[submissionId]/status - The resource is the submission, so the current status of the submission in a sub-resource of the submission, other sub entries such as /submission/[submissionId]/performanceCounters could be added in the future without affecting existing clients. > Provide a stable application submission gateway in standalone cluster mode > -- > > Key: SPARK-5388 > URL: https://issues.apache.org/jira/browse/SPARK-5388 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf > > > The existing submission gateway in standalone mode is not compatible across > Spark versions. If you have a newer version of Spark submitting to an older > version of the standalone Master, it is currently not guaranteed to work. The > goal is to provide a stable REST interface to replace this channel. > For more detail, please see the most recent design doc attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847 ] Dale Richardson edited comment on SPARK-5388 at 1/26/15 5:39 AM: - Hi Andrew, I think the idea is well worth considering. I have a question if there is an intention for other entities (such as job servers) to communicate with the master at all? If so then the proposed gateway is semantically defined at a fairly low level (just RPC over JSON/HTTP). This is fine if the interface is not going to be exposed to anybody who is not a spark developer with detailed knowledge of spark internals. Did you use the term “REST” to simply mean RPC over JSON/HTTP? Creating a REST interface is more then a HTTP RPC gateway. If the interface is going to be exposed to 3rd parties (such as developers of Job servers and web notebooks etc) then there is a benefit to simplifying some of the exposed application semantics, and exposing an API that is more integrated with HTTP’s protocol semantics which most people are already familiar with - this is what a true REST interface does and if you are defining an endpoint for others to use it is a very powerful concept that allows other people to quickly grasp how to properly use the exposed interface. A rough sketch of a more “REST”ed version of the API would be: *Submit_driver_request* HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver Responds with standard HTTP Response including allocated DRIVER_ID if driver submission ok, http error codes with spark specific error if not. *Get status of DRIVER* HTTP GET http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver execution. If no record of driver_id, then http error code 404 (Not found) returned. *Kill Driver request* HTTP DELETE http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver kill request, or http error code if an error occurs. I would be happy to prototype something like this up to test the concept out for you if you are looking for something more than just RPC over JSON/HTTP. was (Author: tigerquoll): Hi Andrew, I think the idea is well worth considering. In response to the requirement of making it easier for client<-> master communications to pass through restrictive fire-walling, have you considered just using Akka's REST gateway ( http://doc.akka.io/docs/akka/1.3.1/scala/http.html )? I also have a question if there is an intention for other entities (such as job servers) to communicate with the master at all? If so then the proposed gateway is semantically defined at a fairly low level (just RPC over JSON/HTTP). This is fine if the interface is not going to be exposed to anybody who is not a spark developer with detailed knowledge of spark internals. Did you use the term “REST” to simply mean RPC over JSON/HTTP? Creating a REST interface is more then a HTTP RPC gateway. If the interface is going to be exposed to 3rd parties (such as developers of Job servers and web notebooks etc) then there is a benefit to simplifying some of the exposed application semantics, and exposing an API that is more integrated with HTTP’s protocol semantics which most people are already familiar with - this is what a true REST interface does and if you are defining an endpoint for others to use it is a very powerful concept that allows other people to quickly grasp how to properly use the exposed interface. A rough sketch of a more “REST”ed version of the API would be: *Submit_driver_request* HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver Responds with standard HTTP Response including allocated DRIVER_ID if driver submission ok, http error codes with spark specific error if not. *Get status of DRIVER* HTTP GET http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver execution. If no record of driver_id, then http error code 404 (Not found) returned. *Kill Driver request* HTTP DELETE http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver kill request, or http error code if an error occurs. I would be happy to prototype something like this up to test the concept out for you if you are looking for something more than just RPC over JSON/HTTP. > Provide a stable application submission gateway > --- > > Key: SPARK-5388 > URL: https://issues.apache.org/jira/browse/SPARK-5388 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Attachments: Stable Spark Standalone Submission.pdf > > > The existing submission gateway
[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847 ] Dale Richardson edited comment on SPARK-5388 at 1/26/15 5:30 AM: - Hi Andrew, I think the idea is well worth considering. In response to the requirement of making it easier for client<-> master communications to pass through restrictive fire-walling, have you considered just using Akka's REST gateway ( http://doc.akka.io/docs/akka/1.3.1/scala/http.html )? I also have a question if there is an intention for other entities (such as job servers) to communicate with the master at all? If so then the proposed gateway is semantically defined at a fairly low level (just RPC over JSON/HTTP). This is fine if the interface is not going to be exposed to anybody who is not a spark developer with detailed knowledge of spark internals. Did you use the term “REST” to simply mean RPC over JSON/HTTP? Creating a REST interface is more then a HTTP RPC gateway. If the interface is going to be exposed to 3rd parties (such as developers of Job servers and web notebooks etc) then there is a benefit to simplifying some of the exposed application semantics, and exposing an API that is more integrated with HTTP’s protocol semantics which most people are already familiar with - this is what a true REST interface does and if you are defining an endpoint for others to use it is a very powerful concept that allows other people to quickly grasp how to properly use the exposed interface. A rough sketch of a more “REST”ed version of the API would be: *Submit_driver_request* HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver Responds with standard HTTP Response including allocated DRIVER_ID if driver submission ok, http error codes with spark specific error if not. *Get status of DRIVER* HTTP GET http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver execution. If no record of driver_id, then http error code 404 (Not found) returned. *Kill Driver request* HTTP DELETE http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver kill request, or http error code if an error occurs. I would be happy to prototype something like this up to test the concept out for you if you are looking for something more than just RPC over JSON/HTTP. was (Author: tigerquoll): Hi Andrew, I think the idea is well worth considering. In response to the requirement of making it easier for client<-> master communications to pass through restrictive fire-walling, have you considered just using Akka's REST gateway (http://doc.akka.io/docs/akka/1.3.1/scala/http.html)? I also have a question if there is an intention for other entities (such as job servers) to communicate with the master at all? If so then the proposed gateway is semantically defined at a fairly low level (just RPC over JSON/HTTP). This is fine if the interface is not going to be exposed to anybody who is not a spark developer with detailed knowledge of spark internals. Did you use the term “REST” to simply mean RPC over JSON/HTTP? Creating a REST interface is more then a HTTP RPC gateway. If the interface is going to be exposed to 3rd parties (such as developers of Job servers and web notebooks etc) then there is a benefit to simplifying some of the exposed application semantics, and exposing an API that is more integrated with HTTP’s protocol semantics which most people are already familiar with - this is what a true REST interface does and if you are defining an endpoint for others to use it is a very powerful concept that allows other people to quickly grasp how to properly use the exposed interface. A rough sketch of a more “REST”ed version of the API would be: *Submit_driver_request* HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver Responds with standard HTTP Response including allocated DRIVER_ID if driver submission ok, http error codes with spark specific error if not. *Get status of DRIVER* HTTP GET http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver execution. If no record of driver_id, then http error code 404 (Not found) returned. *Kill Driver request* HTTP DELETE http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver kill request, or http error code if an error occurs. I would be happy to prototype something like this up to test the concept out for you if you are looking for something more than just RPC over JSON/HTTP. > Provide a stable application submission gateway > --- > > Key: SPARK-5388 > URL: https://issues.apache.org/jira/browse/SPARK-5388 > Project: Spark > Issue Type: Bug > Componen
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290847#comment-14290847 ] Dale Richardson commented on SPARK-5388: Hi Andrew, I think the idea is well worth considering. In response to the requirement of making it easier for client<-> master communications to pass through restrictive fire-walling, have you considered just using Akka's REST gateway (http://doc.akka.io/docs/akka/1.3.1/scala/http.html)? I also have a question if there is an intention for other entities (such as job servers) to communicate with the master at all? If so then the proposed gateway is semantically defined at a fairly low level (just RPC over JSON/HTTP). This is fine if the interface is not going to be exposed to anybody who is not a spark developer with detailed knowledge of spark internals. Did you use the term “REST” to simply mean RPC over JSON/HTTP? Creating a REST interface is more then a HTTP RPC gateway. If the interface is going to be exposed to 3rd parties (such as developers of Job servers and web notebooks etc) then there is a benefit to simplifying some of the exposed application semantics, and exposing an API that is more integrated with HTTP’s protocol semantics which most people are already familiar with - this is what a true REST interface does and if you are defining an endpoint for others to use it is a very powerful concept that allows other people to quickly grasp how to properly use the exposed interface. A rough sketch of a more “REST”ed version of the API would be: *Submit_driver_request* HTTP POST JSON body of request http://host:port/SparkMaster?SubmitDriver Responds with standard HTTP Response including allocated DRIVER_ID if driver submission ok, http error codes with spark specific error if not. *Get status of DRIVER* HTTP GET http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver execution. If no record of driver_id, then http error code 404 (Not found) returned. *Kill Driver request* HTTP DELETE http://host:port/SparkMaster/Drivers/ Responds with JSON body containing information on driver kill request, or http error code if an error occurs. I would be happy to prototype something like this up to test the concept out for you if you are looking for something more than just RPC over JSON/HTTP. > Provide a stable application submission gateway > --- > > Key: SPARK-5388 > URL: https://issues.apache.org/jira/browse/SPARK-5388 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Attachments: Stable Spark Standalone Submission.pdf > > > The existing submission gateway in standalone mode is not compatible across > Spark versions. If you have a newer version of Spark submitting to an older > version of the standalone Master, it is currently not guaranteed to work. The > goal is to provide a stable REST interface to replace this channel. > The first cut implementation will target standalone cluster mode because > there are very few messages exchanged. The design, however, will be general > enough to eventually support this for other cluster managers too. Note that > this is not necessarily required in YARN because we already use YARN's stable > interface to submit applications there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4787) Resource unreleased during failure in SparkContext initialization
[ https://issues.apache.org/jira/browse/SPARK-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258974#comment-14258974 ] Dale Richardson commented on SPARK-4787: Pretty simple fix - Can somebody assign this to me? > Resource unreleased during failure in SparkContext initialization > - > > Key: SPARK-4787 > URL: https://issues.apache.org/jira/browse/SPARK-4787 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Jacky Li > Fix For: 1.3.0 > > > When client creates a SparkContext, currently there are many val to > initialize during object initialization. But when there is failure > initializing these val, like throwing an exception, the resources in this > SparkContext is not released properly. > For example, SparkUI object is created and bind to the HTTP server during > initialization using > {{ui.foreach(_.bind())}} > but if anything goes wrong after this code (say throwing an exception when > creating DAGScheduler), the SparkUI server is not stopped, thus the port bind > will fail again in the client when creating another SparkContext. So > basically this leads to a situation that the client can not create another > SparkContext in the same process, which I think it is not reasonable. > So, I suggest to refactor the SparkContext code to release resource when > there is failure during in initialization. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144095#comment-14144095 ] Dale Richardson commented on SPARK-3620: Due to typesafe conf being based on a JSON-iike tree structure of config values, it will never support non-common prefixes on config variable. So I've gone back to using property objects > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] > Initial proposal is to use typesafe conf to read in the config information > and merge the various config sources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-3620: --- Description: I'm proposing its time to refactor the configuration argument handling code in spark-submit. The code has grown organically in a short period of time, handles a pretty complicated logic flow, and is now pretty fragile. Some issues that have been identified: 1. Hand-crafted property file readers that do not support the property file format as specified in http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) 2. ResolveURI not called on paths read from conf/prop files 3. inconsistent means of merging / overriding values from different sources (Some get overridden by file, others by manual settings of field on object, Some by properties) 4. Argument validation should be done after combining config files, system properties and command line arguments, 5. Alternate conf file location not handled in shell scripts 6. Some options can only be passed as command line arguments 7. Defaults for options are hard-coded (and sometimes overridden multiple times) in many through-out the code e.g. master = local[*] Initial proposal is to use typesafe conf to read in the config information and merge the various config sources was: I'm proposing its time to refactor the configuration argument handling code in spark-submit. The code has grown organically in a short period of time, handles a pretty complicated logic flow, and is now pretty fragile. Some issues that have been identified: 1. Hand-crafted property file readers that do not support the property file format as specified in http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) 2. ResolveURI not called on paths read from conf/prop files 3. inconsistent means of merging / overriding values from different sources (Some get overridden by file, others by manual settings of field on object, Some by properties) 4. Argument validation should be done after combining config files, system properties and command line arguments, 5. Alternate conf file location not handled in shell scripts 6. Some options can only be passed as command line arguments 7. Defaults for options are hard-coded (and sometimes overridden multiple times) in many through-out the code e.g. master = local[*] > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] > Initial proposal is to use typesafe conf to read in the config information > and merge the various config sources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142394#comment-14142394 ] Dale Richardson edited comment on SPARK-3620 at 9/21/14 9:30 AM: - Seems to be discussion about moving to typesafe config and back again for version 0.9 http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html was (Author: tigerquoll): Seems to be discussion about moving to typesafe config and back again at http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142395#comment-14142395 ] Dale Richardson commented on SPARK-3620: Also somes notes about config properties that do not have unique prefixes at http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html Its seems the following options have non-unique prefixes, which means that some typesafe conf functionality may be broken spark.locality.wait spark.locality.wait.node spark.locality.wait.process spark.locality.wait.rack spark.speculation spark.speculation.interval spark.speculation.multiplier spark.speculation.quantile > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142394#comment-14142394 ] Dale Richardson commented on SPARK-3620: Seems to be discussion about moving to typesafe config and back again at http://apache-spark-developers-list.1001551.n3.nabble.com/Moving-to-Typesafe-Config-td381.html > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Richardson updated SPARK-3620: --- Summary: Refactor config option handling code for spark-submit (was: Refactor parameter handling code for spark-submit) > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3620) Refactor parameter handling code for spark-submit
Dale Richardson created SPARK-3620: -- Summary: Refactor parameter handling code for spark-submit Key: SPARK-3620 URL: https://issues.apache.org/jira/browse/SPARK-3620 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.1.0, 1.0.0 Reporter: Dale Richardson Priority: Minor I'm proposing its time to refactor the configuration argument handling code in spark-submit. The code has grown organically in a short period of time, handles a pretty complicated logic flow, and is now pretty fragile. Some issues that have been identified: 1. Hand-crafted property file readers that do not support the property file format as specified in http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) 2. ResolveURI not called on paths read from conf/prop files 3. inconsistent means of merging / overriding values from different sources (Some get overridden by file, others by manual settings of field on object, Some by properties) 4. Argument validation should be done after combining config files, system properties and command line arguments, 5. Alternate conf file location not handled in shell scripts 6. Some options can only be passed as command line arguments 7. Defaults for options are hard-coded (and sometimes overridden multiple times) in many through-out the code e.g. master = local[*] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org