[jira] [Comment Edited] (SPARK-1532) provide option for more restrictive firewall rule in ec2/spark_ec2.py
[ https://issues.apache.org/jira/browse/SPARK-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973819#comment-13973819 ] Art Peel edited comment on SPARK-1532 at 4/19/14 6:11 AM: -- https://github.com/apache/spark/pull/445 (subsequently closed and replaced by https://github.com/apache/spark/pull/453 ) was (Author: foundart): https://github.com/apache/spark/pull/445 > provide option for more restrictive firewall rule in ec2/spark_ec2.py > - > > Key: SPARK-1532 > URL: https://issues.apache.org/jira/browse/SPARK-1532 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 0.9.0 >Reporter: Art Peel >Priority: Minor > > When ec2/spark_ec2.py sets up firewall rules for various ports, it uses an > extremely lenient hard-coded value for allowed IP addresses: '0.0.0.0/0' > It would be very useful for deployments to allow specifying the allowed IP > addresses as a command-line option to ec2/spark_ec2.py. This new > configuration parameter should have as its default the current hard-coded > value, '0.0.0.0/0', so the functionality of ec2/spark_ec2.py will change only > for those users who specify the new option. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1532) provide option for more restrictive firewall rule in ec2/spark_ec2.py
[ https://issues.apache.org/jira/browse/SPARK-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974747#comment-13974747 ] Art Peel commented on SPARK-1532: - the original pull request failed on Travis-CI due to a timeout compiling scala code. That seems extremely unlikely to have resulted from my changes to ec2/spark_ec2.py so I have generated a new pull request: https://github.com/apache/spark/pull/453 > provide option for more restrictive firewall rule in ec2/spark_ec2.py > - > > Key: SPARK-1532 > URL: https://issues.apache.org/jira/browse/SPARK-1532 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 0.9.0 >Reporter: Art Peel >Priority: Minor > > When ec2/spark_ec2.py sets up firewall rules for various ports, it uses an > extremely lenient hard-coded value for allowed IP addresses: '0.0.0.0/0' > It would be very useful for deployments to allow specifying the allowed IP > addresses as a command-line option to ec2/spark_ec2.py. This new > configuration parameter should have as its default the current hard-coded > value, '0.0.0.0/0', so the functionality of ec2/spark_ec2.py will change only > for those users who specify the new option. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1482) Potential resource leaks in saveAsHadoopDataset and saveAsNewAPIHadoopDataset
[ https://issues.apache.org/jira/browse/SPARK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1482: - Assignee: Shixiong Zhu > Potential resource leaks in saveAsHadoopDataset and saveAsNewAPIHadoopDataset > - > > Key: SPARK-1482 > URL: https://issues.apache.org/jira/browse/SPARK-1482 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Labels: easyfix > Fix For: 1.0.0 > > > "writer.close" should be put in the "finally" block to avoid potential > resource leaks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1482) Potential resource leaks in saveAsHadoopDataset and saveAsNewAPIHadoopDataset
[ https://issues.apache.org/jira/browse/SPARK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1482. -- Resolution: Fixed Fix Version/s: 1.0.0 https://github.com/apache/spark/pull/400 > Potential resource leaks in saveAsHadoopDataset and saveAsNewAPIHadoopDataset > - > > Key: SPARK-1482 > URL: https://issues.apache.org/jira/browse/SPARK-1482 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Labels: easyfix > Fix For: 1.0.0 > > > "writer.close" should be put in the "finally" block to avoid potential > resource leaks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1538) SparkUI forgets about all persisted RDD's not directly associated with stages
[ https://issues.apache.org/jira/browse/SPARK-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1538: - Summary: SparkUI forgets about all persisted RDD's not directly associated with stages (was: SparkUI forgets about all persisted RDD's not associated with stages) > SparkUI forgets about all persisted RDD's not directly associated with stages > - > > Key: SPARK-1538 > URL: https://issues.apache.org/jira/browse/SPARK-1538 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.1 >Reporter: Andrew Or >Priority: Blocker > Fix For: 1.0.0 > > > The following command creates two RDDs in one Stage: > sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count > More specifically, parallelize creates one, and map creates another. If we > persist only the first one, it does not actually show up on the StorageTab of > the SparkUI. > This is because StageInfo only keeps around information for the last RDD > associated with the stage, but forgets about all of its parents. The proposal > here is to have StageInfo climb the RDD dependency ladder to keep a list of > all associated RDDInfos. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1538) SparkUI forgets about all persisted RDD's not directly associated with the Stage
[ https://issues.apache.org/jira/browse/SPARK-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1538: - Summary: SparkUI forgets about all persisted RDD's not directly associated with the Stage (was: SparkUI forgets about all persisted RDD's not directly associated with stages) > SparkUI forgets about all persisted RDD's not directly associated with the > Stage > > > Key: SPARK-1538 > URL: https://issues.apache.org/jira/browse/SPARK-1538 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.1 >Reporter: Andrew Or >Priority: Blocker > Fix For: 1.0.0 > > > The following command creates two RDDs in one Stage: > sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count > More specifically, parallelize creates one, and map creates another. If we > persist only the first one, it does not actually show up on the StorageTab of > the SparkUI. > This is because StageInfo only keeps around information for the last RDD > associated with the stage, but forgets about all of its parents. The proposal > here is to have StageInfo climb the RDD dependency ladder to keep a list of > all associated RDDInfos. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1538) SparkUI forgets about all persisted RDD's not associated with stages
Andrew Or created SPARK-1538: Summary: SparkUI forgets about all persisted RDD's not associated with stages Key: SPARK-1538 URL: https://issues.apache.org/jira/browse/SPARK-1538 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1 Reporter: Andrew Or Priority: Blocker Fix For: 1.0.0 The following command creates two RDDs in one Stage: sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count More specifically, parallelize creates one, and map creates another. If we persist only the first one, it does not actually show up on the StorageTab of the SparkUI. This is because StageInfo only keeps around information for the last RDD associated with the stage, but forgets about all of its parents. The proposal here is to have StageInfo climb the RDD dependency ladder to keep a list of all associated RDDInfos. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1537) Add integration with Yarn's Application Timeline Server
Marcelo Vanzin created SPARK-1537: - Summary: Add integration with Yarn's Application Timeline Server Key: SPARK-1537 URL: https://issues.apache.org/jira/browse/SPARK-1537 Project: Spark Issue Type: New Feature Components: YARN Reporter: Marcelo Vanzin It would be nice to have Spark integrate with Yarn's Application Timeline Server (see YARN-321, YARN-1530). This would allow users running Spark on Yarn to have a single place to go for all their history needs, and avoid having to manage a separate service (Spark's built-in server). At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, although there is still some ongoing work. But the basics are there, and I wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1536) Add multiclass classification support to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Amde updated SPARK-1536: --- Description: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Choosing a good strategy for multiclass classification among multiple options: -- add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true -- error-correcting output codes -- one-vs-all - Code implementation - Unit tests - Functional tests - Performance tests - Documentation was: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: -- add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true -- error-correcting output codes -- one-vs-all - Code implementation - Unit tests - Functional tests - Performance tests - Documentation > Add multiclass classification support to MLlib > -- > > Key: SPARK-1536 > URL: https://issues.apache.org/jira/browse/SPARK-1536 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 0.9.0 >Reporter: Manish Amde > > The current decision tree implementation in MLlib only supports binary > classification. This task involves adding multiclass classification support > to the decision tree implementation. > The tasks involves: > - Choosing a good strategy for multiclass classification among multiple > options: > -- add multi class support to impurity but it won't work well with the > categorical features since the centriod-based ordering assumptions won't hold > true > -- error-correcting output codes > -- one-vs-all > - Code implementation > - Unit tests > - Functional tests > - Performance tests > - Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1536) Add multiclass classification support to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Amde updated SPARK-1536: --- Description: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: + Finding the best strategy for multiclass classification among multiple options: - add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation was: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. > Add multiclass classification support to MLlib > -- > > Key: SPARK-1536 > URL: https://issues.apache.org/jira/browse/SPARK-1536 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 0.9.0 >Reporter: Manish Amde > > The current decision tree implementation in MLlib only supports binary > classification. This task involves adding multiclass classification support > to the decision tree implementation. > The tasks involves: > + Finding the best strategy for multiclass classification among multiple > options: > - add multi class support to impurity but it won't work well with the > categorical features since the centriod-based ordering assumptions won't hold > true > - error-correcting output codes > - one-vs-all > + Code implementation > + Unit tests > + Functional tests > + Performance tests > + Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1536) Add multiclass classification support to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Amde updated SPARK-1536: --- Description: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: -- add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true -- error-correcting output codes -- one-vs-all - Code implementation - Unit tests - Functional tests - Performance tests - Documentation was: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: -- add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation > Add multiclass classification support to MLlib > -- > > Key: SPARK-1536 > URL: https://issues.apache.org/jira/browse/SPARK-1536 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 0.9.0 >Reporter: Manish Amde > > The current decision tree implementation in MLlib only supports binary > classification. This task involves adding multiclass classification support > to the decision tree implementation. > The tasks involves: > - Finding the best strategy for multiclass classification among multiple > options: > -- add multi class support to impurity but it won't work well with the > categorical features since the centriod-based ordering assumptions won't hold > true > -- error-correcting output codes > -- one-vs-all > - Code implementation > - Unit tests > - Functional tests > - Performance tests > - Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1536) Add multiclass classification support to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Amde updated SPARK-1536: --- Description: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: - add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation was: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: + Finding the best strategy for multiclass classification among multiple options: - add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation > Add multiclass classification support to MLlib > -- > > Key: SPARK-1536 > URL: https://issues.apache.org/jira/browse/SPARK-1536 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 0.9.0 >Reporter: Manish Amde > > The current decision tree implementation in MLlib only supports binary > classification. This task involves adding multiclass classification support > to the decision tree implementation. > The tasks involves: > - Finding the best strategy for multiclass classification among multiple > options: > - add multi class support to impurity but it won't work well with the > categorical features since the centriod-based ordering assumptions won't hold > true > - error-correcting output codes > - one-vs-all > + Code implementation > + Unit tests > + Functional tests > + Performance tests > + Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1536) Add multiclass classification support to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Amde updated SPARK-1536: --- Description: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: -- add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation was: The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. The tasks involves: - Finding the best strategy for multiclass classification among multiple options: - add multi class support to impurity but it won't work well with the categorical features since the centriod-based ordering assumptions won't hold true - error-correcting output codes - one-vs-all + Code implementation + Unit tests + Functional tests + Performance tests + Documentation > Add multiclass classification support to MLlib > -- > > Key: SPARK-1536 > URL: https://issues.apache.org/jira/browse/SPARK-1536 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 0.9.0 >Reporter: Manish Amde > > The current decision tree implementation in MLlib only supports binary > classification. This task involves adding multiclass classification support > to the decision tree implementation. > The tasks involves: > - Finding the best strategy for multiclass classification among multiple > options: > -- add multi class support to impurity but it won't work well with the > categorical features since the centriod-based ordering assumptions won't hold > true > - error-correcting output codes > - one-vs-all > + Code implementation > + Unit tests > + Functional tests > + Performance tests > + Documentation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1536) Add multiclass classification support to MLlib
Manish Amde created SPARK-1536: -- Summary: Add multiclass classification support to MLlib Key: SPARK-1536 URL: https://issues.apache.org/jira/browse/SPARK-1536 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 0.9.0 Reporter: Manish Amde The current decision tree implementation in MLlib only supports binary classification. This task involves adding multiclass classification support to the decision tree implementation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1229) train on array (in addition to RDD)
[ https://issues.apache.org/jira/browse/SPARK-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974521#comment-13974521 ] Aliaksei Litouka commented on SPARK-1229: - May I start working on this issue? Please assign it to me. > train on array (in addition to RDD) > --- > > Key: SPARK-1229 > URL: https://issues.apache.org/jira/browse/SPARK-1229 > Project: Spark > Issue Type: Story > Components: MLlib >Reporter: Arshak Navruzyan > > since predict method accepts either RDD or Array for consistency so should > train. (particularly since RDD.takeSample() returns Array) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1184) Update the distribution tar.gz to include spark-assembly jar
[ https://issues.apache.org/jira/browse/SPARK-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Grover resolved SPARK-1184. Resolution: Fixed > Update the distribution tar.gz to include spark-assembly jar > > > Key: SPARK-1184 > URL: https://issues.apache.org/jira/browse/SPARK-1184 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 0.9.0 >Reporter: Mark Grover >Assignee: Mark Grover > Fix For: 0.9.0 > > > This JIRA tracks 2 things: > 1. There seems to be something going on in our assembly generation logic > because of which are two assembly jars. > Something like: > {code}spark-assembly_2.10-1.0.0-SNAPSHOT.jar{code} > and > {code}spark-assembly_2.10-1.0.0-SNAPSHOT-hadoop2.0.5-alpha.jar{code} > The former is pretty bogus and doesn't contain any class files and should be > gotten rid of. The latter contains all the good stuff. It essentially is the > uber jar generated by the maven-shade-plugin > 2. The current bigtop-dist profile that builds the maven assembly (a .tar.gz > file) using the maven-assembly-plugin includes the bogus jar and not the > legit spark-assembly jar. We should get rid of the first one from this > assembly (which would happen when we fix #1) and put the legit uber jar in it. > 3. Also, the bigtop-dist profile is meant to exclude the hadoop related jars > from the distribution. It does a good job of doing so for org.apache.hadoop > jars but misses the avro and zookeeper jars that are also provided by hadoop > land. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1184) Update the distribution tar.gz to include spark-assembly jar
[ https://issues.apache.org/jira/browse/SPARK-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974475#comment-13974475 ] Mark Grover commented on SPARK-1184: Committed quite a while ago: https://github.com/apache/spark/commit/cda381f88cc03340fdf7b2d681699babbae2a56e Resolving > Update the distribution tar.gz to include spark-assembly jar > > > Key: SPARK-1184 > URL: https://issues.apache.org/jira/browse/SPARK-1184 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 0.9.0 >Reporter: Mark Grover >Assignee: Mark Grover > Fix For: 0.9.0 > > > This JIRA tracks 2 things: > 1. There seems to be something going on in our assembly generation logic > because of which are two assembly jars. > Something like: > {code}spark-assembly_2.10-1.0.0-SNAPSHOT.jar{code} > and > {code}spark-assembly_2.10-1.0.0-SNAPSHOT-hadoop2.0.5-alpha.jar{code} > The former is pretty bogus and doesn't contain any class files and should be > gotten rid of. The latter contains all the good stuff. It essentially is the > uber jar generated by the maven-shade-plugin > 2. The current bigtop-dist profile that builds the maven assembly (a .tar.gz > file) using the maven-assembly-plugin includes the bogus jar and not the > legit spark-assembly jar. We should get rid of the first one from this > assembly (which would happen when we fix #1) and put the legit uber jar in it. > 3. Also, the bigtop-dist profile is meant to exclude the hadoop related jars > from the distribution. It does a good job of doing so for org.apache.hadoop > jars but misses the avro and zookeeper jars that are also provided by hadoop > land. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (SPARK-1459) EventLoggingListener does not work with "file://" target dir
[ https://issues.apache.org/jira/browse/SPARK-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reopened SPARK-1459: --- Sorry, got confused. This PR is still pending. > EventLoggingListener does not work with "file://" target dir > > > Key: SPARK-1459 > URL: https://issues.apache.org/jira/browse/SPARK-1459 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Marcelo Vanzin > > Bug is simple; FileLogger tries to pass a URL to FileOutputStream's > constructor, and that fails. I'll upload a PR soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1459) EventLoggingListener does not work with "file://" target dir
[ https://issues.apache.org/jira/browse/SPARK-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-1459. --- Resolution: Fixed This was commit 69047506. (If someone with permissions could set me as the assignee that would be great.) > EventLoggingListener does not work with "file://" target dir > > > Key: SPARK-1459 > URL: https://issues.apache.org/jira/browse/SPARK-1459 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Marcelo Vanzin > > Bug is simple; FileLogger tries to pass a URL to FileOutputStream's > constructor, and that fails. I'll upload a PR soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1535) jblas's DoubleMatrix(double[]) ctor creates garbage; avoid
Tor Myklebust created SPARK-1535: Summary: jblas's DoubleMatrix(double[]) ctor creates garbage; avoid Key: SPARK-1535 URL: https://issues.apache.org/jira/browse/SPARK-1535 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 0.9.0 Reporter: Tor Myklebust Priority: Trivial The DoubleMatrix constructor that wraps a double[] and presents it as a row vector in jblas-1.2.3 new's a double[] and then immediately discards it. It is straightforward to replace uses of this constructor with the (int, int, double...) constructor; perhaps this should be done until jblas-1.2.4 is released. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1523) improve the readability of code in AkkaUtil
[ https://issues.apache.org/jira/browse/SPARK-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-1523. Resolution: Fixed Fix Version/s: 1.1.0 > improve the readability of code in AkkaUtil > > > Key: SPARK-1523 > URL: https://issues.apache.org/jira/browse/SPARK-1523 > Project: Spark > Issue Type: Improvement >Reporter: Nan Zhu >Assignee: Nan Zhu >Priority: Trivial > Fix For: 1.1.0 > > > Actually it is separated from https://github.com/apache/spark/pull/85 as > suggested by Reynold > compare > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L122 > and > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L117 > the first one use get and then toLong, the second one getLongbetter to > make them consistent > very very small fix -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1483) Rename minSplits to minPartitions in public APIs
[ https://issues.apache.org/jira/browse/SPARK-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-1483. Resolution: Fixed > Rename minSplits to minPartitions in public APIs > > > Key: SPARK-1483 > URL: https://issues.apache.org/jira/browse/SPARK-1483 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Matei Zaharia >Assignee: Nan Zhu >Priority: Critical > Fix For: 1.0.0 > > > The parameter name is part of the public API in Scala and Python, since you > can pass named parameters to a method, so we should name it to this more > descriptive term. Everywhere else we refer to "splits" as partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1534) spark-submit for yarn prints warnings even though calling as expected
Thomas Graves created SPARK-1534: Summary: spark-submit for yarn prints warnings even though calling as expected Key: SPARK-1534 URL: https://issues.apache.org/jira/browse/SPARK-1534 Project: Spark Issue Type: Bug Components: YARN Reporter: Thomas Graves I am calling spark-submit to submit application to spark on yarn (cluster mode) and it is still printing warnings: $ ./bin/spark-submit examples/target/scala-2.10/spark-examples_2.10-assembly-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --arg yarn-cluster --properties-file ./spark-conf.properties WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn" --args is deprecated. Use --arg instead. The --args is deprecated is coming out because SparkSubmit itself needs to be updated to --arg. Similarly I think the Client.scala class for yarn needs to have the "Use ./bin/spark-submit with "--master yarn"" warning removed since SparkSubmit also calls it directly. I think the last one was supposed to warn users using spark-class directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1485) Implement AllReduce
[ https://issues.apache.org/jira/browse/SPARK-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1485: - Affects Version/s: (was: 1.0.0) > Implement AllReduce > --- > > Key: SPARK-1485 > URL: https://issues.apache.org/jira/browse/SPARK-1485 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > The current implementations of machine learning algorithms rely on the driver > for some computation and data broadcasting. This will create a bottleneck at > the driver for both computation and communication, especially in multi-model > training. An efficient implementation of AllReduce (or AllAggregate) can help > free the driver: > allReduce(RDD[T], (T, T) => T): RDD[T] > This JIRA is created for discussing how to implement AllReduce efficiently > and possible alternatives. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1533) The (kill) button in the web UI is visible to everyone.
[ https://issues.apache.org/jira/browse/SPARK-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1533: - Priority: Blocker (was: Major) > The (kill) button in the web UI is visible to everyone. > --- > > Key: SPARK-1533 > URL: https://issues.apache.org/jira/browse/SPARK-1533 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Xiangrui Meng >Priority: Blocker > > We can kill jobs from web UI now, which is great. But there is no > authentication in the standalone mode, e.g., clusters created by spark-ec2. > Then everyone can visit a standalone server and kill jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1533) The (kill) button in the web UI is visible to everyone.
Xiangrui Meng created SPARK-1533: Summary: The (kill) button in the web UI is visible to everyone. Key: SPARK-1533 URL: https://issues.apache.org/jira/browse/SPARK-1533 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Xiangrui Meng We can kill jobs from web UI now, which is great. But there is no authentication in the standalone mode, e.g., clusters created by spark-ec2. Then everyone can visit a standalone server and kill jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)