[jira] [Commented] (SPARK-1222) Logistic Regression (+ regularized variants)

2014-04-07 Thread Martin Jaggi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962007#comment-13962007
 ] 

Martin Jaggi commented on SPARK-1222:
-

is resolved, right?

 Logistic Regression (+ regularized variants)
 

 Key: SPARK-1222
 URL: https://issues.apache.org/jira/browse/SPARK-1222
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ameet Talwalkar
Assignee: Shivaram Venkataraman

 Implement Logistic Regression using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1433:
-

Description: 
HBase 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0

  was:
HBase 0.14.0 was released 6 months ago.
Upgrade HBase dependency to 0.17.0


 Upgrade Mesos dependency to 0.17.0
 --

 Key: SPARK-1433
 URL: https://issues.apache.org/jira/browse/SPARK-1433
 Project: Spark
  Issue Type: Task
Reporter: Sandeep Singh
Priority: Minor

 HBase 0.14.0 was released 6 months ago.
 Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1433:
-

Description: 
Mesos 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0

  was:
HBase 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0


 Upgrade Mesos dependency to 0.17.0
 --

 Key: SPARK-1433
 URL: https://issues.apache.org/jira/browse/SPARK-1433
 Project: Spark
  Issue Type: Task
Reporter: Sandeep Singh
Priority: Minor

 Mesos 0.14.0 was released 6 months ago.
 Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1434) Make labelParser Java friendly.

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-1434:
-

Component/s: MLlib

 Make labelParser Java friendly.
 ---

 Key: SPARK-1434
 URL: https://issues.apache.org/jira/browse/SPARK-1434
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor
 Fix For: 1.0.0


 MLUtils#loadLibSVMData uses an anonymous function for the label parser. Java 
 users won't like it. So I make a trait for LabelParser and provide two 
 implementations: binary and multiclass.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1432:
---

Assignee: Davis Shepherd

 Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker
 -

 Key: SPARK-1432
 URL: https://issues.apache.org/jira/browse/SPARK-1432
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.9.0
Reporter: Davis Shepherd
Assignee: Davis Shepherd
 Fix For: 1.0.0, 0.9.2


 JobProgressTracker continuously cleans up old metadata as per the 
 spark.ui.retainedStages configuration parameter. It seems however that not 
 all metadata maps are being cleaned, in particular stageIdToExecutorSummaries 
 could grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1432.


   Resolution: Fixed
Fix Version/s: 0.9.2
   1.0.0

 Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker
 -

 Key: SPARK-1432
 URL: https://issues.apache.org/jira/browse/SPARK-1432
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.9.0
Reporter: Davis Shepherd
Assignee: Davis Shepherd
 Fix For: 1.0.0, 0.9.2


 JobProgressTracker continuously cleans up old metadata as per the 
 spark.ui.retainedStages configuration parameter. It seems however that not 
 all metadata maps are being cleaned, in particular stageIdToExecutorSummaries 
 could grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2014-04-07 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962026#comment-13962026
 ] 

Matei Zaharia commented on SPARK-1021:
--

Note that if we do this, we'll need a similar fix in Python, which may be 
trickier.

 sortByKey() launches a cluster job when it shouldn't
 

 Key: SPARK-1021
 URL: https://issues.apache.org/jira/browse/SPARK-1021
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.8.0, 0.9.0
Reporter: Andrew Ash
  Labels: starter

 The sortByKey() method is listed as a transformation, not an action, in the 
 documentation.  But it launches a cluster job regardless.
 http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
 Some discussion on the mailing list suggested that this is a problem with the 
 rdd.count() call inside Partitioner.scala's rangeBounds method.
 https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L102
 Josh Rosen suggests that rangeBounds should be made into a lazy variable:
 {quote}
 I wonder whether making RangePartitoner .rangeBounds into a lazy val would 
 fix this 
 (https://github.com/apache/incubator-spark/blob/6169fe14a140146602fb07cfcd13eee6efad98f9/core/src/main/scala/org/apache/spark/Partitioner.scala#L95).
   We'd need to make sure that rangeBounds() is never called before an action 
 is performed.  This could be tricky because it's called in the 
 RangePartitioner.equals() method.  Maybe it's sufficient to just compare the 
 number of partitions, the ids of the RDDs used to create the 
 RangePartitioner, and the sort ordering.  This still supports the case where 
 I range-partition one RDD and pass the same partitioner to a different RDD.  
 It breaks support for the case where two range partitioners created on 
 different RDDs happened to have the same rangeBounds(), but it seems unlikely 
 that this would really harm performance since it's probably unlikely that the 
 range partitioners are equal by chance.
 {quote}
 Can we please make this happen?  I'll send a PR on GitHub to start the 
 discussion and testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1403) Mesos on Spark does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1403:
---

Summary: Mesos on Spark does not set Thread's context class loader  (was:  
java.lang.ClassNotFoundException - spark on mesos)

 Mesos on Spark does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker

 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1403:
---

Summary: Spark on Mesos does not set Thread's context class loader  (was: 
Mesos on Spark does not set Thread's context class loader)

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker

 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962032#comment-13962032
 ] 

Patrick Wendell commented on SPARK-1403:


The underlying issue here is that we've made assumptions in various parts of 
the codebase that the context classloader is set on a thread. In general, we 
should relax these assumptions and just fallback to the classloader that loaded 
Spark. As a workaround this patch:

https://github.com/apache/spark/pull/322/files

just manually sets the classloader to the system class loader.

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker

 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-04-07 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962048#comment-13962048
 ] 

Xiangrui Meng commented on SPARK-1406:
--

I think we should support PMML import/export in MLlib. PMML also provides 
feature transformations, which MLlib has very limited support at this time. The 
question is 1) how we take leverage on existing PMML packages, 2)  how many 
people volunteer.

Sean, it would be super helpful if you can share some experience on Oryx's PMML 
support, since I'm also not sure about whether this is the right time to start.

 PMML model evaluation support via MLib
 --

 Key: SPARK-1406
 URL: https://issues.apache.org/jira/browse/SPARK-1406
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Thomas Darimont

 It would be useful if spark would provide support the evaluation of PMML 
 models (http://www.dmg.org/v4-2/GeneralStructure.html).
 This would allow to use analytical models that were created with a 
 statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
 would perform the actual model evaluation for a given input tuple. The PMML 
 model would then just contain the parameterization of an analytical model.
 Other projects like JPMML-Evaluator do a similar thing.
 https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1218) Minibatch SGD with random sampling

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1218.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Fixed in 0.9.0 or an earlier version.

 Minibatch SGD with random sampling
 --

 Key: SPARK-1218
 URL: https://issues.apache.org/jira/browse/SPARK-1218
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ameet Talwalkar
Assignee: Shivaram Venkataraman
 Fix For: 0.9.0


 Takes a gradient function as input.  At each iteration, we run stochastic 
 gradient descent locally on each worker with a fraction of the data points 
 selected randomly and with replacement (i.e., sampled points may overlap 
 across iterations).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1217) Add proximal gradient updater.

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1217.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

 Add proximal gradient updater.
 --

 Key: SPARK-1217
 URL: https://issues.apache.org/jira/browse/SPARK-1217
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Ameet Talwalkar
 Fix For: 0.9.0


 Add proximal gradient updater, in particular for L1 regularization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1219) Minibatch SGD with disjoint partitions

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1219.
--

Resolution: Fixed

Implemented in 0.9.0 or an earlier version.

 Minibatch SGD with disjoint partitions
 --

 Key: SPARK-1219
 URL: https://issues.apache.org/jira/browse/SPARK-1219
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ameet Talwalkar

 Takes a gradient function as input.  At each iteration, we run stochastic 
 gradient descent locally on each worker with a fraction (alpha) of the data 
 points selected randomly and disjointly (i.e., we ensure that we touch all 
 datapoints after at most 1/alpha iterations).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1099) Allow inferring number of cores with local[*]

2014-04-07 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-1099.
---

Resolution: Fixed

 Allow inferring number of cores with local[*]
 -

 Key: SPARK-1099
 URL: https://issues.apache.org/jira/browse/SPARK-1099
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: Aaron Davidson
Assignee: Aaron Davidson
Priority: Minor
 Fix For: 1.0.0


 It seems reasonable that the default number of cores used by spark's local 
 mode (when no value is specified) is drawn from the spark.cores.max 
 configuration parameter (which, conveniently, is now settable as a 
 command-line option in spark-shell).
 For the sake of consistency, it's probable that this change would also entail 
 making the default number of cores when spark.cores.max is NOT specified to 
 be as many logical cores are on the machine (which is what standalone mode 
 does). This too seems reasonable, as Spark is inherently a distributed system 
 and I think it's expected that it should use multiple cores by default. 
 However, it is a behavioral change, and thus requires caution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)