date:20160121

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173537102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173537104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49873/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173538635
  
**[Test build #49872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49872/consoleFull)**
 for PR 10723 at commit 
[`9922ccc`](https://github.com/apache/spark/commit/9922cccde7ecdfd5e850552b78dd5742a0a4a6a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10639#discussion_r50390322
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/glm/Families.scala ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.glm
+
+import org.apache.spark.rdd.RDD
+
+/**
+ * A description of the error distribution and link function to be used in 
the model.
+ * @param link a link function instance
+ */
+private[ml] abstract class Family(val link: Link) extends Serializable {
--- End diff --

I think ```Families``` can be used by 
[SPARK-12811](https://issues.apache.org/jira/browse/SPARK-12811) which provide 
Estimator interface for GLMs, so I move it to a new folder named ```glm```.
Here we have two ways to support GLMs:
* Implement ```reweightFunc``` for each ```Family/Link``` directly based on 
mathematical formula.
* Implement the ```Family``` framework like what I have done and a factory 
method which can output ```reweightFunc``` according to argument.

The former one has better execution efficiency, the later one is more easy 
to understand. Looking forward your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10639#issuecomment-173545758
  
**[Test build #49875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49875/consoleFull)**
 for PR 10639 at commit 
[`2191d2a`](https://github.com/apache/spark/commit/2191d2a8ee1a8def5dc942ce03718826da2f5813).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-21 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-173553803
  
@liancheng @yhuai ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-6106:Support user group mapping and grou...

2016-01-21 Thread alpivonka

Github user alpivonka commented on the pull request:

https://github.com/apache/spark/pull/5325#issuecomment-173561257
  
I would like to bring up an opportunity for re-use.
with in the mapred-site.xml (for our implementation) we are using the 
following to allow access to mapreduce logs ...etc..
Why reinvent the wheel, most often the users/groups for both MR and Spark 
would be the some or all of the same user/groups.
My suggestion is to create a common list/set of properties between MR and 
Spark for acls. Instead of maintaining two separate lists
mapreduce.cluster.acls.enabled = true
mapreduce.job.acl-view-job=mapred,hue,* 
mapreduce.job.acl-modify-job=mapred,hue,* 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11327] [MESOS] Dispatcher does not resp...

2016-01-21 Thread dragos

Github user dragos commented on a diff in the pull request:

https://github.com/apache/spark/pull/10370#discussion_r50402728
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -440,6 +446,9 @@ private[spark] class MesosClusterScheduler(
 .mkString(",")
   options ++= Seq("--py-files", formattedFiles)
 }
+desc.schedulerProperties
+  .filter { case (key, _) => !replicatedOptionsBlacklist.contains(key) 
}
+  .foreach { case (key, value) => options ++= Seq("--conf", 
s"""$key="$value) }
--- End diff --

That's a good point, `CommandInfo` is using `/bin/sh` to launch the 
command. :confused: 

Spaces should be ok, everything else won't be correctly escaped. Skimming 
through Spark properties I think the only ones that could pose problems are 
`spark.authenticate.secret` and the other passwords (SSL, etc.). Still, this 
needs a solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10635#issuecomment-173579793
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10635#issuecomment-173579799
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49876/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10639#discussion_r50390974
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import org.apache.spark.Logging
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.rdd.RDD
+
+/**
+ * Model fitted by [[IterativelyReweightedLeastSquares]].
+ * @param coefficients model coefficients
+ * @param intercept model intercept
+ */
+private[ml] class IterativelyReweightedLeastSquaresModel(
+val coefficients: DenseVector,
+val intercept: Double) extends Serializable
+
+/**
+ * Implements the method of iteratively reweighted least squares (IRLS) 
which is used to solve
+ * certain optimization problems by an iterative method. In each step of 
the iterations, it
+ * involves solving a weighted lease squares (WLS) problem by 
[[WeightedLeastSquares]].
+ * It can be used to find maximum likelihood estimates of a generalized 
linear model (GLM),
+ * find M-estimator in robust regression and some other optimization 
problems.
+ *
+ * @param initialModel the initial guess model.
+ * @param reweightFunc the reweight function which is used to update 
offsets and weights
+ * at each iteration.
+ * @param fitIntercept whether to fit intercept.
+ * @param regParam L2 regularization parameter used by WLS.
+ * @param maxIter maximum number of iterations.
+ * @param tol the convergence tolerance.
+ */
+private[ml] class IterativelyReweightedLeastSquares(
+val initialModel: WeightedLeastSquaresModel,
+val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, 
Double),
+val fitIntercept: Boolean,
+val regParam: Double,
+val maxIter: Int,
+val tol: Double) extends Logging with Serializable {
+
+  def fit(instances: RDD[Instance]): 
IterativelyReweightedLeastSquaresModel = {
+
+var converged = false
+var iter = 0
+
+var offsetsAndWeights: RDD[(Double, Double)] = null
--- End diff --

R glm has argument named ```offset```, but ```offsetsAndWeights``` is 
```private```. I hope it won't confuse users, or should we rename to other 
better one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10639#issuecomment-173554879
  
**[Test build #49875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49875/consoleFull)**
 for PR 10639 at commit 
[`2191d2a`](https://github.com/apache/spark/commit/2191d2a8ee1a8def5dc942ce03718826da2f5813).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10639#issuecomment-173554998
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49875/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10639#issuecomment-173554996
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10635#issuecomment-173556798
  
**[Test build #49876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49876/consoleFull)**
 for PR 10635 at commit 
[`57a57fc`](https://github.com/apache/spark/commit/57a57fcc7cc8fc5cda05f327a970c566ae620320).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...

2016-01-21 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/10639#issuecomment-173548548
  
@mengxr Thanks for your comments. For the issue that  
```WeightedLeastSquares``` contains extra content such as ```diagInvAWA```, it 
will be used to generate statistic summary of IRLS or GLM. We can discuss them 
in the follow-up work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12686][SQL] Support group-by push down ...

2016-01-21 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10631#issuecomment-173552264
  
@rxin ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12476][SQL] Implement JdbcRelation#unha...

2016-01-21 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10427#issuecomment-173552368
  
@yhuai ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2827][GraphX] Add collectDegreeDist to ...

2016-01-21 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10521#issuecomment-173553650
  
@andrewor14 ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-21 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10635#issuecomment-173553404
  
@marmbrus ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...

2016-01-21 Thread dsdinter

Github user dsdinter commented on the pull request:

https://github.com/apache/spark/pull/9495#issuecomment-173553569
  
It seems this issue in OJDBC and started to happen after Oracle 11g:

http://stackoverflow.com/questions/2133679/why-would-number-columns-scale-and-or-precision-differ-in-jdbc-from-oracle-10-t



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173536939
  
**[Test build #49873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49873/consoleFull)**
 for PR 10841 at commit 
[`f4100bc`](https://github.com/apache/spark/commit/f4100bc6dd165d025c92fc2853e6b3b075991791).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173538837
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49872/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173538836
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10635#issuecomment-173579529
  
**[Test build #49876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49876/consoleFull)**
 for PR 10635 at commit 
[`57a57fc`](https://github.com/apache/spark/commit/57a57fcc7cc8fc5cda05f327a970c566ae620320).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369722
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
+   * {{{
+   * P(0) = 1 / (1 + \exp(b)), and
+   * P(1) = \exp(b) / (1 + \exp(b))
+   * }}}, hence
+   * {{{
+   * b = \log{P(1) / P(0)} = \log{count_1 / count_0}
+   * }}}
*/
-  initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
-histogram(1) / histogram(0))
+  initialCoefficientsWithIntercept.toArray(numFeatures)
+  = math.log(histogram(1) / histogram(0))
 }
 
 val states = optimizer.iterations(new CachedDiffFunction(costFun),
   initialCoefficientsWithIntercept.toBreeze.toDenseVector)
 
-/*
-   Note that in Logistic Regression, the objective history (loss + 
regularization)
-   is log-likelihood which is invariance under feature 
standardization. As a result,
-   the objective history from optimizer is the same as the one in 
the original space.
+/**
+ * Note that in Logistic Regression, the objective history (loss + 
regularization)
--- End diff --

reverse the style change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369730
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -374,11 +395,11 @@ class LogisticRegression @Since("1.2.0") (
   throw new SparkException(msg)
 }
 
-/*
-   The coefficients are trained in the scaled space; we're 
converting them back to
-   the original space.
-   Note that the intercept in scaled space and original space is 
the same;
-   as a result, no scaling is needed.
+/**
+ * The coefficients are trained in the scaled space; we're 
converting them back to
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10863#issuecomment-173492042
  
**[Test build #49867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49867/consoleFull)**
 for PR 10863 at commit 
[`a41f95d`](https://github.com/apache/spark/commit/a41f95d71c75fec493b722099b90628dc550f720).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372397
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
--- End diff --

its used on L348 in the log warning


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10862#issuecomment-173498061
  
**[Test build #49869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49869/consoleFull)**
 for PR 10862 at commit 
[`e68cc38`](https://github.com/apache/spark/commit/e68cc38134ca78d5e8425aad4b1b5fd36c781ccc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173506673
  
**[Test build #49871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)**
 for PR 10788 at commit 
[`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10862#issuecomment-173511498
  
**[Test build #49869 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49869/consoleFull)**
 for PR 10862 at commit 
[`e68cc38`](https://github.com/apache/spark/commit/e68cc38134ca78d5e8425aad4b1b5fd36c781ccc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173525273
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49868/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173525271
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-173493053
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-173493055
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49864/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-173492642
  
**[Test build #49864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49864/consoleFull)**
 for PR 10731 at commit 
[`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372852
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
--- End diff --

Good point, in a previous version of the code we passed handlePersistence 
down through to avoid this. I've updated it to do the same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173506101
  
**[Test build #49873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49873/consoleFull)**
 for PR 10841 at commit 
[`f4100bc`](https://github.com/apache/spark/commit/f4100bc6dd165d025c92fc2853e6b3b075991791).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8734#issuecomment-173506110
  
**[Test build #49874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49874/consoleFull)**
 for PR 8734 at commit 
[`a37d3d8`](https://github.com/apache/spark/commit/a37d3d8fc026a7a42405b5a16814e23c6fcfa3be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12953][Examples]RDDRelation writer set ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10864#issuecomment-173506209
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370169
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
--- End diff --

Will this cause double caching? Let's say input RDD is cached, so 
`handlePersistence` will be false. As a result, `df == StorageLevel.NONE` will 
be true in ml's LOR code, and this will cause caching twice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...

2016-01-21 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10862#issuecomment-173494166
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8734#issuecomment-173518885
  
**[Test build #49874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49874/consoleFull)**
 for PR 8734 at commit 
[`a37d3d8`](https://github.com/apache/spark/commit/a37d3d8fc026a7a42405b5a16814e23c6fcfa3be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8734#issuecomment-173519027
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8734#issuecomment-173519029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49874/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12755][CORE] Stop the event logger befo...

2016-01-21 Thread mallman

Github user mallman commented on the pull request:

https://github.com/apache/spark/pull/10700#issuecomment-173490509
  
Here are my current thoughts. Josh says this functionality is going to be 
removed in Spark 2.0. The bug this PR is designed to address manifests itself 
in Spark 1.5 in three ways (I'm aware of):

1. Misleading log messages from the Master (reported above).
2. Incomplete (aka "in progress") application event logs, which can be 
further divided into two scenarios:
2.a. Incomplete uncompressed event log files. The log processor can recover 
these files.
2.b. Incomplete compressed event log files. The compression output is 
truncated and unreadable by normal means. The history server reports a 
corrupted event log. I cannot definitively tie that symptom to this bug, but it 
agrees with my experience.

The most problematic of these is unrecoverable event logs. I've been 
frustrated by this before and turned off event log compression as a workaround. 
Since deploying a build with this patch to one of our dev clusters I haven't 
seen this problem again.

I don't see a simple way to write a test to support this PR.

Overall, I feel we should close this PR but keep a reference to it from 
Jira with a comment that Spark 1.5 and 1.6 users can try this patchâat their 
own riskâto address the described symptoms if they wish to. It's going into 
our own Spark 1.x builds.

I'll close this PR and the associated Jira issue within the next few days 
unless someone objects or wishes to continue discussion.

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370273
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
--- End diff --

```scala
val weights = Vectors.dense(mlLogisticRegresionModel.coefficients.toArray)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-173490560
  
**[Test build #49861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49861/consoleFull)**
 for PR 10705 at commit 
[`12ed084`](https://github.com/apache/spark/commit/12ed0841b5d5cf171e9db9325bf9f61f3dd8046b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173494001
  
**[Test build #49868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49868/consoleFull)**
 for PR 10723 at commit 
[`3e5a229`](https://github.com/apache/spark/commit/3e5a22948558e79777568b5e2f7d14f93705cf3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173507314
  
**[Test build #49872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49872/consoleFull)**
 for PR 10723 at commit 
[`9922ccc`](https://github.com/apache/spark/commit/9922cccde7ecdfd5e850552b78dd5742a0a4a6a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519318
  
**[Test build #49871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)**
 for PR 10788 at commit 
[`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519472
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49871/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173525219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49870/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10723#issuecomment-173525124
  
**[Test build #49868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49868/consoleFull)**
 for PR 10723 at commit 
[`3e5a229`](https://github.com/apache/spark/commit/3e5a22948558e79777568b5e2f7d14f93705cf3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173525217
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173525109
  
**[Test build #49870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49870/consoleFull)**
 for PR 10841 at commit 
[`cdfd0be`](https://github.com/apache/spark/commit/cdfd0be8ef6d4ee3b4a6656910e1a3cb049e1320).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10854#issuecomment-173527896
  
**[Test build #49860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49860/consoleFull)**
 for PR 10854 at commit 
[`39f21de`](https://github.com/apache/spark/commit/39f21de507271314c1b08f9d6a9c0fc0a12396a4).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10854#issuecomment-173527952
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10854#issuecomment-173527954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49860/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10845#issuecomment-173491009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10845#issuecomment-173491008
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370414
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
+  case x: DenseVector => x
+  case y: Vector => Vectors.dense(y.toArray)
+}
+createModel(weights, mlLogisticRegresionModel.intercept)
+  }
+  optimizer.getUpdater() match {
--- End diff --

when `optimizer.getRegParam() == 0.0`, run the old version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10845#issuecomment-173490658
  
**[Test build #49863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49863/consoleFull)**
 for PR 10845 at commit 
[`7202c54`](https://github.com/apache/spark/commit/7202c546d025fc2c5cf71856c7e64fce8e85444f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-173490745
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49861/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173493541
  
LGTM except some styling issues, and concern about caching twice. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50371017
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
+  case x: DenseVector => x
+  case y: Vector => Vectors.dense(y.toArray)
+}
+createModel(weights, mlLogisticRegresionModel.intercept)
+  }
+  optimizer.getUpdater() match {
--- End diff --

okay, this will make the test harder to write. I don't care this one now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10841#issuecomment-173500763
  
**[Test build #49870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49870/consoleFull)**
 for PR 10841 at commit 
[`cdfd0be`](https://github.com/apache/spark/commit/cdfd0be8ef6d4ee3b4a6656910e1a3cb049e1320).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12755][CORE] Stop the event logger befo...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10700#issuecomment-173505157
  
Are there downsides to merging this to master, even if the related 
functionality is about to be removed? it passes tests, and seems to improve an 
ordering of shutdown, and can be backported to fix an actual minor issue in 
previous releases. Tests would be cool but you're correct that this one could 
be really hard to trigger. I see no reason to close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372566
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
--- End diff --

Ok, looking at the rest of the comments in the file & the style guide it 
seems to mostly have the `*` but I'll put them back in (it also break auto 
indent to not have them but thats an emacs bug)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix error when run RDDRelation.main():"path fi...

2016-01-21 Thread shijinkui

GitHub user shijinkui opened a pull request:

https://github.com/apache/spark/pull/10864

fix error when run RDDRelation.main():"path file:/Users/sjk/pair.parqâ¦

https://issues.apache.org/jira/browse/SPARK-12953

fix error when run RDDRelation.main():
"path file:/Users/sjk/pair.parquet already exists"

Set DataFrameWriter's mode to SaveMode.Overwrite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shijinkui/spark set_mode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10864


commit 958a419877e36ad0d3987e83e56b6007937334e8
Author: shijinkui 
Date:   2016-01-21T08:56:26Z

fix error when run RDDRelation.main():"path file:/Users/sjk/pair.parquet 
already exists"

Setting DataFrameWriter's mode to `SaveMode.Overwrite`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10862#issuecomment-173511839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49869/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10862#issuecomment-173511835
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519471
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10863#issuecomment-173492160
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10863#issuecomment-173492161
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49867/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10152#issuecomment-173607998
  
**[Test build #2431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2431/consoleFull)**
 for PR 10152 at commit 
[`e938208`](https://github.com/apache/spark/commit/e938208d9c85515f62b41635a8445b8ab31f55f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: improved error message for java type inference...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10865#issuecomment-173608093
  
(Link this to your JIRA -- see guidance here first for how to open a PR: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10491#issuecomment-173606818
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: improved error message for java type inference...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10865#issuecomment-173608775
  
**[Test build #2432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2432/consoleFull)**
 for PR 10865 at commit 
[`f11f1c7`](https://github.com/apache/spark/commit/f11f1c738771339e4031c313f759fa24f3b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: improved error message for java type inference...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10865#issuecomment-173608886
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...

2016-01-21 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/10866

[SPARK-12760] [DOCS] inaccurate description for difference between local vs 
cluster mode in closure handling

Clarify that modifying a driver local variable won't have the desired 
effect in cluster modes, and may or may not work as intended in local mode

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-12760

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10866


commit b62e31804685209fc0443430c9ddb32c5d5a3299
Author: Sean Owen 
Date:   2016-01-21T15:51:55Z

Clarify that modifying a driver local variable won't have the desired 
effect in cluster modes, and may or may not work as intended in local mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11137][Streaming] Make StreamingContext...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10807#issuecomment-173609582
  
@felixcheung WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: improved error message for java type inference...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10865#issuecomment-173609383
  
**[Test build #2432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2432/consoleFull)**
 for PR 10865 at commit 
[`f11f1c7`](https://github.com/apache/spark/commit/f11f1c738771339e4031c313f759fa24f3b3).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `throw new UnsupportedOperationException(s\"Cannot infer 
type for Java class $`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11565 Replace deprecated DigestUtils.sha...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9532#issuecomment-173609858
  
@gliptak are you able to follow up on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...

2016-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10491


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: improved error message for java type inference...

2016-01-21 Thread andygrove

GitHub user andygrove opened a pull request:

https://github.com/apache/spark/pull/10865

improved error message for java type inference failure



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/codefutures/spark SPARK-12932

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10865


commit f11f1c738771339e4031c313f759fa24f3b3
Author: Andy Grove 
Date:   2016-01-21T15:33:22Z

improved error message




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10152#issuecomment-173621659
  
**[Test build #2431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2431/consoleFull)**
 for PR 10152 at commit 
[`e938208`](https://github.com/apache/spark/commit/e938208d9c85515f62b41635a8445b8ab31f55f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10866#issuecomment-173623032
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49877/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread mortada

GitHub user mortada opened a pull request:

https://github.com/apache/spark/pull/10867

[SPARK-12760] [DOCS] invalid lambda expression in python example for â¦

â¦local vs cluster

@srowen thanks for the PR at https://github.com/apache/spark/pull/10866! 
sorry it took me a while.

This is related to https://github.com/apache/spark/pull/10866, basically 
the assignment in the lambda expression in the python example is actually 
invalid

```
In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
  File "", line 1
rdd.foreach(lambda x: counter += x)
   ^
SyntaxError: invalid syntax
``` 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mortada/spark doc_python_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10867


commit fc9f16a2ffb5846ecc03c4df584f611e6728573d
Author: Mortada Mehyar 
Date:   2016-01-21T16:51:28Z

[SPARK-12760] [DOCS] invalid lambda expression in python example for local 
vs cluster




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10867#issuecomment-173643528
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49880/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10867#issuecomment-173643525
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...

2016-01-21 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/10648#issuecomment-173630521
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10867#issuecomment-173635004
  
Does it still execute without error on a cluster? (even if it doesn't 
actually increment the counter in the way someone might expect.) Certainly if 
it doesn't compile we need to change this, but want to make sure the result 
with "global" executes too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10648#issuecomment-173632677
  
**[Test build #49878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49878/consoleFull)**
 for PR 10648 at commit 
[`ad6ce01`](https://github.com/apache/spark/commit/ad6ce01e849591d152ec04bd86109cbced291e6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HOTFIX][BUILD][TEST-MAVEN]Remove duplicate de...

2016-01-21 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10868#issuecomment-173638249
  
CC @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread mortada

Github user mortada commented on the pull request:

https://github.com/apache/spark/pull/10867#issuecomment-173648674
  
@srowen I tested the python code in cluster mode (5 ec2 workers) and this 
works fine

```
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.31.10.56:35937 with 6.6 GB RAM, BlockManagerId(4, 172.31.10.56, 
35937)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.31.10.55:59871 with 6.6 GB RAM, BlockManagerId(0, 172.31.10.55, 
59871)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.31.10.53:39162 with 6.6 GB RAM, BlockManagerId(1, 172.31.10.53, 
39162)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.31.10.54:59145 with 6.6 GB RAM, BlockManagerId(2, 172.31.10.54, 
59145)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block 
manager 172.31.10.57:35000 with 6.6 GB RAM, BlockManagerId(3, 172.31.10.57, 
35000)
In [1]: data = [1, 2, 3, 4, 5]

In [2]: counter = 0

In [3]: rdd = sc.parallelize(data)

In [4]: def increment_counter(x):
global counter
counter += x
   ...:

In [5]: rdd.foreach(increment_counter)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.31.10.55:59871 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.31.10.56:35937 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.31.10.57:35000 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.31.10.53:39162 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 172.31.10.54:59145 (size: 3.2 KB, free: 6.6 GB)
(other output skipped)

In [6]: print("Counter value: ", counter)
Counter value:  0
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10866#issuecomment-173619459
  
**[Test build #49877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49877/consoleFull)**
 for PR 10866 at commit 
[`b62e318`](https://github.com/apache/spark/commit/b62e31804685209fc0443430c9ddb32c5d5a3299).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...

2016-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10866#issuecomment-173623028
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...

2016-01-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10866#issuecomment-173622876
  
**[Test build #49877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49877/consoleFull)**
 for PR 10866 at commit 
[`b62e318`](https://github.com/apache/spark/commit/b62e31804685209fc0443430c9ddb32c5d5a3299).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 307 matches

Mail list logo