[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173537102 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173537104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49873/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173538635 **[Test build #49872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49872/consoleFull)** for PR 10723 at commit [`9922ccc`](https://github.com/apache/spark/commit/9922cccde7ecdfd5e850552b78dd5742a0a4a6a3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/10639#discussion_r50390322 --- Diff: mllib/src/main/scala/org/apache/spark/ml/glm/Families.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.glm + +import org.apache.spark.rdd.RDD + +/** + * A description of the error distribution and link function to be used in the model. + * @param link a link function instance + */ +private[ml] abstract class Family(val link: Link) extends Serializable { --- End diff -- I think ```Families``` can be used by [SPARK-12811](https://issues.apache.org/jira/browse/SPARK-12811) which provide Estimator interface for GLMs, so I move it to a new folder named ```glm```. Here we have two ways to support GLMs: * Implement ```reweightFunc``` for each ```Family/Link``` directly based on mathematical formula. * Implement the ```Family``` framework like what I have done and a factory method which can output ```reweightFunc``` according to argument. The former one has better execution efficiency, the later one is more easy to understand. Looking forward your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-173545758 **[Test build #49875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49875/consoleFull)** for PR 10639 at commit [`2191d2a`](https://github.com/apache/spark/commit/2191d2a8ee1a8def5dc942ce03718826da2f5813). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10596#issuecomment-173553803 @liancheng @yhuai ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-6106:Support user group mapping and grou...
Github user alpivonka commented on the pull request: https://github.com/apache/spark/pull/5325#issuecomment-173561257 I would like to bring up an opportunity for re-use. with in the mapred-site.xml (for our implementation) we are using the following to allow access to mapreduce logs ...etc.. Why reinvent the wheel, most often the users/groups for both MR and Spark would be the some or all of the same user/groups. My suggestion is to create a common list/set of properties between MR and Spark for acls. Instead of maintaining two separate lists mapreduce.cluster.acls.enabled = true mapreduce.job.acl-view-job=mapred,hue,* mapreduce.job.acl-modify-job=mapred,hue,* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11327] [MESOS] Dispatcher does not resp...
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/10370#discussion_r50402728 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -440,6 +446,9 @@ private[spark] class MesosClusterScheduler( .mkString(",") options ++= Seq("--py-files", formattedFiles) } +desc.schedulerProperties + .filter { case (key, _) => !replicatedOptionsBlacklist.contains(key) } + .foreach { case (key, value) => options ++= Seq("--conf", s"""$key="$value) } --- End diff -- That's a good point, `CommandInfo` is using `/bin/sh` to launch the command. :confused: Spaces should be ok, everything else won't be correctly escaped. Skimming through Spark properties I think the only ones that could pose problems are `spark.authenticate.secret` and the other passwords (SSL, etc.). Still, this needs a solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-173579793 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-173579799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49876/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/10639#discussion_r50390974 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.optim + +import org.apache.spark.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.mllib.linalg._ +import org.apache.spark.rdd.RDD + +/** + * Model fitted by [[IterativelyReweightedLeastSquares]]. + * @param coefficients model coefficients + * @param intercept model intercept + */ +private[ml] class IterativelyReweightedLeastSquaresModel( +val coefficients: DenseVector, +val intercept: Double) extends Serializable + +/** + * Implements the method of iteratively reweighted least squares (IRLS) which is used to solve + * certain optimization problems by an iterative method. In each step of the iterations, it + * involves solving a weighted lease squares (WLS) problem by [[WeightedLeastSquares]]. + * It can be used to find maximum likelihood estimates of a generalized linear model (GLM), + * find M-estimator in robust regression and some other optimization problems. + * + * @param initialModel the initial guess model. + * @param reweightFunc the reweight function which is used to update offsets and weights + * at each iteration. + * @param fitIntercept whether to fit intercept. + * @param regParam L2 regularization parameter used by WLS. + * @param maxIter maximum number of iterations. + * @param tol the convergence tolerance. + */ +private[ml] class IterativelyReweightedLeastSquares( +val initialModel: WeightedLeastSquaresModel, +val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double), +val fitIntercept: Boolean, +val regParam: Double, +val maxIter: Int, +val tol: Double) extends Logging with Serializable { + + def fit(instances: RDD[Instance]): IterativelyReweightedLeastSquaresModel = { + +var converged = false +var iter = 0 + +var offsetsAndWeights: RDD[(Double, Double)] = null --- End diff -- R glm has argument named ```offset```, but ```offsetsAndWeights``` is ```private```. I hope it won't confuse users, or should we rename to other better one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-173554879 **[Test build #49875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49875/consoleFull)** for PR 10639 at commit [`2191d2a`](https://github.com/apache/spark/commit/2191d2a8ee1a8def5dc942ce03718826da2f5813). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-173554998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49875/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-173554996 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-173556798 **[Test build #49876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49876/consoleFull)** for PR 10635 at commit [`57a57fc`](https://github.com/apache/spark/commit/57a57fcc7cc8fc5cda05f327a970c566ae620320). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9835] [ML] IterativelyReweightedLeastSq...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10639#issuecomment-173548548 @mengxr Thanks for your comments. For the issue that ```WeightedLeastSquares``` contains extra content such as ```diagInvAWA```, it will be used to generate statistic summary of IRLS or GLM. We can discuss them in the follow-up work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12686][SQL] Support group-by push down ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10631#issuecomment-173552264 @rxin ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12476][SQL] Implement JdbcRelation#unha...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10427#issuecomment-173552368 @yhuai ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2827][GraphX] Add collectDegreeDist to ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10521#issuecomment-173553650 @andrewor14 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-173553404 @marmbrus ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Github user dsdinter commented on the pull request: https://github.com/apache/spark/pull/9495#issuecomment-173553569 It seems this issue in OJDBC and started to happen after Oracle 11g: http://stackoverflow.com/questions/2133679/why-would-number-columns-scale-and-or-precision-differ-in-jdbc-from-oracle-10-t --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173536939 **[Test build #49873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49873/consoleFull)** for PR 10841 at commit [`f4100bc`](https://github.com/apache/spark/commit/f4100bc6dd165d025c92fc2853e6b3b075991791). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173538837 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49872/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173538836 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-173579529 **[Test build #49876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49876/consoleFull)** for PR 10635 at commit [`57a57fc`](https://github.com/apache/spark/commit/57a57fcc7cc8fc5cda05f327a970c566ae620320). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50369722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") ( val initialCoefficientsWithIntercept = Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures) -if ($(fitIntercept)) { - /* - For binary logistic regression, when we initialize the coefficients as zeros, - it will converge faster if we initialize the intercept such that - it follows the distribution of the labels. - - {{{ - P(0) = 1 / (1 + \exp(b)), and - P(1) = \exp(b) / (1 + \exp(b)) - }}}, hence - {{{ - b = \log{P(1) / P(0)} = \log{count_1 / count_0} - }}} +if (optInitialModel.isDefined && optInitialModel.get.coefficients != numFeatures) { + val vec = optInitialModel.get.coefficients + logWarning( +s"Initial coefficients provided ${vec} did not match the expected size ${numFeatures}") +} + +if (optInitialModel.isDefined && optInitialModel.get.coefficients == numFeatures) { + val initialCoefficientsWithInterceptArray = initialCoefficientsWithIntercept.toArray + optInitialModel.get.coefficients.foreachActive { case (index, value) => +initialCoefficientsWithInterceptArray(index) = value + } + if ($(fitIntercept)) { +initialCoefficientsWithInterceptArray(numFeatures) == optInitialModel.get.intercept + } +} else if ($(fitIntercept)) { + /** + * For binary logistic regression, when we initialize the coefficients as zeros, + * it will converge faster if we initialize the intercept such that + * it follows the distribution of the labels. + + * {{{ + * P(0) = 1 / (1 + \exp(b)), and + * P(1) = \exp(b) / (1 + \exp(b)) + * }}}, hence + * {{{ + * b = \log{P(1) / P(0)} = \log{count_1 / count_0} + * }}} */ - initialCoefficientsWithIntercept.toArray(numFeatures) = math.log( -histogram(1) / histogram(0)) + initialCoefficientsWithIntercept.toArray(numFeatures) + = math.log(histogram(1) / histogram(0)) } val states = optimizer.iterations(new CachedDiffFunction(costFun), initialCoefficientsWithIntercept.toBreeze.toDenseVector) -/* - Note that in Logistic Regression, the objective history (loss + regularization) - is log-likelihood which is invariance under feature standardization. As a result, - the objective history from optimizer is the same as the one in the original space. +/** + * Note that in Logistic Regression, the objective history (loss + regularization) --- End diff -- reverse the style change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50369730 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -374,11 +395,11 @@ class LogisticRegression @Since("1.2.0") ( throw new SparkException(msg) } -/* - The coefficients are trained in the scaled space; we're converting them back to - the original space. - Note that the intercept in scaled space and original space is the same; - as a result, no scaling is needed. +/** + * The coefficients are trained in the scaled space; we're converting them back to --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10863#issuecomment-173492042 **[Test build #49867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49867/consoleFull)** for PR 10863 at commit [`a41f95d`](https://github.com/apache/spark/commit/a41f95d71c75fec493b722099b90628dc550f720). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50372397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") ( val initialCoefficientsWithIntercept = Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures) -if ($(fitIntercept)) { - /* - For binary logistic regression, when we initialize the coefficients as zeros, - it will converge faster if we initialize the intercept such that - it follows the distribution of the labels. - - {{{ - P(0) = 1 / (1 + \exp(b)), and - P(1) = \exp(b) / (1 + \exp(b)) - }}}, hence - {{{ - b = \log{P(1) / P(0)} = \log{count_1 / count_0} - }}} +if (optInitialModel.isDefined && optInitialModel.get.coefficients != numFeatures) { + val vec = optInitialModel.get.coefficients --- End diff -- its used on L348 in the log warning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10862#issuecomment-173498061 **[Test build #49869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49869/consoleFull)** for PR 10862 at commit [`e68cc38`](https://github.com/apache/spark/commit/e68cc38134ca78d5e8425aad4b1b5fd36c781ccc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10788#issuecomment-173506673 **[Test build #49871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)** for PR 10788 at commit [`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10862#issuecomment-173511498 **[Test build #49869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49869/consoleFull)** for PR 10862 at commit [`e68cc38`](https://github.com/apache/spark/commit/e68cc38134ca78d5e8425aad4b1b5fd36c781ccc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173525273 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49868/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173525271 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-173493053 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-173493055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49864/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-173492642 **[Test build #49864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49864/consoleFull)** for PR 10731 at commit [`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50372852 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS new LogisticRegressionModel(weights, intercept, numFeatures, numOfLinearPredictor + 1) } } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * If using ml implementation, uses ml code to generate initial weights. + */ + override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = { +run(input, generateInitialWeights(input), userSuppliedWeights = false) + } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries starting from the initial weights provided. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * Uses user provided weights. + */ + override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = { +run(input, initialWeights, userSuppliedWeights = true) + } + + private def run(input: RDD[LabeledPoint], initialWeights: Vector, userSuppliedWeights: Boolean): + LogisticRegressionModel = { +// ml's Logisitic regression only supports binary classifcation currently. +if (numOfLinearPredictor == 1) { + def runWithMlLogisitcRegression(elasticNetParam: Double) = { +// Prepare the ml LogisticRegression based on our settings +val lr = new org.apache.spark.ml.classification.LogisticRegression() +lr.setRegParam(optimizer.getRegParam()) +lr.setElasticNetParam(elasticNetParam) +lr.setStandardization(useFeatureScaling) +if (userSuppliedWeights) { + val uid = Identifiable.randomUID("logreg-static") + lr.setInitialModel(new org.apache.spark.ml.classification.LogisticRegressionModel( +uid, initialWeights, 1.0)) +} +lr.setFitIntercept(addIntercept) +lr.setMaxIter(optimizer.getNumIterations()) +lr.setTol(optimizer.getConvergenceTol()) +// Convert our input into a DataFrame +val sqlContext = new SQLContext(input.context) +import sqlContext.implicits._ +val df = input.toDF() +// Determine if we should cache the DF +val handlePersistence = input.getStorageLevel == StorageLevel.NONE --- End diff -- Good point, in a previous version of the code we passed handlePersistence down through to avoid this. I've updated it to do the same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173506101 **[Test build #49873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49873/consoleFull)** for PR 10841 at commit [`f4100bc`](https://github.com/apache/spark/commit/f4100bc6dd165d025c92fc2853e6b3b075991791). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8734#issuecomment-173506110 **[Test build #49874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49874/consoleFull)** for PR 8734 at commit [`a37d3d8`](https://github.com/apache/spark/commit/a37d3d8fc026a7a42405b5a16814e23c6fcfa3be). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12953][Examples]RDDRelation writer set ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10864#issuecomment-173506209 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50370169 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS new LogisticRegressionModel(weights, intercept, numFeatures, numOfLinearPredictor + 1) } } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * If using ml implementation, uses ml code to generate initial weights. + */ + override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = { +run(input, generateInitialWeights(input), userSuppliedWeights = false) + } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries starting from the initial weights provided. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * Uses user provided weights. + */ + override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = { +run(input, initialWeights, userSuppliedWeights = true) + } + + private def run(input: RDD[LabeledPoint], initialWeights: Vector, userSuppliedWeights: Boolean): + LogisticRegressionModel = { +// ml's Logisitic regression only supports binary classifcation currently. +if (numOfLinearPredictor == 1) { + def runWithMlLogisitcRegression(elasticNetParam: Double) = { +// Prepare the ml LogisticRegression based on our settings +val lr = new org.apache.spark.ml.classification.LogisticRegression() +lr.setRegParam(optimizer.getRegParam()) +lr.setElasticNetParam(elasticNetParam) +lr.setStandardization(useFeatureScaling) +if (userSuppliedWeights) { + val uid = Identifiable.randomUID("logreg-static") + lr.setInitialModel(new org.apache.spark.ml.classification.LogisticRegressionModel( +uid, initialWeights, 1.0)) +} +lr.setFitIntercept(addIntercept) +lr.setMaxIter(optimizer.getNumIterations()) +lr.setTol(optimizer.getConvergenceTol()) +// Convert our input into a DataFrame +val sqlContext = new SQLContext(input.context) +import sqlContext.implicits._ +val df = input.toDF() +// Determine if we should cache the DF +val handlePersistence = input.getStorageLevel == StorageLevel.NONE --- End diff -- Will this cause double caching? Let's say input RDD is cached, so `handlePersistence` will be false. As a result, `df == StorageLevel.NONE` will be true in ml's LOR code, and this will cause caching twice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10862#issuecomment-173494166 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8734#issuecomment-173518885 **[Test build #49874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49874/consoleFull)** for PR 8734 at commit [`a37d3d8`](https://github.com/apache/spark/commit/a37d3d8fc026a7a42405b5a16814e23c6fcfa3be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8734#issuecomment-173519027 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10524][ML] Use the soft prediction to o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8734#issuecomment-173519029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49874/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12755][CORE] Stop the event logger befo...
Github user mallman commented on the pull request: https://github.com/apache/spark/pull/10700#issuecomment-173490509 Here are my current thoughts. Josh says this functionality is going to be removed in Spark 2.0. The bug this PR is designed to address manifests itself in Spark 1.5 in three ways (I'm aware of): 1. Misleading log messages from the Master (reported above). 2. Incomplete (aka "in progress") application event logs, which can be further divided into two scenarios: 2.a. Incomplete uncompressed event log files. The log processor can recover these files. 2.b. Incomplete compressed event log files. The compression output is truncated and unreadable by normal means. The history server reports a corrupted event log. I cannot definitively tie that symptom to this bug, but it agrees with my experience. The most problematic of these is unrecoverable event logs. I've been frustrated by this before and turned off event log compression as a workaround. Since deploying a build with this patch to one of our dev clusters I haven't seen this problem again. I don't see a simple way to write a test to support this PR. Overall, I feel we should close this PR but keep a reference to it from Jira with a comment that Spark 1.5 and 1.6 users can try this patchâat their own riskâto address the described symptoms if they wish to. It's going into our own Spark 1.x builds. I'll close this PR and the associated Jira issue within the next few days unless someone objects or wishes to continue discussion. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50370273 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS new LogisticRegressionModel(weights, intercept, numFeatures, numOfLinearPredictor + 1) } } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * If using ml implementation, uses ml code to generate initial weights. + */ + override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = { +run(input, generateInitialWeights(input), userSuppliedWeights = false) + } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries starting from the initial weights provided. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * Uses user provided weights. + */ + override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = { +run(input, initialWeights, userSuppliedWeights = true) + } + + private def run(input: RDD[LabeledPoint], initialWeights: Vector, userSuppliedWeights: Boolean): + LogisticRegressionModel = { +// ml's Logisitic regression only supports binary classifcation currently. +if (numOfLinearPredictor == 1) { + def runWithMlLogisitcRegression(elasticNetParam: Double) = { +// Prepare the ml LogisticRegression based on our settings +val lr = new org.apache.spark.ml.classification.LogisticRegression() +lr.setRegParam(optimizer.getRegParam()) +lr.setElasticNetParam(elasticNetParam) +lr.setStandardization(useFeatureScaling) +if (userSuppliedWeights) { + val uid = Identifiable.randomUID("logreg-static") + lr.setInitialModel(new org.apache.spark.ml.classification.LogisticRegressionModel( +uid, initialWeights, 1.0)) +} +lr.setFitIntercept(addIntercept) +lr.setMaxIter(optimizer.getNumIterations()) +lr.setTol(optimizer.getConvergenceTol()) +// Convert our input into a DataFrame +val sqlContext = new SQLContext(input.context) +import sqlContext.implicits._ +val df = input.toDF() +// Determine if we should cache the DF +val handlePersistence = input.getStorageLevel == StorageLevel.NONE +if (handlePersistence) { + df.persist(StorageLevel.MEMORY_AND_DISK) +} +// Train our model +val mlLogisticRegresionModel = lr.train(df) +// unpersist if we persisted +if (handlePersistence) { + df.unpersist() +} +// convert the model +val weights = mlLogisticRegresionModel.weights match { --- End diff -- ```scala val weights = Vectors.dense(mlLogisticRegresionModel.coefficients.toArray) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10705#issuecomment-173490560 **[Test build #49861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49861/consoleFull)** for PR 10705 at commit [`12ed084`](https://github.com/apache/spark/commit/12ed0841b5d5cf171e9db9325bf9f61f3dd8046b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173494001 **[Test build #49868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49868/consoleFull)** for PR 10723 at commit [`3e5a229`](https://github.com/apache/spark/commit/3e5a22948558e79777568b5e2f7d14f93705cf3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173507314 **[Test build #49872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49872/consoleFull)** for PR 10723 at commit [`9922ccc`](https://github.com/apache/spark/commit/9922cccde7ecdfd5e850552b78dd5742a0a4a6a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10788#issuecomment-173519318 **[Test build #49871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)** for PR 10788 at commit [`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10788#issuecomment-173519472 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49871/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173525219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49870/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10723#issuecomment-173525124 **[Test build #49868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49868/consoleFull)** for PR 10723 at commit [`3e5a229`](https://github.com/apache/spark/commit/3e5a22948558e79777568b5e2f7d14f93705cf3d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173525217 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173525109 **[Test build #49870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49870/consoleFull)** for PR 10841 at commit [`cdfd0be`](https://github.com/apache/spark/commit/cdfd0be8ef6d4ee3b4a6656910e1a3cb049e1320). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10854#issuecomment-173527896 **[Test build #49860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49860/consoleFull)** for PR 10854 at commit [`39f21de`](https://github.com/apache/spark/commit/39f21de507271314c1b08f9d6a9c0fc0a12396a4). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10854#issuecomment-173527952 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7997][Core]Remove Akka from Spark Core ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10854#issuecomment-173527954 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49860/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10845#issuecomment-173491009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10845#issuecomment-173491008 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50370414 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS new LogisticRegressionModel(weights, intercept, numFeatures, numOfLinearPredictor + 1) } } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * If using ml implementation, uses ml code to generate initial weights. + */ + override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = { +run(input, generateInitialWeights(input), userSuppliedWeights = false) + } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries starting from the initial weights provided. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * Uses user provided weights. + */ + override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = { +run(input, initialWeights, userSuppliedWeights = true) + } + + private def run(input: RDD[LabeledPoint], initialWeights: Vector, userSuppliedWeights: Boolean): + LogisticRegressionModel = { +// ml's Logisitic regression only supports binary classifcation currently. +if (numOfLinearPredictor == 1) { + def runWithMlLogisitcRegression(elasticNetParam: Double) = { +// Prepare the ml LogisticRegression based on our settings +val lr = new org.apache.spark.ml.classification.LogisticRegression() +lr.setRegParam(optimizer.getRegParam()) +lr.setElasticNetParam(elasticNetParam) +lr.setStandardization(useFeatureScaling) +if (userSuppliedWeights) { + val uid = Identifiable.randomUID("logreg-static") + lr.setInitialModel(new org.apache.spark.ml.classification.LogisticRegressionModel( +uid, initialWeights, 1.0)) +} +lr.setFitIntercept(addIntercept) +lr.setMaxIter(optimizer.getNumIterations()) +lr.setTol(optimizer.getConvergenceTol()) +// Convert our input into a DataFrame +val sqlContext = new SQLContext(input.context) +import sqlContext.implicits._ +val df = input.toDF() +// Determine if we should cache the DF +val handlePersistence = input.getStorageLevel == StorageLevel.NONE +if (handlePersistence) { + df.persist(StorageLevel.MEMORY_AND_DISK) +} +// Train our model +val mlLogisticRegresionModel = lr.train(df) +// unpersist if we persisted +if (handlePersistence) { + df.unpersist() +} +// convert the model +val weights = mlLogisticRegresionModel.weights match { + case x: DenseVector => x + case y: Vector => Vectors.dense(y.toArray) +} +createModel(weights, mlLogisticRegresionModel.intercept) + } + optimizer.getUpdater() match { --- End diff -- when `optimizer.getRegParam() == 0.0`, run the old version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12904][SQL] Strength reduction for inte...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10845#issuecomment-173490658 **[Test build #49863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49863/consoleFull)** for PR 10845 at commit [`7202c54`](https://github.com/apache/spark/commit/7202c546d025fc2c5cf71856c7e64fce8e85444f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10705#issuecomment-173490745 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49861/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10788#issuecomment-173493541 LGTM except some styling issues, and concern about caching twice. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50371017 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS new LogisticRegressionModel(weights, intercept, numFeatures, numOfLinearPredictor + 1) } } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * If using ml implementation, uses ml code to generate initial weights. + */ + override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = { +run(input, generateInitialWeights(input), userSuppliedWeights = false) + } + + /** + * Run Logistic Regression with the configured parameters on an input RDD + * of LabeledPoint entries starting from the initial weights provided. + * + * If a known updater is used calls the ml implementation, to avoid + * applying a regularization penalty to the intercept, otherwise + * defaults to the mllib implementation. If more than two classes + * or feature scaling is disabled, always uses mllib implementation. + * Uses user provided weights. + */ + override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = { +run(input, initialWeights, userSuppliedWeights = true) + } + + private def run(input: RDD[LabeledPoint], initialWeights: Vector, userSuppliedWeights: Boolean): + LogisticRegressionModel = { +// ml's Logisitic regression only supports binary classifcation currently. +if (numOfLinearPredictor == 1) { + def runWithMlLogisitcRegression(elasticNetParam: Double) = { +// Prepare the ml LogisticRegression based on our settings +val lr = new org.apache.spark.ml.classification.LogisticRegression() +lr.setRegParam(optimizer.getRegParam()) +lr.setElasticNetParam(elasticNetParam) +lr.setStandardization(useFeatureScaling) +if (userSuppliedWeights) { + val uid = Identifiable.randomUID("logreg-static") + lr.setInitialModel(new org.apache.spark.ml.classification.LogisticRegressionModel( +uid, initialWeights, 1.0)) +} +lr.setFitIntercept(addIntercept) +lr.setMaxIter(optimizer.getNumIterations()) +lr.setTol(optimizer.getConvergenceTol()) +// Convert our input into a DataFrame +val sqlContext = new SQLContext(input.context) +import sqlContext.implicits._ +val df = input.toDF() +// Determine if we should cache the DF +val handlePersistence = input.getStorageLevel == StorageLevel.NONE +if (handlePersistence) { + df.persist(StorageLevel.MEMORY_AND_DISK) +} +// Train our model +val mlLogisticRegresionModel = lr.train(df) +// unpersist if we persisted +if (handlePersistence) { + df.unpersist() +} +// convert the model +val weights = mlLogisticRegresionModel.weights match { + case x: DenseVector => x + case y: Vector => Vectors.dense(y.toArray) +} +createModel(weights, mlLogisticRegresionModel.intercept) + } + optimizer.getUpdater() match { --- End diff -- okay, this will make the test harder to write. I don't care this one now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][RFC/WIP] Add Consistent Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10841#issuecomment-173500763 **[Test build #49870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49870/consoleFull)** for PR 10841 at commit [`cdfd0be`](https://github.com/apache/spark/commit/cdfd0be8ef6d4ee3b4a6656910e1a3cb049e1320). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12755][CORE] Stop the event logger befo...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10700#issuecomment-173505157 Are there downsides to merging this to master, even if the related functionality is about to be removed? it passes tests, and seems to improve an ordering of shutdown, and can be backported to fix an actual minor issue in previous releases. Tests would be cool but you're correct that this one could be really hard to trigger. I see no reason to close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10788#discussion_r50372566 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") ( val initialCoefficientsWithIntercept = Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures) -if ($(fitIntercept)) { - /* - For binary logistic regression, when we initialize the coefficients as zeros, - it will converge faster if we initialize the intercept such that - it follows the distribution of the labels. - - {{{ - P(0) = 1 / (1 + \exp(b)), and - P(1) = \exp(b) / (1 + \exp(b)) - }}}, hence - {{{ - b = \log{P(1) / P(0)} = \log{count_1 / count_0} - }}} +if (optInitialModel.isDefined && optInitialModel.get.coefficients != numFeatures) { + val vec = optInitialModel.get.coefficients + logWarning( +s"Initial coefficients provided ${vec} did not match the expected size ${numFeatures}") +} + +if (optInitialModel.isDefined && optInitialModel.get.coefficients == numFeatures) { + val initialCoefficientsWithInterceptArray = initialCoefficientsWithIntercept.toArray + optInitialModel.get.coefficients.foreachActive { case (index, value) => +initialCoefficientsWithInterceptArray(index) = value + } + if ($(fitIntercept)) { +initialCoefficientsWithInterceptArray(numFeatures) == optInitialModel.get.intercept + } +} else if ($(fitIntercept)) { + /** + * For binary logistic regression, when we initialize the coefficients as zeros, + * it will converge faster if we initialize the intercept such that + * it follows the distribution of the labels. + --- End diff -- Ok, looking at the rest of the comments in the file & the style guide it seems to mostly have the `*` but I'll put them back in (it also break auto indent to not have them but thats an emacs bug) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix error when run RDDRelation.main():"path fi...
GitHub user shijinkui opened a pull request: https://github.com/apache/spark/pull/10864 fix error when run RDDRelation.main():"path file:/Users/sjk/pair.parq⦠https://issues.apache.org/jira/browse/SPARK-12953 fix error when run RDDRelation.main(): "path file:/Users/sjk/pair.parquet already exists" Set DataFrameWriter's mode to SaveMode.Overwrite You can merge this pull request into a Git repository by running: $ git pull https://github.com/shijinkui/spark set_mode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10864.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10864 commit 958a419877e36ad0d3987e83e56b6007937334e8 Author: shijinkuiDate: 2016-01-21T08:56:26Z fix error when run RDDRelation.main():"path file:/Users/sjk/pair.parquet already exists" Setting DataFrameWriter's mode to `SaveMode.Overwrite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10862#issuecomment-173511839 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49869/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12908][ML] Add warning message for Logi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10862#issuecomment-173511835 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10788#issuecomment-173519471 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10863#issuecomment-173492160 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12952] EMLDAOptimizer initialize() shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10863#issuecomment-173492161 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49867/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-173607998 **[Test build #2431 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2431/consoleFull)** for PR 10152 at commit [`e938208`](https://github.com/apache/spark/commit/e938208d9c85515f62b41635a8445b8ab31f55f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: improved error message for java type inference...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10865#issuecomment-173608093 (Link this to your JIRA -- see guidance here first for how to open a PR: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10491#issuecomment-173606818 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: improved error message for java type inference...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10865#issuecomment-173608775 **[Test build #2432 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2432/consoleFull)** for PR 10865 at commit [`f11f1c7`](https://github.com/apache/spark/commit/f11f1c738771339e4031c313f759fa24f3b3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: improved error message for java type inference...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10865#issuecomment-173608886 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/10866 [SPARK-12760] [DOCS] inaccurate description for difference between local vs cluster mode in closure handling Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-12760 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10866 commit b62e31804685209fc0443430c9ddb32c5d5a3299 Author: Sean OwenDate: 2016-01-21T15:51:55Z Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11137][Streaming] Make StreamingContext...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10807#issuecomment-173609582 @felixcheung WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: improved error message for java type inference...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10865#issuecomment-173609383 **[Test build #2432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2432/consoleFull)** for PR 10865 at commit [`f11f1c7`](https://github.com/apache/spark/commit/f11f1c738771339e4031c313f759fa24f3b3). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `throw new UnsupportedOperationException(s\"Cannot infer type for Java class $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11565 Replace deprecated DigestUtils.sha...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9532#issuecomment-173609858 @gliptak are you able to follow up on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10491 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: improved error message for java type inference...
GitHub user andygrove opened a pull request: https://github.com/apache/spark/pull/10865 improved error message for java type inference failure You can merge this pull request into a Git repository by running: $ git pull https://github.com/codefutures/spark SPARK-12932 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10865 commit f11f1c738771339e4031c313f759fa24f3b3 Author: Andy GroveDate: 2016-01-21T15:33:22Z improved error message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-173621659 **[Test build #2431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2431/consoleFull)** for PR 10152 at commit [`e938208`](https://github.com/apache/spark/commit/e938208d9c85515f62b41635a8445b8ab31f55f2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10866#issuecomment-173623032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49877/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/10867 [SPARK-12760] [DOCS] invalid lambda expression in python example for ⦠â¦local vs cluster @srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a while. This is related to https://github.com/apache/spark/pull/10866, basically the assignment in the lambda expression in the python example is actually invalid ``` In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: rdd.foreach(lambda x: counter += x) File "", line 1 rdd.foreach(lambda x: counter += x) ^ SyntaxError: invalid syntax ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mortada/spark doc_python_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10867 commit fc9f16a2ffb5846ecc03c4df584f611e6728573d Author: Mortada MehyarDate: 2016-01-21T16:51:28Z [SPARK-12760] [DOCS] invalid lambda expression in python example for local vs cluster --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173643528 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49880/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173643525 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/10648#issuecomment-173630521 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173635004 Does it still execute without error on a cluster? (even if it doesn't actually increment the counter in the way someone might expect.) Certainly if it doesn't compile we need to change this, but want to make sure the result with "global" executes too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10648#issuecomment-173632677 **[Test build #49878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49878/consoleFull)** for PR 10648 at commit [`ad6ce01`](https://github.com/apache/spark/commit/ad6ce01e849591d152ec04bd86109cbced291e6a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][BUILD][TEST-MAVEN]Remove duplicate de...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10868#issuecomment-173638249 CC @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173648674 @srowen I tested the python code in cluster mode (5 ec2 workers) and this works fine ``` 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.56:35937 with 6.6 GB RAM, BlockManagerId(4, 172.31.10.56, 35937) 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.55:59871 with 6.6 GB RAM, BlockManagerId(0, 172.31.10.55, 59871) 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.53:39162 with 6.6 GB RAM, BlockManagerId(1, 172.31.10.53, 39162) 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.54:59145 with 6.6 GB RAM, BlockManagerId(2, 172.31.10.54, 59145) 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.57:35000 with 6.6 GB RAM, BlockManagerId(3, 172.31.10.57, 35000) In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: def increment_counter(x): global counter counter += x ...: In [5]: rdd.foreach(increment_counter) 16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.55:59871 (size: 3.2 KB, free: 6.6 GB) 16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.56:35937 (size: 3.2 KB, free: 6.6 GB) 16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.57:35000 (size: 3.2 KB, free: 6.6 GB) 16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.53:39162 (size: 3.2 KB, free: 6.6 GB) 16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.54:59145 (size: 3.2 KB, free: 6.6 GB) (other output skipped) In [6]: print("Counter value: ", counter) Counter value: 0 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10866#issuecomment-173619459 **[Test build #49877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49877/consoleFull)** for PR 10866 at commit [`b62e318`](https://github.com/apache/spark/commit/b62e31804685209fc0443430c9ddb32c5d5a3299). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10866#issuecomment-173623028 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12760] [DOCS] inaccurate description fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10866#issuecomment-173622876 **[Test build #49877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49877/consoleFull)** for PR 10866 at commit [`b62e318`](https://github.com/apache/spark/commit/b62e31804685209fc0443430c9ddb32c5d5a3299). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org