[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22754263 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- supervisor is single-threaded , I don't think we have scenario where we update concurrently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22754237 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- Why do you stop using AtomicInteger? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-69431702 [Test build #25348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25348/consoleFull) for PR 3823 at commit [`133c43e`](https://github.com/apache/spark/commit/133c43e79482d2f88392dc287aa185564c2ed557). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-69431707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25348/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3720#issuecomment-69432582 [Test build #25353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25353/consoleFull) for PR 3720 at commit [`dd0d0e8`](https://github.com/apache/spark/commit/dd0d0e8ecd36b9e607306dd170d1e22437180389). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3945#issuecomment-69432542 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3923#discussion_r22755110 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala --- @@ -15,11 +15,13 @@ * limitations under the License. */ -package org.apache.spark.mllib.stat.impl +package org.apache.spark.mllib.stat.distribution import breeze.linalg.{DenseVector = DBV, DenseMatrix = DBM, diag, max, eigSym} +import org.apache.spark.mllib.linalg.{Vectors, Vector, Matrices, Matrix} import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.annotation.DeveloperApi; --- End diff -- sort import alphabetically --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3923#discussion_r22755112 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala --- @@ -15,11 +15,13 @@ * limitations under the License. */ -package org.apache.spark.mllib.stat.impl +package org.apache.spark.mllib.stat.distribution import breeze.linalg.{DenseVector = DBV, DenseMatrix = DBM, diag, max, eigSym} +import org.apache.spark.mllib.linalg.{Vectors, Vector, Matrices, Matrix} import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.annotation.DeveloperApi; /** * This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution. In --- End diff -- Please add `:: DeveloperApi ::` before `This class ...`. We need it to generate the doc correctly. You can check other `@DeveloperApi` usage as examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69440503 Merging in master branch-1.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3987#discussion_r22753235 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala --- @@ -53,7 +53,7 @@ import scala.language.implicitConversions * * @param functionClassName UDF class name */ -class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { +case class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { --- End diff -- Ah, I see. It's externalaizable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/3988 [SPARK-5188][BUILD] make-distribution.sh should support curl, not only wget to get Tachyon When we use `make-distribution.sh` with `--with-tachyon` option, Tachyon will be downloaded by `wget` command but some systems don't have `wget` by default (MacOS X doesn't have). Other scripts like build/mvn, build/sbt support not only `wget` but also `curl` so `make-distribution.sh` should support `curl` too. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-5188 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3988.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3988 commit 83b49b5e2def5df861c21cad1c6c72be3a460e09 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2015-01-10T00:51:17Z Modified make-distribution.sh so that we use curl, not only wget to get tachyon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-69427935 [Test build #25349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25349/consoleFull) for PR 3988 at commit [`83b49b5`](https://github.com/apache/spark/commit/83b49b5e2def5df861c21cad1c6c72be3a460e09). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-69429218 Ok I'm merging this into master since tests are irrelevant here thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-69431301 [Test build #25347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25347/consoleFull) for PR 3823 at commit [`b1ab402`](https://github.com/apache/spark/commit/b1ab402a0a835a642c99064fc0fa3d4a320b8b94). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-69431314 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25347/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22754925 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -46,6 +46,33 @@ trait RelationProvider { /** * ::DeveloperApi:: + * Implemented by objects that produce relations for a specific kind of data source. When + * Spark SQL is given a DDL operation with + * 1. USING clause: to specify the implemented SchemaRelationProvider + * 2. User defined schema: users can define schema optionally when create table + * + * Users may specify the fully qualified class name of a given data source. When that class is + * not found Spark SQL will append the class name `DefaultSource` to the path, allowing for + * less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the + * data source 'org.apache.spark.sql.json.DefaultSource' + * + * A new instance of this class with be instantiated each time a DDL call is made. + */ +@DeveloperApi +trait SchemaRelationProvider { + /** + * Returns a new base relation with the given parameters and user defined schema. + * Note: the parameters' keywords are case insensitive and this insensitivity is enforced + * by the Map that is passed to the function. + */ + def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation --- End diff -- My initial idea is to compatible with the old traits, since we will have two traits i will fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69435265 [Test build #25352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25352/consoleFull) for PR 3944 at commit [`b6d63d5`](https://github.com/apache/spark/commit/b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `protected class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69435266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25352/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3941#issuecomment-69435983 [Test build #25356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25356/consoleFull) for PR 3941 at commit [`343ae27`](https://github.com/apache/spark/commit/343ae27959bcccd20b7360c9a050eb297a181e14). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22755829 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- I think, it's not single-threaded. Multiple threads can access to Supervisor. Each thread couldn't access at a same time but it includes memory-visibility problem. Or, how about marking those vals as volatile? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69436041 [Test build #25354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull) for PR 3431 at commit [`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3988#discussion_r22755950 --- Diff: build/mvn --- @@ -48,11 +48,11 @@ install_app() { # check if we already have the tarball # check if we have curl installed # download application -[ ! -f ${local_tarball} ] [ -n `which curl 2/dev/null` ] \ +[ ! -f ${local_tarball} ] [ -n `type curl 2/dev/null` ] \ --- End diff -- FWIW, the approach recommended in [this answer](http://stackoverflow.com/a/677212/877069), which I agree with, is to use `command -v`, though honestly one way or the other it doesn't seem like a big deal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756102 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- hmmm..because supervisor is implemented as an actor, n and hiccups are maintained as the state of the actor and are only accessed via the handler of the message... so...I don't think it can be accessed by multiple threads ...I missed something in the code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69440276 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25357/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69440274 [Test build #25357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull) for PR 3951 at commit [`a34bec5`](https://github.com/apache/spark/commit/a34bec5c0fec8416168836f58b98a6fa046c3a8d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GradientBoostedTreesModel(JavaModelWrapper):` * `class GradientBoostedTrees(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3987#discussion_r22753114 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala --- @@ -53,7 +53,7 @@ import scala.language.implicitConversions * * @param functionClassName UDF class name */ -class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { +case class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { --- End diff -- nit: should `functionClassName` be still `var`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3987#discussion_r22753194 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala --- @@ -53,7 +53,7 @@ import scala.language.implicitConversions * * @param functionClassName UDF class name */ -class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { +case class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { --- End diff -- Yeah, its mutated below by our custom deserialization. On Jan 9, 2015 4:31 PM, Kousuke Saruta notificati...@github.com wrote: In sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala https://github.com/apache/spark/pull/3987#discussion-diff-22753114: @@ -53,7 +53,7 @@ import scala.language.implicitConversions * * @param functionClassName UDF class name */ -class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { +case class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable { nit: should functionClassName be still var? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3987/files#r22753114. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69427688 I mean it's unexpected because they're different when they should be the same (in that case, the value of `SPARK_YARN_APP_NAME`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3823 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69431870 Oh, I see. But after this patch `SPARK_YARN_APP_NAME` becomes useless. It will make behavior in client and cluster mode to be same. Note that happens when we don't set app name in SparkConf. Otherwise it is a different issue described in [SPARK-3678](https://issues.apache.org/jira/browse/SPARK-3678). Perhaps we should file another separate PR to solve that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69431972 But after this patch SPARK_YARN_APP_NAME becomes useless That's why we should not commit this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433129 @scwf I have done it and will have a PR to your branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433190 https://github.com/scwf/spark/pull/22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-69437984 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25355/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-69437979 [Test build #25355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25355/consoleFull) for PR 3850 at commit [`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756612 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- Correct, volatile is not necessary. https://groups.google.com/forum/#!msg/scalaz/kFnICLFjO-4/GT_59mZLrFAJ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69428905 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5006][Deploy]spark.port.maxRetries does...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3841#issuecomment-69431008 @andrewor14 Yeah it is an alternative. I will try it on Monday. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69431073 [Test build #25351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25351/consoleFull) for PR 3916 at commit [`fc6a3e2`](https://github.com/apache/spark/commit/fc6a3e2597220907602f320fd2aebe43564a7461). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3720#issuecomment-69434737 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25353/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69435868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25350/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69435864 [Test build #25350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25350/consoleFull) for PR 3916 at commit [`f26556b`](https://github.com/apache/spark/commit/f26556b498cdae3fa23ea5837d673b4f5cb98c58). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] make-distribution.sh using build/mvn
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/3867 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756477 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- You try to log the current thread name in `receive` and then, you can see multiple threads access `receive`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756617 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- I see, Akka's actor makes sure the visibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
GitHub user luogankun reopened a pull request: https://github.com/apache/spark/pull/3944 [SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException CaseInsensitiveMap throws java.io.NotSerializableException. You can merge this pull request into a Git repository by running: $ git pull https://github.com/luogankun/spark SPARK-5141 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3944.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3944 commit b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32 Author: luogankun luogan...@gmail.com Date: 2015-01-08T08:19:23Z [SPARK-5141]CaseInsensitiveMap throws java.io.NotSerializableException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3987#issuecomment-69428763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25346/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3987#issuecomment-69428754 [Test build #25346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25346/consoleFull) for PR 3987 at commit [`8bca2fa`](https://github.com/apache/spark/commit/8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveFunctionWrapper(functionClassName: String) extends java.io.Serializable ` * `case class HiveFunctionWrapper(var functionClassName: String) extends java.io.Externalizable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69430581 [Test build #25350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25350/consoleFull) for PR 3916 at commit [`f26556b`](https://github.com/apache/spark/commit/f26556b498cdae3fa23ea5837d673b4f5cb98c58). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69432493 Or we could just make SPARK_YARN_APP_NAME disappear? As the env variable is not recommended and it will cause different behavior. User can still use `spark.app.name`. Or we should make SPARK_YARN_APP_NAME and spark.app.name a special case in `YarnClientSchedulerBackend.scala` ? What do you two think? @tgravescs @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3945 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433261 ok, merged! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-69433277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25349/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4749] [mllib]: Allow initializing KMean...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3610#issuecomment-69433216 @nxwhite-str There are few minor comments left. Do you have time to update the PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3988#issuecomment-69433274 [Test build #25349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25349/consoleFull) for PR 3988 at commit [`83b49b5`](https://github.com/apache/spark/commit/83b49b5e2def5df861c21cad1c6c72be3a460e09). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433301 [Test build #25354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull) for PR 3431 at commit [`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3720#issuecomment-69434732 [Test build #25353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25353/consoleFull) for PR 3720 at commit [`dd0d0e8`](https://github.com/apache/spark/commit/dd0d0e8ecd36b9e607306dd170d1e22437180389). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long)` * ` case class Movie(movieId: Int, title: String, genres: Seq[String])` * ` case class Params(` * `class ALS extends Estimator[ALSModel] with ALSParams ` * ` case class RatingBlock(srcIds: Array[Int], dstIds: Array[Int], ratings: Array[Float]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user alexliu68 commented on a diff in the pull request: https://github.com/apache/spark/pull/3941#discussion_r22755728 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -178,10 +178,23 @@ class SqlParser extends AbstractSparkSQLParser { joinedRelation | relationFactor protected lazy val relationFactor: Parser[LogicalPlan] = -( ident ~ (opt(AS) ~ opt(ident)) ^^ { -case tableName ~ alias = UnresolvedRelation(None, tableName, alias) +( + ident ~ (. ~ ident) ~ (. ~ ident) ~ (. ~ ident) ~ (opt(AS) ~ opt(ident)) ^^ { +case reserveName1 ~ reserveName2 ~ dbName ~ tableName ~ alias = + UnresolvedRelation(IndexedSeq(tableName, dbName, reserveName2, reserveName1), alias) } -| (( ~ start ~ )) ~ (AS.? ~ ident) ^^ { case s ~ a = Subquery(a, s) } + | ident ~ (. ~ ident) ~ (. ~ ident) ~ (opt(AS) ~ opt(ident)) ^^ { +case reserveName1 ~ dbName ~ tableName ~ alias = + UnresolvedRelation(IndexedSeq(tableName, dbName, reserveName1), alias) + } + | ident ~ (. ~ ident) ~ (opt(AS) ~ opt(ident)) ^^ { + case dbName ~ tableName ~ alias = +UnresolvedRelation(IndexedSeq(tableName, dbName), alias) +} + | ident ~ (opt(AS) ~ opt(ident)) ^^ { + case tableName ~ alias = UnresolvedRelation(IndexedSeq(tableName), alias) --- End diff -- I change it to rep1sep(ident, .) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user alexliu68 commented on a diff in the pull request: https://github.com/apache/spark/pull/3941#discussion_r22755736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala --- @@ -115,43 +101,41 @@ class SimpleCatalog(val caseSensitive: Boolean) extends Catalog { trait OverrideCatalog extends Catalog { // TODO: This doesn't work when the database changes... - val overrides = new mutable.HashMap[(Option[String],String), LogicalPlan]() + val overrides = new mutable.HashMap[String, LogicalPlan]() --- End diff -- restore it to (Option[String],String) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3944 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69446001 I went ahead and implemented locality and checkpointing of generated rdds. Couple of points - still depends on SPARK-4014 eventually being merged, for efficiency's sake. - I ran into classloader / class not found issues trying to checkpoint KafkaRDDPartition directly. Current solution is to transform them to/from tuples, ugly but it works. If you know what the issue is there, let me know. - I've got a use case that requires overriding the compute method on the DStream (basically, modifying offsets to a fixed delay rather than now). I'm assuming you'd prefer a user supplied function to do the transformation rather than subclassing, but let me know. On Mon, Jan 5, 2015 at 7:59 PM, Tathagata Das notificati...@github.com wrote: Great! Keep me posted. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-68815205. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-69426979 Oh gosh it is merged finally. Thanks guys for persistent comments. @andrewor14 @tgravescs @vanzin @sryza --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69432161 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]Tag EC2 instances in the same call...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3986#issuecomment-69437259 By the way, please also update the title of this PR to match the approach you are taking, since as you noted we can't actually use the same call to launch and tag instances. You can leave the JIRA tag at the beginning as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3965#issuecomment-69445304 [Test build #25358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25358/consoleFull) for PR 3965 at commit [`42411e0`](https://github.com/apache/spark/commit/42411e002d729f855e33f0da61ab2bd4f0f65b24). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3965#issuecomment-69445305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25358/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69426504 @vanzin Which one do you mean? Client mode or Cluster mode? @tgravescs I looked the name on RM's UI. I checked SPARK-3678 and realized that if no `spark.app.name` in configuration file or `--name` in command args, in cluster mode it will use `mainClass`. But in client mode, cause usually we use `SparkConf.setAppName` in application code, so on RM's UI it will show what we set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69430819 I understand all that. I'm saying that's unexpected, in that I'd expect both modes to behave the same. So if there's anything to fix here, that's it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69430615 I am afraid it is not. In cluster mode, in `SparkSubmitArguments.scala` will be assigned with `mainClass` if `spark.app.name` or `--name` is not specified as it will not read `SPARK_YARN_APP_NAME`. Then in `SparkSubmit.scala` it will pass `args.name` in format of `spark.app.name` and `--name` to `org.apache.spark.deploy.yarn.Client`. `yarn.Client.scala` transform the args to a ClientArguments object, in which `appName` will only get its value from `--name`. So, in the progress, the app name would never get value from the env `SPARK_YARN_APP_NAME`. It is only used in client mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-69432664 Although in general we should honor Spark properties over environment variables, the app name has been a special case and should remain so for backward compatibility. For this PR, I think the goal is to maintain behavior in the before table by making more changes in `YarnClientSchedulerBackend`. Additionally, it is not intuitive that if you set both `SPARK_YARN_APP_NAME` and `spark.app.name`, the behavior is inconsistent between client mode and cluster mode. I think the app name should be a special case for both deploy modes, but we can fix that in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3941#issuecomment-69439434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25356/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/3945#issuecomment-69445130 @jkbradley @mengxr Thanks for the quick reviews and merge. Looking to contribute more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove permission for execution from s...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3983#issuecomment-69428110 BTW, I was testing some things on Windows the other day, and the 644 permissions did turn out to be an issue. Probably because I was rsyncing the files from a Linux host and rsync would then translate the permissions to not allow them to be executable on the Windows side... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69435466 [Test build #25351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25351/consoleFull) for PR 3916 at commit [`fc6a3e2`](https://github.com/apache/spark/commit/fc6a3e2597220907602f320fd2aebe43564a7461). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-69435468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25351/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3988#discussion_r22755934 --- Diff: build/mvn --- @@ -48,11 +48,11 @@ install_app() { # check if we already have the tarball # check if we have curl installed # download application -[ ! -f ${local_tarball} ] [ -n `which curl 2/dev/null` ] \ +[ ! -f ${local_tarball} ] [ -n `type curl 2/dev/null` ] \ --- End diff -- Why are we replacing `which` with `type`? What's the difference between the two commands? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69436744 [Test build #25357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull) for PR 3951 at commit [`a34bec5`](https://github.com/apache/spark/commit/a34bec5c0fec8416168836f58b98a6fa046c3a8d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3941#issuecomment-69439432 [Test build #25356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25356/consoleFull) for PR 3941 at commit [`343ae27`](https://github.com/apache/spark/commit/343ae27959bcccd20b7360c9a050eb297a181e14). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756597 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- Hi, @sarutak , I went back to Akka's document http://doc.akka.io/docs/akka/snapshot/general/jmm.html (Actors and the Java Memory Model), I think they stated that, internal fields of the actor are visible when the next message is processed by that actor. So fields in your actor need not be volatile or equivalent. So, we don't need to explicitly mark these variables to be volatile? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69432309 [Test build #25352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25352/consoleFull) for PR 3944 at commit [`b6d63d5`](https://github.com/apache/spark/commit/b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3923#discussion_r22755113 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala --- @@ -30,33 +32,68 @@ import org.apache.spark.mllib.util.MLUtils * @param mu The mean vector of the distribution * @param sigma The covariance matrix of the distribution */ -private[mllib] class MultivariateGaussian( -val mu: DBV[Double], -val sigma: DBM[Double]) extends Serializable { +@DeveloperApi +class MultivariateGaussian private[mllib] ( +private[mllib] val mu: DBV[Double], --- End diff -- Instead of having `mu`/`sigma` private and add getters, could we make them MLlib vector/matrix types and add private members of breeze types? Then we can make this constructor public and remove getters. The overhead is little because we don't copy the data arrays. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3923#issuecomment-69433464 @tgaloppo Besides inline comments, please resolve conflicts with the master branch. The patch does not merge cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-69434441 [Test build #25355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25355/consoleFull) for PR 3850 at commit [`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-69434332 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69436044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25354/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3944#issuecomment-69438903 oh, since users may pass the ```CaseInsensitiveMap``` into scan builder relation, make it ```Serializable``` more robust. this LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756565 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag]( class Supervisor extends Actor { override val supervisorStrategy = receiverSupervisorStrategy -val worker = context.actorOf(props, name) -logInfo(Started receiver worker at: + worker.path) - -val n: AtomicInteger = new AtomicInteger(0) -val hiccups: AtomicInteger = new AtomicInteger(0) - --- End diff -- I see what you meanyes, you're correct, since the running thread of the actor can be changed before the updated value is written back to the memory --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3965#issuecomment-69442589 [Test build #25358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25358/consoleFull) for PR 3965 at commit [`42411e0`](https://github.com/apache/spark/commit/42411e002d729f855e33f0da61ab2bd4f0f65b24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]Tag EC2 instances in the same call...
Github user GenTang commented on a diff in the pull request: https://github.com/apache/spark/pull/3986#discussion_r22751730 --- Diff: ec2/spark_ec2.py --- @@ -569,15 +569,28 @@ def launch_cluster(conn, opts, cluster_name): master_nodes = master_res.instances print Launched master in %s, regid = %s % (zone, master_res.id) -# Give the instances descriptive names +# Give the instances descriptive names. +# The code of handling exceptions corresponds to issue [SPARK-4983] for master in master_nodes: -master.add_tag( -key='Name', -value='{cn}-master-{iid}'.format(cn=cluster_name, iid=master.id)) +while True: +try: +master.add_tag( +key='Name', +value='{cn}-master-{iid}'.format(cn=cluster_name, iid=master.id)) +except: +pass --- End diff -- I think that it takes some time for EC2 to return an instance not existing exception . That's why I leave pass in the exception. However, Maybe we should add a small wait time to ensure that we don't submit too much requests to ec2 Yes, Here we just want to catch the exception of instance not existing. You are right, it is better to use specific exception. I will work on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-69422290 I see, thanks for your detailed explanations @suyanNone @liyezhang556520. If the problem is that we double count after we put the block in memory, shouldn't we also release the pending memory *after* we actually put the block (i.e. after [this line](https://github.com/apache/spark/blob/4e1f12d997426560226648d62ee17c90352613e7/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L344)), not before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user tomerk commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r22752063 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala --- @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.SparkContext._ +import org.apache.spark.ml.classification.{Classifier, ClassifierParams, ClassificationModel} +import org.apache.spark.ml.param.{Params, IntParam, ParamMap} +import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors, VectorUDT} +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.sql.{DataType, SchemaRDD, Row, SQLContext} + +/** + * A simple example demonstrating how to write your own learning algorithm using Estimator, + * Transformer, and other abstractions. + * This mimics [[org.apache.spark.ml.classification.LogisticRegression]]. + * Run with + * {{{ + * bin/run-example ml.DeveloperApiExample + * }}} + */ +object DeveloperApiExample { + + def main(args: Array[String]) { +val conf = new SparkConf().setAppName(DeveloperApiExample) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Prepare training data. +val training = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)), + LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5 + +// Create a LogisticRegression instance. This instance is an Estimator. +val lr = new MyLogisticRegression() +// Print out the parameters, documentation, and any default values. +println(MyLogisticRegression parameters:\n + lr.explainParams() + \n) + +// We may set parameters using setter methods. +lr.setMaxIter(10) + +// Learn a LogisticRegression model. This uses the parameters stored in lr. +val model = lr.fit(training) + +// Prepare test data. +val test = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)), + LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)), + LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5 + +// Make predictions on test data. +val sumPredictions: Double = model.transform(test) + .select('features, 'label, 'prediction) + .collect() + .map { case Row(features: Vector, label: Double, prediction: Double) = +prediction + }.sum +assert(sumPredictions == 0.0, + MyLogisticRegression predicted something other than 0, even though all weights are 0!) + } +} + +/** + * Example of defining a parameter trait for a user-defined type of [[Classifier]]. + * + * NOTE: This is private since it is an example. In practice, you may not want it to be private. + */ +private trait MyLogisticRegressionParams extends ClassifierParams { + + /** param for max number of iterations */ + val maxIter: IntParam = new IntParam(this, maxIter, max number of iterations) + def getMaxIter: Int = get(maxIter) +} + +/** + * Example of defining a type of [[Classifier]]. + * + * NOTE: This is private since it is an example. In practice, you may not want it to be private. + */ +private class MyLogisticRegression + extends Classifier[Vector, MyLogisticRegression, MyLogisticRegressionModel] + with MyLogisticRegressionParams { + + setMaxIter(100) // Initialize + + def setMaxIter(value: Int): this.type = set(maxIter, value) + + override def fit(dataset: SchemaRDD, paramMap: ParamMap): MyLogisticRegressionModel = { +// Check schema (types). This allows early failure before running the algorithm. +
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3987#issuecomment-69422858 [Test build #25346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25346/consoleFull) for PR 3987 at commit [`8bca2fa`](https://github.com/apache/spark/commit/8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user tomerk commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r22752140 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala --- @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.SparkContext._ +import org.apache.spark.ml.classification.{Classifier, ClassifierParams, ClassificationModel} +import org.apache.spark.ml.param.{Params, IntParam, ParamMap} +import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors, VectorUDT} +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.sql.{DataType, SchemaRDD, Row, SQLContext} + +/** + * A simple example demonstrating how to write your own learning algorithm using Estimator, + * Transformer, and other abstractions. + * This mimics [[org.apache.spark.ml.classification.LogisticRegression]]. + * Run with + * {{{ + * bin/run-example ml.DeveloperApiExample + * }}} + */ +object DeveloperApiExample { + + def main(args: Array[String]) { +val conf = new SparkConf().setAppName(DeveloperApiExample) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Prepare training data. +val training = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)), + LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5 + +// Create a LogisticRegression instance. This instance is an Estimator. +val lr = new MyLogisticRegression() +// Print out the parameters, documentation, and any default values. +println(MyLogisticRegression parameters:\n + lr.explainParams() + \n) + +// We may set parameters using setter methods. +lr.setMaxIter(10) + +// Learn a LogisticRegression model. This uses the parameters stored in lr. +val model = lr.fit(training) + +// Prepare test data. +val test = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)), + LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)), + LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5 + +// Make predictions on test data. +val sumPredictions: Double = model.transform(test) + .select('features, 'label, 'prediction) + .collect() + .map { case Row(features: Vector, label: Double, prediction: Double) = +prediction + }.sum +assert(sumPredictions == 0.0, + MyLogisticRegression predicted something other than 0, even though all weights are 0!) + } +} + +/** + * Example of defining a parameter trait for a user-defined type of [[Classifier]]. + * + * NOTE: This is private since it is an example. In practice, you may not want it to be private. + */ +private trait MyLogisticRegressionParams extends ClassifierParams { + + /** param for max number of iterations */ + val maxIter: IntParam = new IntParam(this, maxIter, max number of iterations) + def getMaxIter: Int = get(maxIter) +} + +/** + * Example of defining a type of [[Classifier]]. + * + * NOTE: This is private since it is an example. In practice, you may not want it to be private. + */ +private class MyLogisticRegression + extends Classifier[Vector, MyLogisticRegression, MyLogisticRegressionModel] + with MyLogisticRegressionParams { + + setMaxIter(100) // Initialize + + def setMaxIter(value: Int): this.type = set(maxIter, value) + + override def fit(dataset: SchemaRDD, paramMap: ParamMap): MyLogisticRegressionModel = { +// Check schema (types). This allows early failure before running the algorithm. +
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/3987 [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark hiveUdfCaching Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3987 commit 8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61 Author: Michael Armbrust mich...@databricks.com Date: 2015-01-09T23:54:18Z [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-69423665 Also, the other issue with this patch is that `unrollSafely` is not used exclusively with `tryToPut`; it is also used in `CacheManager#putInBlockManager`. If we acquire pending memory in `unrollSafely` and expect `tryToPut` to release it later, then we will never release the pending memory in the `CacheManager` case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user tomerk commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r22752339 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -80,69 +50,157 @@ class LogisticRegression extends Estimator[LogisticRegressionModel] with Logisti def setRegParam(value: Double): this.type = set(regParam, value) def setMaxIter(value: Int): this.type = set(maxIter, value) - def setLabelCol(value: String): this.type = set(labelCol, value) def setThreshold(value: Double): this.type = set(threshold, value) - def setFeaturesCol(value: String): this.type = set(featuresCol, value) - def setScoreCol(value: String): this.type = set(scoreCol, value) - def setPredictionCol(value: String): this.type = set(predictionCol, value) override def fit(dataset: SchemaRDD, paramMap: ParamMap): LogisticRegressionModel = { +// Check schema transformSchema(dataset.schema, paramMap, logging = true) -import dataset.sqlContext._ + +// Extract columns from data. If dataset is persisted, do not persist oldDataset. +val oldDataset = extractLabeledPoints(dataset, paramMap) val map = this.paramMap ++ paramMap -val instances = dataset.select(map(labelCol).attr, map(featuresCol).attr) - .map { case Row(label: Double, features: Vector) = -LabeledPoint(label, features) - }.persist(StorageLevel.MEMORY_AND_DISK) +val handlePersistence = dataset.getStorageLevel == StorageLevel.NONE +if (handlePersistence) { + oldDataset.persist(StorageLevel.MEMORY_AND_DISK) +} + +// Train model val lr = new LogisticRegressionWithLBFGS lr.optimizer .setRegParam(map(regParam)) .setNumIterations(map(maxIter)) -val lrm = new LogisticRegressionModel(this, map, lr.run(instances).weights) -instances.unpersist() +val oldModel = lr.run(oldDataset) +val lrm = new LogisticRegressionModel(this, map, oldModel.weights, oldModel.intercept) + +if (handlePersistence) { + oldDataset.unpersist() +} + // copy model params Params.inheritValues(map, this, lrm) lrm } - private[ml] override def transformSchema(schema: StructType, paramMap: ParamMap): StructType = { -validateAndTransformSchema(schema, paramMap, fitting = true) - } + override protected def featuresDataType: DataType = new VectorUDT } + /** * :: AlphaComponent :: + * * Model produced by [[LogisticRegression]]. */ @AlphaComponent class LogisticRegressionModel private[ml] ( override val parent: LogisticRegression, --- End diff -- Why do models need to have a reference to the Estimator that produced them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/3987#issuecomment-69423882 lgtm. as we just discussed, this is the same code path as SchemaRDD.cache(), so no need for additional tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3959#issuecomment-69424023 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25345/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3959#issuecomment-69424015 [Test build #25345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25345/consoleFull) for PR 3959 at commit [`5425314`](https://github.com/apache/spark/commit/542531483312b77ed941c277f3e05c4ef1867534). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user tomerk commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r22752722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import org.apache.spark.annotation.{AlphaComponent, DeveloperApi} +import org.apache.spark.ml.param.{HasProbabilityCol, ParamMap, Params} +import org.apache.spark.mllib.linalg.{Vector, VectorUDT} +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.analysis.Star + +/** + * Params for probabilistic classification. + */ +private[classification] trait ProbabilisticClassifierParams + extends ClassifierParams with HasProbabilityCol { + + override protected def validateAndTransformSchema( + schema: StructType, + paramMap: ParamMap, + fitting: Boolean, + featuresDataType: DataType): StructType = { +val parentSchema = super.validateAndTransformSchema(schema, paramMap, fitting, featuresDataType) +val map = this.paramMap ++ paramMap +addOutputColumn(parentSchema, map(probabilityCol), new VectorUDT) + } +} + + +/** + * :: AlphaComponent :: + * + * Single-label binary or multiclass classifier which can output class conditional probabilities. + * + * @tparam FeaturesType Type of input features. E.g., [[Vector]] + * @tparam Learner Concrete Estimator type + * @tparam M Concrete Model type + */ +@AlphaComponent +abstract class ProbabilisticClassifier[ +FeaturesType, +Learner : ProbabilisticClassifier[FeaturesType, Learner, M], +M : ProbabilisticClassificationModel[FeaturesType, M]] + extends Classifier[FeaturesType, Learner, M] with ProbabilisticClassifierParams { + + def setProbabilityCol(value: String): Learner = set(probabilityCol, value).asInstanceOf[Learner] +} + + +/** + * :: AlphaComponent :: + * + * Model produced by a [[ProbabilisticClassifier]]. + * Classes are indexed {0, 1, ..., numClasses - 1}. + * + * @tparam FeaturesType Type of input features. E.g., [[Vector]] + * @tparam M Concrete Model type + */ +@AlphaComponent +abstract class ProbabilisticClassificationModel[ +FeaturesType, +M : ProbabilisticClassificationModel[FeaturesType, M]] + extends ClassificationModel[FeaturesType, M] with ProbabilisticClassifierParams { + + def setProbabilityCol(value: String): M = set(probabilityCol, value).asInstanceOf[M] + + /** + * Transforms dataset by reading from [[featuresCol]], and appending new columns as specified by + * parameters: + * - predicted labels as [[predictionCol]] of type [[Double]] + * - raw predictions (confidences) as [[rawPredictionCol]] of type [[Vector]] + * - probability of each class as [[probabilityCol]] of type [[Vector]]. + * + * @param dataset input dataset + * @param paramMap additional parameters, overwrite embedded params + * @return transformed dataset + */ + override def transform(dataset: SchemaRDD, paramMap: ParamMap): SchemaRDD = { +// This default implementation should be overridden as needed. +import dataset.sqlContext._ +import org.apache.spark.sql.catalyst.dsl._ + +// Check schema +transformSchema(dataset.schema, paramMap, logging = true) +val map = this.paramMap ++ paramMap + +// Prepare model +val tmpModel = if (paramMap.size != 0) { + val tmpModel = this.copy() + Params.inheritValues(paramMap, parent, tmpModel) + tmpModel +} else { + this +} + +val (numColsOutput, outputData) = + ClassificationModel.transformColumnsImpl[FeaturesType](dataset, tmpModel, map) + +// Output selected columns only. +if (map(probabilityCol) != ) { + //