[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r110511571 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala --- @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { object SharedState { + URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory()) --- End diff -- In a [prior PR](https://github.com/apache/spark/pull/16324), FsUrlStreamHandlerFactory is set to JVM URL class directly. @gatorsmile raised a concern that `URL.setURLStreamHandlerFactory` can be called only once per JVM, and that is the motivation of this PR. Either one is OK for me; however we've got to choose one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17574 **[Test build #75618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75618/testReport)** for PR 17574 at commit [`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17574: [SPARK-20264][SQL] asm should be non-test depende...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17574 [SPARK-20264][SQL] asm should be non-test dependency in sql/core ## What changes were proposed in this pull request? sq/core module currently declares asm as a test scope dependency. Transitively it should actually be a normal dependency since the actual core module defines it. This occasionally confuses IntelliJ. ## How was this patch tested? N/A - This is a build change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-20264 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17574.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17574 commit 2a0318882a3133cc3dbd88f824a92f83cdf2c5e7 Author: Reynold Xin Date: 2017-04-08T05:46:28Z [SPARK-20264][SQL] asm should be non-test dependency in sql/core --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510848 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -225,20 +225,28 @@ case class Invoke( getFuncResult(ev.value, s"${obj.value}.$functionName($argString)") } else { val funcResult = ctx.freshName("funcResult") - s""" -Object $funcResult = null; -${getFuncResult(funcResult, s"${obj.value}.$functionName($argString)")} -if ($funcResult == null) { - ${ev.isNull} = true; -} else { + if (!returnNullable) { --- End diff -- since we have `postNullCheck`, can we always go to this branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510800 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -608,7 +616,7 @@ case class MapObjects private( $convertedArray = $arrayConstructor; """, genValue => s"$convertedArray[$loopIndex] = $genValue;", -s"new ${classOf[GenericArrayData].getName}($convertedArray);" +s"new ${classOf[GenericArrayData].getName}($convertedArray); /*###*/" --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510776 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -577,7 +584,7 @@ object ScalaReflection extends ScalaReflection { udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(), Nil, dataType = ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt())) -Invoke(obj, "serialize", udt, inputObject :: Nil) +Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = false) --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -586,7 +593,7 @@ object ScalaReflection extends ScalaReflection { udt.getClass, Nil, dataType = ObjectType(udt.getClass)) -Invoke(obj, "serialize", udt, inputObject :: Nil) +Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = false) --- End diff -- sam here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -356,7 +361,8 @@ object ScalaReflection extends ScalaReflection { udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(), Nil, dataType = ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt())) -Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) +Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil, --- End diff -- The `deserialize` is totally implemented by users, can we guarantee not return null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17569#discussion_r110510773 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -365,7 +371,8 @@ object ScalaReflection extends ScalaReflection { udt.getClass, Nil, dataType = ObjectType(udt.getClass)) -Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) +Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil, --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r110510575 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2230,6 +2230,8 @@ class Analyzer( val result = resolved transformDown { case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { +case ArrayType(et, false) if cls.isEmpty => --- End diff -- To be safe, we should check: 1. no custom collection class specified 2. the `function` will convert an expression `e` to `AssertNotNull(e)`(this guarantees we are expecting a primitive array) 3. the `inputData` is of type array and its element is not nullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r110510454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2230,6 +2230,8 @@ class Analyzer( val result = resolved transformDown { case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { +case ArrayType(et, false) if cls.isEmpty => --- End diff -- is it really safe to do so? The `MapObject` is not only used for null checking, but also to resolve struct in array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17573 Thanks! Merging to master/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17573: [SPARK-20262][SQL] AssertNotNull should throw Nul...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17573 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17562: [SPARK-20246][SQL] should not push predicate down...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17562 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17562: [SPARK-20246][SQL] should not push predicate down throug...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17562 Thanks! Merging to master/2.1/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17573 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75617/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17573 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17573 **[Test build #75617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75617/testReport)** for PR 17573 at commit [`4c16795`](https://github.com/apache/spark/commit/4c16795fc1c06cdbb938195da2e4c80a469b47e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AssertNotNull(child: Expression, walkedTypePath: Seq[String] = Nil)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user witgo commented on the issue: https://github.com/apache/spark/pull/17567 LGTM. Are there any performance test reports? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17546 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75616/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17546 **[Test build #75616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75616/testReport)** for PR 17546 at commit [`830255c`](https://github.com/apache/spark/commit/830255ce0a3476f4d56e1d6ebf4fa3d77c7b619f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17568 @cloud-fan could you please review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17569 @cloud-fan could you please review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17573: [SPARK-20262][SQL] AssertNotNull should throw Nul...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17573 [SPARK-20262][SQL] AssertNotNull should throw NullPointerException ## What changes were proposed in this pull request? AssertNotNull currently throws RuntimeException. It should throw NullPointerException, which is more specific. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-20262 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17573.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17573 commit 4c16795fc1c06cdbb938195da2e4c80a469b47e5 Author: Reynold Xin Date: 2017-04-08T00:16:43Z [SPARK-20262][SQL] AssertNotNull should throw NullPointerException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17573 **[Test build #75617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75617/testReport)** for PR 17573 at commit [`4c16795`](https://github.com/apache/spark/commit/4c16795fc1c06cdbb938195da2e4c80a469b47e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17546 **[Test build #75616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75616/testReport)** for PR 17546 at commit [`830255c`](https://github.com/apache/spark/commit/830255ce0a3476f4d56e1d6ebf4fa3d77c7b619f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110495345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -736,6 +736,12 @@ object SQLConf { .checkValue(weight => weight >= 0 && weight <= 1, "The weight value must be in [0, 1].") .createWithDefault(0.7) + val JOIN_REORDER_DP_STAR_FILTER = +buildConf("spark.sql.cbo.joinReorder.dp.star.filter") + .doc("Applies star-join filter heuristics to cost based join enumeration.") + .booleanConf + .createWithDefault(false) --- End diff -- @ron8hu Thank you. We will keep the default false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17469 It seems somethine goes wrong with @holdnk and Jenkins. I think I dont have a permission to trigger this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17567 **[Test build #3646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3646/testReport)** for PR 17567 at commit [`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17527 **[Test build #3647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3647/testReport)** for PR 17527 at commit [`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17572: String interpolation required for error message
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17572 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17572: String interpolation required for error message
Github user vijaykramesh commented on the issue: https://github.com/apache/spark/pull/17572 I'm not sure if I should open my own jira for this issue or if that is handled by the project maintainers? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17572: String interpolation required for error message
GitHub user vijaykramesh opened a pull request: https://github.com/apache/spark/pull/17572 String interpolation required for error message ## What changes were proposed in this pull request? This error message doesn't get properly formatted because of a missing `s`. Currently the error looks like: ``` Caused by: java.lang.IllegalArgumentException: requirement failed: indices should be one-based and in ascending order; found current=$current, previous=$previous; line="$line" ``` (note the literal `$current` instead of the interpolated value) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vijaykramesh/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17572.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17572 commit 7cd0a0defe6e3ecb4bfb249b2644298230a03ac7 Author: Vijay Ramesh Date: 2017-04-07T21:41:18Z need to do string interpolation for error message to display last line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user map222 commented on the issue: https://github.com/apache/spark/pull/17469 @HyukjinKwon Do I need to do something to start the Jenkins test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFil...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17570 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17570 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75614/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75614/testReport)** for PR 17569 at commit [`ae5e232`](https://github.com/apache/spark/commit/ae5e232da543f6c7c5d6f6a3526bdb56c6f793b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110480494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -736,6 +736,12 @@ object SQLConf { .checkValue(weight => weight >= 0 && weight <= 1, "The weight value must be in [0, 1].") .createWithDefault(0.7) + val JOIN_REORDER_DP_STAR_FILTER = +buildConf("spark.sql.cbo.joinReorder.dp.star.filter") + .doc("Applies star-join filter heuristics to cost based join enumeration.") + .booleanConf + .createWithDefault(false) --- End diff -- In Spark 2.2, we introduced a couple of new configuration parameters in optimizer area. In order to play on the safe side, we set the default value to false. I suggest that we can change the default value to true AFTER we are sure that the new optimizer feature does not cause any regression. I think the system regression/integration test suites help us make a decision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...
Github user Downchuck commented on the issue: https://github.com/apache/spark/pull/13206 may be fixed in https://github.com/apache/spark/pull/16898 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17558: [SPARK-20247][CORE] Add jar but this jar is missing late...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17558 agreed, why would the jar be missing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17567 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17570 **[Test build #3645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3645/testReport)** for PR 17570 at commit [`1c69820`](https://github.com/apache/spark/commit/1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110466604 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -736,6 +736,12 @@ object SQLConf { .checkValue(weight => weight >= 0 && weight <= 1, "The weight value must be in [0, 1].") .createWithDefault(0.7) + val JOIN_REORDER_DP_STAR_FILTER = +buildConf("spark.sql.cbo.joinReorder.dp.star.filter") + .doc("Applies star-join filter heuristics to cost based join enumeration.") + .booleanConf + .createWithDefault(false) --- End diff -- @gatorsmile I am also fine with changing the default. @wzhfy What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17557#discussion_r110465373 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -906,6 +910,24 @@ predicted <- predict(model, df) head(predicted) ``` + FP-growth + +`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on a `SparkDataFrame`. + +* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` with the frequent itemsets. +* `spark.associationRules` returns a `SparkDataFrame` with the association rules. + + +```{r} +items <- selectExpr(createDataFrame(data.frame(items = c( + "s,t,u", --- End diff -- something that is not coded in 3 lines ;) reading from a file if we could - if there isn't any dataset that we can license to use, can we anonymize an existing one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17571 merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17571 right, tests don't run example anyway... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75610/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75610/testReport)** for PR 17568 at commit [`6a5fa5a`](https://github.com/apache/spark/commit/6a5fa5abb8ae73eaf2866630af070e0301660149). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17527 **[Test build #3647 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3647/testReport)** for PR 17527 at commit [`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17567 **[Test build #3646 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3646/testReport)** for PR 17567 at commit [`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17567 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17567 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17567 **[Test build #75609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75609/testReport)** for PR 17567 at commit [`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17571 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75615/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17571 **[Test build #75615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75615/testReport)** for PR 17571 at commit [`f7e71ea`](https://github.com/apache/spark/commit/f7e71ea8c01d44852fde9c1a6a930e09cc95d2e6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17557#discussion_r110459968 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -906,6 +910,24 @@ predicted <- predict(model, df) head(predicted) ``` + FP-growth + +`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on a `SparkDataFrame`. + +* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` with the frequent itemsets. +* `spark.associationRules` returns a `SparkDataFrame` with the association rules. + + +```{r} +items <- selectExpr(createDataFrame(data.frame(items = c( + "s,t,u", --- End diff -- What do you mean by "real"? Something human readable (e.g. milk, bread, butter) or some standard pattern mining dataset? If the former one then it is not a problem. If the latter one I am not aware of any dataset which would be safe enough on the license side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17571 **[Test build #75613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75613/testReport)** for PR 17571 at commit [`95b5383`](https://github.com/apache/spark/commit/95b5383fae4da22aa0552e969c05b9488accb1a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75613/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17571 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17571 **[Test build #75615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75615/testReport)** for PR 17571 at commit [`f7e71ea`](https://github.com/apache/spark/commit/f7e71ea8c01d44852fde9c1a6a930e09cc95d2e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17571#discussion_r110457676 --- Diff: examples/src/main/r/ml/glm.R --- @@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family = "gaussian") summary(gaussianGLM2) # Fit a generalized linear model of family "binomial" with spark.glm -training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") -df_list2 <- randomSplit(training2, c(7,3), 2) +training2 <- read.df("/data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") --- End diff -- Thanks! copy paste error. Corrected now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17571#discussion_r110457466 --- Diff: examples/src/main/r/ml/glm.R --- @@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family = "gaussian") summary(gaussianGLM2) # Fit a generalized linear model of family "binomial" with spark.glm -training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") --- End diff -- just a bad example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17571#discussion_r110457459 --- Diff: examples/src/main/r/ml/glm.R --- @@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family = "gaussian") summary(gaussianGLM2) # Fit a generalized linear model of family "binomial" with spark.glm -training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") -df_list2 <- randomSplit(training2, c(7,3), 2) +training2 <- read.df("/data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") --- End diff -- actually, you might need to leave it as relative path, ie. not starting with `/` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17571 @felixcheung Just noticed that the current example for logistic regression in the programming guide did not seem to be a good one. It did not converge using IRWLS, and Quasi-Newton yielded almost zero estimates for all coefficients. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17571#discussion_r110457201 --- Diff: examples/src/main/r/ml/glm.R --- @@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family = "gaussian") summary(gaussianGLM2) # Fit a generalized linear model of family "binomial" with spark.glm -training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") --- End diff -- is this an issue with `binary_classification_data` data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17571 **[Test build #75613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75613/testReport)** for PR 17571 at commit [`95b5383`](https://github.com/apache/spark/commit/95b5383fae4da22aa0552e969c05b9488accb1a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75614/testReport)** for PR 17569 at commit [`ae5e232`](https://github.com/apache/spark/commit/ae5e232da543f6c7c5d6f6a3526bdb56c6f793b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/17571 [SPARK-20258][Doc][SparkR] Fix SparkR logistic regression example in programming guide (did not converge) ## What changes were proposed in this pull request? SparkR logistic regression example did not converge in programming guide (for IRWLS). All estimates are essentially zero: ``` training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") df_list2 <- randomSplit(training2, c(7,3), 2) binomialDF <- df_list2[[1]] binomialTestDF <- df_list2[[2]] binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial") 17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to singular covariance matrix. Retrying with Quasi-Newton solver. > summary(binomialGLM) Coefficients: Estimate (Intercept)9.0255e+00 features_0 0.e+00 features_1 0.e+00 features_2 0.e+00 features_3 0.e+00 features_4 0.e+00 features_5 0.e+00 features_6 0.e+00 features_7 0.e+00 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark programGuide2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17571 commit 95b5383fae4da22aa0552e969c05b9488accb1a1 Author: actuaryzhang Date: 2017-04-07T18:37:33Z update logistic regression example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17527 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17527 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75608/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17527 **[Test build #75608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75608/testReport)** for PR 17527 at commit [`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17566 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75607/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17566 **[Test build #75607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75607/testReport)** for PR 17566 at commit [`6d21ad8`](https://github.com/apache/spark/commit/6d21ad81073fcec7bb623635328a604fb99303a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17516: [SPARK-20197][SPARKR] CRAN check fail with packag...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17516 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17516 merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17516 thanks, I find it rather odd but probably by design that the current directory is different when running `R CMD check .tgz`. will need to look at the more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17557#discussion_r110450322 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -906,6 +910,24 @@ predicted <- predict(model, df) head(predicted) ``` + FP-growth + +`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on a `SparkDataFrame`. + +* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` with the frequent itemsets. +* `spark.associationRules` returns a `SparkDataFrame` with the association rules. + + +```{r} +items <- selectExpr(createDataFrame(data.frame(items = c( + "s,t,u", --- End diff -- thanks! - I'd prefer example with real data... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/17516 Got it. LGTM. Thanks for explanation. I'm fine with merging this to master ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17570 **[Test build #3645 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3645/testReport)** for PR 17570 at commit [`1c69820`](https://github.com/apache/spark/commit/1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17570 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17570 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17553 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17553 merged to master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17553 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75612/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17553 **[Test build #75612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75612/testReport)** for PR 17553 at commit [`ca87b38`](https://github.com/apache/spark/commit/ca87b38fae0dcae66ca09db15051b2f44a3f542f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17516 There are two parts to the branch-2.1 fix First, the reason why the test failed was because `SPARK_HOME` was not set before calling `spark.install()` when running as a package. This would not be a problem when in Jenkins, but only when running with `R CMD check SparkR*.tgz`. The fix was to move `spark.install` to earlier. Second, even after the change, while testing it, I found that `R CMD check` was getting `spark-warehouse` etc in the `testthat` directory, NOT in `SPARK_HOME` - therefore that test would be essentially a no-op or always passes anyway. I made the call to disable it (with `skip_if_cran`), but that had the unintended effect of also turning off that test in Jenkins, as we are testing with `--as-cran` (as explained above) And so the attempt here in this PR to fix this for real in master. Since we are rolling our RC anytime, I don't want to delay the first fix (install.spark) only to sort out the 2nd part, which could come a bit later. If you feel that's safer, we could also add `skip_if_cran` to this test in master - just know that it will also turn off this test in Jenkins. Since with `R CMD check` the `spark-warehouse` and `metastore_db` are not written to `SPARK_HOME`, but to `testthat`, this test will pass during the package test with `R CMD check` - so long as we merge this PR to move `install.spark` first --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17553 **[Test build #75612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75612/testReport)** for PR 17553 at commit [`ca87b38`](https://github.com/apache/spark/commit/ca87b38fae0dcae66ca09db15051b2f44a3f542f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17570 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17569 **[Test build #75611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75611/testReport)** for PR 17569 at commit [`4482e1c`](https://github.com/apache/spark/commit/4482e1c2b920e201afca1379a3686df9a4db5bc9). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17569 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75611/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFil...
GitHub user adrian-ionescu opened a pull request: https://github.com/apache/spark/pull/17570 [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex ## What changes were proposed in this pull request Trying to get a grip on the `FileIndex` hierarchy, I was confused by the following inconsistency: On the one hand, `PartitioningAwareFileIndex` defines `leafFiles` and `leafDirToChildrenFiles` as abstract, but on the other it fully implements `listLeafFiles` which does all the listing of files. However, the latter is only used by `InMemoryFileIndex`. I'm hereby proposing to move this method (and all its dependencies) to the implementation class that actually uses it, and thus unclutter the `PartitioningAwareFileIndex` interface. ## How was this patch tested? `./build/sbt sql/test` You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-ionescu/apache-spark list-leaf-files Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17570.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17570 commit 1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf Author: Adrian Ionescu Date: 2017-04-07T17:06:49Z Move listLeafFiles() to InMemoryFileIndex --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17553 Issues fixed. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org