[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9951#discussion_r45837347 --- Diff: core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala --- @@ -57,7 +57,9 @@ private[spark] object ShutdownHookManager extends Logging { // Add a shutdown hook to delete the temp dirs when the JVM exits addShutdownHook(TEMP_DIR_SHUTDOWN_PRIORITY) { () => logInfo("Shutdown hook called") -shutdownDeletePaths.foreach { dirPath => +//we need to materialize the paths to delete because deleteRecursively removes items from --- End diff -- need a space after // - otherwise this will fail the style check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL] Unable to resolve order by ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9961#issuecomment-159528020 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL] Unable to resolve order by ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9961#issuecomment-159528023 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46670/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL] Unable to resolve order by ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9961#issuecomment-159527924 **[Test build #46670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46670/consoleFull)** for PR 9961 at commit [`8ad6897`](https://github.com/apache/spark/commit/8ad6897b64c3bab3c249e79e54d2979bd672f2c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user DoingDone9 commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159527936 Ok, get it @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/9597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-159527637 OK, I will close this, since some partitions implementation relies on the position of the partition array, so this implementation may be failed in some cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9889#issuecomment-159527582 @rxin Sure, will do the Python testing at first. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION] Fix minor doc error
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9956#discussion_r45836961 --- Diff: docs/configuration.md --- @@ -35,7 +35,7 @@ val sc = new SparkContext(conf) {% endhighlight %} Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may -actually require one to prevent any sort of starvation issues. +actually require more than 1 thread to prevent any sort of starvation issues. --- End diff -- cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION] Fix minor doc error
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9956#discussion_r45836959 --- Diff: docs/configuration.md --- @@ -35,7 +35,7 @@ val sc = new SparkContext(conf) {% endhighlight %} Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may -actually require one to prevent any sort of starvation issues. +actually require more than 1 thread to prevent any sort of starvation issues. --- End diff -- at least one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9889#issuecomment-159527365 @gatorsmile just fyi if you have time, the python tests stuff is probably much more important than the more complicated case of caching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9889#issuecomment-159526861 Now, I understood your concern. Thank you for the example! I added your example into the newly created testcase suite `CacheSuite`. I saw the failure and thus used `ignore` to disable the case. I will keep investigating the issue after the merge. Running the test cases in my local computer. Will upload the new changes tomorrow morning. Thank you for your help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9889#discussion_r45836772 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -579,11 +580,50 @@ class Dataset[T] private[sql]( */ def takeAsList(num: Int): java.util.List[T] = java.util.Arrays.asList(take(num) : _*) + + /* *** * + * Cache * + * *** */ + + /** +* @since 1.6.0 +*/ --- End diff -- @marmbrus moving functions into Queryable actually breaks both scaladoc and javadoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9959#issuecomment-159526655 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9959#issuecomment-159526613 **[Test build #46668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46668/consoleFull)** for PR 9959 at commit [`2fb7a1c`](https://github.com/apache/spark/commit/2fb7a1cd42664c281bfc64bf584b8f762f828b4d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9959#issuecomment-159526658 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46668/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9889#discussion_r45836459 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -579,11 +580,50 @@ class Dataset[T] private[sql]( */ def takeAsList(num: Int): java.util.List[T] = java.util.Arrays.asList(take(num) : _*) + + /* *** * + * Cache * + * *** */ + + /** +* @since 1.6.0 +*/ --- End diff -- So far, we are unable to move the functions to `Queryable`. I just added the descriptions in both `DataFrame` and `Dataset`. Hopefully, it resolves your concern. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159525947 Ah ok - can you update the change to add a line of comment saying we need to materialize the paths to delete because deleteRecursively removes items from shutdownDeletePaths as we are traversing through it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user DoingDone9 commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159525672 ``` shutdownDeletePaths.foreach { dirPath => try { logInfo("Deleting directory " + dirPath) Utils.deleteRecursively(new File(dirPath)) } catch { case e: Exception => logError(s"Exception while deleting Spark temp dir: $dirPath", e) } } ``` `Utils.deleteRecursively(new File(dirPath) ` call `ShutdownHookManager.removeShutdownDeleteDir(file)` ``` def deleteRecursively(file: File) { ... ShutdownHookManager.removeShutdownDeleteDir(file) ... } ``` `ShutdownHookManager.removeShutdownDeleteDir(file) `will deleting elements of shutdownDeletePaths ``` def removeShutdownDeleteDir(file: File) { val absolutePath = file.getAbsolutePath() shutdownDeletePaths.synchronized { shutdownDeletePaths.remove(absolutePath) } } ``` @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-159524808 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-159524810 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/4/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-159524728 **[Test build #4 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)** for PR 9840 at commit [`6c9dc1e`](https://github.com/apache/spark/commit/6c9dc1e22fb88229247cf1bb284c06715b089da3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` * For example, we build an encoder for `case class Data(a: Int, b: String)` and the real type`\n * `case class Cast(child: Expression, dataType: DataType) extends UnaryExpression `\n * `case class UpCast(child: Expression, dataType: DataType) extends UnaryExpression with Unevaluable`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-159523428 @viirya can you bring this up to date? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9959#discussion_r45835272 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2028,4 +2028,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { Row(false) :: Row(true) :: Nil) } + test("push filter through aggregation with alias and literals") { --- End diff -- +1 It'd be better to have a unit test for the optimizer ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10387][ML][WIP] Add code gen for gbt
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9524#issuecomment-159522953 **[Test build #46677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46677/consoleFull)** for PR 9524 at commit [`866feab`](https://github.com/apache/spark/commit/866feabd866a99fbfa3934a819d8aed24fcf3c1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11961][DOC] Add docs of ChiSqSelector
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9965#issuecomment-159522884 **[Test build #46678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46678/consoleFull)** for PR 9965 at commit [`4abd9d2`](https://github.com/apache/spark/commit/4abd9d290bba24f75e09c516e9e06657b03bcfbd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9958 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159522526 @tdas merging to master and 1.6 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9959#discussion_r45835007 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2028,4 +2028,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { Row(false) :: Row(true) :: Nil) } + test("push filter through aggregation with alias and literals") { --- End diff -- should we write test in `FilterPushdownSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11961][DOC] Add docs of ChiSqSelector
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/9965 [SPARK-11961][DOC] Add docs of ChiSqSelector https://issues.apache.org/jira/browse/SPARK-11961 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yinxusen/spark SPARK-11961 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9965.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9965 commit 4abd9d290bba24f75e09c516e9e06657b03bcfbd Author: Xusen Yin Date: 2015-11-25T07:10:04Z add docs of ChiSqSelector --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9959#discussion_r45834951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -690,7 +690,15 @@ object PushPredicateThroughAggregate extends Rule[LogicalPlan] with PredicateHel def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, aggregate @ Aggregate(groupingExpressions, aggregateExpressions, grandChild)) => - val (pushDown, stayUp) = splitConjunctivePredicates(condition).partition { + + // Create a map of Alias for grouping keys or literals + val aliasMap = AttributeMap(aggregateExpressions.collect { +case a: Alias if groupingExpressions.contains(a.child) || a.child.foldable => --- End diff -- This doesn't work if a grouping expression is inside another expression, for example, `key + 1 as k`, after remove the alias, `groupingExpressions` doesn't contain `key + 1` and we will fail to push down it. I think we don't need the `if` here, we have `conjunct.references subsetOf AttributeSet(groupingExpressions)` below to decide whether to push a condition or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159521906 The part I don't get is --- where are we deleting elements of shutdownDeletePaths? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159521935 @zsxwing thank you. please use your newly acquired power to merge it to master and 1.6 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/9959#discussion_r45834786 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -690,7 +690,15 @@ object PushPredicateThroughAggregate extends Rule[LogicalPlan] with PredicateHel def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, aggregate @ Aggregate(groupingExpressions, aggregateExpressions, grandChild)) => - val (pushDown, stayUp) = splitConjunctivePredicates(condition).partition { + + // Create a map of Alias for grouping keys or literals + val aliasMap = AttributeMap(aggregateExpressions.collect { +case a: Alias if groupingExpressions.contains(a.child) || a.child.foldable => + (a.toAttribute, a.child) + }) + val newCond = PushPredicateThroughProject.replaceAlias(condition, aliasMap) + + val (pushDown, stayUp) = splitConjunctivePredicates(newCond).partition { conjunct => conjunct.references subsetOf AttributeSet(groupingExpressions) --- End diff -- Can you see if you can make AttributeSet deal with this. It is more general and more consistent with what I understand the intent of AttributeSet is. The comment in AttributeSet is clear that it wants AttributeReferences which Aliases are not. Is that possible? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159521569 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159521418 **[Test build #46676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46676/consoleFull)** for PR 9921 at commit [`458cf67`](https://github.com/apache/spark/commit/458cf671b72bc3f0c5e76dff689b765933de5576). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159521575 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46665/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user DoingDone9 commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159521038 It can not delete all element of shutdownDeletePaths. Like the example above, this method can not delete all element of a. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/9959#discussion_r45834726 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -690,7 +690,15 @@ object PushPredicateThroughAggregate extends Rule[LogicalPlan] with PredicateHel def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, aggregate @ Aggregate(groupingExpressions, aggregateExpressions, grandChild)) => - val (pushDown, stayUp) = splitConjunctivePredicates(condition).partition { + + // Create a map of Alias for grouping keys or literals + val aliasMap = AttributeMap(aggregateExpressions.collect { +case a: Alias if groupingExpressions.contains(a.child) || a.child.foldable => + (a.toAttribute, a.child) + }) + val newCond = PushPredicateThroughProject.replaceAlias(condition, aliasMap) --- End diff -- I think this is confusing to read. Not clear why this is calling a utility in PushPredicateThroughProject. Move this into a better named place (similar to PredicateHelper) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159521218 **[Test build #46665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46665/consoleFull)** for PR 9921 at commit [`0d62b5e`](https://github.com/apache/spark/commit/0d62b5e880eb6c0b6578850936eb7fba5d86e4cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159519962 @rxin @cloud-fan Just combined all the changes you mentioned in the comments. Thank you for your inputs! : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159516332 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159516339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159515784 **[Test build #46671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46671/consoleFull)** for PR 9958 at commit [`2cdec72`](https://github.com/apache/spark/commit/2cdec72e1d1b89fbf0a26c70c259e077d5ef25b7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159513573 LGTM except some minor style comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r45833897 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -367,6 +372,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { 1 -> "a", 2 -> "bc", 3 -> "d") } + test("sample with replacement") { +val n = 100 +val data = sparkContext.parallelize(1 to n, 2).toDS() +checkAnswer( + data.sample(withReplacement = true, 0.05, seed = 13), + Seq(5, 10, 52, 73): _*) --- End diff -- instead of `Seq(xxx, yyy, ...): _*`, why not just `xxx, yyy, ...`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11963][DOC] Add docs for QuantileDiscre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9962#issuecomment-159513443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46673/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11962] Added getAsOpt[T]() methods to e...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9943#issuecomment-159513430 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11963][DOC] Add docs for QuantileDiscre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9962#issuecomment-159513442 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11963][DOC] Add docs for QuantileDiscre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9962#issuecomment-159513343 **[Test build #46673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46673/consoleFull)** for PR 9962 at commit [`b054a54`](https://github.com/apache/spark/commit/b054a54d814b64572d036857181c7c6f5e2a53a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `public class JavaQuantileDiscretizerExample `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r45833828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -185,17 +185,23 @@ class DatasetSuite extends QueryTest with SharedSQLContext { val ds2 = Seq(1, 2).toDS().as("b") checkAnswer( - ds1.joinWith(ds2, $"a.value" === $"b.value"), + ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"), (1, 1), (2, 2)) } - test("joinWith, expression condition") { -val ds1 = Seq(ClassData("a", 1), ClassData("b", 2)).toDS() -val ds2 = Seq(("a", 1), ("b", 2)).toDS() + test("joinWith, expression condition, outer join") { +val nullInteger = null.asInstanceOf[Integer] +val nullString = null.asInstanceOf[String] +val ds1 = Seq(ClassNullableData("a", new Integer(1)), --- End diff -- nit: we can just pass in `1`, and compile will auto box for us. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11981][SQL] Move implementations of met...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9964#issuecomment-159513085 **[Test build #46675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46675/consoleFull)** for PR 9964 at commit [`4a1ca72`](https://github.com/apache/spark/commit/4a1ca7262f4217ae126b871853181288fb85ebf6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9955#issuecomment-159511727 Yeah, that is also my solution in my original solution in the PR for EXPLAIN. As long as you do not think they are duplicate codes, I like it. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/9955 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11974][CORE]Not all the temp dirs had b...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9951#issuecomment-159510889 What's the problem with the original approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11969] [SQL] [PYSPARK] visualization of...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/9949#issuecomment-159510739 > @zsxwing all the save and write call Java API directly, so they all work as expected. cool. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9921#issuecomment-159510356 @gatorsmile can you update the title to remove "show"? Just keep sample and join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159510029 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r45833377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -540,6 +558,18 @@ class Dataset[T] private[sql]( } } + /** + * Using inner equi-join to join this [[Dataset]] returning a [[Tuple2]] for each pair + * where `condition` evaluates to true --- End diff -- missed a period --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r45833361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -453,6 +451,22 @@ class Dataset[T] private[sql]( c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)] = selectUntyped(c1, c2, c3, c4, c5).asInstanceOf[Dataset[(U1, U2, U3, U4, U5)]] + + /** + * Returns a new [[Dataset]] by sampling a fraction of rows. + * @since 1.6.0 + */ + def sample(withReplacement: Boolean, fraction: Double, seed: Long) : Dataset[T] = +withPlan(Sample(0.0, fraction, withReplacement, seed, _)) + + /** + * Returns a new [[Dataset]] by sampling a fraction of rows, using a random seed. --- End diff -- rows -> records --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r4587 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -453,6 +451,22 @@ class Dataset[T] private[sql]( c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)] = selectUntyped(c1, c2, c3, c4, c5).asInstanceOf[Dataset[(U1, U2, U3, U4, U5)]] + --- End diff -- remove the extra line here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Adding JoinType into JoinW...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9921#discussion_r45833358 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -453,6 +451,22 @@ class Dataset[T] private[sql]( c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)] = selectUntyped(c1, c2, c3, c4, c5).asInstanceOf[Dataset[(U1, U2, U3, U4, U5)]] + + /** + * Returns a new [[Dataset]] by sampling a fraction of rows. --- End diff -- rows -> records --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9959#issuecomment-159509327 cc @cloud-fan too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9955#issuecomment-159509085 I implemented show as part of this patch: https://github.com/apache/spark/pull/9964 I just used toDF().show(). Can you close this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11981][SQL] Move implementations of met...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/9964 [SPARK-11981][SQL] Move implementations of methods back to DataFrame from Queryable Also added show methods to Dataset. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-11981 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9964.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9964 commit 4a1ca7262f4217ae126b871853181288fb85ebf6 Author: Reynold Xin Date: 2015-11-25T06:28:44Z [SPARK-11981][SQL] Move implementations of methods back to DataFrame from Queryable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10582][Yarn][Core] Fix AM failure situa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9963#issuecomment-159508637 **[Test build #46674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46674/consoleFull)** for PR 9963 at commit [`1f92d27`](https://github.com/apache/spark/commit/1f92d27f525500a907d1862b47eea156ff2aff85). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11878][SQL][WIP]: Eliminate distribute ...
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/9858#issuecomment-159507098 Thanks for the feedback! Let me take a look at the Exchange code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11963][DOC] Add docs for QuantileDiscre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9962#issuecomment-159506066 **[Test build #46673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46673/consoleFull)** for PR 9962 at commit [`b054a54`](https://github.com/apache/spark/commit/b054a54d814b64572d036857181c7c6f5e2a53a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...
Github user jameszhouyi commented on the pull request: https://github.com/apache/spark/pull/9589#issuecomment-159505641 Hi @adrian-wang , For SPARK-11972, the case passed now after applying the patch.Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10582][Yarn][Core] Fix AM failure situa...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/9963 [SPARK-10582][Yarn][Core] Fix AM failure situation for dynamic allocation Because of AM failure, the target executor number between driver and AM will be different, which will lead to unexpected behavior in dynamic allocation. So when AM is re-registered with driver, state in `ExecutorAllocationManager` and `CoarseGrainedSchedulerBacked` should be reset. This issue is originally addressed in #8737 , here re-opened again. Thanks a lot @KaiXinXiaoLei for finding this issue. @andrewor14 and @vanzin would you please help to review this, thanks a lot. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-10582 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9963.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9963 commit c272c7eb005bf443678f2cd89c6971a3f022edbd Author: jerryshao Date: 2015-11-24T09:08:00Z Fix AM failure situation for dynamic allocation commit 1f92d27f525500a907d1862b47eea156ff2aff85 Author: jerryshao Date: 2015-11-25T06:15:05Z Remove unnecessary code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9859#issuecomment-159503152 **[Test build #46672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46672/consoleFull)** for PR 9859 at commit [`9365a7f`](https://github.com/apache/spark/commit/9365a7f8766b081625ed8fe9066cef10df25b198). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9955#issuecomment-159502968 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9955#issuecomment-159502969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46662/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11970] [SQL] Add function SHOW to Datas...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9955#issuecomment-159502491 **[Test build #46662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46662/consoleFull)** for PR 9955 at commit [`a4428c3`](https://github.com/apache/spark/commit/a4428c3347c2fd65a14d9e2c5472b539ff3a2101). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9960#issuecomment-159501877 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46669/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11963][DOC] Add docs for QuantileDiscre...
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/9962 [SPARK-11963][DOC] Add docs for QuantileDiscretizer https://issues.apache.org/jira/browse/SPARK-11963 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yinxusen/spark SPARK-11963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9962 commit b054a54d814b64572d036857181c7c6f5e2a53a3 Author: Xusen Yin Date: 2015-11-25T06:08:11Z add docs for QuantileDiscretizer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9960#issuecomment-159501873 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9960#issuecomment-159501540 **[Test build #46669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46669/consoleFull)** for PR 9960 at commit [`7d09381`](https://github.com/apache/spark/commit/7d093816541b03915ae862fd1f8450f05369ce8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...
Github user woj-i commented on the pull request: https://github.com/apache/spark/pull/9859#issuecomment-159499824 @vanzin thanks for your support. I've made a commit with documentation update and clean the code as you asked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL][WIP] Unable to resolve orde...
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/9844#issuecomment-159499937 @cloud-fan Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL] Unable to resolve order by ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9961#issuecomment-159499707 **[Test build #46670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46670/consoleFull)** for PR 9961 at commit [`8ad6897`](https://github.com/apache/spark/commit/8ad6897b64c3bab3c249e79e54d2979bd672f2c3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159499474 **[Test build #46671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46671/consoleFull)** for PR 9958 at commit [`2cdec72`](https://github.com/apache/spark/commit/2cdec72e1d1b89fbf0a26c70c259e077d5ef25b7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9959#issuecomment-159499292 **[Test build #46668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46668/consoleFull)** for PR 9959 at commit [`2fb7a1c`](https://github.com/apache/spark/commit/2fb7a1cd42664c281bfc64bf584b8f762f828b4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159499108 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9960#issuecomment-159498892 **[Test build #46669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46669/consoleFull)** for PR 9960 at commit [`7d09381`](https://github.com/apache/spark/commit/7d093816541b03915ae862fd1f8450f05369ce8b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11863][SQL] Unable to resolve order by ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/9961 [SPARK-11863][SQL] Unable to resolve order by if it contains mixture of aliases and real columns this is based on https://github.com/apache/spark/pull/9844, with some bug fix and clean up. whoever merge this PR, please give the credit to @dilipbiswal You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark sort Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9961 commit 954f9194377b759fa8414c3a66fb7ee5c74640b7 Author: Dilip Biswal Date: 2015-11-04T05:23:11Z [SPARK-11863] Unable to resolve order by attributes if it contains mixture of aliases and real columns. commit ef4274a06a5859e0f233baaab4044a6d1b0e6418 Author: Dilip Biswal Date: 2015-11-20T08:11:46Z Fix test failure commit d319524be7fb8daedb72a23587182f8307afd812 Author: Dilip Biswal Date: 2015-11-24T02:28:30Z Implement code review comments commit 8c9609849a8cf3784258b11c6728f32807889c33 Author: Dilip Biswal Date: 2015-11-24T07:28:25Z fix test failure commit cbf14ffa49c456193e56946ae30181d7b6571139 Author: Dilip Biswal Date: 2015-11-24T07:36:48Z minor style commit 3d4515f67526a540ad1e551f005a1ae619f0dde7 Author: Wenchen Fan Date: 2015-11-25T05:39:01Z Merge remote-tracking branch 'origin/master' into sort commit 8ad6897b64c3bab3c249e79e54d2979bd672f2c3 Author: Wenchen Fan Date: 2015-11-25T05:49:04Z bug gix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11969] [SQL] [PYSPARK] visualization of...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/9949#issuecomment-159498690 @zsxwing all the save and write call Java API directly, so they all work as expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11140][CORE] Transfer files using netwo...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9947#issuecomment-159498446 I've merged this. @vanzin can you close the pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Remove unnecessary spaces in `include_...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/9960 [Minor] Remove unnecessary spaces in `include_example.rb` You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark minor-remove-spaces Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9960 commit 7d093816541b03915ae862fd1f8450f05369ce8b Author: Yu ISHIKAWA Date: 2015-11-25T05:46:33Z [Minor] Remove unnecessary spaces in `include_example.rb` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11973] [SQL] push filter through aggreg...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/9959 [SPARK-11973] [SQL] push filter through aggregation with alias and literals Currently, filter can't be pushed through aggregation with alias or literals, this patch fix that. After this patch, the time of TPC-DS query 4 go down to 34 seconds from 141 seconds (4x improvements). cc @nongli @yhuai You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark push_filter2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9959.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9959 commit 2fb7a1cd42664c281bfc64bf584b8f762f828b4d Author: Davies Liu Date: 2015-11-25T05:41:15Z push filter through aggregation with alias and literals --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11140][CORE] Transfer files using netwo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9947#issuecomment-159498127 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46661/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11140][CORE] Transfer files using netwo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9947#issuecomment-159498126 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11140][CORE] Transfer files using netwo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9947#issuecomment-159498056 **[Test build #46661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46661/consoleFull)** for PR 9947 at commit [`bea3fda`](https://github.com/apache/spark/commit/bea3fdaa6d801c6c8955a0896c8c27995528f247). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159497749 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9958#issuecomment-159497750 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46667/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11859][Mesos] SparkContext accepts inva...
Github user toddwan commented on the pull request: https://github.com/apache/spark/pull/9886#issuecomment-159497221 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10621][SQL] Consistent naming for funct...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9948 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9654#issuecomment-159495383 Will take a look soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9840#issuecomment-159495265 **[Test build #4 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)** for PR 9840 at commit [`6c9dc1e`](https://github.com/apache/spark/commit/6c9dc1e22fb88229247cf1bb284c06715b089da3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11979][Streaming] Empty TrackStateRDD c...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/9958 [SPARK-11979][Streaming] Empty TrackStateRDD cannot be checkpointed and recovered from checkpoint file This solves the following exception caused when empty state RDD is checkpointed and recovered. The root cause is that an empty OpenHashMapBasedStateMap cannot be deserialized as the initialCapacity is set to zero. ``` Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 20, localhost): java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity at scala.Predef$.require(Predef.scala:233) at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.(StateMap.scala:96) at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.(StateMap.scala:86) at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.readObject(StateMap.scala:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-11979 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9958 commit 2cdec72e1d1b89fbf0a26c70c259e077d5ef25b7 Author: Tathagata Das Date: 2015-11-25T05:17:48Z Fixed state map deser bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- -