[GitHub] spark pull request: [SPARK-15648][SQL]add TeradataDialect.scala
Github user lihongliustc commented on the pull request: https://github.com/apache/spark/pull/13359#issuecomment-222607185 @srowen Hi,srowen,does the PR be delayed?What should I do to it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222604482 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222604484 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59636/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222604316 **[Test build #59636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59636/consoleFull)** for PR 13392 at commit [`b2849e8`](https://github.com/apache/spark/commit/b2849e8f514c1265f7c6199aba980e95b72aa7c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222603352 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222603353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59635/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222603171 **[Test build #59635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)** for PR 13270 at commit [`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13380#issuecomment-222602457 cc @jkbradley / @mengxr are we ok with changing the API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13402#issuecomment-222601872 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13402#issuecomment-222601873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59633/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222601745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59634/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222601744 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13402#issuecomment-222601649 **[Test build #59633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59633/consoleFull)** for PR 13402 at commit [`6d614dd`](https://github.com/apache/spark/commit/6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222601564 **[Test build #59634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59634/consoleFull)** for PR 13392 at commit [`4306c4f`](https://github.com/apache/spark/commit/4306c4fe0b741689bb0ff5349506707e8a7ec520). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12850#discussion_r65127553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] { // Fold expressions that are foldable. case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType) + + // Use associative property for integral type + case e if e.isInstanceOf[BinaryArithmetic] && e.dataType.isInstanceOf[IntegralType] +=> e match { +case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, Add(b, c)) --- End diff -- Thank you for review, @cloud-fan ! I see. That sounds great. Let me think about how to eliminate all constants then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15659][SQL] Ensure FileSystem is gotten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13405#issuecomment-222600020 **[Test build #59640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59640/consoleFull)** for PR 13405 at commit [`7e01426`](https://github.com/apache/spark/commit/7e01426f98dd8f5f68a1cb6d3fa8a5d47686ac0b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15659][SQL] Ensure FileSystem is gotten...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/13405 [SPARK-15659][SQL] Ensure FileSystem is gotten from path ## What changes were proposed in this pull request? Currently `spark.sql.warehouse.dir` is pointed to local dir by default, which will throw exception when HADOOP_CONF_DIR is configured and default FS is hdfs. ``` java.lang.IllegalArgumentException: Wrong FS: file:/Users/sshao/projects/apache-spark/spark-warehouse, expected: hdfs://localhost:8020 ``` So we should always get the `FileSystem` from `Path` to avoid wrong FS problem. ## How was this patch tested? Local test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-15659 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13405 commit 7e01426f98dd8f5f68a1cb6d3fa8a5d47686ac0b Author: jerryshao Date: 2016-05-31T05:58:29Z Ensure FileSystem is gotten from path to avoid default FileSystem conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12850#discussion_r65126741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] { // Fold expressions that are foldable. case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType) + + // Use associative property for integral type + case e if e.isInstanceOf[BinaryArithmetic] && e.dataType.isInstanceOf[IntegralType] +=> e match { +case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, Add(b, c)) --- End diff -- what about `a + 1 + b + 2`? I think we need a more general approach, like reordering the `Add` nodes to put all literals together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13370 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222597618 **[Test build #59639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59639/consoleFull)** for PR 13404 at commit [`d94b2f6`](https://github.com/apache/spark/commit/d94b2f66c2da1d2cd7f7638b6cde0a2b7b354149). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13370#issuecomment-222597634 Thanks - merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13370#discussion_r65126050 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql]( * Internal helper function for building typed aggregations that return tuples. For simplicity * and code reuse, we do this without the help of the type system and then use helper functions * that cast appropriately for the user facing interface. - * TODO: does not handle aggregations that return nonflat results, */ + // TODO: does not handle aggregations that return nonflat results. --- End diff -- cool i will remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222597423 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222597425 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59638/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222597411 **[Test build #59638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59638/consoleFull)** for PR 13404 at commit [`94f666a`](https://github.com/apache/spark/commit/94f666a54c4865ec2d915ae1a7250506aa836faf). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13404#discussion_r65125923 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,5 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function --- End diff -- This is just removing one ending space and adding one blank line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222596540 **[Test build #59638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59638/consoleFull)** for PR 13404 at commit [`94f666a`](https://github.com/apache/spark/commit/94f666a54c4865ec2d915ae1a7250506aa836faf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222596477 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222596478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59631/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOC Fix description of FilterFun...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13404 [MINOR][CORE][DOC Fix description of FilterFunction ## What changes were proposed in this pull request? This PR fixes the wrong description of `FilterFunction`. ``` - * If the function returns true, the element is discarded in the returned Dataset. + * If the function returns true, the element is included in the returned Dataset. ``` ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark minor_fix_java_api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13404 commit 94f666a54c4865ec2d915ae1a7250506aa836faf Author: Dongjoon Hyun Date: 2016-05-31T05:31:39Z [MINOR][CORE] Fix description of FilterFunction --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222596375 **[Test build #59631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59631/consoleFull)** for PR 12836 at commit [`7b5767a`](https://github.com/apache/spark/commit/7b5767ad25aaa1f091c4b2d22d7a99cf3d8ec00b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as de...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13368#issuecomment-222596139 cc @yhuai @davies , do you still remember why we promote string to decimal(38, 18) instead of double? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222595900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59630/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222595899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222595798 **[Test build #59630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59630/consoleFull)** for PR 12836 at commit [`a0425c1`](https://github.com/apache/spark/commit/a0425c17906fcd2ea1d8dd6fb33c0fd8a860d4a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13370#issuecomment-222595668 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13370#discussion_r65125196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -204,8 +205,8 @@ class KeyValueGroupedDataset[K, V] private[sql]( * Internal helper function for building typed aggregations that return tuples. For simplicity * and code reuse, we do this without the help of the type system and then use helper functions * that cast appropriately for the user facing interface. - * TODO: does not handle aggregations that return nonflat results, */ + // TODO: does not handle aggregations that return nonflat results. --- End diff -- I'm pretty sure this TODO is already done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13403#issuecomment-222595435 **[Test build #59637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59637/consoleFull)** for PR 13403 at commit [`3fe0cb6`](https://github.com/apache/spark/commit/3fe0cb6024ba44b1645bc74f1fbe29267571caa0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13370#discussion_r65125156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) extends Logging { case _ => UnresolvedAttribute.quotedString(name) }) + override def toString: String = usePrettyExpression(expr).sql + + override def equals(that: Any): Boolean = that match { +case that: Column => that.expr.equals(this.expr) --- End diff -- oh sorry you just move them up, it's fine to keep them same as before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15638][SQL] Audit Dataset, SparkSession...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13370#discussion_r65125096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -132,6 +130,15 @@ class Column(protected[sql] val expr: Expression) extends Logging { case _ => UnresolvedAttribute.quotedString(name) }) + override def toString: String = usePrettyExpression(expr).sql + + override def equals(that: Any): Boolean = that match { +case that: Column => that.expr.equals(this.expr) --- End diff -- how about `that.expr.semanticEquals(this.expr)`? One column equals to another if they always produce same result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13403 [SPARK-15660][CORE] RDD and Dataset should show the consistent values for variance/stdev. ## What changes were proposed in this pull request? In Spark-11490, `variance/stdev` are redefined as the **sample** `variance/stdev` instead of population ones. This PR addresses the only remaining legacy in RDD. This may cause breaking changes, but we had better be consistent in Spark 2.0 if possible. This PR also `popVariance` and `popStdev` functions. ```scala scala> val rdd = sc.parallelize(Seq(1.0, 2.0, 3.0)) rdd: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[0] at parallelize at :24 scala> rdd.stdev res0: Double = 0.816496580927726 scala> rdd.toDS().describe().show() 16/05/30 22:20:12 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/05/30 22:20:12 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException +---+-+ |summary|value| +---+-+ | count|3| | mean| 2.0| | stddev| 1.0| |min| 1.0| |max| 3.0| +---+-+ ``` ## How was this patch tested? Pass the updated Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15660 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13403.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13403 commit 3fe0cb6024ba44b1645bc74f1fbe29267571caa0 Author: Dongjoon Hyun Date: 2016-05-31T05:22:16Z [SPARK-15660][CORE] RDD and Dataset should show the consistent value for variance/stdev. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-22259 **[Test build #59636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59636/consoleFull)** for PR 13392 at commit [`b2849e8`](https://github.com/apache/spark/commit/b2849e8f514c1265f7c6199aba980e95b72aa7c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65124178 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { } } + test("MAX_CASES_BRANCHES") { +import testImplicits._ + +val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES) +try { + withTable("tab1") { +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) --- End diff -- Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65124192 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { } } + test("MAX_CASES_BRANCHES") { +import testImplicits._ + +val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES) +try { + withTable("tab1") { +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write + .saveAsTable("tab1") + +val sql_one_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 END FROM tab1" +val sql_two_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 ELSE 0 END FROM tab1" + +spark.conf.set(SQLConf.MAX_CASES_BRANCHES.key, "0") --- End diff -- Yeah, will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13401#issuecomment-222593249 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13401#issuecomment-222593250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59629/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13401#issuecomment-222593159 **[Test build #59629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59629/consoleFull)** for PR 13401 at commit [`b6c1a5f`](https://github.com/apache/spark/commit/b6c1a5fc6013b643ae39aad32224d08d71b63e00). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ValidateExternalType(child: Expression, expected: DataType)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222592978 **[Test build #59635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)** for PR 13270 at commit [`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13351#issuecomment-222592731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59628/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13351#issuecomment-222592730 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13351#issuecomment-222592645 **[Test build #59628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59628/consoleFull)** for PR 13351 at commit [`a0ae62e`](https://github.com/apache/spark/commit/a0ae62eaf7ecc19565695da68d3b42cc4aac8f09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65123693 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { } } + test("MAX_CASES_BRANCHES") { +import testImplicits._ + +val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES) +try { + withTable("tab1") { +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write + .saveAsTable("tab1") + +val sql_one_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 END FROM tab1" +val sql_two_branch_caseWhen = "SELECT CASE WHEN a = 1 THEN 1 ELSE 0 END FROM tab1" + +spark.conf.set(SQLConf.MAX_CASES_BRANCHES.key, "0") --- End diff -- how about: ``` withTable { spark.range(10) val oneBranchCaseWhen = val twoBranchCaseWhen = ... withConf { assert(...) } withConf { assert(...) } withConf { assert(...) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222592448 LGTM except some style comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222592244 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59632/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222592243 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65123624 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -219,4 +220,41 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { } } + test("MAX_CASES_BRANCHES") { +import testImplicits._ + +val original = spark.conf.get(SQLConf.MAX_CASES_BRANCHES) +try { + withTable("tab1") { +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) --- End diff -- I think we only need `a`? or just `spark.range(10).write.saveAsTable`, then we can use `id` in the case when --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222592234 **[Test build #59632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)** for PR 13270 at commit [`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13402#issuecomment-222592066 cc @yhuai @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222592083 **[Test build #59634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59634/consoleFull)** for PR 13392 at commit [`4306c4f`](https://github.com/apache/spark/commit/4306c4fe0b741689bb0ff5349506707e8a7ec520). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13402#issuecomment-222592077 **[Test build #59633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59633/consoleFull)** for PR 13402 at commit [`6d614dd`](https://github.com/apache/spark/commit/6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15658][SQL] UDT serializer should decla...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13402 [SPARK-15658][SQL] UDT serializer should declare its data type as udt instead of udt.sqlType ## What changes were proposed in this pull request? When we build serializer for UDT object, we should declare its data type as udt instead of udt.sqlType, or if we deserialize it again, we lose the information that it's a udt object and throw analysis exception. ## How was this patch tested? new test in `UserDefiendTypeSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark udt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13402.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13402 commit 6d614dd3ae4d5ee97083ae99ea527a7c5eaa9f0a Author: Wenchen Fan Date: 2016-05-31T04:48:39Z UDT serializer should declare its data type as udt instead of udt.sqlType --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13270#issuecomment-222591642 **[Test build #59632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)** for PR 13270 at commit [`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13270#discussion_r65123201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -212,11 +212,46 @@ class SessionCatalog( * If no such database is specified, create it in the current database. */ def createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit = { -val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) -val table = formatTableName(tableDefinition.identifier.table) +val tableId = tableDefinition.identifier +val db = formatDatabaseName(tableId.database.getOrElse(getCurrentDatabase)) +val table = formatTableName(tableId.table) val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) requireDbExists(db) -externalCatalog.createTable(db, newTableDefinition, ignoreIfExists) + +if ( + // If this is an external data source table... + tableDefinition.properties.contains("spark.sql.sources.provider") && + newTableDefinition.tableType == CatalogTableType.EXTERNAL && + // ... that is not persisted as Hive compatible format (external tables in Hive compatible + // format always set `locationUri` to the actual data location and should NOT be hacked as + // following.) + tableDefinition.storage.locationUri.isEmpty +) { + // !! HACK ALERT !! + // + // Due to a restriction of Hive metastore, here we have to set `locationUri` to a temporary + // directory that doesn't exist yet but can definitely be successfully created, and then + // delete it right after creating the external data source table. This location will be + // persisted to Hive metastore as standard Hive table location URI, but Spark SQL doesn't + // really use it. Also, since we only do this workaround for external tables, deleting the + // directory after the fact doesn't do any harm. + // + // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 for more details. + + val tempPath = +new Path(defaultTablePath(tableId).stripSuffix(Path.SEPARATOR) + "-__PLACEHOLDER__") + + try { +externalCatalog.createTable( + db, + newTableDefinition.withNewStorage(locationUri = Some(tempPath.toString)), + ignoreIfExists) + } finally { +FileSystem.get(tempPath.toUri, hadoopConf).delete(tempPath, true) + } +} else { + externalCatalog.createTable(db, newTableDefinition, ignoreIfExists) +} --- End diff -- Added these changes here mostly because `HiveExternalCatalog` doesn't have access to the Hadoop configuration, which is used to instantiate the `FileSystem` instance. Added an extra constructor argument to `HiveExternalCatalog` and moved this change there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65122900 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -17,13 +17,36 @@ package org.apache.spark.sql.internal +import org.scalatest.BeforeAndAfterAll + import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext} +import org.apache.spark.sql.execution.WholeStageCodegenExec import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext} -class SQLConfSuite extends QueryTest with SharedSQLContext { +class SQLConfSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll { + import testImplicits._ + private val testKey = "test.key.0" private val testVal = "test.val.0" + override def beforeAll() { +super.beforeAll() +sql("DROP TABLE IF EXISTS testData") +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write + .saveAsTable("testData") --- End diff -- Sure, will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...
Github user sitalkedia commented on the pull request: https://github.com/apache/spark/pull/12436#issuecomment-222590636 @kayousterhout - Sure, I will resolve the conflicts. Can you take a cursory look at the diff and let me know if the approach is reasonable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65122821 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -17,13 +17,36 @@ package org.apache.spark.sql.internal +import org.scalatest.BeforeAndAfterAll + import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext} +import org.apache.spark.sql.execution.WholeStageCodegenExec import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext} -class SQLConfSuite extends QueryTest with SharedSQLContext { +class SQLConfSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll { + import testImplicits._ + private val testKey = "test.key.0" private val testVal = "test.val.0" + override def beforeAll() { +super.beforeAll() +sql("DROP TABLE IF EXISTS testData") +spark + .range(10) + .select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write + .saveAsTable("testData") --- End diff -- instead of creating this table in `beforeAll`, can we create it just in the test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222589679 Thanks @kayousterhout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15655] [SQL] Fix Wrong Partition Column...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13400#discussion_r65122095 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1537,6 +1537,35 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { assert(fs.exists(path), "This is an external table, so the data should not have been dropped") } + test("select partitioned table") { +sql( + s""" + |CREATE TABLE table_with_partition(c1 string) + |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 string) --- End diff -- There are multiple related test cases in `InsertIntoHiveTableSuite`. It has more than one bugs in this statement. For example, below is a common mistake users might make: ``` hive> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (data string, part string); FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns ``` Currently, we return a confusing error message: ``` org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.); ``` Try to submit another PR to detect these user errors and output a understandable error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65121914 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -937,9 +937,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { */ case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { -case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen => +case e: CaseWhen if canCodeGen(e) => e.toCodegen() --- End diff -- Sure, will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65121902 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -17,13 +17,36 @@ package org.apache.spark.sql.internal +import org.scalatest.BeforeAndAfterAll + import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext} +import org.apache.spark.sql.execution.WholeStageCodegenExec import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext} -class SQLConfSuite extends QueryTest with SharedSQLContext { +class SQLConfSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll { --- End diff -- Initially, I tried it. The behavior is controlled by the configuration `SQLConf.MAX_CASES_BRANCHES`. However, I am not sure how to change the default conf value of `SQLConf.MAX_CASES_BRANCHES` in `OptimizeCodegenSuite`. At the same time, we do not have a test case to verify the configuration `MAX_CASES_BRANCHES`. That is why I added the test case here. Let me know if you have any idea. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222586936 **[Test build #59631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59631/consoleFull)** for PR 12836 at commit [`7b5767a`](https://github.com/apache/spark/commit/7b5767ad25aaa1f091c4b2d22d7a99cf3d8ec00b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120888 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15655] [SQL] Fix Wrong Partition Column...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13400#discussion_r65120902 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1537,6 +1537,35 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { assert(fs.exists(path), "This is an external table, so the data should not have been dropped") } + test("select partitioned table") { +sql( + s""" + |CREATE TABLE table_with_partition(c1 string) + |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 string) --- End diff -- I'm surprised we support this hive style syntax, cc @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120891 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input +#' grouping column. +#' +#' @param x A SparkDataFrame +#' @param func A function to be applied to each group partition specified by grouping +#' column of the SparkDataFrame. +#' The output of func is a local R data.frame. +#' @param schema The schema of the resulting SparkDataFrame after the function is applied. +#' It must match the output of func. +#' @family SparkDataFrame functions +#' @rdname gapply +#' @name gapply +#' @export +#' @examples +#' +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' sqlContext, --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65120800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -17,13 +17,36 @@ package org.apache.spark.sql.internal +import org.scalatest.BeforeAndAfterAll + import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext} +import org.apache.spark.sql.execution.WholeStageCodegenExec import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext} -class SQLConfSuite extends QueryTest with SharedSQLContext { +class SQLConfSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll { --- End diff -- I'm not sure if this is the proper suite to test this, how about OptimizeCodegenSuite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120675 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input +#' grouping column. +#' +#' @param x A SparkDataFrame +#' @param func A function to be applied to each group partition specified by grouping +#' column of the SparkDataFrame. +#' The output of func is a local R data.frame. +#' @param schema The schema of the resulting SparkDataFrame after the function is applied. +#' It must match the output of func. +#' @family SparkDataFrame functions +#' @rdname gapply +#' @name gapply +#' @export +#' @examples +#' +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' sqlContext, +#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), +#' c("a", "b", "c", "d")) +#' +#' schema <- structType(structField("a", "integer"), structField("c", "string"), +#' structField("avg", "double")) +#' df1 <- gapply( +#' df, +#' list("a", "c"), +#' function(x) { +#' y <- data.frame(x$a[1], x$c[1], mean(x$b), stringsAsFactors = FALSE) +#' }, +#' schema) +#' collect(df1) +#' +#' Result +#' -- +#' a c avg +#' 3 3 3.0 +#' 1 1 1.5 +#' +#' Fits linear models on iris dataset by grouping on the 'Species' column and +#' using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length' +#' and 'Petal_Width' as training features. +#' +#' df <- createDataFrame (sqlContext, iris) --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120665 --- Diff: R/pkg/R/deserialize.R --- @@ -197,6 +197,31 @@ readMultipleObjects <- function(inputCon) { data # this is a list of named lists now } +readMultipleObjectsWithKeys <- function(inputCon) { + # readMultipleObjectsWithKeys will read multiple continuous objects from + # a DataOutputStream. There is no preceding field telling the count + # of the objects, so the number of objects varies, we try to read + # all objects in a loop until the end of the stream. The rows in --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120671 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input +#' grouping column. +#' +#' @param x A SparkDataFrame +#' @param func A function to be applied to each group partition specified by grouping +#' column of the SparkDataFrame. +#' The output of func is a local R data.frame. +#' @param schema The schema of the resulting SparkDataFrame after the function is applied. +#' It must match the output of func. +#' @family SparkDataFrame functions +#' @rdname gapply +#' @name gapply +#' @export +#' @examples +#' +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' sqlContext, +#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), +#' c("a", "b", "c", "d")) +#' +#' schema <- structType(structField("a", "integer"), structField("c", "string"), +#' structField("avg", "double")) +#' df1 <- gapply( +#' df, +#' list("a", "c"), +#' function(x) { +#' y <- data.frame(x$a[1], x$c[1], mean(x$b), stringsAsFactors = FALSE) +#' }, +#' schema) +#' collect(df1) +#' +#' Result +#' -- +#' a c avg +#' 3 3 3.0 +#' 1 1 1.5 +#' +#' Fits linear models on iris dataset by grouping on the 'Species' column and +#' using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length' +#' and 'Petal_Width' as training features. +#' +#' df <- createDataFrame (sqlContext, iris) +#' schema <- structType(structField("(Intercept)", "double"), +#' structField("Sepal_Width", "double"),structField("Petal_Length", "double"), +#' structField("Petal_Width", "double")) +#' df1 <- gapply( +#' df, +#' list(df$"Species"), +#' function(x) { +#' m <- suppressWarnings(lm(Sepal_Length ~ +#' Sepal_Width + Petal_Length + Petal_Width, x)) +#' data.frame(t(coef(m))) +#' }, schema) +#' collect(df1) +#' +#'Result +#'- +#' Model (Intercept) Sepal_Width Petal_Length Petal_Width +#' 10.6998830.33033700.9455356-0.1697527 +#' 21.8955400.38685760.9083370-0.6792238 +#' 32.3518900.65483500.2375602 0.2521257 +#' +#'} +setMethod("gapply", + signature(x = "SparkDataFrame"), + function(x, col, func, schema) { --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120653 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,54 @@ createMethods <- function() { } createMethods() + +#' gapply +#' +#' Applies a R function to each group in the input GroupedData +#' +#' @param x a GroupedData +#' @return a SparkDataFrame +#' @rdname gapply +#' @name gapply +#' @family agg_funcs --- End diff -- removed "agg func" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120620 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,54 @@ createMethods <- function() { } createMethods() + +#' gapply +#' +#' Applies a R function to each group in the input GroupedData +#' +#' @param x a GroupedData +#' @return a SparkDataFrame +#' @rdname gapply +#' @name gapply +#' @family agg_funcs +#' @examples +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' sqlContext, --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120599 --- Diff: R/pkg/inst/worker/worker.R --- @@ -84,67 +84,78 @@ broadcastElap <- elapsedSecs() # as number of partitions to create. numPartitions <- SparkR:::readInt(inputCon) -isDataFrame <- as.logical(SparkR:::readInt(inputCon)) +# 0 - RDD mode, 1 - dapply mode, 2 - gapply mode +mode <- SparkR:::readInt(inputCon) -# If isDataFrame, then read column names -if (isDataFrame) { +# If DataFrame - mode = 1 and mode = 2, then read column names +if (mode > 0) { --- End diff -- I ended up leaving mode as is. I also think that one variable is less confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120565 --- Diff: R/pkg/inst/worker/worker.R --- @@ -84,67 +84,78 @@ broadcastElap <- elapsedSecs() # as number of partitions to create. numPartitions <- SparkR:::readInt(inputCon) -isDataFrame <- as.logical(SparkR:::readInt(inputCon)) +# 0 - RDD mode, 1 - dapply mode, 2 - gapply mode +mode <- SparkR:::readInt(inputCon) -# If isDataFrame, then read column names -if (isDataFrame) { +# If DataFrame - mode = 1 and mode = 2, then read column names +if (mode > 0) { colNames <- SparkR:::readObject(inputCon) + if (mode == 2) { +key <- SparkR:::readObject(inputCon) + } } isEmpty <- SparkR:::readInt(inputCon) if (isEmpty != 0) { - if (numPartitions == -1) { if (deserializer == "byte") { # Now read as many characters as described in funcLen - data <- SparkR:::readDeserialize(inputCon) + dataList <- list(SparkR:::readDeserialize(inputCon)) } else if (deserializer == "string") { - data <- as.list(readLines(inputCon)) -} else if (deserializer == "row") { - data <- SparkR:::readMultipleObjects(inputCon) + dataList <- list(as.list(readLines(inputCon))) +} else if (deserializer == "row" && mode == 2) { + dataList <- SparkR:::readMultipleObjectsWithKeys(inputCon) +} else if (deserializer == "row"){ + dataList <- list(SparkR:::readMultipleObjects(inputCon)) } # Timing reading input data for execution inputElap <- elapsedSecs() - -if (isDataFrame) { - if (deserializer == "row") { -# Transform the list of rows into a data.frame -# Note that the optional argument stringsAsFactors for rbind is -# available since R 3.2.4. So we set the global option here. -oldOpt <- getOption("stringsAsFactors") -options(stringsAsFactors = FALSE) -data <- do.call(rbind.data.frame, data) -options(stringsAsFactors = oldOpt) - -names(data) <- colNames +for (i in 1:length(dataList)) { --- End diff -- done! I called it computeHelper, thought compute might be too generic for this specific use case. I can still rename it to compute if you think that it's a better name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120457 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -379,6 +383,50 @@ class RelationalGroupedDataset protected[sql]( def pivot(pivotColumn: String, values: java.util.List[Any]): RelationalGroupedDataset = { pivot(pivotColumn, values.asScala) } + + /** + * Applies the given serialized R function `func` to each group of data. For each unique group, + * the function will be passed the group key and an iterator that contains all of the elements in + * the group. The function can return an iterator containing elements of an arbitrary type which + * will be returned as a new [[DataFrame]]. + * + * This function does not support partial aggregation, and as a result requires shuffling all + * the data in the [[Dataset]]. If an application intends to perform an aggregation over each + * key, it is best to use the reduce function or an + * [[org.apache.spark.sql.expressions#Aggregator Aggregator]]. + * + * Internally, the implementation will spill to disk if any given group is too large to fit into + * memory. However, users must take care to avoid materializing the whole iterator for a group + * (for example, by calling `toList`) unless they are sure that this is possible given the memory + * constraints of their cluster. + * + * @since 2.0.0 + */ + def flatMapGroupsInR( --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2011,6 +2011,25 @@ class Dataset[T] private[sql]( } /** + * Returns a new [[DataFrame]] which contains the aggregated result of applying --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13392#discussion_r65120442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -937,9 +937,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { */ case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { -case e @ CaseWhen(branches, _) if branches.size < conf.maxCaseBranchesForCodegen => +case e: CaseWhen if canCodeGen(e) => e.toCodegen() --- End diff -- nit: this can fit in the previous line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -325,6 +330,80 @@ case class MapGroupsExec( } /** + * Groups the input rows together and calls the R function with each group and an iterator + * containing all elements in the group. + * The result of this function is flattened before being output. + */ +case class FlatMapGroupsInRExec( +func: Array[Byte], +packageNames: Array[Byte], +broadcastVars: Array[Broadcast[Object]], +inputSchema: StructType, +outputSchema: StructType, +keyDeserializer: Expression, +valueDeserializer: Expression, +groupingAttributes: Seq[Attribute], +dataAttributes: Seq[Attribute], +outputObjAttr: Attribute, +child: SparkPlan) extends UnaryExecNode with ObjectProducerExec { + + override def output: Seq[Attribute] = outputObjAttr :: Nil + override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr) + + override def requiredChildDistribution: Seq[Distribution] = +ClusteredDistribution(groupingAttributes) :: Nil + + override def requiredChildOrdering: Seq[Seq[SortOrder]] = +Seq(groupingAttributes.map(SortOrder(_, Ascending))) + + override protected def doExecute(): RDD[InternalRow] = { +val isDeserializedRData = + if (outputSchema == SERIALIZED_R_DATA_SCHEMA) true else false +val serializerForR = if (!isDeserializedRData) { + SerializationFormats.ROW +} else { + SerializationFormats.BYTE +} +val (deserializerForR, colNames) = + (SerializationFormats.ROW, inputSchema.fieldNames) + +child.execute().mapPartitionsInternal { iter => + val grouped = GroupedIterator(iter, groupingAttributes, child.output) + val getKey = ObjectOperator.deserializeRowToObject(keyDeserializer, groupingAttributes) + val getValue = ObjectOperator.deserializeRowToObject(valueDeserializer, dataAttributes) + val outputObject = ObjectOperator.wrapObjectToRow(outputObjAttr.dataType) + val groupNames = groupingAttributes.map(_.name).toArray + + val runner = new RRunner[Array[Byte]]( +func, deserializerForR, serializerForR, packageNames, broadcastVars, +isDataFrame = true, colNames = colNames, key = groupNames) + + val hasGroups = grouped.hasNext --- End diff -- Did some refactoring! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13401#issuecomment-222586062 **[Test build #59629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59629/consoleFull)** for PR 13401 at commit [`b6c1a5f`](https://github.com/apache/spark/commit/b6c1a5fc6013b643ae39aad32224d08d71b63e00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-222586065 **[Test build #59630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59630/consoleFull)** for PR 12836 at commit [`a0425c1`](https://github.com/apache/spark/commit/a0425c17906fcd2ea1d8dd6fb33c0fd8a860d4a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120403 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input +#' grouping column. +#' +#' @param x A SparkDataFrame +#' @param func A function to be applied to each group partition specified by grouping --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65120399 --- Diff: R/pkg/R/DataFrame.R --- @@ -1268,6 +1268,82 @@ setMethod("dapplyCollect", ldf }) +#' gapply +#' +#' Apply a R function to each group of a DataFrame. The group is defined by an input +#' grouping column. +#' +#' @param x A SparkDataFrame +#' @param func A function to be applied to each group partition specified by grouping +#' column of the SparkDataFrame. +#' The output of func is a local R data.frame. +#' @param schema The schema of the resulting SparkDataFrame after the function is applied. +#' It must match the output of func. +#' @family SparkDataFrame functions +#' @rdname gapply +#' @name gapply +#' @export +#' @examples +#' +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' sqlContext, +#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), +#' c("a", "b", "c", "d")) +#' +#' schema <- structType(structField("a", "integer"), structField("c", "string"), +#' structField("avg", "double")) +#' df1 <- gapply( +#' df, +#' list("a", "c"), +#' function(x) { --- End diff -- done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13401#issuecomment-222585795 cc @marmbrus @yhuai @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15657][SQL] RowEncoder should validate ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13401 [SPARK-15657][SQL] RowEncoder should validate the data type of input object ## What changes were proposed in this pull request? This PR improves the error handling of `RowEncoder`. When we create a `RowEncoder` with a given schema, we should validate the data type of input object. e.g. we should throw an exception when a field is boolean but is declared as a string column. This PR also removes the support to use `Product` as a valid external type of struct type. This support is added at https://github.com/apache/spark/pull/9712, but is incomplete, e.g. nested product, product in array are both not working. However, we never officially support this feature and I think it's ok to ban it. ## How was this patch tested? new tests in `RowEncoderSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13401.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13401 commit 7b30e7030d14c99a42f7b1e23c9953c9bfbdb536 Author: Wenchen Fan Date: 2016-05-31T03:21:12Z validates input data type in RowEncoder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13147#issuecomment-222585528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13147#issuecomment-222585527 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13147#issuecomment-222585447 **[Test build #59627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59627/consoleFull)** for PR 13147 at commit [`254381d`](https://github.com/apache/spark/commit/254381d245cabf3cbad57f7ab06eec155ae79d96). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13351#issuecomment-222583398 **[Test build #59628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59628/consoleFull)** for PR 13351 at commit [`a0ae62e`](https://github.com/apache/spark/commit/a0ae62eaf7ecc19565695da68d3b42cc4aac8f09). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15601][CORE] CircularBuffer's toString(...
Github user tejasapatil commented on the pull request: https://github.com/apache/spark/pull/13351#issuecomment-222583109 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6320][SQL] Move planLater method into G...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13147#issuecomment-222577825 **[Test build #59627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59627/consoleFull)** for PR 13147 at commit [`254381d`](https://github.com/apache/spark/commit/254381d245cabf3cbad57f7ab06eec155ae79d96). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222574008 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222574005 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org