[GitHub] spark pull request: [SPARK-14495][SQL] fix resolution failure of h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12974#issuecomment-217609077 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14495][SQL] fix resolution failure of h...
GitHub user xwu0226 opened a pull request: https://github.com/apache/spark/pull/12974 [SPARK-14495][SQL] fix resolution failure of having clause with distinct aggregate function Symptom: In the latest **branch 1.6**, when a `DISTINCT` aggregation function is used in the `HAVING` clause, Analyzer throws `AnalysisException` with a message like following: ``` resolved attribute(s) gid#558,id#559 missing from date#554,id#555 in operator !Expand [List(date#554, null, 0, if ((gid#558 = 1)) id#559 else null),List(date#554, id#555, 1, null)], [date#554,id#561,gid#560,if ((gid = 1)) id else null#562]; ``` Root cause: The problem is that the distinct aggregate in having condition are resolved by the rule `DistinctAggregationRewriter` twice, which messes up the resulted `EXPAND` operator. In a `ResolveAggregateFunctions` rule, when resolving ```Filter(havingCondition, _: Aggregate)```, the `havingCondition` is resolved as an `Aggregate` in a nested loop of analyzer rule execution (by invoking `RuleExecutor.execute`). At this nested level of analysis, the rule `DistinctAggregationRewriter` rewrites this distinct aggregate clause to an expanded two-layer aggregation, where the `aggregateExpresssions` of the final `Aggregate` contains the resolved `gid` and the aggregate expression attributes (In the above case, they are `gid#558, id#559`). After completion of the nested analyzer rule execution, the resulted `aggregateExpressions` in the `havingCondition` is pushed down into the underlying `Aggregate` operator. The `DistinctAggregationRewriter` rule is executed again. The `projections` field of `EXPAND` operator is populated with the `aggregateExpressions` of the `havingCondition` mentioned above. However, the attributes (In the above case, they are `gid#558, id#559`) in the projection list of `EXPAND` operator can not be found in the underlying relation. Solution: This PR retrofits part of [#11579](https://github.com/apache/spark/pull/11579) that moves the `DistinctAggregationRewriter` to the beginning of Optimizer, so that it guarantees that the rewrite only happens after all the aggregate functions are resolved first. Thus, it avoid resolution failure. This PR also removes the unnecessary SQLConf property `spark.sql.specializeSingleDistinctAggPlanning` due to the above change. @cloud-fan @yhuai How is the PR change tested New [test cases ](https://github.com/xwu0226/spark/blob/f73428f94746d6d074baf6702589545bdbd11cad/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala#L927-L988) are added to drive `DistinctAggregationRewriter` rewrites for multi-distinct aggregations , involving having clause. A following up PR will be submitted to add these test cases to master(2.0) branch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xwu0226/spark SPARK-14495_review Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12974.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12974 commit c51448d1173739dd592895b0902ab61d66da499d Author: xin WuDate: 2016-05-05T14:50:37Z move DistinctAggregateRewrite rule to optimizer commit f73428f94746d6d074baf6702589545bdbd11cad Author: xin Wu Date: 2016-05-07T02:23:30Z modify testcases and remove property spark.sql.specializeSingleDistinctAggPlanning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217608493 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217608492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][Python][Wrong implementation for...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10356#issuecomment-217608460 @JoshRosen Sorry, it was my mistake. It seems it works as expected and not an issue. ```python row = Row("f1", "f2") >>> row(1, 2) Row(f1=1, f2=2) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217608468 **[Test build #58055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58055/consoleFull)** for PR 12871 at commit [`4d400ca`](https://github.com/apache/spark/commit/4d400ca5e5d0b184540f8e91188daeaa1303ff5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12377][PySpark] Missing argments i...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/12973 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12377][PySpark] Missing argments i...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217608440 Oh I see. The usage was as below and it seems the correct behaviour ``` row = Row("f1", "f2") >>> row(1, 2) Row(f1=1, f2=2) ``` Sorry, closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user lw-lin commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217608390 I've addressed comments and expanded tests; @zsxwing would you mind taking another look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58053/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608339 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608311 **[Test build #58053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217608278 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58052/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217608277 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217608247 **[Test build #58052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58052/consoleFull)** for PR 12970 at commit [`d1a6374`](https://github.com/apache/spark/commit/d1a6374e2ca7870527b73e14a9077b929124eec7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12893#issuecomment-217608216 Thank you very much ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12377][PySpark] Missing argments i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217608052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58058/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12377][PySpark] Missing argments i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217608051 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12377][PySpark] Missing argments i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217608050 **[Test build #58058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58058/consoleFull)** for PR 12973 at commit [`378f195`](https://github.com/apache/spark/commit/378f19588e6cc12d454c5b9eb1a24ae0b169d0e9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217607981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58056/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217607980 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217607965 **[Test build #58056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58056/consoleFull)** for PR 12971 at commit [`b717479`](https://github.com/apache/spark/commit/b7174798c323a1fdd112ca21413442e1a89500ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217607854 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58057/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217607853 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217607848 **[Test build #58057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58057/consoleFull)** for PR 12973 at commit [`cf1fd05`](https://github.com/apache/spark/commit/cf1fd05c681de68db159968622cb379c4718f197). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217607328 **[Test build #58058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58058/consoleFull)** for PR 12973 at commit [`378f195`](https://github.com/apache/spark/commit/378f19588e6cc12d454c5b9eb1a24ae0b169d0e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217606678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217606677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217606648 **[Test build #58054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58054/consoleFull)** for PR 12725 at commit [`a72423b`](https://github.com/apache/spark/commit/a72423b5aab05189c56707897fc638f4c49a3c06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217606617 I submitted this PR because the author is not answering (for about four months). If the author answers or this PR is problematic, I am happy to close this. @JoshRosen Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217606626 **[Test build #58057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58057/consoleFull)** for PR 12973 at commit [`cf1fd05`](https://github.com/apache/spark/commit/cf1fd05c681de68db159968622cb379c4718f197). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12973#issuecomment-217606585 I submitted this PR because the author is not answering (for about four months). If the author answers or this PR is problematic, I am happy to close this. @JoshRosen Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][Python][Wrong implementation for...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10356#issuecomment-217606556 I submitted a PR for this in https://github.com/apache/spark/pull/10356 because I guess the author is not answering for about four months. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12377][PySpark] Missing argments in imp...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12973 [SPARK-12377][PySpark] Missing argments in implementation for Row.__call__ in PySpark ## What changes were proposed in this pull request? This PR corrects the implementation of `Row.__call__` so that let the object acts like class. ## How was this patch tested? Unittests in `pyspark/sql/types.py`. Closes #10356 You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark pr/10356 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12973.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12973 commit bbc38ba690bed3ba777817b0dda15b51cbd031f2 Author: somideshmukhDate: 2015-12-17T09:48:54Z [SPARK-12377][Python][Wrong implementation for Row.__call__ in pyspark] commit 2b3e9b91a6d8a887998c556a5c30e1385efe6089 Author: somideshmukh Date: 2016-01-14T12:02:04Z [SPARK-12377][Python][Wrong implementation for Row.__call__ in pyspark,Added Regression Testing] commit 18108fa7a5494eafd802fd685204d429b1549aae Author: hyukjinkwon Date: 2016-05-07T03:15:22Z Fetch upstream commit cf1fd05c681de68db159968622cb379c4718f197 Author: hyukjinkwon Date: 2016-05-07T04:40:25Z Object-like access of PySpark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12971#discussion_r62411488 --- Diff: mllib/src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java --- @@ -21,6 +21,8 @@ import java.util.HashMap; import java.util.List; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.mllib.linalg.Vector; --- End diff -- I guess we need to reorder imports (See https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217605838 In case of writing, I think ``` Row("", "null", null) ``` should produce the CSV as below: 1. With the option, `nullValue` set to `"null"`, I think ```csv ,null,null ``` 2. Without any options, I think ```csv ,null, ``` 3. With the option, `nullValue` set to `""`, I think ```csv ,null, ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217605693 **[Test build #58056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58056/consoleFull)** for PR 12971 at commit [`b717479`](https://github.com/apache/spark/commit/b7174798c323a1fdd112ca21413442e1a89500ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12954 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217605160 Here is what I think CSV datasource should handle `""`, empty string and `nullValue`. With the option, `nullValue` set to `"null"`, I think ```csv ,"","null" ``` should produce the records as below: ``` Row(null, "", null) ``` Would this make sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605162 LGTM, Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217605135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58049/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217605134 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217605108 **[Test build #58049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58049/consoleFull)** for PR 12972 at commit [`f225fe5`](https://github.com/apache/spark/commit/f225fe598b21a44543226196fd0447117e7a34fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605013 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58047/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217604986 @rxin @sureshthalamati Do you mind holding off this change until #12921 is merged? That PR also handles `nullValue`. Apparently, I guess `nullValue` could affect this behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r62411095 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -555,4 +558,37 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { assert(numbers.count() == 8) } + + test("load data with empty quoted string fields.") { +val results = sqlContext + .read + .format("csv") + .options(Map( +"header" -> "true", +"nullValue" -> null, --- End diff -- Could I ask what happen if we don't set `nullValue`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217604963 **[Test build #58047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58047/consoleFull)** for PR 12954 at commit [`e28bbb6`](https://github.com/apache/spark/commit/e28bbb6c90aef0a17caab5db8072327fcf93e59d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14642][SQL] import org.apache.spark.sql...
Github user sbcd90 commented on the pull request: https://github.com/apache/spark/pull/12458#issuecomment-217604922 Hello @zsxwing , I have resolved the conflicts. Please have a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217604019 **[Test build #58055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58055/consoleFull)** for PR 12871 at commit [`4d400ca`](https://github.com/apache/spark/commit/4d400ca5e5d0b184540f8e91188daeaa1303ff5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217602686 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-217603358 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-217603504 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217602678 **[Test build #58048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58048/consoleFull)** for PR 12971 at commit [`b0ce8d9`](https://github.com/apache/spark/commit/b0ce8d97ae48e19622aa26ae52ff0600212c8e25). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-217603319 **[Test build #58046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58046/consoleFull)** for PR 12268 at commit [`f2234e3`](https://github.com/apache/spark/commit/f2234e3f7bac02c396a8638f69baab740bc83bb1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NoSuchPermanentFunctionException(db: String, func: String)` * `class NoSuchFunctionException(db: String, func: String)` * `case class GetExternalRowField(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-217603463 **[Test build #58045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58045/consoleFull)** for PR 12921 at commit [`1233bd7`](https://github.com/apache/spark/commit/1233bd7ce9b70aa984cc3c77ca11e1dc455e3e7e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NoSuchPermanentFunctionException(db: String, func: String)` * `class NoSuchFunctionException(db: String, func: String)` * `case class GetExternalRowField(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-217603359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58046/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62410825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -617,6 +618,77 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Propagate foldable expressions: + * Replace all attributes with aliases of the original foldable expressions except the followings. + * 1) Command and Set(UNION/INTERSECT/EXCEPT): Do not optimize. --- End diff -- What will happen if we optimize them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62410849 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -617,6 +618,77 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Propagate foldable expressions: + * Replace all attributes with aliases of the original foldable expressions except the followings. + * 1) Command and Set(UNION/INTERSECT/EXCEPT): Do not optimize. + * 2) Filter/Sort: Use the original foldable expressions without aliases. --- End diff -- instead of doing this, can't we always use alias and run `CleanupAlias` for the result? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-217603505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58045/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10943#issuecomment-217603519 cc @rxin , looks like we missed this one... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217601995 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12966#issuecomment-217602308 **[Test build #58050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58050/consoleFull)** for PR 12966 at commit [`bf3a74d`](https://github.com/apache/spark/commit/bf3a74d34b21eaa6c3d1422c1135658d9be58a8a). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/12725#discussion_r62410545 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -27,12 +27,12 @@ import org.apache.spark.sql.execution.{QueryExecution, SparkPlan, SparkPlanner, * A variant of [[QueryExecution]] that allows the execution of the given [[LogicalPlan]] * plan incrementally. Possibly preserving state in between each execution. */ -class IncrementalExecution( +class IncrementalExecution private[sql]( sparkSession: SparkSession, logicalPlan: LogicalPlan, outputMode: OutputMode, checkpointLocation: String, -currentBatchId: Long) +val currentBatchId: Long) --- End diff -- expose this to tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12966#issuecomment-217602310 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12966#issuecomment-217602311 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58050/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217602687 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58048/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217601994 **[Test build #58051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58051/consoleFull)** for PR 12725 at commit [`d4cd47a`](https://github.com/apache/spark/commit/d4cd47a07bfb395deee0461d0b43be0424110379). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217601996 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58051/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217602287 **[Test build #58053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/12725#discussion_r62410548 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -122,7 +122,7 @@ class StreamExecution( * processing is done. Thus, the Nth record in this log indicated data that is currently being * processed and the N-1th entry indicates which offsets have been durably committed to the sink. */ - private val offsetLog = + private[sql] val offsetLog = --- End diff -- expose this to test suits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217602755 **[Test build #58054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58054/consoleFull)** for PR 12725 at commit [`a72423b`](https://github.com/apache/spark/commit/a72423b5aab05189c56707897fc638f4c49a3c06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217602286 **[Test build #58052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58052/consoleFull)** for PR 12970 at commit [`d1a6374`](https://github.com/apache/spark/commit/d1a6374e2ca7870527b73e14a9077b929124eec7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12893#issuecomment-217602652 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12893 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11317#issuecomment-217602140 @RussellSpitzer I saw you answered my ping before. Excuse my ping here again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12953#issuecomment-217602079 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12953 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14942][SQL][Streaming] Reduce delay bet...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12725#issuecomment-217601944 **[Test build #58051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58051/consoleFull)** for PR 12725 at commit [`d4cd47a`](https://github.com/apache/spark/commit/d4cd47a07bfb395deee0461d0b43be0424110379). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12949#discussion_r62410521 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -239,8 +239,13 @@ case class DataSource( } } - /** Create a resolved [[BaseRelation]] that can be used to read data from this [[DataSource]] */ - def resolveRelation(): BaseRelation = { + /** + * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this + * [[DataSource]] + * + * @param checkPathExist A flag to indicate whether to check the existence of path or not. + */ + def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { --- End diff -- When we wanna read it. But for creating datasource table, the path does not exist and we should skip this check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12966#issuecomment-217601769 **[Test build #58050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58050/consoleFull)** for PR 12966 at commit [`bf3a74d`](https://github.com/apache/spark/commit/bf3a74d34b21eaa6c3d1422c1135658d9be58a8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217601777 **[Test build #58049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58049/consoleFull)** for PR 12972 at commit [`f225fe5`](https://github.com/apache/spark/commit/f225fe598b21a44543226196fd0447117e7a34fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14654][CORE] New accumulator API
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12612#discussion_r62410489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -19,200 +19,106 @@ package org.apache.spark.sql.execution.metric import java.text.NumberFormat -import org.apache.spark.{Accumulable, AccumulableParam, Accumulators, SparkContext} +import org.apache.spark.{NewAccumulator, SparkContext} import org.apache.spark.scheduler.AccumulableInfo import org.apache.spark.util.Utils -/** - * Create a layer for specialized metric. We cannot add `@specialized` to - * `Accumulable/AccumulableParam` because it will break Java source compatibility. - * - * An implementation of SQLMetric should override `+=` and `add` to avoid boxing. - */ -private[sql] abstract class SQLMetric[R <: SQLMetricValue[T], T]( -name: String, -val param: SQLMetricParam[R, T]) extends Accumulable[R, T](param.zero, param, Some(name)) { - // Provide special identifier as metadata so we can tell that this is a `SQLMetric` later - override def toInfo(update: Option[Any], value: Option[Any]): AccumulableInfo = { -new AccumulableInfo(id, Some(name), update, value, true, countFailedValues, - Some(SQLMetrics.ACCUM_IDENTIFIER)) - } - - def reset(): Unit = { -this.value = param.zero - } -} - -/** - * Create a layer for specialized metric. We cannot add `@specialized` to - * `Accumulable/AccumulableParam` because it will break Java source compatibility. - */ -private[sql] trait SQLMetricParam[R <: SQLMetricValue[T], T] extends AccumulableParam[R, T] { - - /** - * A function that defines how we aggregate the final accumulator results among all tasks, - * and represent it in string for a SQL physical operator. - */ - val stringValue: Seq[T] => String - - def zero: R -} +class SQLMetric(val metricType: String, initValue: Long = 0L) extends NewAccumulator[Long, Long] { + // This is a workaround for SPARK-11013. + // We may use -1 as initial value of the accumulator, if the accumulator is valid, we will + // update it at the end of task and the value will be at least 0. Then we can filter out the -1 + // values before calculate max, min, etc. + private[this] var _value = initValue -/** - * Create a layer for specialized metric. We cannot add `@specialized` to - * `Accumulable/AccumulableParam` because it will break Java source compatibility. - */ -private[sql] trait SQLMetricValue[T] extends Serializable { + override def copyAndReset(): SQLMetric = new SQLMetric(metricType, initValue) - def value: T - - override def toString: String = value.toString -} - -/** - * A wrapper of Long to avoid boxing and unboxing when using Accumulator - */ -private[sql] class LongSQLMetricValue(private var _value : Long) extends SQLMetricValue[Long] { - - def add(incr: Long): LongSQLMetricValue = { -_value += incr -this + override def merge(other: NewAccumulator[Long, Long]): Unit = other match { +case o: SQLMetric => _value += o.localValue +case _ => throw new UnsupportedOperationException( + s"Cannot merge ${this.getClass.getName} with ${other.getClass.getName}") } - // Although there is a boxing here, it's fine because it's only called in SQLListener - override def value: Long = _value - - // Needed for SQLListenerSuite - override def equals(other: Any): Boolean = other match { -case o: LongSQLMetricValue => value == o.value -case _ => false - } + override def isZero(): Boolean = _value == initValue - override def hashCode(): Int = _value.hashCode() -} + override def add(v: Long): Unit = _value += v -/** - * A specialized long Accumulable to avoid boxing and unboxing when using Accumulator's - * `+=` and `add`. - */ -private[sql] class LongSQLMetric private[metric](name: String, param: LongSQLMetricParam) - extends SQLMetric[LongSQLMetricValue, Long](name, param) { + def +=(v: Long): Unit = _value += v - override def +=(term: Long): Unit = { -localValue.add(term) - } + override def localValue: Long = _value - override def add(term: Long): Unit = { -localValue.add(term) + // Provide special identifier as metadata so we can tell that this is a `SQLMetric` later + private[spark] override def toInfo(update: Option[Any], value: Option[Any]): AccumulableInfo = { +new AccumulableInfo(id, name, update, value, true, true, Some(SQLMetrics.ACCUM_IDENTIFIER)) } -} - -private class LongSQLMetricParam(val
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217601711 Let me please cc @liancheng and also @tedyu who suggested this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12972 [SPARK-15198][SQL] Support for pushing down filters for boolean types in ORC data source ## What changes were proposed in this pull request? This PR adds the support for pushing filters down for `BooleanType` in ORC data source. This PR also removes `OrcTableScan` class and the companion object, which is not used anymore. ## How was this patch tested? Unittest in `OrcFilterSuite` and `OrcQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15198 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12972.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12972 commit ba78ce645b5bb6a7a6e07c2ffd8a1bc0bad6c55f Author: hyukjinkwonDate: 2016-05-07T02:51:09Z Support for filter push down for boolean types in ORC commit 7f99feb16d0ec83f85f78be2eb0b977c5f4cf9b0 Author: hyukjinkwon Date: 2016-05-07T02:57:40Z Remove OrcTableScan which is not used anymore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12899#discussion_r62410420 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -155,7 +155,13 @@ private[spark] abstract class Task[T]( */ def collectAccumulatorUpdates(taskFailed: Boolean = false): Seq[AccumulatorV2[_, _]] = { if (context != null) { - context.taskMetrics.accumulators().filter { a => !taskFailed || a.countFailedValues } + context.taskMetrics.internalAccums.filter { a => +// RESULT_SIZE accumulator is always zero at executor, we need to send it back as its +// value will be updated at driver side. +!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE) + // zero value external accumulators may still be useful, e.g. SQLMetrics, we should not filter --- End diff -- There are 2 concepts: 1. internal accumulators: like GCtime, resultSize, which are internal to DAGScheduler. 2. `countFailedValues` accumulator: `countFailedValues` is an internal flag that can only be set by us. All internal accumulators are `countFailedValues` accumulators, and SQLMetrics are also `countFailedValues` accumulators. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12971#issuecomment-217601538 **[Test build #58048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58048/consoleFull)** for PR 12971 at commit [`b0ce8d9`](https://github.com/apache/spark/commit/b0ce8d97ae48e19622aa26ae52ff0600212c8e25). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/12971 [SPARK-14814][MLlib] API: Java compatibility, docs ## What changes were proposed in this pull request? fix a java compatibility function in mllib DecisionTreeModel ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark javacompatibility Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12971.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12971 commit b0ce8d97ae48e19622aa26ae52ff0600212c8e25 Author: Yuhao YangDate: 2016-05-07T02:19:46Z java compatibility --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217600276 **[Test build #58047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58047/consoleFull)** for PR 12954 at commit [`e28bbb6`](https://github.com/apache/spark/commit/e28bbb6c90aef0a17caab5db8072327fcf93e59d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12113 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12113#issuecomment-21767 Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- import it into analyzer as temporary fix, we could think of a proper way later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- Ok, what do you suggest then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-217599175 **[Test build #58046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58046/consoleFull)** for PR 12268 at commit [`f2234e3`](https://github.com/apache/spark/commit/f2234e3f7bac02c396a8638f69baab740bc83bb1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-217598946 **[Test build #58045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58045/consoleFull)** for PR 12921 at commit [`1233bd7`](https://github.com/apache/spark/commit/1233bd7ce9b70aa984cc3c77ca11e1dc455e3e7e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...
Github user clockfly commented on the pull request: https://github.com/apache/spark/pull/12947#issuecomment-217598830 @yhuai This PR truncate the long path by 100 chars https://github.com/apache/spark/pull/12947/files#diff-4b3d7a5ee80fb01203fcd345c073ae46R186 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- Moving this one into analysis is even worse ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15197][WIP][Docs] Added Scaladoc for co...
Github user ntietz commented on the pull request: https://github.com/apache/spark/pull/12955#issuecomment-217598813 I created JIRA SPARK-15197. I think that I covered the expanded scope in there, but if I goofed and missed something feel free to update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217597674 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58044/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unnecessary things from SparkEnv
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12970#issuecomment-217597666 **[Test build #58044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58044/consoleFull)** for PR 12970 at commit [`9bcd09d`](https://github.com/apache/spark/commit/9bcd09d0198b943e7c4634b2b269b4d3c1c8a1c1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org