[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57077090 I tested for the time out issue, https://github.com/apache/spark/pull/1689 lead to this issue, but have not found the root cause --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user epahomov commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57077001 Sorry for such messy pull request, I didn't review my student code close enough. Would try my best next time. We'll fix everything by the middle of the week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57076884 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20928/consoleFull) for PR 2514 at commit [`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]` * `class AvroWrapperToJavaConverter extends Converter[Any, Any] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57076888 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20928/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-57076438 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull) for PR 1290 at commit [`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57076212 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20929/consoleFull) for PR 2542 at commit [`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2561#issuecomment-57076163 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...
GitHub user saucam opened a pull request: https://github.com/apache/spark/pull/2561 SPARK-3711: Optimize where in clause filter queries The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list. You can merge this pull request into a Git repository by running: $ git pull https://github.com/saucam/spark branch-1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2561 commit bee98aadcea7cb8fa6402d72af45aef2a4de8c19 Author: Yash Datta Date: 2014-09-28T05:54:49Z SPARK-3711: Optimize where in clause filter queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57076000 Hi Matei, thanks a lot for your suggestions. I've updated the code with fixed seed. Would you mind taking a look at this? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] remaining cleanup work.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2560#issuecomment-57075980 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20927/consoleFull) for PR 2560 at commit [`9eff95a`](https://github.com/apache/spark/commit/9eff95afe6051b264854b415b5d305dc9e4bf3ef). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2514#issuecomment-57075989 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20928/consoleFull) for PR 2514 at commit [`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] remaining cleanup work.
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/2560 [SPARK-3543] remaining cleanup work. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark TaskContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2560 commit 9eff95afe6051b264854b415b5d305dc9e4bf3ef Author: Reynold Xin Date: 2014-09-28T05:43:57Z [SPARK-3543] remaining cleanup work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2533 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18127631 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import akka.actor.{Address, ActorRef} + +/** + * Grouping of data that is accessed by a CourseGrainedScheduler. This class --- End diff -- Course -> Coarse. I will fix it during merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2533#issuecomment-57075299 Thanks. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57075143 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/172/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3389] Add Converter for ease of Parquet...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2256 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57074968 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20926/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57074970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20926/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3389] Add Converter for ease of Parquet...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2256#issuecomment-57074954 Alright, merged it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2216#issuecomment-57074623 I really wish that we could convert JavaDStreamLike / JavaRDDLike into abstract base classes instead of traits, since there's no particular reason why they should be implemented as traits (it's an unfortunate carry-over from an earlier Java API design prototype that we didn't wind up using and which nobody caught and removed before 1.0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57074476 @tianyi CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCT, k INT); SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k; But hive can resolve this as @liancheng said. What's the magic here for the `ON` statement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-57074480 It's probably easiest to move the accumulator update to TaskSetManager or to the part of DAGScheduler that reports the result to the user. It's right below the current update in the code: ``` if (!job.finished(rt.outputId)) { job.finished(rt.outputId) = true ... ``` That happens only once per task, so it's a good place to do the update for ResultTask. For ShuffleMapTask you can do it in the corresponding match statement as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-57074398 I can simply monitor the accumulator update in TaskSetManager, just not sure if that can maximumly resolve the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-57074388 the drawbacks for us not to de-duplicate in shuffle stage is that, it makes accumulator usage to be very tricky... it sounds like you are not encouraged to use accumulator in a transformation, especially when the involved stage is shared by multiple jobs or your cluster is not that stable for adding flag, just provide flexibility for the user to choose whether they would like to accept duplicate update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57074276 @cloud-fan, I think it is reasonable for return "ambiguous references" in the case you mentioned, because we can't make sure whether 'a' is a table alias or column name. In my last commits, spark will return "ambiguous references" for your case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-57074233 Basically it would be great to get a really simple patch that *only* fixes SPARK-3628 and adds no new data structures in DAGScheduler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-57074221 Let's not de-duplicate in shuffle stages please. That complicates the patch a lot and I'm not sure why people would necessarily use it. Also, why did you add a duplicate flag to Accumulator? IMO we shouldn't expose this as an option. Again it adds complexity in what should just be a bug fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57073173 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/172/consoleFull) for PR 2542 at commit [`a018641`](https://github.com/apache/spark/commit/a018641924fd60ebf54c05990a10001cf9a65a0c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57073139 @liancheng Hmm..I don't have a hive environment for test... CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCT, k INT); SELECT a.x FROM t1 a JOIN t2 b; Without this PR, spark sql will report ambiguousReferences, but how hive resolve `a.x`? "table a, column x" or "table b, column a.x"? Or do we have to add the `ON` statement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57073055 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20925/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57072994 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20926/consoleFull) for PR 2542 at commit [`a018641`](https://github.com/apache/spark/commit/a018641924fd60ebf54c05990a10001cf9a65a0c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18127069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression { /** * Returns the value of fields in the Struct `child`. */ -case class GetField(child: Expression, fieldName: String) extends UnaryExpression { +case class GetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression { type EvaluatedType = Any def dataType = field.dataType override def nullable = child.nullable || field.nullable override def foldable = child.foldable - protected def structType = child.dataType match { -case s: StructType => s -case otherType => sys.error(s"GetField is not valid on fields of type $otherType") - } - - lazy val field = -structType.fields -.find(_.name == fieldName) -.getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}")) - - lazy val ordinal = structType.fields.indexOf(field) - - override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[StructType] --- End diff -- Currently all `GetField`s are resolved as I try to resolve them in the `ResolveGetField` rule. If it can't be resolved(field not exists in Strct etc.), the rule will throw Exception. That's why I removed the `resolved` field. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18127040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -366,7 +366,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { case base ~ _ ~ ordinal => GetItem(base, ordinal) } | (expression <~ ".") ~ ident ^^ { - case base ~ fieldName => GetField(base, fieldName) + case base ~ fieldName => UnresolvedGetField(base, fieldName) --- End diff -- Actually I need this rule always take action as I want to support another type of `GetField` like `GetFieldFromStruct`, `GetFieldFromArray`, etc. Anyway I think `UnresolvedGetField` is not necessary for this PR, I will use `resolved` field instead and put this into related PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18126983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression { /** * Returns the value of fields in the Struct `child`. */ -case class GetField(child: Expression, fieldName: String) extends UnaryExpression { +case class GetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression { type EvaluatedType = Any def dataType = field.dataType override def nullable = child.nullable || field.nullable override def foldable = child.foldable - protected def structType = child.dataType match { -case s: StructType => s -case otherType => sys.error(s"GetField is not valid on fields of type $otherType") - } - - lazy val field = -structType.fields -.find(_.name == fieldName) -.getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}")) - - lazy val ordinal = structType.fields.indexOf(field) - - override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[StructType] - override def eval(input: Row): Any = { val baseValue = child.eval(input).asInstanceOf[Row] if (baseValue == null) null else baseValue(ordinal) } - override def toString = s"$child.$fieldName" + override def toString = s"$child.${field.name}" +} + +object GetField { --- End diff -- I was going to put this logic into Analyzer rule, but found some tests depend on `GetField(child, fieldName)`, so I have to create this constructor of `GetField`. And these two are so similar, so I combine them together. Maybe I should fix those tests instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57072352 BTW the reason why abstract classes are favored over traits is because traits (as with many advanced scala features) with default impls complicate a lot of things, especially when it comes to java and binary compatibility. I guess in this case it might be ok because Spillable is not a public thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57072266 Are we going to use this in multiple unrelated classes? As far as I can tell, this is only used for collections ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57072024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20924/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57072022 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20924/consoleFull) for PR 2416 at commit [`cf8be9a`](https://github.com/apache/spark/commit/cf8be9a59f1dbca3d0dcfbd973c3858b6fa50d50). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r18126862 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -220,20 +220,52 @@ trait HiveTypeCoercion { case a: BinaryArithmetic if a.right.dataType == StringType => a.makeCopy(Array(a.left, Cast(a.right, DoubleType))) + // we should cast all timestamp/date/string compare into string compare, + // even if both sides are of same type, as Hive use xxxwritable to compare. --- End diff -- The native `compareTo` of both `java.sql.Date` point to `java.util.Date`, which compares the millis since epoch, but `DateWritable` compares days since epoch, so here is the gap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...
Github user jameszhouyi commented on the pull request: https://github.com/apache/spark/pull/2444#issuecomment-57071944 Hi @pwendell , After this commit, for spark-perf will complain 'not found slaves' when run ./bin/run... so have to modify from slaves.template to slaves manually ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57071901 Hi, @marmbrus . The current codes still have some bugs to fix, I talked @liancheng yesterday, I will push a update later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user ezhulenev commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57071591 @sjbrunst aargh, TwitterStreamSuite.scala:53 requred to add count parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2509#discussion_r18126747 --- Diff: sbin/spark-daemon.sh --- @@ -142,8 +142,12 @@ case $startStop in spark_rotate_log "$log" echo starting $command, logging to $log -cd "$SPARK_PREFIX" -nohup nice -n $SPARK_NICENESS "$SPARK_PREFIX"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & +if [ $option == spark-submit ]; then + nohup nice -n $SPARK_NICENESS "$SPARK_PREFIX"/bin/spark-submit --class $command \ --- End diff -- Hm, that's fair. I'd use `export` in `sbin/start-thriftserver.sh` to fix this issue (exported environment variables are accessible in bash subprocesses): ```bash export SUBMIT_USAGE_FUNCTION=usage exec "$FWDIR"/sbin/spark-daemon.sh spark-submit $CLASS 1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2509#discussion_r18126738 --- Diff: sbin/spark-daemon.sh --- @@ -142,8 +142,12 @@ case $startStop in spark_rotate_log "$log" echo starting $command, logging to $log -cd "$SPARK_PREFIX" -nohup nice -n $SPARK_NICENESS "$SPARK_PREFIX"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & +if [ $option == spark-submit ]; then + nohup nice -n $SPARK_NICENESS "$SPARK_PREFIX"/bin/spark-submit --class $command \ --- End diff -- Hm, that's fair. I'd use `export` in `sbin/start-thriftserver.sh` before `exec` to fix this issue (exported environment variables can be accessed by bash subprocesses): ```bash export SUBMIT_USAGE_FUNCTION=usage exec "$FWDIR"/bin/spark-submit --class $CLASS "${SUBMISSION_OPTS[@]}" spark-internal "${APPLICATION_OPTS[@]}" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r18126726 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -220,20 +220,52 @@ trait HiveTypeCoercion { case a: BinaryArithmetic if a.right.dataType == StringType => a.makeCopy(Array(a.left, Cast(a.right, DoubleType))) + // we should cast all timestamp/date/string compare into string compare, + // even if both sides are of same type, as Hive use xxxwritable to compare. --- End diff -- I considered this question again and now think when comparing same types, it is better not to convert to string but write `compareTo` methods for them, since the native `compareTo` method of `java.sql.date` seems not consistent with `DateWritable`. I'll do a quick follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57070773 @rxin Thanks for your feedback. I agree with almost all of your comments and made the appropriate changes. However, I don't think it should be an abstract class. According to Programming in Scala: _To trait or not to trait_, > If it might be reused in multiple, unrelated classes, make it a trait. Only traits can be mixed into different parts of the class hierarchy. `Spillable` seems to fit that criteria. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2761 refactor #maybeSpill into Spillable
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2416#issuecomment-57070772 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20924/consoleFull) for PR 2416 at commit [`cf8be9a`](https://github.com/apache/spark/commit/cf8be9a59f1dbca3d0dcfbd973c3858b6fa50d50). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126678 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -149,13 +144,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A // Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers( -executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) +executorDataMap.map{ case(id, executorData) => + new WorkerOffer( id, executorData.executorHost, executorData.freeCores)}.toSeq)) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126677 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -149,13 +144,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A // Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers( -executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) +executorDataMap.map{ case(id, executorData) => --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126669 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -179,25 +176,22 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A } } else { - freeCores(task.executorId) -= scheduler.CPUS_PER_TASK - executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask)) + val executorInfo = executorDataMap(task.executorId) + executorInfo.freeCores -= scheduler.CPUS_PER_TASK + executorInfo.executorActor ! LaunchTask(new SerializableBuffer(serializedTask)) } } } // Remove a disconnected slave from the cluster def removeExecutor(executorId: String, reason: String) { - if (executorActor.contains(executorId)) { -logInfo("Executor " + executorId + " disconnected, so removing it") -val numCores = totalCores(executorId) -executorActor -= executorId -executorHost -= executorId -addressToExecutorId -= executorAddress(executorId) -executorAddress -= executorId -totalCores -= executorId -freeCores -= executorId -totalCoreCount.addAndGet(-numCores) -scheduler.executorLost(executorId, SlaveLost(reason)) + executorDataMap.get(executorId) match { +case Some(executorInfo) => + val numCores = executorInfo.totalCores --- End diff -- expression inlined --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126667 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -297,6 +291,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A } } + --- End diff -- removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126665 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -85,16 +79,14 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A def receiveWithLogging = { case RegisterExecutor(executorId, hostPort, cores) => Utils.checkHostPort(hostPort, "Host port expected " + hostPort) -if (executorActor.contains(executorId)) { +if (executorDataMap.contains(executorId)) { sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId) } else { logInfo("Registered executor: " + sender + " with ID " + executorId) sender ! RegisteredExecutor - executorActor(executorId) = sender - executorHost(executorId) = Utils.parseHostPort(hostPort)._1 - totalCores(executorId) = cores - freeCores(executorId) = cores - executorAddress(executorId) = sender.path.address + executorDataMap.put(executorId, new ExecutorData(sender, sender.path.address, --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126663 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -126,8 +120,8 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A case StopExecutors => logInfo("Asking each executor to shut down") -for (executor <- executorActor.values) { - executor ! StopExecutor +for ((_,executorData) <- executorDataMap) { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126658 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import akka.actor.{Address, ActorRef} + +private[cluster] class ExecutorData( + var executorActor: ActorRef, --- End diff -- Good point - All but freeCores changed to vals --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126545 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -104,13 +96,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A case StatusUpdate(executorId, taskId, state, data) => scheduler.statusUpdate(taskId, state, data.value) if (TaskState.isFinished(state)) { - if (executorActor.contains(executorId)) { -freeCores(executorId) += scheduler.CPUS_PER_TASK -makeOffers(executorId) - } else { -// Ignoring the update since we don't know about the executor. -val msg = "Ignored task status update (%d state %s) from unknown executor %s with ID %s" -logWarning(msg.format(taskId, state, sender, executorId)) + executorDataMap.get(executorId) match { +case Some(executorInfo) => + executorInfo.freeCores += scheduler.CPUS_PER_TASK + makeOffers(executorId) +case None => + // Ignoring the update since we don't know about the executor. + val msg = "Ignored task status update (%d state %s) " + --- End diff -- Done and replaced format with a 's' interpolated string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126510 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import akka.actor.{Address, ActorRef} + +private[cluster] class ExecutorData( + var executorActor: ActorRef, + var executorAddress: Address, + var executorHost: String , + var freeCores: Int, + var totalCores: Int +) {} --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-CORE [SPARK-3651] Group common CoarseGra...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/2533#discussion_r18126499 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import akka.actor.{Address, ActorRef} + +private[cluster] class ExecutorData( --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57069344 @marmbrus , it seems all PR of SQL tests timed out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57068065 Hm this exclusion might not work in the case that a class is changed to an interface. Maybe just also add the specific recommended exclusion here: ``` ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.scheduler.TaskLocation") ``` Once this passes tests LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Clean up Java TaskContext impleme...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2557#issuecomment-57067407 @rxin The [block that sets this message](https://github.com/apache/spark/blob/5b922bb458e863f5be0ae68167de882743f70b86/dev/run-tests-jenkins#L89) is driven by an environment variable set outside the scope of the Jenkins script in this repo, so that must be mis-set somehow. Perhaps it's related to some work @JoshRosen said he was going to be doing with the AMPLab team this weekend to fix the double-posting of test result messages to GitHub? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Clean up Java TaskContext impleme...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2557 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Clean up Java TaskContext impleme...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2557#issuecomment-57066811 Ok merging this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3543] Clean up Java TaskContext impleme...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2557#issuecomment-57066786 @nchammas any idea why it says does not merge cleanly even though it does? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3704][SQL]ColumnValue types not match i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2551#issuecomment-57066017 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20923/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57066008 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20922/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3704][SQL]ColumnValue types not match i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2551#issuecomment-57066016 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20923/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57066007 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20922/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57065881 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20921/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57065879 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20921/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57065586 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/170/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2547#issuecomment-57064275 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/171/consoleFull) for PR 2547 at commit [`d5e431f`](https://github.com/apache/spark/commit/d5e431f0a1b9047a5afc27cb371dbfb7014fb6e0). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3681] [SQL] [PySpark] fix serialization...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2526 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3681] [SQL] [PySpark] fix serialization...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2526#issuecomment-57062871 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2543#issuecomment-57062687 Thanks for working on this! A few minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18125297 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression { /** * Returns the value of fields in the Struct `child`. */ -case class GetField(child: Expression, fieldName: String) extends UnaryExpression { +case class GetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression { type EvaluatedType = Any def dataType = field.dataType override def nullable = child.nullable || field.nullable override def foldable = child.foldable - protected def structType = child.dataType match { -case s: StructType => s -case otherType => sys.error(s"GetField is not valid on fields of type $otherType") - } - - lazy val field = -structType.fields -.find(_.name == fieldName) -.getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}")) - - lazy val ordinal = structType.fields.indexOf(field) - - override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[StructType] --- End diff -- Here we can check to see if the field actually exists in Struct, otherwise `resolved = false` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3704][SQL]ColumnValue types not match i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2551#issuecomment-57062670 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20923/consoleFull) for PR 2551 at commit [`08bcc59`](https://github.com/apache/spark/commit/08bcc5965fc17ac0e797fb501e815b71e5b2b64e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18125293 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -118,6 +119,19 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool } /** + * Replaces [[UnresolvedGetField]]s with concrete [[GetField]] + */ + object ResolveGetField extends Rule[LogicalPlan] { --- End diff -- Rules aren't that much overhead. I think this is good :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18125291 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -366,7 +366,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { case base ~ _ ~ ordinal => GetItem(base, ordinal) } | (expression <~ ".") ~ ident ^^ { - case base ~ fieldName => GetField(base, fieldName) + case base ~ fieldName => UnresolvedGetField(base, fieldName) --- End diff -- Instead of creating a new type of `GetField`, can we just use the `resolved` field in the existing one determine when the rule needs to take action? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57062663 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20922/consoleFull) for PR 2552 at commit [`453d892`](https://github.com/apache/spark/commit/453d892242cdacebf383bc3a2d61c351ad0b8c37). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18125290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression { /** * Returns the value of fields in the Struct `child`. */ -case class GetField(child: Expression, fieldName: String) extends UnaryExpression { +case class GetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression { type EvaluatedType = Any def dataType = field.dataType override def nullable = child.nullable || field.nullable override def foldable = child.foldable - protected def structType = child.dataType match { -case s: StructType => s -case otherType => sys.error(s"GetField is not valid on fields of type $otherType") - } - - lazy val field = -structType.fields -.find(_.name == fieldName) -.getOrElse(sys.error(s"No such field $fieldName in ${child.dataType}")) - - lazy val ordinal = structType.fields.indexOf(field) - - override lazy val resolved = childrenResolved && child.dataType.isInstanceOf[StructType] - override def eval(input: Row): Any = { val baseValue = child.eval(input).asInstanceOf[Row] if (baseValue == null) null else baseValue(ordinal) } - override def toString = s"$child.$fieldName" + override def toString = s"$child.${field.name}" +} + +object GetField { --- End diff -- If possible, I think it might be clearer to keep the resolver logic in the Analyzer rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3680][SQL] Fix bug caused by eager typi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2525 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3704][SQL]ColumnValue types not match i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2551#issuecomment-57062542 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3676][SQL]spark sql hive test suite fai...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2552#issuecomment-57062534 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57062530 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20921/consoleFull) for PR 2542 at commit [`252dbbb`](https://github.com/apache/spark/commit/252dbbbdaab653a94aa784873ac362b4422494e1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3676][SQL]spark sql hive test suite fai...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2517#issuecomment-57062523 Thanks! I've merged this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2547#issuecomment-57062442 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/171/consoleFull) for PR 2547 at commit [`d5e431f`](https://github.com/apache/spark/commit/d5e431f0a1b9047a5afc27cb371dbfb7014fb6e0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57062370 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2344#issuecomment-57062345 Sorry for the delay, this week has been very busy! I'd like to merge this soon, only one small question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r18125250 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -220,20 +220,52 @@ trait HiveTypeCoercion { case a: BinaryArithmetic if a.right.dataType == StringType => a.makeCopy(Array(a.left, Cast(a.right, DoubleType))) + // we should cast all timestamp/date/string compare into string compare, + // even if both sides are of same type, as Hive use xxxwritable to compare. --- End diff -- Can you explain this more? It seems more expensive to convert to strings and then compare strings instead of just comparing the underlying types. What does writables have to do with this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2559#issuecomment-57062261 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/170/consoleFull) for PR 2559 at commit [`199a85d`](https://github.com/apache/spark/commit/199a85d2e7ef482f3c0ac9cacc4dbeb2a21d5901). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57060931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20920/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57060929 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20920/consoleFull) for PR 1717 at commit [`4b5b09d`](https://github.com/apache/spark/commit/4b5b09d5a70a120ebd8f9f13ea3ba77611d06b10). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class BoundingBox(west: Double, south: Double, east: Double, north: Double)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2216#issuecomment-57060863 Yeah so I looked into it a bit more and since `JavaDStream` extends `JavaDStreamLike` this will break user code with custom DStream's. The issue is that under the hood those user classes have been compiled to implement an interface called `JavaDStreamLike` and older ones won't have the forwarder method in the interface. In this case I think there is a straightforward workaround of just adding `print(num)`. To the concrete classes `JavaDStream` and `JavaPairDStream`. It will have some code re-use both with `print` and with each other, but it will work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2216#issuecomment-57060715 Ah I see - I thought this was an abstract class instead of a trait being modified in this patch. This is not an error with the compatibility checker - it's a legitimate break. Because of the way traits work in Scala, you cannot add a new method even if it has a default implementation. It's more like an interface in that regard. For this reason we usually try to avoid traits for public-facing things that could be implemented as abstract classes. However, it will only break if someone outside of Spark has written a class that extends this trait directly or indirectly. @JoshRosen is the design here that this trait should ever be used outside of Spark? You can look on Slide 10 here to see why: http://www.slideshare.net/mircodotta/managing-binary-compatibility-in-scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3478] [PySpark] Profile the Python task...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2556#issuecomment-57060226 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/169/consoleFull) for PR 2556 at commit [`e68df5a`](https://github.com/apache/spark/commit/e68df5a2ada0044f76d748f4e5dd250a1928812b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57059419 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20920/consoleFull) for PR 1717 at commit [`4b5b09d`](https://github.com/apache/spark/commit/4b5b09d5a70a120ebd8f9f13ea3ba77611d06b10). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user sjbrunst commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57059353 @ezhulenev I've rolled back those changes now. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...
Github user ezhulenev commented on the pull request: https://github.com/apache/spark/pull/1717#issuecomment-57059126 @sjbrunst you need to rollback your changes in TwitterAlgebirdCMD & TwitterAlgebirdHLL (remove Nil for locations), and after that project will compile and I should pass all tests. I tried it locally but didn't commit to my repo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org