[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77763275 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- Since everything in in `inputData` is a list this goes straight to the top of hierarchy- same as if you called `rbind(list1, list2, ...)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14960 Yeap, I quickly fixed and re-ran :). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77763061 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- Ah I see - the types are inside the `listmatrix`. Thanks @clarkfitzg for clarifying. Let us know once you have added the test for a single column of raw as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77762907 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } /** + * Extracts the [[StructField]] with the given name recursively. + * + * @throws IllegalArgumentException if the parent field's type is not StructType + */ + def getFieldRecursively(name: String): StructField = { --- End diff -- I think there's another way to solve this problem, maybe generate the final structType in FileSourceStrategy better.I'll try it and give another pr later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14960 seems to fail to build: ``` [INFO] Compiling 468 Scala sources and 74 Java sources to C:\projects\spark\core\target\scala-2.11\classes... [ERROR] C:\projects\spark\core\src\main\scala\org\apache\spark\SparkContext.scala:995: type mismatch; found : org.apache.spark.SparkConf required: org.apache.hadoop.conf.Configuration [ERROR] FileSystem.getLocal(conf) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77762689 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- I think the correct class is maintained: ``` > sapply(listmatrix, class) [1] "integer" "integer" "raw" "raw" "character" "character" > sapply(listmatrix, typeof) [1] "integer" "integer" "raw" "raw" "character" "character" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77762250 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- I was looking at https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html specifically the section `Value` which says ``` The type of a matrix result determined from the highest type of any of the inputs in the hierarchy raw < logical < integer < double < complex < character < list . ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14960 **[Test build #65026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65026/consoleFull)** for PR 14960 at commit [`41aaaf1`](https://github.com/apache/spark/commit/41aaaf127e949af7563024c1584567a177295409). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77762149 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } + test("SPARK-4502 parquet nested fields pruning") { +// Schema of "test-data/nested-array-struct.parquet": +//root +//|-- primitive: integer (nullable = true) +//|-- myComplex: array (nullable = true) +//||-- element: struct (containsNull = true) +//|||-- id: integer (nullable = true) +//|||-- repeatedMessage: array (nullable = true) +//||||-- element: struct (containsNull = true) +//|||||-- someId: integer (nullable = true) +val df = readResourceParquetFile("test-data/nested-array-struct.parquet") --- End diff -- Ah, I missed. Sorry. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14960 I re-run the test after this commit - https://ci.appveyor.com/project/HyukjinKwon/spark/build/81-SPARK-17339-fix-r Let's wait and see :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77761938 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import javax.annotation.concurrent.GuardedBy + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.util.StringUtils + + +/** + * A thread-safe manager for a list of temp views, providing atomic operations to manage temp views. --- End diff -- In the description of `TempViewManager`, could we mention the name of temp view is always case sensitive? The caller is responsible for handling case-related issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14960 @sarutak Ah, I will do this here. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77761381 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } /** + * Extracts the [[StructField]] with the given name recursively. + * + * @throws IllegalArgumentException if the parent field's type is not StructType + */ + def getFieldRecursively(name: String): StructField = { --- End diff -- I think I understood how it works. My point is, this is a Parquet-specific problem not related with Catalyst module. I don't see any reason that this method should be exposed. I believe we can do this not by modifying the column names (not even for a temporary use). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/14960 I found we can replace `FileSystem.get` in `SparkContext#hadoopFile` and `SparkContext.newAPIHadoopFile` with `FileSystem.getLocal` like `SparkContext#hadoopRDD` so once they are replaced, we need not discuss the case of comma-separated file list. @HyukjinKwon You can replace them in this PR or not do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14623 **[Test build #65025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65025/consoleFull)** for PR 14623 at commit [`0a28fd6`](https://github.com/apache/spark/commit/0a28fd6d559f36a3ec68cd4c195db5ebf568e67b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14527 **[Test build #65024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65024/consoleFull)** for PR 14527 at commit [`0970781`](https://github.com/apache/spark/commit/0970781b5d3b92fee0546d4bb9cb6a029fb9888e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77760776 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- ``` > b = serialize(1:10, NULL) > inputData = list(list(1L, b, 'a'), list(2L, b, 'b')) # Mixed data types > listmatrix <- do.call(rbind, inputData) > listmatrix [,1] [,2] [,3] [1,] 1Raw,62 "a" [2,] 2Raw,62 "b" > class(listmatrix) [1] "matrix" > typeof(listmatrix) [1] "list" > is.character(listmatrix) [1] FALSE ``` A little unusual- it's a list matrix. Hence the name. Which docs are you referring to? The test that's in here now does test for mixed columns, but it doesn't test for a single column of raws. I'll add that now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14991: [SPARK-17427][SQL] function SIZE should return -1 when p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14991 **[Test build #65021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65021/consoleFull)** for PR 14991 at commit [`1ccbe6b`](https://github.com/apache/spark/commit/1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14426 **[Test build #65023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65023/consoleFull)** for PR 14426 at commit [`0d19e28`](https://github.com/apache/spark/commit/0d19e28a8c53c83b3ca45ef3498f5faf9894c11c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid O...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14990 **[Test build #65022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65022/consoleFull)** for PR 14990 at commit [`33983e5`](https://github.com/apache/spark/commit/33983e5771f5d00dc3d5a97adfa23003e76f94c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14991: [SPARK-17427][SQL] function SIZE should return -1...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/14991 [SPARK-17427][SQL] function SIZE should return -1 when parameter is null ## What changes were proposed in this pull request? `select size(null)` returns -1 in Hive. In order to be compatible, we should return `-1`. ## How was this patch tested? unit test in `CollectionFunctionsSuite` and `DataFrameFunctionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14991.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14991 commit 1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f Author: Daoyuan WangDate: 2016-09-07T04:52:58Z size(null)=-1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77760397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } /** + * Extracts the [[StructField]] with the given name recursively. + * + * @throws IllegalArgumentException if the parent field's type is not StructType + */ + def getFieldRecursively(name: String): StructField = { --- End diff -- The mark of nested fields is one kind of tmp data, finally it will convert to a pruned StructType and pass to `org.apache.spark.sql.parquet.row.requested_schema` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14962: [SPARK-17402][SQL] separate the management of temp views...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14962 Found a common bug in the following ALTER TABLE commands: ``` | ALTER TABLE tableIdentifier (partitionSpec)? SET SERDE STRING (WITH SERDEPROPERTIES tablePropertyList)? #setTableSerDe | ALTER TABLE tableIdentifier (partitionSpec)? SET SERDEPROPERTIES tablePropertyList #setTableSerDe | ALTER TABLE tableIdentifier ADD (IF NOT EXISTS)? partitionSpecLocation+ #addTablePartition | ALTER VIEW tableIdentifier ADD (IF NOT EXISTS)? partitionSpec+ #addTablePartition | ALTER TABLE tableIdentifier from=partitionSpec RENAME TO to=partitionSpec #renameTablePartition | ALTER TABLE tableIdentifier DROP (IF EXISTS)? partitionSpec (',' partitionSpec)* PURGE? #dropTablePartitions | ALTER VIEW tableIdentifier DROP (IF EXISTS)? partitionSpec (',' partitionSpec)* #dropTablePartitions | ALTER TABLE tableIdentifier partitionSpec? SET locationSpec #setTableLocation | ALTER TABLE tableIdentifier RECOVER PARTITIONS #recoverPartitions ``` We need to issue an exception when the tableType is `VIEW`. This is not introduced by this PR. Should we fix it here? or create a separate PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77760264 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } + test("SPARK-4502 parquet nested fields pruning") { +// Schema of "test-data/nested-array-struct.parquet": +//root +//|-- primitive: integer (nullable = true) +//|-- myComplex: array (nullable = true) +//||-- element: struct (containsNull = true) +//|||-- id: integer (nullable = true) +//|||-- repeatedMessage: array (nullable = true) +//||||-- element: struct (containsNull = true) +//|||||-- someId: integer (nullable = true) +val df = readResourceParquetFile("test-data/nested-array-struct.parquet") --- End diff -- https://github.com/apache/spark/blob/master/sql/core/src/test/resources/test-data/nested-array-struct.parquet I reuse this file to test nested struct in paruqet, this file in sql/core/src/test/resources/test-data/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid O...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14990 **[Test build #65020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65020/consoleFull)** for PR 14990 at commit [`284d780`](https://github.com/apache/spark/commit/284d780c446b3f7cb59f8a2f34c522d90bc43fe1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to ...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14990 [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid OOM when converting unknown fields to JSON ## What changes were proposed in this pull request? This PR is a follow up of SPARK-17356. Current implementation of `TreeNode.toJSON` recursively converts all fields to JSON, even if the field is of type `Seq` or type Map. This may trigger out of memory exception in cases like: 1. the Seq or Map can be very big. Converting them to JSON make take huge memory, which may trigger out of memory error. 2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error. For a real example, please check the Jira description of SPARK-17426. In this PR, we refactor the `TreeNode.toJSON` so that we only convert a field to JSON string if the field is a safe type. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark json_oom2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14990 commit 284d780c446b3f7cb59f8a2f34c522d90bc43fe1 Author: Sean ZhongDate: 2016-09-07T03:49:23Z json oom --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65018/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14988 **[Test build #65018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65018/consoleFull)** for PR 14988 at commit [`d9ba28d`](https://github.com/apache/spark/commit/d9ba28d2cbd7823324f7dc02fb1072fa71d2450a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia I was thinking about this over the weekend and I'm not sure this is the right approach. I suspect it might be better to re-use the same task set manager for the new stage. This copying of information is confusing and I'm concerned it will be bug-prone in the future. Did you consider that approach? Also, separately from what approach is used, how do you deal with the following: suppose map task 1 loses its output (e.g., the reducer where that task is located dies). Now, suppose reduce task A gets a fetch failure for map task 1, triggering map task 1 to be re-run. Meanwhile, reduce task B is still running. Now the re-run map task 1 completes and the scheduler launches the reduce phase again. Suppose after that happens, task B fails (this is the old task B, that started before the fetch failure) because it can't get the data from map task 1, but that's because it still has the old location for map task 1. My understanding is that, with the current code, that would cause the map stage to get re-triggered again, but really, reduce task B should be re-started with the correct location for the output from map 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r77758918 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -137,6 +138,17 @@ class KMeansModel private[ml] ( @Since("1.6.0") override def write: MLWriter = new KMeansModel.KMeansModelWriter(this) + override def hashCode(): Int = { +(Array(this.getClass, uid) ++ clusterCenters) --- End diff -- @yinxusen Correct me if I'm wrong, but I believe you override the equals method is because the params are checked for equality in the read/write tests. Just thinking ahead, we will have to do this for every model we use as an initial model. We can avoid this by adding some handling inside the read/write params test, and then checking the initial model equality for read/write inside the `checkModelData` method. I guess I'd prefer not to randomly overwrite some models equals methods, and not others, especially since the reasoning behind won't be clear. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #65019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65019/consoleFull)** for PR 14116 at commit [`d107721`](https://github.com/apache/spark/commit/d107721ffe1a83d7081b846db80cb4b787d79d7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14957 Also, it seems you might need to update your PR description. It seems the last commit you just pushed acts differently with your PR description. In addition, maybe you would need to fix the title of this PR to be complete (without `...`) if you'd like to keep this PR open. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14957 Could you please check out related tests pass locally? It seems it affects all other data sources. Also, I am not sure of the approach here. Marking nested fields by modifying column names does look like a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Sorry for the delay @clarkfitzg - The code change looks pretty good to me. I just had one question about mixed type columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77757859 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } + test("SPARK-4502 parquet nested fields pruning") { +// Schema of "test-data/nested-array-struct.parquet": +//root +//|-- primitive: integer (nullable = true) +//|-- myComplex: array (nullable = true) +//||-- element: struct (containsNull = true) +//|||-- id: integer (nullable = true) +//|||-- repeatedMessage: array (nullable = true) +//||||-- element: struct (containsNull = true) +//|||||-- someId: integer (nullable = true) +val df = readResourceParquetFile("test-data/nested-array-struct.parquet") +df.createOrReplaceTempView("tmp_table") +// normal test +val query1 = "select primitive,myComplex[0].id from tmp_table" +val result1 = sql(query1) +withSQLConf(SQLConf.PARQUET_NEST_COLUMN_PRUNING.key -> "true") { + checkAnswer(sql(query1), result1) --- End diff -- Does this really test if the nested fields are pruned? I think this test will pass regardless of the newly added option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77757807 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) } + +# rbind a list of rows with raw (binary) columns +# +# @param inputData a list of rows, with each row a list +# @return data.frame with raw columns as lists +rbindRaws <- function(inputData){ + row1 <- inputData[[1]] + rawcolumns <- ("raw" == sapply(row1, class)) + + listmatrix <- do.call(rbind, inputData) --- End diff -- Do you know what happens if we have a mixed set of columns here ? i.e. say one column with "raw", one with "integer" and one with "character" -- From reading some docs it looks like everything is converted to create a `character` matrix when we use `rbind`. I think we have two choices if thats the case (a) we apply the type conversions after `rbind` (b) we only call this method when all columns are `raw` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77757667 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } /** + * Extracts the [[StructField]] with the given name recursively. + * + * @throws IllegalArgumentException if the parent field's type is not StructType + */ + def getFieldRecursively(name: String): StructField = { --- End diff -- Isn't this Parquet-specific problem? I wonder adding this method is appropriate. Also, I am not too sure if it is appropriate to mark nested fields by modifying field names with a character. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77757611 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } + test("SPARK-4502 parquet nested fields pruning") { +// Schema of "test-data/nested-array-struct.parquet": +//root +//|-- primitive: integer (nullable = true) +//|-- myComplex: array (nullable = true) +//||-- element: struct (containsNull = true) +//|||-- id: integer (nullable = true) +//|||-- repeatedMessage: array (nullable = true) +//||||-- element: struct (containsNull = true) +//|||||-- someId: integer (nullable = true) +val df = readResourceParquetFile("test-data/nested-array-struct.parquet") --- End diff -- It seems we don't have this file in this PR. So running tests will fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77757552 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -97,7 +98,16 @@ object FileSourceStrategy extends Strategy with Logging { dataColumns .filter(requiredAttributes.contains) .filterNot(partitionColumns.contains) - val outputSchema = readDataColumns.toStructType + val outputSchema = if (fsRelation.sqlContext.conf.isParquetNestColumnPruning) { --- End diff -- It will affect all other data sources. I am pretty sure any tests related with this will pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14912#discussion_r77757275 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala --- @@ -171,6 +172,27 @@ class FilterPushdownSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("push down filters that are combined") { +// The following predicate ('a === 2 || 'a === 3) && ('c > 10 || 'a === 2) +// will be simplified as ('a == 2) || ('c > 10 && 'a == 3). +// ('a === 2 || 'a === 3) can be pushed down. But the simplified one can't. --- End diff -- You are right. It is only triggered when adjoining Filters are there. So in above example, the predicate `(a == 2 || a==3)` will not be pushed down when there is no `.where(c > 10)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756736 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -159,12 +171,13 @@ case class AlterTableRenameCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog DDLUtils.verifyAlterTableType(catalog, oldName, isView) -// If this is a temp view, just rename the view. -// Otherwise, if this is a real table, we also need to uncache and invalidate the table. -val isTemporary = catalog.isTemporaryTable(oldName) -if (isTemporary) { - catalog.renameTable(oldName, newName) -} else { + +// If the old table name contains database part, we should rename a metastore table directly, +// otherwise, try to rename a temp view first, if that not exists, rename a metastore table. +val renameMetastoreTable = + oldName.database.isDefined || !catalog.renameTempView(oldName.table, newName) --- End diff -- Here, we also need to check if it is VIEW before trying to drop a temp view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756537 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { e = intercept[AnalysisException] { sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""") }.getMessage - assert(e.contains(s"Target table in LOAD DATA cannot be temporary: `$viewName`")) + assert(e.contains(s"Target table in LOAD DATA does not exist: `$viewName`")) --- End diff -- https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L218-L223 Before this PR, `tableExists` checks the temp table, but `getTableMetadataOption` does not check it. Thus, instead of changing the test case, we need to change the impl of `LoadDataCommand` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14912 The CNF exponential expansion issue is an important concern in previous works. Actually you can find that this patch doesn't produce a real CNF for predicate. I use `splitDisjunctivePredicates` to obtain disjunctive predicates and convert them to conjunctive form. The conversion here is not recursive. I think this should prevent exponential explosion. Of course it is a compromise and can't benefit for all predicates. But I would suspect how often a complex predicate need complete conversion of CNF is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/12436 @davies - Thanks for looking into this. Updated the PR description with details of the change. Let me know if the approach seem reasonable, I will work on rebasing the change against latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77756261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -246,33 +246,23 @@ class SessionCatalog( } /** - * Retrieve the metadata of an existing metastore table. - * If no database is specified, assume the table is in the current database. - * If the specified table is not found in the database then a [[NoSuchTableException]] is thrown. + * Retrieve the metadata of an existing metastore table/view. + * If no database is specified, assume the table/view is in the current database. + * If the specified table/view is not found in the database then a [[NoSuchTableException]] is + * thrown. */ def getTableMetadata(name: TableIdentifier): CatalogTable = { val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase)) val table = formatTableName(name.table) -val tid = TableIdentifier(table) -if (isTemporaryTable(name)) { - CatalogTable( -identifier = tid, -tableType = CatalogTableType.VIEW, -storage = CatalogStorageFormat.empty, -schema = tempTables(table).output.toStructType, -properties = Map(), -viewText = None) -} else { - requireDbExists(db) - requireTableExists(TableIdentifier(table, Some(db))) - externalCatalog.getTable(db, table) -} +requireDbExists(db) +requireTableExists(TableIdentifier(table, Some(db))) +externalCatalog.getTable(db, table) } /** - * Retrieve the metadata of an existing metastore table. + * Retrieve the metadata of an existing metastore table/view. * If no database is specified, assume the table is in the current database. - * If the specified table is not found in the database then return None if it doesn't exist. + * If the specified table/view is not found in the database then return None if it doesn't exist. */ def getTableMetadataOption(name: TableIdentifier): Option[CatalogTable] = { --- End diff -- `getTableMetadataOption` does not check the temp view, but `getTableMetadata` does check it... We might have more bugs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65017/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14988 **[Test build #65017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65017/consoleFull)** for PR 14988 at commit [`8e537a1`](https://github.com/apache/spark/commit/8e537a161560a6d717a40d8aae44b1973dda9695). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14762: [SPARK-16962][CORE][SQL] Fix misaligned record ac...
Github user sumansomasundar commented on a diff in the pull request: https://github.com/apache/spark/pull/14762#discussion_r77755718 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -47,13 +47,20 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { public static boolean arrayEquals( Object leftBase, long leftOffset, Object rightBase, long rightOffset, final long length) { int i = 0; -while (i <= length - 8) { - if (Platform.getLong(leftBase, leftOffset + i) != -Platform.getLong(rightBase, rightOffset + i)) { -return false; - } - i += 8; -} + + // This attempts to speed up the memcmp type of operation, but there is no way + // to guarantee that the offsets will be on a word boundary in order to use + // Platform.getLong --- End diff -- It can still be used only if both the leftOffset AND rightOffset start on proper word boundaries. By checking for these 2 conditions, we lose the advantage gained by this block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14834 | numClasses| isMultinomial| coefficientMatrix size| | - |:-:| -:| |3+|true|3+ x numFeatures| |2|true|2 x numFeatures| |2|false|1 x numFeatures| The current behavior is as follows: * If it is binary classification trained with multinomial family, then we store `2 x numFeatures` coefficients in a matrix. We will predict with this matrix (i.e. we do not convert to `1 x numFeatures`). * If it is binary classification trained with binomial family, then we store `1 x numFeatures` (i.e. these coefficients are pivoted) and we use a `DenseVector` instead of a matrix for prediction. The coefficients are stored in an array, truly. There is always `coefficientMatrix` which is backed by that array and in some cases has only 1 row. When it is binomial family, we also have a `cofficients` vector which is backed by the same array as the matrix. We use that vector for prediction in the binomial case. Hopefully that clears it up. I don't think it's necessary to convert the case of multinomial family but binary classification to `1 x numFeatures` for prediction since it won't be a regression and users would have to explicitly specify that family (hopefully knowing the consequences of that choice). I also vote for Option 2 in the original description. We can avoid any regressions with past versions and the implementation isn't too messy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65016/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14931 **[Test build #65016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65016/consoleFull)** for PR 14931 at commit [`a62289e`](https://github.com/apache/spark/commit/a62289ebd47c7a91c3e8659bf13b1d940499dccb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14912 hmm, looks like there are previous works regarding CNF but none of them are really merged. @gatorsmile Thanks for the context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...
Github user watermen commented on a diff in the pull request: https://github.com/apache/spark/pull/14988#discussion_r77754923 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -164,4 +164,11 @@ case class HiveTableScanExec( } override def output: Seq[Attribute] = attributes + + override def sameResult(plan: SparkPlan): Boolean = plan match { --- End diff -- `left.cleanArgs == right.cleanArgs` in defalut `sameResult` return false, because `equals` in `MetastoreRelation` compare the output(`AttributeReference`) and `exprId`s are diff. We need to erase the exprId. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14988 **[Test build #65018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65018/consoleFull)** for PR 14988 at commit [`d9ba28d`](https://github.com/apache/spark/commit/d9ba28d2cbd7823324f7dc02fb1072fa71d2450a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14912 @viirya Could you please wait for the CNF predicate normalization rule? @liancheng @yjshen did a few related work before. See https://github.com/apache/spark/pull/10444 and https://github.com/apache/spark/pull/8200. Let us also collect the inputs from @ioana-delaney @nsyca . They did a lot of related work in the past 10+ years. We need a good design about CNF normalization, which can benefit the other optimizer rules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14989: [MINOR][SQL] Fixing the typo in unit test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14989 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14989: [MINOR][SQL] Fixing the typo in unit test
GitHub user vundela opened a pull request: https://github.com/apache/spark/pull/14989 [MINOR][SQL] Fixing the typo in unit test ## What changes were proposed in this pull request? Fixing the typo in the unit test of CodeGenerationSuite.scala ## How was this patch tested? Ran the unit test after fixing the typo and it passes You can merge this pull request into a Git repository by running: $ git pull https://github.com/vundela/spark typo_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14989.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14989 commit 0a96ac233dc06a985f56741019dc69a9e869596a Author: Srinivasa Reddy VundelaDate: 2016-09-07T02:50:49Z [MINOR][SQL] Fixing the typo in unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14887 @zhaoyunjiong , the fix you made may introduce a situation where recovery data will be existed in multiple directories, I'm not sure if this will introduce recovery issue or others, since now the recovery data may not be consistent. IMO I think here based on SPARK-14963, we could change to enable Spark's shuffle service recovery as a configuration: 1. If it is not enabled, then Spark will not persist data into leveldb, in that case yarn shuffle service can still be served but lose the ability for recovery. 2. If it is enabled, then user should guarantee recovery path is reliable. Because recovery path is also crucial for NM to recover. 3. Also this configuration should be consistent with NM's recovery enabled configuration. 4. If this shuffle service is running on a lower version of Hadoop where there's no NM recovery * If Spark's shuffle service recovery is enabled, refer to 2. * If it is not enabled, then refer to 1. Just my two cents, may have some missing parts. Basically I think to solve your problem (also considering recovery) it might be better to make Spark's shuffle recovery mechanism as configurable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77753115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -189,31 +189,39 @@ case class DropTableCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -if (!catalog.tableExists(tableName)) { - if (!ifExists) { -val objectName = if (isView) "View" else "Table" -throw new AnalysisException(s"$objectName to drop '$tableName' does not exist") - } -} else { - // If the command DROP VIEW is to drop a table or DROP TABLE is to drop a view - // issue an exception. - catalog.getTableMetadataOption(tableName).map(_.tableType match { -case CatalogTableType.VIEW if !isView => - throw new AnalysisException( -"Cannot drop a view with DROP TABLE. Please use DROP VIEW instead") -case o if o != CatalogTableType.VIEW && isView => - throw new AnalysisException( -s"Cannot drop a table with DROP VIEW. Please use DROP TABLE instead") -case _ => - }) - try { -sparkSession.sharedState.cacheManager.uncacheQuery( - sparkSession.table(tableName.quotedString)) - } catch { -case NonFatal(e) => log.warn(e.toString, e) + +// If the table name contains database part, we should drop a metastore table directly, +// otherwise, try to drop a temp view first, if that not exist, drop metastore table. +val dropMetastoreTable = + tableName.database.isDefined || !catalog.dropTempView(tableName.table) --- End diff -- `Drop Table` is unable to drop a temp view, right? ```SQL spark.range(10).createTempView("tempView") sql("DESC tempView").show() sql("DROP TABLE tempView") sql("DESC tempView").show() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/14957#discussion_r77753006 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -259,8 +259,23 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru * @throws IllegalArgumentException if a field with the given name does not exist */ def apply(name: String): StructField = { -nameToField.getOrElse(name, - throw new IllegalArgumentException(s"""Field "$name" does not exist.""")) +if (name.contains('.')) { --- End diff -- @HyukjinKwon Thanks for your review, mix the recursively get with the default apply has this problem, I fixed it in next patch and use ',' which is a invalid character in Parquet schema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14960#discussion_r77751910 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli // This is a hack to enforce loading hdfs-site.xml. // See SPARK-11227 for details. -FileSystem.get(new URI(path), hadoopConfiguration) +FileSystem.get(new Path(path).toUri, hadoopConfiguration) --- End diff -- Yeah I'm not sure what part of the URI we are using here. If its just the scheme, authority then I think its fine to use that from the first path. FWIW there is a method in Hadoop to parse comma separated path strings but its private [1]. IMHO this problem existed even before this PR so I'm fine not fixing it here if thats okay with @sarutak [1] https://hadoop.apache.org/docs/r2.7.1/api/src-html/org/apache/hadoop/mapred/FileInputFormat.html#line.467 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14987 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14987 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65015/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14987 **[Test build #65015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65015/consoleFull)** for PR 14987 at commit [`9bcbb08`](https://github.com/apache/spark/commit/9bcbb087d2935657a30eb9bc6b52ea6fbed65edf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 /cc @cloud-fan @rxin @davies for reviewing this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14850 also backport it to 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14988#discussion_r77750354 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -164,4 +164,11 @@ case class HiveTableScanExec( } override def output: Seq[Attribute] = attributes + + override def sameResult(plan: SparkPlan): Boolean = plan match { --- End diff -- why the default one doesn't work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14988 **[Test build #65017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65017/consoleFull)** for PR 14988 at commit [`8e537a1`](https://github.com/apache/spark/commit/8e537a161560a6d717a40d8aae44b1973dda9695). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...
GitHub user watermen opened a pull request: https://github.com/apache/spark/pull/14988 [SPARK-17425][SQL] Override sameResult in HiveTableScanExec to make ReusedExchange work in text format table ## What changes were proposed in this pull request? The PR will override the `sameResult` in `HiveTableScanExec` to make `ReusedExchange` work in text format table. ## How was this patch tested? # SQL ```sql SELECT * FROM src t1 JOIN src t2 ON t1.key = t2.key JOIN src t3 ON t1.key = t3.key; ``` # Before ``` == Physical Plan == *BroadcastHashJoin [key#30], [key#34], Inner, BuildRight :- *BroadcastHashJoin [key#30], [key#32], Inner, BuildRight : :- *Filter isnotnull(key#30) : : +- HiveTableScan [key#30, value#31], MetastoreRelation default, src : +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) : +- *Filter isnotnull(key#32) :+- HiveTableScan [key#32, value#33], MetastoreRelation default, src +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- *Filter isnotnull(key#34) +- HiveTableScan [key#34, value#35], MetastoreRelation default, src ``` # After ``` == Physical Plan == *BroadcastHashJoin [key#2], [key#6], Inner, BuildRight :- *BroadcastHashJoin [key#2], [key#4], Inner, BuildRight : :- *Filter isnotnull(key#2) : : +- HiveTableScan [key#2, value#3], MetastoreRelation default, src : +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) : +- *Filter isnotnull(key#4) :+- HiveTableScan [key#4, value#5], MetastoreRelation default, src +- ReusedExchange [key#6, value#7], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) ``` cc: @davies @cloud-fan You can merge this pull request into a Git repository by running: $ git pull https://github.com/watermen/spark SPARK-17425 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14988.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14988 commit 8e537a161560a6d717a40d8aae44b1973dda9695 Author: Yadong QiDate: 2016-09-07T01:26:46Z Override sameResult in HiveTableScanExec. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14958 ``` Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps). To update the manifest file, run './dev/test-dependencies.sh --replace-manifest'. diff --git a/dev/deps/spark-deps-hadoop-1 b/dev/pr-deps/spark-deps-hadoop-1 index dd5a6dc..a97b10c 100644 --- a/dev/deps/spark-deps-hadoop-1 +++ b/dev/pr-deps/spark-deps-hadoop-1 @@ -143,7 +143,7 @@ servlet-api-2.5.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-0.2.jar -snappy-java-1.1.2.1.jar +snappy-java-1.1.2.6.jar spire-macros_2.10-0.7.4.jar spire_2.10-0.7.4.jar stax-api-1.0.1.jar Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn Build step 'Execute shell' marked build as failure Finished: FAILURE ``` Can you take a look at 1.6 build (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-1.6-lint/262/console)? Seems the 1.6 build is broken by this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user srinathshankar commented on a diff in the pull request: https://github.com/apache/spark/pull/14912#discussion_r77748668 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala --- @@ -171,6 +172,27 @@ class FilterPushdownSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("push down filters that are combined") { +// The following predicate ('a === 2 || 'a === 3) && ('c > 10 || 'a === 2) +// will be simplified as ('a == 2) || ('c > 10 && 'a == 3). +// ('a === 2 || 'a === 3) can be pushed down. But the simplified one can't. --- End diff -- I agree with you that we should respect the interaction between CombineFilters, PushDownPredicates and other rules. I do think it's important that cnf conversion run before any of the push-down / reordering rules. And the simplification rules should run afterwards. My concern with rolling this into CombineFilters is that it doesn't get triggered unless there are adjoining Filter nodes. In the example you have: val originalQuery = testRelation .select('a, 'b, ('c + 1) as 'cc) .groupBy('a)('a, count('cc) as 'c) .where('c > 10) .where(('a === 2) || ('c > 10 && 'a === 3)) I think that (a == 2 || a==3) should get pushed down even if you don't have ".where (c > 10)", but I'm not sure that it will be since toCNF is in CombineFilters. Could you confirm ? My suggestion is that toCNF warrants a separate rule -- for example when you're doing joins, and you have select * from A inner join C on (A.a1 = C.c1) where A.a2 = 2 || (C.c2 = 10 && A.a2 = 3), you want (A.a2 = 2 || A.a2 = 3) pushed down into A --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14809 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14809 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks from di...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/10225#discussion_r77748327 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -136,7 +136,9 @@ private[spark] class IndexShuffleBlockResolver( shuffleId: Int, mapId: Int, lengths: Array[Long], - dataTmp: File): Unit = { --- End diff -- Do we have to change the code in this function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14985 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65012/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14985 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14985 **[Test build #65012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65012/consoleFull)** for PR 14985 at commit [`89065fd`](https://github.com/apache/spark/commit/89065fd08cb2eb1e492571ce5980daa8f059a820). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/14712 @yhuai @hvanhovell @cloud-fan Sorry for the late response, I'm out of office for two days. @gatorsmile Thanks to fix it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14960#discussion_r77747489 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli // This is a hack to enforce loading hdfs-site.xml. // See SPARK-11227 for details. -FileSystem.get(new URI(path), hadoopConfiguration) +FileSystem.get(new Path(path).toUri, hadoopConfiguration) --- End diff -- cc - @sarutak WDYT? is my understanding correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65013/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10970: [SPARK-13067][SQL] workaround for a weird scala reflecti...
Github user atronchi commented on the issue: https://github.com/apache/spark/pull/10970 The solution mentioned in [SPARK-17424] by @rdblue fixes this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14984 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14984 **[Test build #65013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65013/consoleFull)** for PR 14984 at commit [`cc74334`](https://github.com/apache/spark/commit/cc743345de45b7367509cde74098de0cedfac9a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14960#discussion_r77747323 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli // This is a hack to enforce loading hdfs-site.xml. // See SPARK-11227 for details. -FileSystem.get(new URI(path), hadoopConfiguration) +FileSystem.get(new Path(path).toUri, hadoopConfiguration) --- End diff -- As it is known that is hacky and ugly, maybe we can make this separate to another issue (although I am careful to say this)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14960#discussion_r77747258 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli // This is a hack to enforce loading hdfs-site.xml. // See SPARK-11227 for details. -FileSystem.get(new URI(path), hadoopConfiguration) +FileSystem.get(new Path(path).toUri, hadoopConfiguration) --- End diff -- Hm.. I didn't know it supports comma separated path. BTW, we still can use `spark.sparkContext.textFile(..)` though. I took a look and it seems okay though (but it's ugly and hacky). If the first given path is okay, it seems working fine. It looks only `getScheme` and `getAuth` in `FileSystem.get(..)` (I track down the `FileSystem.get(..)` and related function calls.) So, iff the first path is correct, it seems `getAuthority` and `getScheme` give a correct ones to get a file system. For example, the path `http://localhost:8080/a/b,http://localhost:8081/c/d` parses the URI as below: ![2016-09-07 10 19 11](https://cloud.githubusercontent.com/assets/6477701/18296462/d213126c-74e4-11e6-9859-e68e2d6f58cb.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 I'm presenting something related to this on Thursday- it would be nice to tell the audience this patch made it in. Can I do anything to help this along? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14931 **[Test build #65016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65016/consoleFull)** for PR 14931 at commit [`a62289e`](https://github.com/apache/spark/commit/a62289ebd47c7a91c3e8659bf13b1d940499dccb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65011/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14931#discussion_r77746289 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala --- @@ -153,7 +153,7 @@ private[spark] class StandaloneSchedulerBackend( override def executorRemoved(fullId: String, message: String, exitStatus: Option[Int]) { val reason: ExecutorLossReason = exitStatus match { case Some(code) => ExecutorExited(code, exitCausedByApp = true, message) - case None => SlaveLost(message) + case None => SlaveLost(message, workerLost = true /* worker loss event from master */) --- End diff -- Went with propagating just `workerLost` explicitly all the way from the master, since ExecutorState is private to deploy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #65011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65011/consoleFull)** for PR 14702 at commit [`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14962#discussion_r77745578 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -72,9 +72,7 @@ class SessionCatalog( this(externalCatalog, new SimpleFunctionRegistry, new SimpleCatalystConf(true)) } - /** List of temporary tables, mapping from table name to their logical plan. */ - @GuardedBy("this") - protected val tempTables = new mutable.HashMap[String, LogicalPlan] + private val tempViews = new TempViewManager --- End diff -- Since the goal of this PR is to add some view related API. So I think refactoring using TempViewManager is not the major goal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14816 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14816 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65014/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14816 **[Test build #65014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65014/consoleFull)** for PR 14816 at commit [`8b57886`](https://github.com/apache/spark/commit/8b57886c0489c759f0308a7b104f5b058204cdcd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14931#discussion_r77745305 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala --- @@ -153,7 +153,7 @@ private[spark] class StandaloneSchedulerBackend( override def executorRemoved(fullId: String, message: String, exitStatus: Option[Int]) { val reason: ExecutorLossReason = exitStatus match { case Some(code) => ExecutorExited(code, exitCausedByApp = true, message) - case None => SlaveLost(message) + case None => SlaveLost(message, workerLost = true /* worker loss event from master */) --- End diff -- This assumes that `exitStatus == None` implies that a worker was lost, but there are some corner-cases where this isn't necessarily true (e.g. if an executor kill fails). Looking through both the 1.6.x and 2.0.x code, it appears that `ExecutorStatus.LOST` is used exclusively for denoting whole-worker-loss, so I think that we should check that status here instead of assuming `true`. Other than that minor corner-case, this patch looks good to me, so I'll merge once we fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org