[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14256 Hi, @lw-lin . This seems to resolve SPARK-16613 , too. Could you check that? If possible, please add SPARK-16613 into the title, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14251 **[Test build #62511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62511/consoleFull)** for PR 14251 at commit [`90d6851`](https://github.com/apache/spark/commit/90d6851bead39875769011954f20a9ae2d333853). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14253: [Doc] improve python doc for rdd.histogram and da...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14251 Now, `findTightestCommonTypeToString` becomes public and the testcase is moved and reduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14255 For easy comparison, `lint-java` results are here. - https://travis-ci.org/dongjoon-hyun/spark/jobs/145738728 (Current master: [SPARK-16303][DOCS][EXAMPLES] ...) - https://travis-ci.org/dongjoon-hyun/spark/jobs/145738812 (This PR) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14253 Merging in master/2.0. THanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14222 Ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14253 **[Test build #3188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3188/consoleFull)** for PR 14253 at commit [`6d8c9aa`](https://github.com/apache/spark/commit/6d8c9aabfc9ce105156fc8eb96b9e35777b03477). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r71278812 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl._ +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +class SimplifyCastsSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil + } + + test("non-nullable to non-nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, false))) +val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + test("non-nullable to nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, false))) +val array_intPrimitive = 'a.array(ArrayType(IntegerType, false)) +val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + test("nullable to non-nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, true))) +val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze +val optimized = Optimize.execute(plan) +comparePlans(optimized, plan) + } + + test("nullable to nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, true))) +val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference = --- End diff -- This is because [current map()](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L246) cannot pass information on `nullable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13382 **[Test build #62510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62510/consoleFull)** for PR 13382 at commit [`e19ec3d`](https://github.com/apache/spark/commit/e19ec3d2b145879e7ea73fa847761cfdeb7d5c95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14014 Let's also update the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71277693 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-16602 Nvl/Coalesce") { --- End diff -- Oh, right. I see. It's the same way to input type checking. This test is too heavy for the exact purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71277607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -146,6 +151,15 @@ case class CatalogTable( requireSubsetOfSchema(sortColumnNames, "sort") requireSubsetOfSchema(bucketColumnNames, "bucket") + lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) { --- End diff -- I'm not quite sure if it's safe to so, why do we have `CatalogColumn` at the first place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user dafrista commented on the issue: https://github.com/apache/spark/pull/13382 Thanks @ericl I've added that information to the class comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14255 **[Test build #62509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62509/consoleFull)** for PR 14255 at commit [`c44a8a0`](https://github.com/apache/spark/commit/c44a8a0863f2a232370bd68999a335f328ebf8bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14256 **[Test build #62508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62508/consoleFull)** for PR 14256 at commit [`8517465`](https://github.com/apache/spark/commit/85174658b5392b5fd9773a89ee7b24a3db08c334). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14255 Rebased to resolve conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14200: [SPARK-16528][SQL] Fix NPE problem in HiveClientI...
Github user jacek-lewandowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14200#discussion_r71277387 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -320,7 +320,7 @@ private[hive] class HiveClientImpl( name = d.getName, description = d.getDescription, locationUri = d.getLocationUri, -properties = d.getParameters.asScala.toMap) +properties = Option(d.getParameters).map(_.asScala.toMap).orNull) --- End diff -- Perhaps... however this would change the semantics which was out of the scope of this ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14222 @viirya I'm going to take over the PR and play with the API a little bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14256: [SPARK-16620][CORE] Add back tokenization process...
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14256 [SPARK-16620][CORE] Add back tokenization process in RDD.pipe(command: String) ## What changes were proposed in this pull request? Currently `RDD.pipe(command: String)`: - works only with a single command with no option specified, such as `RDD.pipe("wc")` - does not work when command is specified with some options, such as `RDD.pipe("wc -l")` This is a regression from Spark 1.6. This patch adds back tokenization process in RDD.pipe(command: String). ## How was this patch tested? Added a test which would pass in 1.6, would fail prior to this patch, and would pass after this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark rdd-pipe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14256 commit 85174658b5392b5fd9773a89ee7b24a3db08c334 Author: Liwei Lin Date: 2016-07-19T05:34:46Z Fix pipe(command) & pipe(command, env) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71277158 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-16602 Nvl/Coalesce") { --- End diff -- We can just test it in expression unit test by calling replaceForTypeCoercion, can't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r71277147 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -442,13 +445,23 @@ private[parquet] class ParquetRowConverter( private val elementConverter: Converter = { val repeatedType = parquetSchema.getType(0) val elementType = catalystSchema.elementType - val parentName = parquetSchema.getName - if (isElementType(repeatedType, elementType, parentName)) { + // At this stage, we're not sure whether the repeated field maps to the element type or is + // just the syntactic repeated group of the 3-level standard LIST layout. Here we try to + // convert the repeated field into a Catalyst type to see whether the converted type matches + // the Catalyst array element type. + val guessedElementType = schemaConverter.convertField(repeatedType) + + if (DataType.equalsIgnoreCompatibleNullability(guessedElementType, elementType)) { +// If the repeated field corresponds to the element type, creates a new converter using the +// type of the repeated field. newConverter(repeatedType, elementType, new ParentContainerUpdater { override def set(value: Any): Unit = currentArray += value }) } else { +// If the repeated field corresponds to the syntactic group in the standard 3-level Parquet +// LIST layout, creates a new converter using the only child field of the repeated field. +assert(!repeatedType.isPrimitive && repeatedType.asGroupType().getFieldCount == 1) new ElementConverter(repeatedType.asGroupType().getType(0), elementType) --- End diff -- Can we add examples at here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14200: [SPARK-16528][SQL] Fix NPE problem in HiveClientI...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14200#discussion_r71277056 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -320,7 +320,7 @@ private[hive] class HiveClientImpl( name = d.getName, description = d.getDescription, locationUri = d.getLocationUri, -properties = d.getParameters.asScala.toMap) +properties = Option(d.getParameters).map(_.asScala.toMap).orNull) --- End diff -- Is `Map.empty` a better default value? Or we should update the `properties` field in `CatalogDatabase` to indicate that it's nullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14245: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java ex...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14245 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71277066 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-16602 Nvl/Coalesce") { --- End diff -- Anyway, I will try to move this. Thank you for fast review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceG...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14222#discussion_r71277037 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/expressions/ReduceAggregatorSuite.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.expressions.ReduceAggregator + +class ReduceAggregatorSuite extends SparkFunSuite { --- End diff -- just put this in DatasetAggregatorSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71276984 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-16602 Nvl/Coalesce") { --- End diff -- I thought we should have here because `Nvl` is RuntimeReplaceable. (Or, did I do some misunderstanding again?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71276925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,7 +100,8 @@ object TypeCoercion { } /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - private def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { + private[catalyst] def findTightestCommonTypeToString(left: DataType, right: DataType) --- End diff -- Oh, sure. I will make this `public`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14255 **[Test build #62507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62507/consoleFull)** for PR 14255 at commit [`8cf8c78`](https://github.com/apache/spark/commit/8cf8c7882d6fe201f653e5df7cd055df87af42ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71276731 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-16602 Nvl/Coalesce") { --- End diff -- maybe this should be a unit test for the analyzer rather than an end to end test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14251#discussion_r71276614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,7 +100,8 @@ object TypeCoercion { } /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - private def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { + private[catalyst] def findTightestCommonTypeToString(left: DataType, right: DataType) --- End diff -- you can just make this public I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14255: [MINOR] Fix Java Linter `LineLength` errors
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14255 [MINOR] Fix Java Linter `LineLength` errors ## What changes were proposed in this pull request? This PR fixes four java linter `LineLength` errors. Those are all `LineLength` errors, but we had better remove all java linter errors before release. ## How was this patch tested? After pass the Jenkins, `./dev/lint-java`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark minor_java_linter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14255 commit 8cf8c7882d6fe201f653e5df7cd055df87af42ff Author: Dongjoon Hyun Date: 2016-07-19T05:45:15Z [MINOR] Fix Java Linter `LineLength` errors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14014#discussion_r71276489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRecordMaterializer.scala --- @@ -30,10 +30,11 @@ import org.apache.spark.sql.types.StructType * @param catalystSchema Catalyst schema of the rows to be constructed */ private[parquet] class ParquetRecordMaterializer( -parquetSchema: MessageType, catalystSchema: StructType) +parquetSchema: MessageType, catalystSchema: StructType, schemaConverter: ParquetSchemaConverter) --- End diff -- Add `schemaConverter` to the scaladoc? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14227: [SPARK-16582][SQL] Explicitly define isNull = fal...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14227#discussion_r71276500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -377,6 +377,7 @@ abstract class UnaryExpression extends Expression { """) } else { ev.copy(code = s""" +boolean ${ev.isNull} = false; --- End diff -- I don't quite understand this, we explicitly define `isNull = "false"` below, how could `ev.isNull` be referenced later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r71276368 --- Diff: python/pyspark/sql/readwriter.py --- @@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non applies to both date type and timestamp type. By default, it is None which means trying to parse times and date by ``java.sql.Timestamp.valueOf()`` and ``java.sql.Date.valueOf()``. +:param timezone: defines the timezone to be used for both date type and timestamp type. + If a timezone is specified in the data, this will load them after --- End diff -- I will clean up the PR description and all those soon with a better proposal. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14253 **[Test build #3188 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3188/consoleFull)** for PR 14253 at commit [`6d8c9aa`](https://github.com/apache/spark/commit/6d8c9aabfc9ce105156fc8eb96b9e35777b03477). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14253 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14247: [MINOR] Remove unused arg in als.py
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14247 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14245: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example u...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14245 Thanks. Merging to master and branch 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14254: [SPARK-16619] Add shuffle service metrics entry in monit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14254 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14247: [MINOR] Remove unused arg in als.py
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14247 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r71275973 --- Diff: python/pyspark/sql/readwriter.py --- @@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non applies to both date type and timestamp type. By default, it is None which means trying to parse times and date by ``java.sql.Timestamp.valueOf()`` and ``java.sql.Date.valueOf()``. +:param timezone: defines the timezone to be used for both date type and timestamp type. + If a timezone is specified in the data, this will load them after --- End diff -- Ah ic - you want to control the zone this gets converted to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r71275955 --- Diff: python/pyspark/sql/readwriter.py --- @@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non applies to both date type and timestamp type. By default, it is None which means trying to parse times and date by ``java.sql.Timestamp.valueOf()`` and ``java.sql.Date.valueOf()``. +:param timezone: defines the timezone to be used for both date type and timestamp type. + If a timezone is specified in the data, this will load them after --- End diff -- Why not just have timezone as part of the dateFormat, so users can specify it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14254: Add shuffle service metrics entry in monitoring d...
GitHub user lovexi opened a pull request: https://github.com/apache/spark/pull/14254 Add shuffle service metrics entry in monitoring docs ## What changes were proposed in this pull request? Add shuffle service metrics entry in currently supporting metrics list in monitoring docs. ## How was this patch tested? Check the docs for changes JIRA link: https://issues.apache.org/jira/browse/SPARK-16619 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lovexi/spark yangyang-monitoring-doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14254 commit abacb11005b1fb81832a12558980814021cebae1 Author: Yangyang Liu Date: 2016-07-19T05:47:48Z Add shuffle service metrics entry in docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13778 ping @cloud-fan Can you check if this is good for you now? It is for a while. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r71275696 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl._ +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +class SimplifyCastsSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil + } + + test("non-nullable to non-nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, false))) +val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + test("non-nullable to nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, false))) +val array_intPrimitive = 'a.array(ArrayType(IntegerType, false)) +val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + test("nullable to non-nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, true))) +val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze +val optimized = Optimize.execute(plan) +comparePlans(optimized, plan) + } + + test("nullable to nullable array cast") { +val input = LocalRelation('a.array(ArrayType(IntegerType, true))) +val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze +val optimized = Optimize.execute(plan) +val expected = input.select('a.as("casted")).analyze +comparePlans(optimized, expected) + } + + def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference = --- End diff -- why need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r71275743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { object SimplifyCasts extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { case Cast(e, dataType) if e.dataType == dataType => e +case c @ Cast(e, dataType) => (e.dataType, dataType) match { --- End diff -- cc @yhuai @liancheng, is it always safe to do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14207 > when the data/files are changed by external system (e.g., appended by a streaming system), the stored schema can be inconsistent with the actual schema of the data. I think this problem already exists, as we will use cached schema instead of inferring it everytime. The only difference is after reboot, this PR will still use the stored schema, and require users to refresh table manually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 @gatorsmile Yea. I meant that as you use the stored schema without inferred schema for table, when the data/files are changed by external system (e.g., appended by a streaming system), the stored schema can be inconsistent with the actual schema of the data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable token manager for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14065 **[Test build #62506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62506/consoleFull)** for PR 14065 at commit [`b8eeb28`](https://github.com/apache/spark/commit/b8eeb28b141b678cf4ccace36564f24536758132). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14207 @viirya Schema inference is time-consuming, especially when the number of files is huge. Thus, we should avoid refreshing it every time. That is one of the major reasons why we have a metadata cache for data source tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14253 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14253: [Doc] improve python doc for rdd.histogram
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/14253 [Doc] improve python doc for rdd.histogram ## What changes were proposed in this pull request? doc change only ## How was this patch tested? doc change only You can merge this pull request into a Git repository by running: $ git pull https://github.com/mortada/spark histogram_typos Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14253 commit 979c7f44690c5239f49621733de112ec623e Author: Mortada Mehyar Date: 2016-07-19T05:22:58Z [Doc] improve python doc for rdd.histogram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14222 ping @rxin The change is ok for you? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 @gatorsmile When the data/files are input by an external system, and Spark is just used to process them in batch. Does it mean that schema can be inconsistent? Or it should call refresh every time it is going to query the table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62504/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #62504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)** for PR 13704 at commit [`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14251 Hi, @rxin . Could you review this `Nvl` PR again? I can solve that by only replacing `findTightestCommonTypeOfTwo` into `findTightestCommonTypeToString`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13990#discussion_r71273378 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -393,3 +394,56 @@ case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression override def prettyName: String = "named_struct_unsafe" } + +/** + * Creates a map after splitting the input text into key/value pairs using delimiters + */ +@ExpressionDescription( + usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text " + +"into key/value pairs using delimiters. " + +"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.", + extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n map("a":"1","b":"2","c":"3") """) +case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression) + extends TernaryExpression with CodegenFallback{ + + def this(child: Expression, pairDelim: Expression) = { +this(child, pairDelim, Literal(":")) + } + + def this(child: Expression) = { +this(child, Literal(","), Literal(":")) + } + + override def children: Seq[Expression] = Seq(text, pairDelim, keyValueDelim) + + override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- looks like it's simpler to follow `XPathExtract` to do the type check, i.e. implement `ExpectsInputTypes` to check the type, and override `checkInputDataTypes` for the foldable check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14251 **[Test build #62505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62505/consoleFull)** for PR 14251 at commit [`53ae02f`](https://github.com/apache/spark/commit/53ae02f12d6d4113aa5cddaad2c7b80d902fe95e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71273081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -146,6 +151,15 @@ case class CatalogTable( requireSubsetOfSchema(sortColumnNames, "sort") requireSubsetOfSchema(bucketColumnNames, "bucket") + lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) { --- End diff -- oh, having this is because `CatalogColumn` is using string as the type? I think we should just use `StructType` as the schema and remove `CatalogColumn`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71272934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -303,6 +303,7 @@ object CreateDataSourceTableUtils extends Logging { matcher.matches() } + // TODO: it's only used in tests, remove it. def createDataSourceTable( --- End diff -- If it is not used, how about we just remove it and update the test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13382 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62503/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13382 **[Test build #62503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62503/consoleFull)** for PR 13382 at commit [`0fe4bc8`](https://github.com/apache/spark/commit/0fe4bc8a0232f9e6a4dcb6df76fc3f256b784803). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71272434 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -313,18 +313,48 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[CreateTableUsing]] or a [[CreateTableUsingAsSelect]] logical plan. */ override def visitCreateTableUsing(ctx: CreateTableUsingContext): LogicalPlan = withOrigin(ctx) { -val (table, temp, ifNotExists, external) = visitCreateTableHeader(ctx.createTableHeader) -if (external) { --- End diff -- How about we do not change this for now (when we decide which syntax to use for create table)? We may only support it in the syntax that is compatible with Hive tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62502/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)** for PR 14132 at commit [`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71272290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -146,6 +151,15 @@ case class CatalogTable( requireSubsetOfSchema(sortColumnNames, "sort") requireSubsetOfSchema(bucketColumnNames, "bucket") + lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) { --- End diff -- What is this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14102 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62501/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14102 **[Test build #62501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62501/consoleFull)** for PR 14102 at commit [`cfe6bed`](https://github.com/apache/spark/commit/cfe6beda1a1db64aab5d2f84a68a5ee1e2bdd905). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #62504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)** for PR 13704 at commit [`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62500/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)** for PR 14132 at commit [`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/13382 Cool, @JoshRosen I'll leave this for you to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/13382#discussion_r71266677 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala --- @@ -27,8 +27,8 @@ import org.apache.spark.util.Utils /** * A class for writing JVM objects directly to a file on disk. This class allows data to be appended - * to an existing block and can guarantee atomicity in the case of faults as it allows the caller to - * revert partial writes. + * to an existing block. Callers can write to the same file and commit these writes. + * In case of faults, callers should atomically revert the uncommitted partial writes. --- End diff -- Perhaps elaborate a bit more, e.g. "For efficiency, this class retains the underlying file channel across multiple commits to a file. The channel is kept open until close() is called on DiskBlockObjectWriter." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Sche...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71266605 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala --- @@ -351,6 +353,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog { } /** + * Refresh the inferred schema stored in the external catalog for data source tables. + */ + private def refreshInferredSchema(tableIdent: TableIdentifier): Unit = { +val table = sessionCatalog.getTableMetadataOption(tableIdent) +table.foreach { tableDesc => + if (DDLUtils.isDatasourceTable(tableDesc) && DDLUtils.isSchemaInferred(tableDesc)) { +val partitionColumns = DDLUtils.getPartitionColumnsFromTableProperties(tableDesc) +val bucketSpec = DDLUtils.getBucketSpecFromTableProperties(tableDesc) +val dataSource = + DataSource( +sparkSession, +userSpecifiedSchema = None, +partitionColumns = partitionColumns, +bucketSpec = bucketSpec, +className = tableDesc.properties(CreateDataSourceTableUtils.DATASOURCE_PROVIDER), +options = tableDesc.storage.serdeProperties) +.resolveRelation().asInstanceOf[HadoopFsRelation] + +val schemaProperties = new mutable.HashMap[String, String] +CreateDataSourceTableUtils.saveSchema( + sparkSession, dataSource.schema, dataSource.partitionSchema.fieldNames, schemaProperties) + +val tablePropertiesWithoutSchema = tableDesc.properties.filterKeys { k => + // Keep the properties that are not for schema or partition columns + k != CreateDataSourceTableUtils.DATASOURCE_SCHEMA_NUMPARTS && --- End diff -- Will change it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Sche...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71266596 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -487,6 +487,10 @@ object DDLUtils { isDatasourceTable(table.properties) } + def isSchemaInferred(table: CatalogTable): Boolean = { +table.properties.get(DATASOURCE_SCHEMA_TYPE) == Option(SchemaType.INFERRED.name) --- End diff -- Thanks! @rxin @jaceklaskowski I will not change it because using `contains` will break the Scala 2.10 compiler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10881: [SPARK-12967][Netty] Avoid NettyRpc error message during...
Github user JerryLead commented on the issue: https://github.com/apache/spark/pull/10881 This bug still exists in latest Spark 1.6.2. How about merging it to branch-1.6? @nishkamravi2 @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13382 **[Test build #62503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62503/consoleFull)** for PR 13382 at commit [`0fe4bc8`](https://github.com/apache/spark/commit/0fe4bc8a0232f9e6a4dcb6df76fc3f256b784803). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user dafrista commented on the issue: https://github.com/apache/spark/pull/13382 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user dafrista commented on the issue: https://github.com/apache/spark/pull/13382 Thanks @ericl. I pushed a commit addressing your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)** for PR 14132 at commit [`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Right, there is `table` API, too. Thank you, I'll add that, too. By the way, I still in the downtown. I need to go home for dinner. I'll take care that tonight. Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14054#discussion_r71265164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -158,25 +159,41 @@ object JdbcUtils extends Logging { rddSchema: StructType, nullTypes: Array[Int], batchSize: Int, - dialect: JdbcDialect): Iterator[Byte] = { + dialect: JdbcDialect, + isolationLevel: Int): Iterator[Byte] = { require(batchSize >= 1, s"Invalid value `${batchSize.toString}` for parameter " + s"`${JdbcUtils.JDBC_BATCH_INSERT_SIZE}`. The minimum value is 1.") val conn = getConnection() var committed = false -val supportsTransactions = try { - conn.getMetaData().supportsDataManipulationTransactionsOnly() || - conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions() -} catch { - case NonFatal(e) => -logWarning("Exception while detecting transaction support", e) -true + +var finalIsolationLevel = Connection.TRANSACTION_NONE +if (isolationLevel != Connection.TRANSACTION_NONE) { + try { +val metadata = conn.getMetaData +if (metadata.supportsTransactions()) { + if (metadata.supportsTransactionIsolationLevel(isolationLevel)) { +finalIsolationLevel = isolationLevel + } else { +val defaultIsolation = metadata.getDefaultTransactionIsolation +logWarning(s"Requested isolation level $isolationLevel is not supported; " + +s"falling back to isolation level $defaultIsolation") +finalIsolationLevel = defaultIsolation + } +} else { + logWarning(s"Requested isolation level $isolationLevel, but transactions are unsupported") +} + } catch { +case NonFatal(e) => logWarning("Exception while detecting transaction support", e) + } } +val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE --- End diff -- Yeah, if possible, the default isolation needs to be consistent. Otherwise, it might be hard to debug if users hit the issue in the production environment. Sometimes, the problem is hard to reproduce it especially for the isolation-related issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71265020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1774,6 +1775,51 @@ class Analyzer( } /** + * Substitute Hints. + * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. + */ + object SubstituteHints extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case logical: LogicalPlan => logical transformDown { +case h @ Hint(name, parameters, child) +if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) => + var resolvedChild = child + + for (param <- parameters) { +val names = param.split("\\.") +val tid = if (names.length > 1) { + TableIdentifier(names(1), Some(names(0))) +} else { + TableIdentifier(param, None) +} +try { + catalog.lookupRelation(tid) + + var stop = false + resolvedChild = resolvedChild.transformDown { +case r @ BroadcastHint(SubqueryAlias(t, _)) + if !stop && resolver(t, tid.identifier) => + stop = true + r +case r @ SubqueryAlias(t, _) if !stop && resolver(t, tid.identifier) => + stop = true + BroadcastHint(r) --- End diff -- I think we have to remove it; otherwise, the result will be wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14132 For your reference, below is a simple case if users want to do it using dataframe ```Scala sql("CREATE TABLE tab1(c1 int)") val df = spark.read.table("tab1") df.join(broadcast(df)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Yep. I made that case. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 What I mean is currently how to broadcast the Hive table `tab1`? I'm making the testcase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14132 Is it related? This is the most basic test case, right? ```SQL CREATE TABLE tab1(c1 int) select * from tab1, tab1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan Comment addressed, test passed ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Does this work in `DataFrame` API, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14132 Not all the joins have the operators `SubqueryAlias`. For example, below is a self join against Hive tables: ``` == Analyzed Logical Plan == c1: int, c1: int Project [c1#7, c1#8] +- Join Inner :- MetastoreRelation default, tab1 +- MetastoreRelation default, tab1 ``` Thus, the current solution does not work, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/13382 This LGTM with some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/13382#discussion_r71262947 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala --- @@ -46,102 +46,145 @@ private[spark] class DiskBlockObjectWriter( extends OutputStream with Logging { + /** + * Guards against close calls, e.g. from a wrapping stream. + * Call manualClose to close the stream that was extended by this trait. + */ + private trait ManualCloseOutputStream extends OutputStream { +abstract override def close(): Unit = { + flush() +} + +def manualClose(): Unit = { + super.close() +} + } + /** The file channel, used for repositioning / truncating the file. */ private var channel: FileChannel = null + private var mcs: ManualCloseOutputStream = null private var bs: OutputStream = null private var fos: FileOutputStream = null private var ts: TimeTrackingOutputStream = null private var objOut: SerializationStream = null private var initialized = false + private var streamOpen = false private var hasBeenClosed = false - private var commitAndCloseHasBeenCalled = false /** * Cursors used to represent positions in the file. * - * ||--- | - * ^^ ^ - * ||finalPosition - * | reportedPosition - * initialPosition + * ||---| + * ^ ^ + * |committedPosition + * reportedPosition * - * initialPosition: Offset in the file where we start writing. Immutable. * reportedPosition: Position at the time of the last update to the write metrics. - * finalPosition: Offset where we stopped writing. Set on closeAndCommit() then never changed. + * committedPosition: Offset after last committed write. * -: Current writes to the underlying file. * x: Existing contents of the file. */ - private val initialPosition = file.length() - private var finalPosition: Long = -1 - private var reportedPosition = initialPosition + private var committedPosition = file.length() + private var reportedPosition = committedPosition /** * Keep track of number of records written and also use this to periodically * output bytes written since the latter is expensive to do for each record. */ private var numRecordsWritten = 0 + private def initialize(): Unit = { +fos = new FileOutputStream(file, true) +channel = fos.getChannel() +ts = new TimeTrackingOutputStream(writeMetrics, fos) +class ManualCloseBufferedOutputStream + extends BufferedOutputStream(ts, bufferSize) with ManualCloseOutputStream +mcs = new ManualCloseBufferedOutputStream + } + def open(): DiskBlockObjectWriter = { if (hasBeenClosed) { throw new IllegalStateException("Writer already closed. Cannot be reopened.") } -fos = new FileOutputStream(file, true) -ts = new TimeTrackingOutputStream(writeMetrics, fos) -channel = fos.getChannel() -bs = compressStream(new BufferedOutputStream(ts, bufferSize)) +if (!initialized) { + initialize() + initialized = true +} +bs = compressStream(mcs) objOut = serializerInstance.serializeStream(bs) -initialized = true +streamOpen = true this } - override def close() { + /** + * Close and cleanup all resources. + * Should call after committing or reverting partial writes. + */ + private def closeResources(): Unit = { if (initialized) { - Utils.tryWithSafeFinally { -if (syncWrites) { - // Force outstanding writes to disk and track how long it takes - objOut.flush() - val start = System.nanoTime() - fos.getFD.sync() - writeMetrics.incWriteTime(System.nanoTime() - start) -} - } { -objOut.close() - } - + mcs.manualClose() channel = null + mcs = null bs = null fos = null ts = null objOut = null initialized = false + streamOpen = false hasBeenClosed = true } } - def isOpen: Boolean = objOut != null + /** + * Commits any remaining partial writes and closes resources. + */ + override def close() { +if (initialized) { + Utils.tryWithSafeFinally { +commit() + } { +closeResources() + }
[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/13382#discussion_r71262912 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala --- @@ -46,102 +46,145 @@ private[spark] class DiskBlockObjectWriter( extends OutputStream with Logging { + /** + * Guards against close calls, e.g. from a wrapping stream. + * Call manualClose to close the stream that was extended by this trait. --- End diff -- Could you also update the class-level comment to note the commit-and-resume behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org