[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21260 @felixcheung This feature was discussed and this PR was started before https://issues.apache.org/jira/browse/SPARK-24434 was even brought up. Being able to mount commonly used types of volumes seems super useful for some users, so it might make sense to accept it while https://issues.apache.org/jira/browse/SPARK-24434 is still going through design review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200828393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -344,6 +344,36 @@ case class Join( } } +/** + * Append data to an existing DataSourceV2 table. + */ +case class AppendData( +table: LogicalPlan, --- End diff -- Then seems that above code comment can be updated? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21664: [SPARK-24678][CORE] NoClassDefFoundError will not...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21664#discussion_r200828384 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1049,6 +1049,13 @@ class DAGScheduler( abortStage(stage, s"Task serialization failed: $e\n${Utils.exceptionString(e)}", Some(e)) runningStages -= stage return + + case e: NoClassDefFoundError => --- End diff -- Actually,i will cause job hung since the state never update. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21260 @skonto is it better to generalize the approach to match the one in https://issues.apache.org/jira/browse/SPARK-24435? not sure if @mccheah @foxish @erikerlandson have any last thought --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21552: [SPARK-24544][SQL] Print actual failure cause when look ...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21552 @maropu May be i will do this check?As @cloud-fan mentioned. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21552: [SPARK-24544][SQL] Print actual failure cause whe...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21552#discussion_r200828333 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -131,6 +132,8 @@ private[sql] class HiveSessionCatalog( Try(super.lookupFunction(funcName, children)) match { case Success(expr) => expr case Failure(error) => +logWarning(s"Encounter a failure during looking up function:" + + s" ${Utils.exceptionString(error)}") if (functionRegistry.functionExists(funcName)) { --- End diff -- @viirya Thanks, i will set up the cause for `NoSuchFunctionException ` later --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #92715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92715/testReport)** for PR 21608 at commit [`06a275b`](https://github.com/apache/spark/commit/06a275b92646f3ccdfa8dbc29af5cfd82f518007). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92713/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #92713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92713/testReport)** for PR 21608 at commit [`f9b382d`](https://github.com/apache/spark/commit/f9b382d9bb3d9d722a6afe7b36a44d9764b0145a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 @jiangxb1987 Any closure sensitive to iteration order [1] is effected by this - under the set of circumstances. If we cannot solve it in a principled manner (make shuffle repeatable which I believe you have investigated and found to be difficult ?) - next best thing until we have a performant solution, would be to expose it to user's and have them deal with it (which is what I did, for example) - with hints on how to accomplish it. The proposed solution will cause cascading failures for non trivial applications (chain of shuffles) - and also introduce high cost - and can unfortunately cause application failures and unpredictable SLA's. [1] I gave example of zip* and sampling, but really - any user defined closure is affected; and we cannot special case for all of them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92714/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21728 **[Test build #92714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92714/testReport)** for PR 21728 at commit [`194991b`](https://github.com/apache/spark/commit/194991b0e8f6375ede6b615813974bbcf75ef036). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21707: Update for spark 2.2.2 release
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21707#discussion_r200826322 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -160,7 +160,7 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { object PROCESS_TABLES extends QueryTest with SQLTestUtils { // Tests the latest version of every release line. - val testingVersions = Seq("2.0.2", "2.1.2", "2.2.1") + val testingVersions = Seq("2.0.2", "2.1.2", "2.2.2") --- End diff -- @tgravescs . Could you replace 2.1.2 with 2.1.3, too? cc @vanzin (2.1.3 release manager). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r200826235 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -160,11 +160,29 @@ case class SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo) extends * Periodic updates from executors. * @param execId executor id * @param accumUpdates sequence of (taskId, stageId, stageAttemptId, accumUpdates) + * @param executorUpdates executor level metrics updates */ @DeveloperApi case class SparkListenerExecutorMetricsUpdate( execId: String, -accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])]) +accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])], +executorUpdates: Option[Array[Long]] = None) + extends SparkListenerEvent + +/** + * Peak metric values for the executor for the stage, written to the history log at stage + * completion. + * @param execId executor id + * @param stageId stage id + * @param stageAttemptId stage attempt + * @param executorMetrics executor level metrics, indexed by MetricGetter.values + */ +@DeveloperApi +case class SparkListenerStageExecutorMetrics( +execId: String, +stageId: Int, +stageAttemptId: Int, +executorMetrics: Array[Long]) --- End diff -- We can change back to using an ExecutorMetrics class in this case. The plan was for any new metrics to be added to the end, so that there wouldn't be any change in ordering, and executorMetrics could be changed to immutable Seq[Long], but there would still be the issue of having to reference MetricGetter to find out how the metrics are indexed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S][WIP] Initial R Bindings for SparkR on...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21584 IMO it's fine we have one version supported in the image and stick with that. the tricky thing is having maintainers to keep updating/testing the newer versions in the images (we have history of not able to keep up) would it be possible for the integration test to build the image running `docker-image-tool.sh -m -t testing build `, and then run the integration test with it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200825129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val cls = DataSource.lookupDataSource(source, df.sparkSession.sessionState.conf) if (classOf[DataSourceV2].isAssignableFrom(cls)) { - val ds = cls.newInstance() - ds match { + val source = cls.newInstance().asInstanceOf[DataSourceV2] + source match { case ws: WriteSupport => - val options = new DataSourceOptions((extraOptions ++ -DataSourceV2Utils.extractSessionConfigs( - ds = ds.asInstanceOf[DataSourceV2], - conf = df.sparkSession.sessionState.conf)).asJava) - // Using a timestamp and a random UUID to distinguish different writing jobs. This is good - // enough as there won't be tons of writing jobs created at the same second. - val jobId = new SimpleDateFormat("MMddHHmmss", Locale.US) -.format(new Date()) + "-" + UUID.randomUUID() - val writer = ws.createWriter(jobId, df.logicalPlan.schema, mode, options) - if (writer.isPresent) { + val options = extraOptions ++ + DataSourceV2Utils.extractSessionConfigs(source, df.sparkSession.sessionState.conf) + + val relation = DataSourceV2Relation.create(source, options.toMap) + if (mode == SaveMode.Append) { runCommand(df.sparkSession, "save") { - WriteToDataSourceV2(writer.get(), df.logicalPlan) + AppendData.byName(relation, df.logicalPlan) +} + + } else { +val writer = ws.createWriter( + UUID.randomUUID.toString, df.logicalPlan.output.toStructType, mode, --- End diff -- How would random UUIDs conflict? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824906 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java --- @@ -38,15 +38,16 @@ * If this method fails (by throwing an exception), the action will fail and no Spark job will be * submitted. * - * @param jobId A unique string for the writing job. It's possible that there are many writing - * jobs running at the same time, and the returned {@link DataSourceWriter} can - * use this job id to distinguish itself from other jobs. + * @param writeUUID A unique string for the writing job. It's possible that there are many writing --- End diff -- This is not the ID of the Spark job that is writing. I think the UUID name is more clear about what is actually passed, a unique string that identifies the write. There's also no need to make the string more complicated than a UUID since there are no guarantees about it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table: NamedRelation, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" + None + +case Some(inAttr) if !outAttr.dataType.sameType(inAttr.dataType) => + Some(upcast(inAttr, outAttr)) + +case Some(inAttr) => + Some(inAttr) // matches nullability, datatype, and name + +case _ => + errors += s"Cannot find data for output column '${outAttr.name}'" + None + } +} + + } else { +if (expected.size > query.output.size) { --- End diff -- That check is the other direction: not enough columns. When matching by position, we need to have the same number of columns so we add this check (we already know that there aren't too few columns, so this checks for too many). When matching by name, we can call out specific columns that are missing, which is why we do the validation differently for the two cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table: NamedRelation, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" --- End diff -- I would much rather have a job fail fast and give a clear error message than to fail during a write. I can see how adding such an assertion to the plan could be useful, so I'd consider it if someone wanted to add that feature later. Right now, though, I think this is good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824602 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table: NamedRelation, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" + None + +case Some(inAttr) if !outAttr.dataType.sameType(inAttr.dataType) => --- End diff -- Yes, I'll update to check nested fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/750/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21728 **[Test build #92714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92714/testReport)** for PR 21728 at commit [`194991b`](https://github.com/apache/spark/commit/194991b0e8f6375ede6b615813974bbcf75ef036). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r200824532 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2120,6 +2122,99 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table: NamedRelation, query, isByName) --- End diff -- Yes, I agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21728: [SPARK-24759] [SQL] No reordering keys for broadc...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/21728 [SPARK-24759] [SQL] No reordering keys for broadcast hash join ## What changes were proposed in this pull request? As the implementation of the broadcast hash join is independent of the input hash partitioning, reordering keys is not necessary. Thus, we solve this issue by simply removing the broadcast hash join from the reordering rule in EnsureRequirements. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark cleanER Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21728 commit 194991b0e8f6375ede6b615813974bbcf75ef036 Author: Xiao Li Date: 2018-07-07T23:06:39Z remove BroadcastHashJoinExec from reorderJoinPredicates --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r200824504 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.catalyst.TableIdentifier; +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException; +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException; +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.types.StructType; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; + +public interface TableCatalog { + /** + * Load table metadata by {@link TableIdentifier identifier} from the catalog. + * + * @param ident a table identifier + * @return the table's metadata + * @throws NoSuchTableException If the table doesn't exist. + */ + Table loadTable(TableIdentifier ident) throws NoSuchTableException; + + /** + * Create a table in the catalog. + * + * @param ident a table identifier + * @param schema the schema of the new table, as a struct type + * @return metadata for the new table + * @throws TableAlreadyExistsException If a table already exists for the identifier + */ + default Table createTable(TableIdentifier ident, +StructType schema) throws TableAlreadyExistsException { +return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap()); + } + + /** + * Create a table in the catalog. + * + * @param ident a table identifier + * @param schema the schema of the new table, as a struct type + * @param properties a string map of table properties + * @return metadata for the new table + * @throws TableAlreadyExistsException If a table already exists for the identifier + */ + default Table createTable(TableIdentifier ident, +StructType schema, +Map properties) throws TableAlreadyExistsException { +return createTable(ident, schema, Collections.emptyList(), properties); + } + + /** + * Create a table in the catalog. + * + * @param ident a table identifier + * @param schema the schema of the new table, as a struct type + * @param partitions a list of expressions to use for partitioning data in the table + * @param properties a string map of table properties + * @return metadata for the new table + * @throws TableAlreadyExistsException If a table already exists for the identifier + */ + Table createTable(TableIdentifier ident, +StructType schema, +List partitions, --- End diff -- I wouldn't say this way of passing partitioning is a new feature. It's just a generalization of the existing partitioning that allows us to pass any type of partition, whether it is bucketing or column-based. As for open discussion, this was proposed in the SPIP that was fairly widely read and commented on. That SPIP was posted to the dev list a few times, too. I do appreciate you wanting to make sure there's been a chance for the community to discuss it, but there has been plenty of opportunity to comment. At this point, I think it's reasonable to move forward with the implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #92713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92713/testReport)** for PR 21608 at commit [`f9b382d`](https://github.com/apache/spark/commit/f9b382d9bb3d9d722a6afe7b36a44d9764b0145a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r200824025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,26 @@ object CommandUtils extends Logging { } } - def calculateTotalSize(sessionState: SessionState, catalogTable: CatalogTable): BigInt = { +def calculateTotalSize(spark: SparkSession, catalogTable: CatalogTable): BigInt = { + +val sessionState = spark.sessionState +val stagingDir = sessionState.conf.getConfString("hive.exec.stagingdir", ".hive-staging") + if (catalogTable.partitionColumnNames.isEmpty) { - calculateLocationSize(sessionState, catalogTable.identifier, catalogTable.storage.locationUri) + calculateLocationSize(sessionState, catalogTable.identifier, + catalogTable.storage.locationUri) } else { // Calculate table size as a sum of the visible partitions. See SPARK-21079 val partitions = sessionState.catalog.listPartitions(catalogTable.identifier) - partitions.map { p => -calculateLocationSize(sessionState, catalogTable.identifier, p.storage.locationUri) - }.sum + val paths = partitions.map(x => new Path(x.storage.locationUri.get.getPath)) + val pathFilter = new PathFilter { +override def accept(path: Path): Boolean = { + !path.getName.startsWith(stagingDir) +} + } + val fileStatusSeq = InMemoryFileIndex.bulkListLeafFiles(paths, +sessionState.newHadoopConf(), pathFilter, spark).flatMap(x => x._2) --- End diff -- Thank you, I have made the changes. Can you review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21727: [SPARK-24757][SQL] Improving the error message fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21727 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21727 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92712/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21727 **[Test build #92712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92712/testReport)** for PR 21727 at commit [`86587ed`](https://github.com/apache/spark/commit/86587edca7c1345b8a3687877b598d8389fbd56b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21667 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92711/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21667 **[Test build #92711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92711/testReport)** for PR 21667 at commit [`7266611`](https://github.com/apache/spark/commit/7266611b243000868c81f4538dd850c394fe7c20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...
Github user tengpeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r200819285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,39 +240,47 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val cls = DataSource.lookupDataSource(source, df.sparkSession.sessionState.conf) if (classOf[DataSourceV2].isAssignableFrom(cls)) { val ds = cls.newInstance() - ds match { -case ws: WriteSupport => - val options = new DataSourceOptions((extraOptions ++ -DataSourceV2Utils.extractSessionConfigs( - ds = ds.asInstanceOf[DataSourceV2], - conf = df.sparkSession.sessionState.conf)).asJava) - // Using a timestamp and a random UUID to distinguish different writing jobs. This is good - // enough as there won't be tons of writing jobs created at the same second. - val jobId = new SimpleDateFormat("MMddHHmmss", Locale.US) -.format(new Date()) + "-" + UUID.randomUUID() - val writer = ws.createWriter(jobId, df.logicalPlan.schema, mode, options) - if (writer.isPresent) { -runCommand(df.sparkSession, "save") { - WriteToDataSourceV2(writer.get(), df.logicalPlan) -} - } + val (needToFallBackFileDataSourceV2, fallBackFileFormat) = ds match { +case f: FileDataSourceV2 => + val disabledV2Readers = + df.sparkSession.sessionState.conf.disabledV2FileDataSourceWriter.split(",") + (disabledV2Readers.contains(f.shortName), f.fallBackFileFormat.getCanonicalName) +case _ => (false, source) + } -// Streaming also uses the data source V2 API. So it may be that the data source implements -// v2, but has no v2 implementation for batch writes. In that case, we fall back to saving -// as though it's a V1 source. -case _ => saveToV1Source() + if (ds.isInstanceOf[WriteSupport] && !needToFallBackFileDataSourceV2) { +val options = new DataSourceOptions((extraOptions ++ + DataSourceV2Utils.extractSessionConfigs( +ds = ds.asInstanceOf[DataSourceV2], +conf = df.sparkSession.sessionState.conf)).asJava) +// Using a timestamp and a random UUID to distinguish different writing jobs. This is good +// enough as there won't be tons of writing jobs created at the same second. +val jobId = new SimpleDateFormat("MMddHHmmss", Locale.US) + .format(new Date()) + "-" + UUID.randomUUID() +val writer = ds.asInstanceOf[WriteSupport] + .createWriter(jobId, df.logicalPlan.schema, mode, options) --- End diff -- I am not sure I understand this: why do use `.createWriter` here, but we do not use `.createReader` in `DataFrameReader`. It seems "unsymmetrical" to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92710/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21556 **[Test build #92710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92710/testReport)** for PR 21556 at commit [`7128539`](https://github.com/apache/spark/commit/712853999442a611ce7b97db8dad43039268573e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92708/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21439 **[Test build #92708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92708/testReport)** for PR 21439 at commit [`f3efb1b`](https://github.com/apache/spark/commit/f3efb1b97f9366839eacbc2611e39013f8c1fcfc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21698 Thank you for your comments @mridulm ! We focus on resolving the RDD.repartition() correctness issue here in this PR, because it is most commonly used, and that we can still address the RDD.zip* issue using the similar approach. At first I was worried that the changes may be huge and trying to address the correctness issue for multiple operations may make it difficult to backport the PR. But now it turns out that the PR didn't change that much code, so maybe I can consider include the RDD.zip* fix in this PR too. Since you are also deeply involved in the related discussion on the correctness issue caused by non-deterministic input for shuffle, you may also agree that there is actually no easy way to both guarantee correctness and don't cause serious performance drop-off. I have to insist that correctness always goes beyond performance concerns, and that we shall not expect users to always remember they may hit a known correctness bug in case of some use patterns. As for the proposed solution, there are actually two ways to follow: Either you insert a local sort before a shuffle repartition (that's how we fixed the DataFrame.repartition()), or you always retry the whole stage with repartition on FetchFailure. The problem with the local-sort solution is that, it can't fix all the problems for RDD (since the data type of an RDD can be not sortable, and it's hard to construct a sorting for a generic type), also it can make the time consumption of repartition() increases by 3X ~ 5X. By applying the approach proposed in this PR, the performance shall keep the same in case no FetchFailure happens, and it shall works well for DataFrames as well as for RDDs. I have to admit that if you have a big query running on a huge cluster, and the tasks can easily hit FetchFailure issues, then you may see the job takes more time to finish (or even fall due to reach max consequence stage failure limit). But again, your big query may be producing wrong result without a patch, and I have to say that is even more unacceptable :( . As for the cascading cost, you are right, it makes things worse, and I don't have good advice for that issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21727 **[Test build #92712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92712/testReport)** for PR 21727 at commit [`86587ed`](https://github.com/apache/spark/commit/86587edca7c1345b8a3687877b598d8389fbd56b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21727 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21727 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21727 @hvanhovell Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21727: [SPARK-24757][SQL] Improving the error message fo...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21727 [SPARK-24757][SQL] Improving the error message for broadcast timeouts ## What changes were proposed in this pull request? In the PR, I propose to provide a tip to user how to resolve the issue of timeout expiration for broadcast joins. In particular, they can increase the timeout via **spark.sql.broadcastTimeout** or disable the broadcast at all by setting **spark.sql.autoBroadcastJoinThreshold** to `-1`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 broadcast-timeout-error Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21727.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21727 commit 0d0b3f34a90469ba894a207639456b4b815a90e8 Author: Maxim Gekk Date: 2018-07-07T14:46:21Z Improving the error message for broadcast timeouts I added a recommendation for increasing broadcast timeout. This sentence is added to existing error message: ``` You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} ``` Author: Maxim Gekk Closes #2801 from MaxGekk/broadcast-error-message. commit 86587edca7c1345b8a3687877b598d8389fbd56b Author: Maxim Gekk Date: 2018-07-07T15:57:36Z Remove empty line in imports --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21667 **[Test build #92711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92711/testReport)** for PR 21667 at commit [`7266611`](https://github.com/apache/spark/commit/7266611b243000868c81f4538dd850c394fe7c20). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21667 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/749/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21192 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92707/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21192 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21192 **[Test build #92707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92707/testReport)** for PR 21192 at commit [`5384c07`](https://github.com/apache/spark/commit/5384c073a0761dbe24ec52e9474d618535ad8f69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92705/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #92705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92705/testReport)** for PR 21669 at commit [`13b3adc`](https://github.com/apache/spark/commit/13b3adc5ffb55fbfd6572089b1f54e8bca393494). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92706/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92706/testReport)** for PR 21657 at commit [`d5921f0`](https://github.com/apache/spark/commit/d5921f08d8efa00f64f01d005e843291568c1e80). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r200813391 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -82,6 +120,30 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) + +case ParquetSchemaType(DECIMAL, INT32, decimal) if pushDownDecimal => --- End diff -- Add check method to `canMakeFilterOn` and add a test case: ```scala val decimal = new JBigDecimal(10).setScale(scale) assert(decimal.scale() === scale) assertResult(Some(lt(intColumn("cdecimal1"), 1000: Integer))) { parquetFilters.createFilter(parquetSchema, sources.LessThan("cdecimal1", decimal)) } val decimal1 = new JBigDecimal(10).setScale(scale + 1) assert(decimal1.scale() === scale + 1) assertResult(None) { parquetFilters.createFilter(parquetSchema, sources.LessThan("cdecimal1", decimal1)) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21556 **[Test build #92710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92710/testReport)** for PR 21556 at commit [`7128539`](https://github.com/apache/spark/commit/712853999442a611ce7b97db8dad43039268573e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/748/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92709/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21690 **[Test build #92709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92709/testReport)** for PR 21690 at commit [`d1a8c60`](https://github.com/apache/spark/commit/d1a8c605e163bc09d1329cbd90560cc5165de555). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21146: [SPARK-23654][BUILD] remove jets3t as a dependenc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21146#discussion_r200812101 --- Diff: dev/deps/spark-deps-hadoop-2.6 --- @@ -21,8 +21,6 @@ automaton-1.11-8.jar avro-1.7.7.jar avro-ipc-1.7.7.jar avro-mapred-1.7.7-hadoop2.jar -base64-2.3.8.jar -bcprov-jdk15on-1.58.jar --- End diff -- Sounds like it's already been removed, so the other 'remnants' should go. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21690 **[Test build #92709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92709/testReport)** for PR 21690 at commit [`d1a8c60`](https://github.com/apache/spark/commit/d1a8c605e163bc09d1329cbd90560cc5165de555). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21439 **[Test build #92708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92708/testReport)** for PR 21439 at commit [`f3efb1b`](https://github.com/apache/spark/commit/f3efb1b97f9366839eacbc2611e39013f8c1fcfc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21192 **[Test build #92707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92707/testReport)** for PR 21192 at commit [`5384c07`](https://github.com/apache/spark/commit/5384c073a0761dbe24ec52e9474d618535ad8f69). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/747/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92706/testReport)** for PR 21657 at commit [`d5921f0`](https://github.com/apache/spark/commit/d5921f08d8efa00f64f01d005e843291568c1e80). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21657 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/746/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/746/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/746/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92704/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92704/testReport)** for PR 21657 at commit [`d5921f0`](https://github.com/apache/spark/commit/d5921f08d8efa00f64f01d005e843291568c1e80). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #92705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92705/testReport)** for PR 21669 at commit [`13b3adc`](https://github.com/apache/spark/commit/13b3adc5ffb55fbfd6572089b1f54e8bca393494). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user yuanboliu commented on the issue: https://github.com/apache/spark/pull/21690 @koeninger Thanks for your reply. Agree with you. there is no need to to use pause repeatedly. This is my test without any pause, and the app master stuck for a long time without any process ![wechatworkscreenshot_abb443bd-97db-48f9-88a2-e45a65617f80](https://user-images.githubusercontent.com/5643344/42409693-b324d45e-8210-11e8-96eb-39fc359b1b42.png) I will update my patch shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user baluchicken commented on the issue: https://github.com/apache/spark/pull/21260 Nice work, only one thing I would consider including a StorageClass name option for the PersistentVolumeClaim volume type which defaults to an empty string. Without that the PVC will always use the default StorageClass which may not exists in all scenarios. Thus the pod will remain in pending state indefinitely. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 I did not go over the PR itself in detail, but the proposal sounds very expensive - particularly given the cascading costs involved. Also, I am not sure why we are special case'ing only coalasce/repartition here : any closure which is depending on ordering of tuples is bound to fail - for example, RDD.zip* variants, sampling in ML etc will suffer from same issue. Either we fix shuffle itself to become deterministic (which I am not sure if we can do efficiently), or we could simply document this issue with coalasce/other relevant api - so that users do a sort when applicable : when they deem the additional cost is required to be borne. Note that in a lot of cases, this is not an issue - for example when reading from external data stores, checkpointed data, persisted data, etc : which typically are reasons why coalasce gets used a lot (to minimize number of partitions). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/21583 @foxish The problem is the same as described here: https://github.com/moby/moby/issues/32457 which should have been solved in `Docker 17.05`. As such, this is prompted by a deprecated version Docker and I am waiting for update to happen on the Jenkins nodes (as locally this works perfectly fine with the newest version of Docker). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r200805499 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -160,11 +160,29 @@ case class SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo) extends * Periodic updates from executors. * @param execId executor id * @param accumUpdates sequence of (taskId, stageId, stageAttemptId, accumUpdates) + * @param executorUpdates executor level metrics updates */ @DeveloperApi case class SparkListenerExecutorMetricsUpdate( execId: String, -accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])]) +accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])], +executorUpdates: Option[Array[Long]] = None) + extends SparkListenerEvent + +/** + * Peak metric values for the executor for the stage, written to the history log at stage + * completion. + * @param execId executor id + * @param stageId stage id + * @param stageAttemptId stage attempt + * @param executorMetrics executor level metrics, indexed by MetricGetter.values + */ +@DeveloperApi +case class SparkListenerStageExecutorMetrics( +execId: String, +stageId: Int, +stageAttemptId: Int, +executorMetrics: Array[Long]) --- End diff -- We cannot expose an array of long's in a developer api (mutability). In addition, we cannot have users needing to reference private spark api's to understand the meaning of it - particularly when the ordering can be subject to change in subsequent versions of spark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21653: [SPARK-13343] speculative tasks that didn't commi...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/21653#discussion_r200805005 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -723,6 +723,13 @@ private[spark] class TaskSetManager( def handleSuccessfulTask(tid: Long, result: DirectTaskResult[_]): Unit = { val info = taskInfos(tid) val index = info.index +// Check if any other attempt succeeded before this and this attempt has not been handled +if (successful(index) && killedByOtherAttempt(index)) { --- End diff -- For completeness, we will also need to 'undo' the changes in `enqueueSuccessfulTask` : to reverse the counters in `canFetchMoreResults`. (Orthogonal to this PR): Looking at use of `killedByOtherAttempt`, I see that there is a bug in `executorLost` w.r.t how it is updated - hopefully a fix for SPARK-24755 wont cause issues here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92704/testReport)** for PR 21657 at commit [`d5921f0`](https://github.com/apache/spark/commit/d5921f08d8efa00f64f01d005e843291568c1e80). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/745/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21657 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92703/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92703/testReport)** for PR 21657 at commit [`d5921f0`](https://github.com/apache/spark/commit/d5921f08d8efa00f64f01d005e843291568c1e80). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org