[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84453/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84453/testReport)** for PR 19717 at commit [`347ed69`](https://github.com/apache/spark/commit/347ed697e3f17546af83a8cd8a9578527cf65384). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19871: [SPARK-20728][SQL] Make OrcFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19871 **[Test build #84461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84461/testReport)** for PR 19871 at commit [`7fac88f`](https://github.com/apache/spark/commit/7fac88f3a25122d4445f2491dcfc3ebc34749186). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r154866923 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions._ + +/** + * Defines APIs used in expression code generation. + */ +object ExpressionCodegen { + + /** + * Given an expression, returns the all necessary parameters to evaluate it, so the generated + * code of this expression can be split in a function. + * The 1st string in returned tuple is the parameter strings used to call the function. + * The 2nd string in returned tuple is the parameter strings used to declare the function. + * + * Returns `None` if it can't produce valid parameters. + * + * Params to include: + * 1. Evaluated columns referred by this, children or deferred expressions. + * 2. Rows referred by this, children or deferred expressions. + * 3. Eliminated subexpressions referred bu children expressions. + */ + def getExpressionInputParams( + ctx: CodegenContext, + expr: Expression): Option[(Seq[String], Seq[String])] = { +val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr) +val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, inputVars) + +val subExprs = getSubExprInChildren(ctx, expr) +val subExprCodes = getSubExprCodes(ctx, subExprs) +val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, subExprCodes) --- End diff -- If I get rid of the part of extracting subexpressions as parameters, some tests will be failed. Because hash aggregation uses subexpression elimination under wholestage codegen, this PR enables splitting expression codes under wholestage codegen, so it inevitably needs to include subexpression parameters. This PR doesn't yet to support `splitExpressions` in wholestage codegen. This is only applied to split codes of deeply nested expression now. But the added API can be applied to `splitExpressions` later. That is a TODO. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19871: [SPARK-20728][SQL] Make OrcFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19871 **[Test build #84460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84460/testReport)** for PR 19871 at commit [`e3f6f75`](https://github.com/apache/spark/commit/e3f6f75084006a0ceb1c2e50c47819bc9ae8e463). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154865943 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- And for here, I added the following to prevent `Multiple sources found`. Last time, I missed this way. My bad. ``` + case name if name.equalsIgnoreCase("orc") && +conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "hive" => +"org.apache.spark.sql.hive.orc.OrcFileFormat" ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19882 It has more lines, doesn't it? In any way, we need helper functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r154865049 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions._ + +/** + * Defines APIs used in expression code generation. + */ +object ExpressionCodegen { + + /** + * Given an expression, returns the all necessary parameters to evaluate it, so the generated + * code of this expression can be split in a function. + * The 1st string in returned tuple is the parameter strings used to call the function. + * The 2nd string in returned tuple is the parameter strings used to declare the function. + * + * Returns `None` if it can't produce valid parameters. + * + * Params to include: + * 1. Evaluated columns referred by this, children or deferred expressions. + * 2. Rows referred by this, children or deferred expressions. + * 3. Eliminated subexpressions referred bu children expressions. + */ + def getExpressionInputParams( + ctx: CodegenContext, + expr: Expression): Option[(Seq[String], Seq[String])] = { +val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr) +val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, inputVars) + +val subExprs = getSubExprInChildren(ctx, expr) +val subExprCodes = getSubExprCodes(ctx, subExprs) +val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, subExprCodes) --- End diff -- I think supporting subexpression elimination in `splitExpressions` really worth an individual JIRA, can we do it in a new PR? This PR should focus on supporting `splitExpressions` in whole stage codegen. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154864703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- sounds good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19889: [SPARK-22690][ML] Imputer inherit HasOutputCols
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19889 No other algs output multi-column for now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19882 > OrcTest will provide common helper functions and def format: String. Instead of having `def format: String`, can we just add `beforeAll` and `afterAll` in the test suites to set the `ORC_IMPLEMENTATION`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154863367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -280,7 +280,7 @@ object TypeCoercion { */ object WidenSetOperationTypes extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case p if p.analyzed => p --- End diff -- In which cases, we should still use the `analyzed ` flag? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154863115 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -280,7 +280,7 @@ object TypeCoercion { */ object WidenSetOperationTypes extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case p if p.analyzed => p --- End diff -- Sorry? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19889: [SPARK-22690][ML] Imputer inherit HasOutputCols
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19889 **[Test build #84459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84459/testReport)** for PR 19889 at commit [`5881b88`](https://github.com/apache/spark/commit/5881b88cfcee0989ef5e715ce4e243b501b39ec7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154863020 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -470,7 +470,7 @@ case class DataSource( }.head } // For partitioned relation r, r.schema's column ordering can be different from the column -// ordering of data.logicalPlan (partition columns are all moved after data column). This +// ordering of data.logicalPlan (partition columns are all moved after data column). This --- End diff -- Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154863033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1073,11 +1076,13 @@ class Analyzer( * The HAVING clause could also used a grouping columns that is not presented in the SELECT. */ object ResolveMissingReferences extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { +def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions + case sa @ Sort(_, _, AnalysisBarrier(child: Aggregate)) => sa case sa @ Sort(_, _, child: Aggregate) => sa - case s @ Sort(order, _, child) if !s.resolved && child.resolved => + case s @ Sort(order, _, oriChild) if !s.resolved && oriChild.resolved => --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154862947 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -241,7 +241,7 @@ class PlannerSuite extends SharedSQLContext { test("collapse adjacent repartitions") { val doubleRepartitioned = testData.repartition(10).repartition(20).coalesce(5) def countRepartitions(plan: LogicalPlan): Int = plan.collect { case r: Repartition => r }.length -assert(countRepartitions(doubleRepartitioned.queryExecution.logical) === 3) +assert(countRepartitions(doubleRepartitioned.queryExecution.analyzed) === 3) --- End diff -- Please see previous discussion: https://github.com/apache/spark/pull/17770/files#r118480364 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19889: [SPARK-22690][ML] Imputer inherit HasOutputCols
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/19889 [SPARK-22690][ML] Imputer inherit HasOutputCols ## What changes were proposed in this pull request? make `Imputer` inherit `HasOutputCols` ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark using_HasOutputCols Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19889.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19889 commit 5881b88cfcee0989ef5e715ce4e243b501b39ec7 Author: Zheng RuiFeng Date: 2017-12-05T07:16:32Z create pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154863003 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -666,7 +667,9 @@ class Analyzer( * Generate a new logical plan for the right child with different expression IDs * for all conflicting attributes. */ -private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = { +private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r154862787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions._ + +/** + * Defines APIs used in expression code generation. + */ +object ExpressionCodegen { + + /** + * Given an expression, returns the all necessary parameters to evaluate it, so the generated + * code of this expression can be split in a function. + * The 1st string in returned tuple is the parameter strings used to call the function. + * The 2nd string in returned tuple is the parameter strings used to declare the function. + * + * Returns `None` if it can't produce valid parameters. + * + * Params to include: + * 1. Evaluated columns referred by this, children or deferred expressions. + * 2. Rows referred by this, children or deferred expressions. + * 3. Eliminated subexpressions referred bu children expressions. + */ + def getExpressionInputParams( + ctx: CodegenContext, + expr: Expression): Option[(Seq[String], Seq[String])] = { +val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr) +val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, inputVars) + +val subExprs = getSubExprInChildren(ctx, expr) +val subExprCodes = getSubExprCodes(ctx, subExprs) +val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, subExprCodes) --- End diff -- Oh, then it is no. Firstly we don't try to eliminate subexpressions before splitting codes now. Secondly, `splitExpressions` doesn't support subexpression elimination too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154862662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- So, there is no more `The ORC data source must be used with Hive support enabled`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19873 From the PR description, I am unable to tell the changes made in this PR. We need a better description to explain what is the solution proposed in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19840 @vanzin according to @ueshin 's explanation, `PYSPARK_DRIVER_PYTHON` is only for driver, if executor follows the order of [SparkSubmitCommandBuilder.java#L304](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L304), we may not need so many configs, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154860853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- I agree with both of you. Just for explanation: The original design completely preserves the previous behavior. Without `SQLConf.ORC_IMPLEMENTATION` option, Spark doesn't know OrcFileFormat. So, in case of non-Hive support, creating data source with "orc" will fail with unknown data source. Anyway, I'm happy to update according to your advice. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in TypeCo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19874 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should not throw...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19888 **[Test build #84458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84458/testReport)** for PR 19888 at commit [`b2b4d54`](https://github.com/apache/spark/commit/b2b4d5460ce1bf156f76b346a7d670bd843f0799). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19793: [SPARK-22574] [Mesos] [Submit] Check submission r...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19793#discussion_r154859418 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala --- @@ -77,10 +77,16 @@ private[mesos] class MesosSubmitRequestServlet( private def buildDriverDescription(request: CreateSubmissionRequest): MesosDriverDescription = { // Required fields, including the main class because python is not yet supported val appResource = Option(request.appResource).getOrElse { - throw new SubmitRestMissingFieldException("Application jar is missing.") + throw new SubmitRestMissingFieldException("Application jar 'appResource' is missing.") } val mainClass = Option(request.mainClass).getOrElse { - throw new SubmitRestMissingFieldException("Main class is missing.") + throw new SubmitRestMissingFieldException("Main class 'mainClass' is missing.") +} +val appArgs = Option(request.appArgs).getOrElse { + throw new SubmitRestMissingFieldException("Application arguments 'appArgs' are missing.") +} +val environmentVariables = Option(request.environmentVariables).getOrElse { + throw new SubmitRestMissingFieldException("Environment variables 'environmentVariables' are missing.") --- End diff -- so this is API changes - new required arguments for the request. are folks ok with this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154859083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1098,7 +1103,8 @@ class Analyzer( case ae: AnalysisException => s } - case f @ Filter(cond, child) if !f.resolved && child.resolved => + case f @ Filter(cond, oriChild) if !f.resolved && oriChild.resolved => --- End diff -- Use `originalChild` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154858997 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -666,7 +667,9 @@ class Analyzer( * Generate a new logical plan for the right child with different expression IDs * for all conflicting attributes. */ -private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = { +private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = { --- End diff -- Use `originalRight` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154859044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1073,11 +1076,13 @@ class Analyzer( * The HAVING clause could also used a grouping columns that is not presented in the SELECT. */ object ResolveMissingReferences extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { +def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions + case sa @ Sort(_, _, AnalysisBarrier(child: Aggregate)) => sa case sa @ Sort(_, _, child: Aggregate) => sa - case s @ Sort(order, _, child) if !s.resolved && child.resolved => + case s @ Sort(order, _, oriChild) if !s.resolved && oriChild.resolved => --- End diff -- Use `originalChild` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154858146 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -470,7 +470,7 @@ case class DataSource( }.head } // For partitioned relation r, r.schema's column ordering can be different from the column -// ordering of data.logicalPlan (partition columns are all moved after data column). This +// ordering of data.logicalPlan (partition columns are all moved after data column). This --- End diff -- Get rid of changes in this file. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154857858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -666,7 +667,9 @@ class Analyzer( * Generate a new logical plan for the right child with different expression IDs * for all conflicting attributes. */ -private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = { +private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = { --- End diff -- What is `oriRight `? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154857587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -280,7 +280,7 @@ object TypeCoercion { */ object WidenSetOperationTypes extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case p if p.analyzed => p --- End diff -- Why? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154857148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- Yep. It exists in master, too. I'll update like the example by @cloud-fan . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in TypeCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19874 **[Test build #84457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84457/testReport)** for PR 19874 at commit [`bcfbd45`](https://github.com/apache/spark/commit/bcfbd45ea60a42480d165160680437e3a6757035). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in TypeCo...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19874 BTW, I did not change all the rules in `object TypeCoercion` to propagate the types, because not all of them need to propagate the types. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19874#discussion_r154855820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -58,7 +58,7 @@ import org.apache.spark.sql.types._ * - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE */ // scalastyle:on -object DecimalPrecision extends Rule[LogicalPlan] { +object DecimalPrecision extends Rule[LogicalPlan] with TypePropagation { --- End diff -- one benefit of naming it `TypeCoercionRule` is, we can just do `object DecimalPrecision extends TypeCoercionRule` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154855703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- I can reproduce it in master now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19873 LGTM, also cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154855592 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -241,7 +241,7 @@ class PlannerSuite extends SharedSQLContext { test("collapse adjacent repartitions") { val doubleRepartitioned = testData.repartition(10).repartition(20).coalesce(5) def countRepartitions(plan: LogicalPlan): Int = plan.collect { case r: Repartition => r }.length -assert(countRepartitions(doubleRepartitioned.queryExecution.logical) === 3) +assert(countRepartitions(doubleRepartitioned.queryExecution.analyzed) === 3) --- End diff -- is it a necessary change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r15480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- +1 for ^ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in TypeCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19874 **[Test build #84456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84456/testReport)** for PR 19874 at commit [`3c63a6a`](https://github.com/apache/spark/commit/3c63a6a4fba638e251a7e9f7dbab5cd0f710370a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19873#discussion_r154855150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -50,41 +50,6 @@ abstract class LogicalPlan /** Returns true if this subtree has data from a streaming data source. */ def isStreaming: Boolean = children.exists(_.isStreaming == true) - /** - * Returns a copy of this node where `rule` has been recursively applied first to all of its - * children and then itself (post-order). When `rule` does not apply to a given node, it is left - * unchanged. This function is similar to `transformUp`, but skips sub-trees that have already - * been marked as analyzed. - * - * @param rule the function use to transform this nodes children - */ - def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = { --- End diff -- can we also remove the `analyzed` flag in this class? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19874#discussion_r154854338 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -39,6 +40,8 @@ class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfte */ protected override def afterAll(): Unit = { try { + // For debugging dump some statistics about how much time was spent in various optimizer rules + logWarning(RuleExecutor.dumpTimeSpent()) --- End diff -- Let me change all the other places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19874#discussion_r154854356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -850,3 +834,45 @@ object TypeCoercion { } } } + +trait TypePropagation extends Logging { --- End diff -- Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r154854328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions._ + +/** + * Defines APIs used in expression code generation. + */ +object ExpressionCodegen { + + /** + * Given an expression, returns the all necessary parameters to evaluate it, so the generated + * code of this expression can be split in a function. + * The 1st string in returned tuple is the parameter strings used to call the function. + * The 2nd string in returned tuple is the parameter strings used to declare the function. + * + * Returns `None` if it can't produce valid parameters. + * + * Params to include: + * 1. Evaluated columns referred by this, children or deferred expressions. + * 2. Rows referred by this, children or deferred expressions. + * 3. Eliminated subexpressions referred bu children expressions. + */ + def getExpressionInputParams( + ctx: CodegenContext, + expr: Expression): Option[(Seq[String], Seq[String])] = { +val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr) +val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, inputVars) + +val subExprs = getSubExprInChildren(ctx, expr) +val subExprCodes = getSubExprCodes(ctx, subExprs) +val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, subExprCodes) --- End diff -- I mean before your PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154854071 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- @dongjoon-hyun can you verify it? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154854042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- hmmm, it's fixed in master? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19874#discussion_r154853933 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -39,6 +40,8 @@ class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfte */ protected override def afterAll(): Unit = { try { + // For debugging dump some statistics about how much time was spent in various optimizer rules + logWarning(RuleExecutor.dumpTimeSpent()) --- End diff -- use `logInfo`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19885 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19885 **[Test build #84455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84455/testReport)** for PR 19885 at commit [`8bbbf04`](https://github.com/apache/spark/commit/8bbbf046a8f706add8430dc6d34c551d87c4d5a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84455/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154853724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- I can see it in 2.2.1. It is a warning message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19874: [SPARK-22675] [SQL] Refactoring PropagateTypes in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19874#discussion_r154853773 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -850,3 +834,45 @@ object TypeCoercion { } } } + +trait TypePropagation extends Logging { --- End diff -- how about ``` trait TypeCoercionRule extends Rule[LogicalPlan] { protected def coerciveType(plan: LogicalPlan): LogicalPlan def apply(plan: LogicalPlan): LogicalPlan = { val newPlan = coerciveType(plan) ... } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19877: [SPARK-22681]Accumulator should only be updated once for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19877 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84452/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19877: [SPARK-22681]Accumulator should only be updated once for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19877 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19877: [SPARK-22681]Accumulator should only be updated once for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19877 **[Test build #84452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84452/testReport)** for PR 19877 at commit [`756f02f`](https://github.com/apache/spark/commit/756f02f1586edd14e42e32cd119e43132e9d13ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19852: [SPARK-22655][PySpark] Throw exception rather tha...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19852#discussion_r154852742 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -319,7 +319,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( case e: Exception if env.isStopped => logDebug("Exception thrown after context is stopped", e) -null.asInstanceOf[OUT] // exit silently +throw new SparkException("Spark session has been stopped", e) --- End diff -- It seems like java task doesn't check `env.isStopped` when task failed, and only check it when retry failed. Does python task have retry mechanism? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154852342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- it's in the log, which means it's not a serious bug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154852231 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- and also add comments here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154852193 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- This sounds counter-intuitive, I think we should register the new orc instead of the old one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154852162 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- ``` scala> spark.version res2: String = 2.3.0-SNAPSHOT scala> sql("DROP TABLE IF EXISTS t") res3: org.apache.spark.sql.DataFrame = [] scala> sql("DROP TABLE IF EXISTS t") res4: org.apache.spark.sql.DataFrame = [] ``` Unable to reproduce it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19885 **[Test build #84455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84455/testReport)** for PR 19885 at commit [`8bbbf04`](https://github.com/apache/spark/commit/8bbbf046a8f706add8430dc6d34c551d87c4d5a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19888#discussion_r154851944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -206,7 +206,8 @@ case class DropTableCommand( try { sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) } catch { - case _: NoSuchTableException if ifExists => + case ae: AnalysisException +if ifExists && ae.cause.nonEmpty && ae.getCause.isInstanceOf[NoSuchTableException] => --- End diff -- can we follow https://github.com/apache/spark/pull/19713/files#diff-bc55b5f76add105ec32ae4107075b278R57 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19885 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154850738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 +} +binId + } + + /** + * Returns the number of the last bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the last bin into which a column values falls. + */ + def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +for (i <- bins.indices) { + if (value > bins(i).hi) { +// increment binId to point to next bin +binId += 1 + } + if ((value == bins(i).hi) && (i < bins.length - 1) && (value == bins(i + 1).lo)) { +// We assume the above 3 conditions will be evaluated from left to right sequentially. +// If the above 3 conditions are evaluated out-of-order, then out-of-bound error may happen. +// At that time, we should split the third condition into another if statement. +// increment binId since the value appears in this bin and next bin +binId += 1 + } +} +binId + } + + /** + * Returns a percentage of a bin holding values for column value in the range of + * [lowerValue, higherValue] + * + * @param binId a given bin id in a specified histogram + * @param higherValue a given upper bound value of a specified column value range + * @param lowerValue a given lower bound value of a specified column value range + * @param histogram a numeric equi-height histogram + * @return the percentage of a single bin holding values in [lowerValue, higherValue]. + */ + private def getOccupation( + binId: Int, + higherValue: Double, + lowerValue: Double, + histogram: Histogram): Double = { +val curBin = histogram.bins(binId) +if (binId == 0 && curBin.hi == curBin.lo) { + // the Min of the histogram occupies the whole first bin + 1.0 +} else if (binId == 0 && curBin.hi != curBin.lo) { + if (higherValue == lowerValue) { +// set percentage to 1/NDV +1.0 / curBin.ndv.toDouble + } else { +// Use proration since the range falls inside this bin. +(higherValue - lowerValue) / (curBin.hi - curBin.lo) + } +} else { + if (curBin.hi == curBin.lo) { +// the entire bin is covered in the range +1.0 + } else if (higherValue == lowerValue) { +// set percentage to 1/NDV +1.0 / curBin.ndv.toDouble + } else { +// Use proration since the range falls inside this bin. +math.min((higherValue - lowerValue) / (curBin.hi - curBin.lo), 1.0) + } +} + } + + /** + * Returns the number of bins for column values in [lowerValue, higherValue]. + * The column value distribution is saved in an equi-height histogram. + * + * @param higherEnd a given upper bound value of a specified column value range + * @param lowerEnd a given lower bound value of a specified column value range + * @param histogram a numeric equi-height histogram + * @return the selectivity percentage for column values in [lowerValue, higherValue]. + */ + def getOccupationBins( + higherEnd: Double, + lowerEnd: Double, + histogram: Histogram): Double = { +// find bins where current min and max locate +val lowerBinId = findFirstBinForValue(lowerEnd, histogram.bins) +val higherBinId = findLastBinForValue(higherEnd, histogram.bins) +assert(lowerBinId <= higherBinId) + +// compute how much current [lowerEnd, higherEnd] range occupies the histogram in the +// number of bins +getOccupationBins(higherBinId, lowerBinId, higherEnd, lowerEnd, histogram) + } +
[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/17702 gental ping @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19885 Is this assumption based on the implementation of Hadoop `FileSystem`? I was thinking that wasb is an exception, for other we still keep the original code. @steveloughran would you please comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19713 Hi, All. It seems we missed `DROP TABLE IF EXISTS` cases here. (I also forgot to check that during 2.2-backporting of this PR.) This PR swallows `NoSuchTableException` and emits `AnalysisException`. `DROP TABLE IF EXISTS` expected `NoSuchTableException` in ddl.scala. It becomes a regression on 2.2.1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154848509 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 +} +binId + } + + /** + * Returns the number of the last bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the last bin into which a column values falls. + */ + def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +for (i <- bins.indices) { + if (value > bins(i).hi) { +// increment binId to point to next bin +binId += 1 + } + if ((value == bins(i).hi) && (i < bins.length - 1) && (value == bins(i + 1).lo)) { +// We assume the above 3 conditions will be evaluated from left to right sequentially. +// If the above 3 conditions are evaluated out-of-order, then out-of-bound error may happen. +// At that time, we should split the third condition into another if statement. +// increment binId since the value appears in this bin and next bin +binId += 1 + } +} +binId + } + + /** + * Returns a percentage of a bin holding values for column value in the range of + * [lowerValue, higherValue] + * + * @param binId a given bin id in a specified histogram + * @param higherValue a given upper bound value of a specified column value range + * @param lowerValue a given lower bound value of a specified column value range + * @param histogram a numeric equi-height histogram + * @return the percentage of a single bin holding values in [lowerValue, higherValue]. + */ + private def getOccupation( + binId: Int, + higherValue: Double, + lowerValue: Double, + histogram: Histogram): Double = { +val curBin = histogram.bins(binId) +if (binId == 0 && curBin.hi == curBin.lo) { + // the Min of the histogram occupies the whole first bin + 1.0 +} else if (binId == 0 && curBin.hi != curBin.lo) { + if (higherValue == lowerValue) { +// set percentage to 1/NDV +1.0 / curBin.ndv.toDouble + } else { +// Use proration since the range falls inside this bin. +(higherValue - lowerValue) / (curBin.hi - curBin.lo) --- End diff -- why do we need to specialize it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...
Github user sadikovi commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154848470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -193,38 +195,68 @@ case class InMemoryTableScanExec( private val inMemoryPartitionPruningEnabled = sqlContext.conf.inMemoryPartitionPruning + private def doFilterCachedBatches( + rdd: RDD[CachedBatch], + partitionStatsSchema: Seq[AttributeReference]): RDD[CachedBatch] = { +val schemaIndex = partitionStatsSchema.zipWithIndex +rdd.mapPartitionsWithIndex { + case (partitionIndex, cachedBatches) => +if (inMemoryPartitionPruningEnabled) { + cachedBatches.filter { cachedBatch => +val partitionFilter = newPredicate( + partitionFilters.reduceOption(And).getOrElse(Literal(true)), + partitionStatsSchema) +partitionFilter.initialize(partitionIndex) +if (!partitionFilter.eval(cachedBatch.stats)) { --- End diff -- All good, no need to change. I was trying to understand the code, so my question would be referring to statistics collection overall, not changes in this PR. Link points to a condition (exists before this PR) that could potentially result in exiting iterator before exhausting all records in it, so statistics would be partially collected, which might affect any filtering that uses such statistics - though it is quite possibly handled later, or a theoretical use case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154848428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 +} +binId + } + + /** + * Returns the number of the last bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the last bin into which a column values falls. + */ + def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +for (i <- bins.indices) { + if (value > bins(i).hi) { +// increment binId to point to next bin +binId += 1 + } + if ((value == bins(i).hi) && (i < bins.length - 1) && (value == bins(i + 1).lo)) { +// We assume the above 3 conditions will be evaluated from left to right sequentially. +// If the above 3 conditions are evaluated out-of-order, then out-of-bound error may happen. +// At that time, we should split the third condition into another if statement. +// increment binId since the value appears in this bin and next bin +binId += 1 + } +} +binId + } + + /** + * Returns a percentage of a bin holding values for column value in the range of + * [lowerValue, higherValue] + * + * @param binId a given bin id in a specified histogram + * @param higherValue a given upper bound value of a specified column value range + * @param lowerValue a given lower bound value of a specified column value range + * @param histogram a numeric equi-height histogram + * @return the percentage of a single bin holding values in [lowerValue, higherValue]. + */ + private def getOccupation( + binId: Int, + higherValue: Double, + lowerValue: Double, + histogram: Histogram): Double = { +val curBin = histogram.bins(binId) +if (binId == 0 && curBin.hi == curBin.lo) { + // the Min of the histogram occupies the whole first bin + 1.0 +} else if (binId == 0 && curBin.hi != curBin.lo) { + if (higherValue == lowerValue) { +// set percentage to 1/NDV +1.0 / curBin.ndv.toDouble + } else { +// Use proration since the range falls inside this bin. +(higherValue - lowerValue) / (curBin.hi - curBin.lo) --- End diff -- this is the only branch we need to specialize for `binId=0`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154848336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- @cloud-fan . To avoid that issue, new OrcFileFormat is not registered intentionally. @HyukjinKwon 's comment is correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should not throw...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19888 **[Test build #84454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84454/testReport)** for PR 19888 at commit [`2b113a2`](https://github.com/apache/spark/commit/2b113a2871dc732d74b0564017bed15e53b297ab). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154848277 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 +} +binId + } + + /** + * Returns the number of the last bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the last bin into which a column values falls. + */ + def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +for (i <- bins.indices) { + if (value > bins(i).hi) { +// increment binId to point to next bin +binId += 1 + } + if ((value == bins(i).hi) && (i < bins.length - 1) && (value == bins(i + 1).lo)) { +// We assume the above 3 conditions will be evaluated from left to right sequentially. +// If the above 3 conditions are evaluated out-of-order, then out-of-bound error may happen. +// At that time, we should split the third condition into another if statement. +// increment binId since the value appears in this bin and next bin +binId += 1 + } +} +binId + } + + /** + * Returns a percentage of a bin holding values for column value in the range of + * [lowerValue, higherValue] + * + * @param binId a given bin id in a specified histogram + * @param higherValue a given upper bound value of a specified column value range + * @param lowerValue a given lower bound value of a specified column value range + * @param histogram a numeric equi-height histogram + * @return the percentage of a single bin holding values in [lowerValue, higherValue]. + */ + private def getOccupation( + binId: Int, + higherValue: Double, + lowerValue: Double, + histogram: Histogram): Double = { --- End diff -- instead of accepting a `binId` and `histogram`, can't we just ask the caller side to pass a `HistogramBin`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154848218 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- I was looking the exact same path. It seems not because it's not registered to `ServiceLoader` (`src/main/resources/org.apache.spark.sql.sources.DataSourceRegister`). So, short name for the newer ORC source would not find be used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19888: [SPARK-22686][SQL] DROP TABLE IF EXISTS should no...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19888 [SPARK-22686][SQL] DROP TABLE IF EXISTS should not throw AnalysisException ## What changes were proposed in this pull request? During [SPARK-22488](https://github.com/apache/spark/pull/19713) to fix view resolution issue, there occurs a regression at `2.2.1` and `master` branch like the following. This PR fixes that. ```scala scala> spark.version res2: String = 2.2.1 scala> sql("DROP TABLE IF EXISTS t").show 17/12/04 21:01:06 WARN DropTableCommand: org.apache.spark.sql.AnalysisException: Table or view not found: t; org.apache.spark.sql.AnalysisException: Table or view not found: t; ``` ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22686 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19888.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19888 commit 2b113a2871dc732d74b0564017bed15e53b297ab Author: Dongjoon Hyun Date: 2017-12-05T05:01:53Z [SPARK-22686][SQL] DROP TABLE IF EXISTS should not throw AnalysisException --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154848018 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 +} +binId + } + + /** + * Returns the number of the last bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the last bin into which a column values falls. + */ + def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = { --- End diff -- Why is this method so different from `findFirstBinForValue`? It looks like we just need to reverse the iteration order, i.e. from `bins.length` to `0`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154847802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 --- End diff -- this looks more like a while loop pattern, can we use while loop here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154847703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object EstimationUtils { } } + /** + * Returns the number of the first bin into which a column values falls for a specified + * numeric equi-height histogram. + * + * @param value a literal value of a column + * @param bins an array of bins for a given numeric equi-height histogram + * @return the number of the first bin into which a column values falls. + */ + def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int = { +var binId = 0 +bins.foreach { bin => + if (value > bin.hi) binId += 1 --- End diff -- shall we use binary search here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19871#discussion_r154847064 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -568,8 +574,13 @@ object DataSource extends Logging { "org.apache.spark.Logging") /** Given a provider name, look up the data source class definition. */ - def lookupDataSource(provider: String): Class[_] = { -val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) + def lookupDataSource(provider: String, conf: SQLConf): Class[_] = { +val provider1 = backwardCompatibilityMap.getOrElse(provider, provider) match { + case name if name.equalsIgnoreCase("orc") && + conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native" => +classOf[OrcFileFormat].getCanonicalName + case name => name --- End diff -- if `ORC_IMPLEMENTATION` is `hive`, we leave the provider as it was, which may be `orc`. Then we will hit `Multiple sources found` issue, aren't we? Both the old and new orc has the same short name `orc`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/19885 @jerryshao yes, hdfs://us...@nn1.com:8020 and hdfs://us...@nn1.com:8020 would consider as two filesystem, since the authority information should be taken into consideration. that is why need to add the authority to check two FS. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19878 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r154846305 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -388,6 +388,32 @@ class SparkSubmitSuite conf.get("spark.ui.enabled") should be ("false") } + test("handles k8s cluster mode") { +val clArgs = Seq( + "--deploy-mode", "cluster", + "--master", "k8s://host:port", + "--executor-memory", "5g", + "--class", "org.SomeClass", + "--kubernetes-namespace", "foo", + "--driver-memory", "4g", + "--conf", "spark.kubernetes.driver.docker.image=bar", + "/home/thejar.jar", + "arg1") +val appArgs = new SparkSubmitArguments(clArgs) +val (childArgs, classpath, conf, mainClass) = prepareSubmitEnvironment(appArgs) + +val childArgsMap = childArgs.grouped(2).map(a => a(0) -> a(1)).toMap +childArgsMap.get("--primary-java-resource") should be (Some("file:/home/thejar.jar")) +childArgsMap.get("--main-class") should be (Some("org.SomeClass")) +childArgsMap.get("--arg") should be (Some("arg1")) +mainClass should be ("org.apache.spark.deploy.k8s.submit.Client") +classpath should have length (0) +conf.get("spark.executor.memory") should be ("5g") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19878: [SPARK-22682][SQL] HashExpression does not need to creat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19878 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19878#discussion_r154845901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -730,23 +776,29 @@ case class HiveHash(children: Seq[Expression]) extends HashExpression[Int] { input: String, result: String, fields: Array[StructField]): String = { -val localResult = ctx.freshName("localResult") val childResult = ctx.freshName("childResult") -fields.zipWithIndex.map { case (field, index) => +val fieldsHash = fields.zipWithIndex.map { case (field, index) => + val computeFieldHash = nullSafeElementHash( +input, index.toString, field.nullable, field.dataType, childResult, ctx) s""" - $childResult = 0; - ${nullSafeElementHash(input, index.toString, field.nullable, field.dataType, - childResult, ctx)} - $localResult = (31 * $localResult) + $childResult; - """ -}.mkString( - s""" - int $localResult = 0; - int $childResult = 0; - """, - "", - s"$result = (31 * $result) + $localResult;" -) + |$childResult = 0; + |$computeFieldHash + |$result = (31 * $result) + $childResult; + """.stripMargin +} + +s"${ctx.JAVA_INT} $childResult = 0;\n" + ctx.splitExpressions( --- End diff -- Yea, the input here is a row that may be produced by `row.getStruct` instead of `ctx.INPUT_ROW`, so we don't need this check as the input won't be `ctx.currentVars`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19869 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19869 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 @sadikovi thanks for the review, I replied in comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154845669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -193,38 +195,68 @@ case class InMemoryTableScanExec( private val inMemoryPartitionPruningEnabled = sqlContext.conf.inMemoryPartitionPruning + private def doFilterCachedBatches( + rdd: RDD[CachedBatch], + partitionStatsSchema: Seq[AttributeReference]): RDD[CachedBatch] = { +val schemaIndex = partitionStatsSchema.zipWithIndex +rdd.mapPartitionsWithIndex { + case (partitionIndex, cachedBatches) => +if (inMemoryPartitionPruningEnabled) { + cachedBatches.filter { cachedBatch => +val partitionFilter = newPredicate( + partitionFilters.reduceOption(And).getOrElse(Literal(true)), + partitionStatsSchema) +partitionFilter.initialize(partitionIndex) +if (!partitionFilter.eval(cachedBatch.stats)) { --- End diff -- I might not understand your proposal well...are you trying to simplify the logic in https://github.com/apache/spark/pull/19810/files#diff-5fc188468d3066580ea9a766114b8f1dR74? it would make the code simpler but degrade pruning effect here, --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84451/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19746 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19746 **[Test build #84451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84451/testReport)** for PR 19746 at commit [`c3d1c5e`](https://github.com/apache/spark/commit/c3d1c5eb2a81124b05a9f7b066f9e8ce46efe217). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84453/testReport)** for PR 19717 at commit [`347ed69`](https://github.com/apache/spark/commit/347ed697e3f17546af83a8cd8a9578527cf65384). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154844579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -52,6 +52,68 @@ object InMemoryRelation { private[columnar] case class CachedBatch(numRows: Int, buffers: Array[Array[Byte]], stats: InternalRow) +private[columnar] class CachedBatchIterator( +rowIterator: Iterator[InternalRow], +output: Seq[Attribute], +batchSize: Int, +useCompression: Boolean, +batchStats: LongAccumulator, +singleBatchPerPartition: Boolean) extends Iterator[CachedBatch] { + + def next(): CachedBatch = { +val columnBuilders = output.map { attribute => + ColumnBuilder(attribute.dataType, batchSize, attribute.name, useCompression) +}.toArray + +var rowCount = 0 +var totalSize = 0L + +val terminateLoop = (singleBatch: Boolean, rowIter: Iterator[InternalRow], --- End diff -- to make getting partition stats easier, we construct only one CatchedBatch for each partition when enabling the functionality proposed in this PR. `singleBatch` distinguishes the scenarios which enables/disables the functionality by introducing different while loop termination conditions, making the other code reusable --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19882#discussion_r154843730 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala --- @@ -614,11 +531,63 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } - test("read from multiple orc input paths") { - val path1 = Utils.createTempDir() - val path2 = Utils.createTempDir() - makeOrcFile((1 to 10).map(Tuple1.apply), path1) - makeOrcFile((1 to 10).map(Tuple1.apply), path2) - assertResult(20)(read.orc(path1.getCanonicalPath, path2.getCanonicalPath).count()) - } + test("read from multiple orc input paths") { +val path1 = Utils.createTempDir() +val path2 = Utils.createTempDir() +makeOrcFile((1 to 10).map(Tuple1.apply), path1) +makeOrcFile((1 to 10).map(Tuple1.apply), path2) +val df = spark.read.format(format).load(path1.getCanonicalPath, path2.getCanonicalPath) +assert(df.count() == 20) + } +} + +class OrcQuerySuite extends OrcQueryTest with SharedSQLContext { + import testImplicits._ + + test("LZO compression options for writing to an ORC file") { +withTempPath { file => + spark.range(0, 10).write +.option("compression", "LZO") +.format(format) +.save(file.getCanonicalPath) + + val maybeOrcFile = file.listFiles().find(_.getName.endsWith(".lzo.orc")) + assert(maybeOrcFile.isDefined) + + val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath) + val conf = OrcFile.readerOptions(new Configuration()) + assert("LZO" === OrcFile.createReader(orcFilePath, conf).getCompressionKind.name) +} + } + + test("Schema discovery on empty ORC files") { +// SPARK-8501 is fixed. +withTempPath { dir => + val path = dir.getCanonicalPath + + withTable("empty_orc") { +withTempView("empty", "single") { + spark.sql( +s"""CREATE TABLE empty_orc(key INT, value STRING) + |USING $format + |LOCATION '${dir.toURI}' + """.stripMargin) + + val emptyDF = Seq.empty[(Int, String)].toDF("key", "value").coalesce(1) + emptyDF.createOrReplaceTempView("empty") + + // This creates 1 empty ORC file with Hive ORC SerDe. We are using this trick because + // Spark SQL ORC data source always avoids write empty ORC files. --- End diff -- Thank you for review, @viirya . I'll update tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84450/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84450/testReport)** for PR 19717 at commit [`cdca57e`](https://github.com/apache/spark/commit/cdca57ece845c86ac16d6089460f28e3ff3c298f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org