[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152443133 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] ( // TODO: Check more carefully about whether this whole class will be included in a closure. /** Per-vector transform function */ - private val transformFunc: Vector => Vector = { + private lazy val transformFunc: Vector => Vector = { --- End diff -- I'm not super familiar with scala closures, but if we do ``` def transformFunc: Vector => Vector = { val sortedCatFeatureIndices = categoryMaps.keys.toArray.sorted val localVectorMap = categoryMaps val localNumFeatures = numFeatures val localHandleInvalid = getHandleInvalid val f : Vector => Vector = { (v: Vector) => ...} f } ``` Won't we still get a closure and avoid serializing the entire model? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152442892 --- Diff: pom.xml --- @@ -2648,6 +2648,13 @@ + + kubernetes + +resource-managers/kubernetes/core --- End diff -- both yarn and meson don't have a sub-directory called `core`, are we going to add more modules? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152442655 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] ( // TODO: Check more carefully about whether this whole class will be included in a closure. /** Per-vector transform function */ - private val transformFunc: Vector => Vector = { + private lazy val transformFunc: Vector => Vector = { --- End diff -- Can I ask about the lazy val here? What happens here if a user sets the params, calls transform, modifies one of the params, and then calls transform again? Is that a concern? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152442103 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi val trueEval = trueValue.genCode(ctx) val falseEval = falseValue.genCode(ctx) -// place generated code of condition, true value and false value in separate methods if -// their code combined is large -val combinedLength = condEval.code.length + trueEval.code.length + falseEval.code.length --- End diff -- BTW if it's really an issue, we can add splitting logic in non-leaf/non-unary nodes. This is much less work than before because: 1. no need to care about unary nodes 2. the splitting logic can be simpler because all children are guaranteed to generate less than 1000 LOC. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19746 **[Test build #84089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84089/testReport)** for PR 19746 at commit [`2b1ed0a`](https://github.com/apache/spark/commit/2b1ed0a3a85385f9b4042415889335942b65b9c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19746 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19767 @felixcheung I'd like to keep it in master only, it has larger impaction than other related PRs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152441210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi val trueEval = trueValue.genCode(ctx) val falseEval = falseValue.genCode(ctx) -// place generated code of condition, true value and false value in separate methods if -// their code combined is large -val combinedLength = condEval.code.length + trueEval.code.length + falseEval.code.length --- End diff -- There is no way to guarantee it with the current string based codegen framework, even without this PR. 1000 length code may also generate 64kb byte code in the end. `1024` is not a good estimation at all, kind of random to me. So multiplying it with 2 doesn't seem a big issue. CASE WHEN may have issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19518 @kiszk @maropu any of you wanna take this over? This patch becomes important as we now split codes more aggressively. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152436838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi val trueEval = trueValue.genCode(ctx) val falseEval = falseValue.genCode(ctx) -// place generated code of condition, true value and false value in separate methods if -// their code combined is large -val combinedLength = condEval.code.length + trueEval.code.length + falseEval.code.length --- End diff -- Two problems I think for this. One is even the two childs' code don't exceed the threshold individually, a method not over 64k but over 8k is still big and bad for JIT. One is we estimate it with code length, I'm not sure if two 1000 length childs won't definitely generate 64k method in the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #84088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84088/testReport)** for PR 19468 at commit [`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19468 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84087/testReport)** for PR 19776 at commit [`7a19ac6`](https://github.com/apache/spark/commit/7a19ac63fcdae6b67ff989ca90d4a3652c7d02f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152429647 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 1))) +testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 1))) + +testTranslateFilter(EqualNullSafe(attrStr, Literal(null)), + Some(sources.EqualNullSafe("cstr", null))) +testTranslateFilter(EqualNullSafe(Literal(null), attrStr), + Some(sources.EqualNullSafe("cstr", null))) + +testTranslateFilter(GreaterThan(attrInt, 1), Some(sources.GreaterThan("cint", 1))) +testTranslateFilter(GreaterThan(1, attrInt), Some(sources.LessThan("cint", 1))) + +testTranslateFilter(LessThan(attrInt, 1), Some(sources.LessThan("cint", 1))) +testTranslateFilter(LessThan(1, attrInt), Some(sources.GreaterThan("cint", 1))) + +testTranslateFilter(GreaterThanOrEqual(attrInt, 1), Some(sources.GreaterThanOrEqual("cint", 1))) +testTranslateFilter(GreaterThanOrEqual(1, attrInt), Some(sources.LessThanOrEqual("cint", 1))) + +testTranslateFilter(LessThanOrEqual(attrInt, 1), Some(sources.LessThanOrEqual("cint", 1))) +testTranslateFilter(LessThanOrEqual(1, attrInt), Some(sources.GreaterThanOrEqual("cint", 1))) + +testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint"))) +testTranslateFilter(IsNotNull(attrInt), Some(sources.IsNotNull("cint"))) + +// cint > 1 AND cint < 10 +testTranslateFilter(And( + GreaterThan(attrInt, 1), + LessThan(attrInt, 10)), + Some(sources.And( +sources.GreaterThan("cint", 1), +sources.LessThan("cint", 10 + +// cint >= 8 OR cint <= 2 +testTranslateFilter(Or( + GreaterThanOrEqual(attrInt, 8), + LessThanOrEqual(attrInt, 2)), + Some(sources.Or( +sources.GreaterThanOrEqual("cint", 8), +sources.LessThanOrEqual("cint", 2 + +testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)), + Some(sources.Not(sources.GreaterThanOrEqual("cint", 8 + +testTranslateFilter(StartsWith(attrStr, "a"), Some(sources.StringStartsWith("cstr", "a"))) + +testTranslateFilter(EndsWith(attrStr, "a"), Some(sources.StringEndsWith("cstr", "a"))) + +testTranslateFilter(Contains(attrStr, "a"), Some(sources.StringContains("cstr", "a"))) + } + + test("translate complex expression") { +val attrInt = 'cint.int + +// ABS(cint) - 2 = 1 +testTranslateFilter(LessThanOrEqual( + // Expressions are not supported --- End diff -- good catch @_@ fixed the typo. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19776 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152425778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 1))) +testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 1))) + +testTranslateFilter(EqualNullSafe(attrStr, Literal(null)), + Some(sources.EqualNullSafe("cstr", null))) +testTranslateFilter(EqualNullSafe(Literal(null), attrStr), + Some(sources.EqualNullSafe("cstr", null))) + +testTranslateFilter(GreaterThan(attrInt, 1), Some(sources.GreaterThan("cint", 1))) +testTranslateFilter(GreaterThan(1, attrInt), Some(sources.LessThan("cint", 1))) + +testTranslateFilter(LessThan(attrInt, 1), Some(sources.LessThan("cint", 1))) +testTranslateFilter(LessThan(1, attrInt), Some(sources.GreaterThan("cint", 1))) + +testTranslateFilter(GreaterThanOrEqual(attrInt, 1), Some(sources.GreaterThanOrEqual("cint", 1))) +testTranslateFilter(GreaterThanOrEqual(1, attrInt), Some(sources.LessThanOrEqual("cint", 1))) + +testTranslateFilter(LessThanOrEqual(attrInt, 1), Some(sources.LessThanOrEqual("cint", 1))) +testTranslateFilter(LessThanOrEqual(1, attrInt), Some(sources.GreaterThanOrEqual("cint", 1))) + +testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint"))) +testTranslateFilter(IsNotNull(attrInt), Some(sources.IsNotNull("cint"))) + +// cint > 1 AND cint < 10 +testTranslateFilter(And( + GreaterThan(attrInt, 1), + LessThan(attrInt, 10)), + Some(sources.And( +sources.GreaterThan("cint", 1), +sources.LessThan("cint", 10 + +// cint >= 8 OR cint <= 2 +testTranslateFilter(Or( + GreaterThanOrEqual(attrInt, 8), + LessThanOrEqual(attrInt, 2)), + Some(sources.Or( +sources.GreaterThanOrEqual("cint", 8), +sources.LessThanOrEqual("cint", 2 + +testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)), + Some(sources.Not(sources.GreaterThanOrEqual("cint", 8 + +testTranslateFilter(StartsWith(attrStr, "a"), Some(sources.StringStartsWith("cstr", "a"))) + +testTranslateFilter(EndsWith(attrStr, "a"), Some(sources.StringEndsWith("cstr", "a"))) + +testTranslateFilter(Contains(attrStr, "a"), Some(sources.StringContains("cstr", "a"))) + } + + test("translate complex expression") { +val attrInt = 'cint.int + +// ABS(cint) - 2 = 1 +testTranslateFilter(LessThanOrEqual( + // Expressions are not supported --- End diff -- ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 ping @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user jliwork commented on the issue: https://github.com/apache/spark/pull/19776 Thanks for everyone's comments! I have polished the test cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421950 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} --- End diff -- Thanks! I've followed your suggestion and the test suite looks cleaner now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152422048 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421982 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,9 +296,15 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ --- End diff -- Thanks for the suggestion. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19776 **[Test build #84086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84086/testReport)** for PR 19776 at commit [`a0b3d4e`](https://github.com/apache/spark/commit/a0b3d4e990cd7024b532593bca321499001fc89b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84082/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/19468 retest this please Failure looks unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #84082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84084/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #84084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84084/testReport)** for PR 19468 at commit [`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152417537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,36 @@ abstract class Expression extends TreeNode[Expression] { val isNull = ctx.freshName("isNull") val value = ctx.freshName("value") val ve = doGenCode(ctx, ExprCode("", isNull, value)) + + // TODO: support whole stage codegen too + if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && ctx.currentVars == null) { +val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") { + val globalIsNull = ctx.freshName("globalIsNull") + ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = false;") + val localIsNull = ve.isNull + ve.isNull = globalIsNull + s"$globalIsNull = $localIsNull;" +} else { + "" +} + +val javaType = ctx.javaType(dataType) +val newValue = ctx.freshName("value") + +val funcName = ctx.freshName(nodeName) +val funcFullName = ctx.addNewFunction(funcName, + s""" + |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) { --- End diff -- cc @rednaxelafx @mgaido91 too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r152417440 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) } } + +/** + * A rule that uses propagated constraints to infer join conditions. The optimization is applicable + * only to CROSS joins. + * + * For instance, if there is a CROSS join, where the left relation has 'a = 1' and the right + * relation has 'b = 1', then the rule infers 'a = b' as a join predicate. + */ +object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with PredicateHelper { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (SQLConf.get.constraintPropagationEnabled) { + inferJoinConditions(plan) +} else { + plan +} + } + + private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan transform { +case join @ Join(left, right, Cross, conditionOpt) => + val leftConstraints = join.constraints.filter(_.references.subsetOf(left.outputSet)) + val rightConstraints = join.constraints.filter(_.references.subsetOf(right.outputSet)) + val inferredJoinPredicates = inferJoinPredicates(leftConstraints, rightConstraints) + + val newConditionOpt = conditionOpt match { +case Some(condition) => + val existingPredicates = splitConjunctivePredicates(condition) + val newPredicates = findNewPredicates(inferredJoinPredicates, existingPredicates) + if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), condition)) else None +case None => + inferredJoinPredicates.reduceOption(And) + } + if (newConditionOpt.isDefined) Join(left, right, Inner, newConditionOpt) else join --- End diff -- Please see the rule `CheckCartesianProducts `. The example above is not a CROSS join. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r152412423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) } } + +/** + * A rule that uses propagated constraints to infer join conditions. The optimization is applicable + * only to CROSS joins. + * + * For instance, if there is a CROSS join, where the left relation has 'a = 1' and the right + * relation has 'b = 1', then the rule infers 'a = b' as a join predicate. + */ +object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- Yes. Since we decide to focus on cross join only, we should rename it to `EliminateCrossJoin `, like what you proposed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r152415250 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,79 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) } } + +/** + * A rule that uses propagated constraints to infer join conditions. The optimization is applicable + * only to CROSS joins. For other join types, adding inferred join conditions would potentially + * shuffle children as child node's partitioning won't satisfy the JOIN node's requirements + * which otherwise could have. + * + * For instance, if there is a CROSS join, where the left relation has 'a = 1' and the right + * relation has 'b = 1', then the rule infers 'a = b' as a join predicate. + */ +object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with PredicateHelper { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (SQLConf.get.constraintPropagationEnabled) { + inferJoinConditions(plan) +} else { + plan +} + } + + private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan transform { +case join @ Join(left, right, Cross, conditionOpt) => + + val rightEqualToPredicates = join.constraints.collect { +case equalTo @ EqualTo(attr: Attribute, _) if isAttributeContainedInPlan(attr, right) => + equalTo +case equalTo @ EqualTo(_, attr: Attribute) if isAttributeContainedInPlan(attr, right) => + equalTo + } + + val inferredJoinPredicates = join.constraints.flatMap { +case EqualTo(attr: Attribute, equivalentExpr) if isAttributeContainedInPlan(attr, left) => + collectJoinPredicates(attr, equivalentExpr, right, rightEqualToPredicates) +case EqualTo(equivalentExpr, attr: Attribute) if isAttributeContainedInPlan(attr, left) => + collectJoinPredicates(attr, equivalentExpr, right, rightEqualToPredicates) +case _ => Nil + } + + val newConditionOpt = conditionOpt match { +case Some(condition) => + val existingPredicates = splitConjunctivePredicates(condition) + val newPredicates = findNewPredicates(inferredJoinPredicates, existingPredicates) + if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), condition)) else None +case None => + inferredJoinPredicates.reduceOption(And) + } + if (newConditionOpt.isDefined) Join(left, right, Inner, newConditionOpt) else join + } + + private def collectJoinPredicates( + leftAttr: Attribute, + equivalentExpr: Expression, + rightPlan: LogicalPlan, + rightPlanEqualToPredicates: Set[EqualTo]): Set[EqualTo] = { + +rightPlanEqualToPredicates.collect { + case EqualTo(attr: Attribute, expr) +if expr.semanticEquals(equivalentExpr) && isAttributeContainedInPlan(attr, rightPlan) => +EqualTo(leftAttr, attr) + case EqualTo(expr, attr: Attribute) +if expr.semanticEquals(equivalentExpr) && isAttributeContainedInPlan(attr, rightPlan) => +EqualTo(leftAttr, attr) +} + } + + private def isAttributeContainedInPlan(attr: Attribute, logicalPlan: LogicalPlan): Boolean = { +attr.references.subsetOf(logicalPlan.outputSet) + } + + private def findNewPredicates( + inferredPredicates: Set[EqualTo], + existingPredicates: Seq[Expression]) : Set[EqualTo] = { --- End diff -- `existingPredicates: Seq[Expression]) : Set[EqualTo] = {` -> `existingPredicates: Seq[Expression]): Set[EqualTo] = {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152417333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,36 @@ abstract class Expression extends TreeNode[Expression] { val isNull = ctx.freshName("isNull") val value = ctx.freshName("value") val ve = doGenCode(ctx, ExprCode("", isNull, value)) + + // TODO: support whole stage codegen too + if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && ctx.currentVars == null) { +val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") { + val globalIsNull = ctx.freshName("globalIsNull") + ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = false;") + val localIsNull = ve.isNull + ve.isNull = globalIsNull + s"$globalIsNull = $localIsNull;" +} else { + "" +} + +val javaType = ctx.javaType(dataType) +val newValue = ctx.freshName("value") + +val funcName = ctx.freshName(nodeName) +val funcFullName = ctx.addNewFunction(funcName, + s""" + |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) { --- End diff -- To continue the discussion in https://github.com/apache/spark/pull/19767#discussion_r151631456 I think there are more global variables can be eliminated by leveraging the method return value. However in some cases, we use global variables to avoid creating an object for each iteration, then we are facing a trade-off between GC overhead and global variable overhead. It would be great if java has something like C struct and can allocate objects on method stack... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r152414088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi val trueEval = trueValue.genCode(ctx) val falseEval = falseValue.genCode(ctx) -// place generated code of condition, true value and false value in separate methods if -// their code combined is large -val combinedLength = condEval.code.length + trueEval.code.length + falseEval.code.length --- End diff -- I already explained it in https://github.com/apache/spark/pull/19767#issuecomment-345176286 Mostly it's ok because the threshold is just an estimation, not a big deal to make it 2 times larger. CASE WHEN may be a problem and we can evaluate it in #18641 after this PR gets merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19767 **[Test build #84085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84085/testReport)** for PR 19767 at commit [`d126977`](https://github.com/apache/spark/commit/d126977bbdd221287b0825fa78c04b1065d97ab1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19730 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84078/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19786 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19786 **[Test build #84078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19730 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84081/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #84081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84081/testReport)** for PR 19779 at commit [`a2560b3`](https://github.com/apache/spark/commit/a2560b3105e7ff712758573ca157703b4397de8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...
Github user ebuildy commented on a diff in the pull request: https://github.com/apache/spark/pull/19792#discussion_r152397263 --- Diff: python/pyspark/sql/types.py --- @@ -1108,19 +1109,22 @@ def _has_nulltype(dt): return isinstance(dt, NullType) -def _merge_type(a, b): +def _merge_type(a, b, name=None): if isinstance(a, NullType): return b elif isinstance(b, NullType): return a elif type(a) is not type(b): # TODO: type cast (such as int -> long) -raise TypeError("Can not merge type %s and %s" % (type(a), type(b))) +if name is not None: --- End diff -- Easier to read as: ``` if name is None: raise TypeError("Can not merge type %s and %s" % (type(a), type(b))) else: raise TypeError("Can not merge type %s and %s in column %s" % (type(a), type(b), name)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19730 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84079/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19730 **[Test build #84079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)** for PR 19730 at commit [`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19774 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19389: [SPARK-22165][SQL] Fixes type conflicts between d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19389 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19774 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19389 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #84084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84084/testReport)** for PR 19468 at commit [`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152383898 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- This bug is for external table only? how about managed table? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18906 I believe the equivalent API in Scala would only be in the following form when registering a UDF ``` spark.udf.register("func", () => { 1 }).asNonNullable() ``` Would it be preferable to just stick with a similar API for Python if we are trying to match the behavior? > So I think with the performance improvements coming into Python UDFs considering annotating results as nullable or not could make sense (although I imagine we'd need to do something differeent for the vectorized UDFs if they aren't already being done). Regarding performance increases with vectorized UDFs, right now the Java side is only implemented to accept nullable return types, so there wouldn't be any difference. In the future it would be possible to accept either and that would give a little performance bump. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84077/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19370 **[Test build #84077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84077/testReport)** for PR 19370 at commit [`a4d516f`](https://github.com/apache/spark/commit/a4d516f2326e39e3f04af3395a66da8e11964ce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #84083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84083/testReport)** for PR 19468 at commit [`7afce3f`](https://github.com/apache/spark/commit/7afce3f1d61df2ecba3efd2019ca4d287fc0bffb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84083/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19468 **[Test build #84083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84083/testReport)** for PR 19468 at commit [`7afce3f`](https://github.com/apache/spark/commit/7afce3f1d61df2ecba3efd2019ca4d287fc0bffb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152374715 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/constants.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +package object constants { --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373882 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} +import javax.annotation.concurrent.GuardedBy + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.concurrent.{ExecutionContext, Future} + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, SchedulerBackendUtils} +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK") + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( +requestExecutorsService) + + private val driverPod = kubernetesClient.pods() +.inNamespace(kubernetesNamespace) +.withName(kubernetesDriverPodName) +.get() + + protected override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + private val totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( +conf.get("spark.driver.host"), +conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), +CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf) + + private val podAllocationInterval = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val allocatorRunnable = new Runnable { + +// Maintains a map of executor id to count of checks performed to learn the loss reason +// for an executor. +private val
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373908 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList} +import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, MixedOperation, NonNamespaceOperation, PodResource} +import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, MockitoAnnotations} +import org.mockito.Matchers.{any, eq => mockitoEq} +import org.mockito.Mockito.{doNothing, never, times, verify, when} +import org.scalatest.BeforeAndAfter +import org.scalatest.mock.MockitoSugar._ +import scala.collection.JavaConverters._ +import scala.concurrent.Future + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc._ +import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor, RemoveExecutor} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.ThreadUtils + +class KubernetesClusterSchedulerBackendSuite +extends SparkFunSuite with BeforeAndAfter { + + private val APP_ID = "test-spark-app" + private val DRIVER_POD_NAME = "spark-driver-pod" + private val NAMESPACE = "test-namespace" + private val SPARK_DRIVER_HOST = "localhost" + private val SPARK_DRIVER_PORT = 7077 + private val POD_ALLOCATION_INTERVAL = 60L + private val DRIVER_URL = RpcEndpointAddress( + SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + private val FIRST_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod1") + .endMetadata() +.withNewSpec() + .withNodeName("node1") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.100") + .endStatus() +.build() + private val SECOND_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod2") + .endMetadata() +.withNewSpec() + .withNodeName("node2") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.101") + .endStatus() +.build() + + private type PODS = MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] + private type LABELED_PODS = FilterWatchListDeletable[ + Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]] --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373938 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -22,15 +22,14 @@ import java.util.concurrent._ import java.util.concurrent.atomic.AtomicInteger import java.util.regex.Pattern -import scala.collection.JavaConverters._ -import scala.collection.mutable -import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} -import scala.util.control.NonFatal - import org.apache.hadoop.yarn.api.records._ import org.apache.hadoop.yarn.client.api.AMRMClient import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest import org.apache.hadoop.yarn.conf.YarnConfiguration +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} +import scala.util.control.NonFatal --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373927 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList} +import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, MixedOperation, NonNamespaceOperation, PodResource} +import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, MockitoAnnotations} +import org.mockito.Matchers.{any, eq => mockitoEq} +import org.mockito.Mockito.{doNothing, never, times, verify, when} +import org.scalatest.BeforeAndAfter +import org.scalatest.mock.MockitoSugar._ +import scala.collection.JavaConverters._ +import scala.concurrent.Future + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc._ +import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor, RemoveExecutor} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.ThreadUtils + +class KubernetesClusterSchedulerBackendSuite +extends SparkFunSuite with BeforeAndAfter { + + private val APP_ID = "test-spark-app" + private val DRIVER_POD_NAME = "spark-driver-pod" + private val NAMESPACE = "test-namespace" + private val SPARK_DRIVER_HOST = "localhost" + private val SPARK_DRIVER_PORT = 7077 + private val POD_ALLOCATION_INTERVAL = 60L + private val DRIVER_URL = RpcEndpointAddress( + SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + private val FIRST_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod1") + .endMetadata() +.withNewSpec() + .withNodeName("node1") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.100") + .endStatus() +.build() + private val SECOND_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod2") + .endMetadata() +.withNewSpec() + .withNodeName("node2") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.101") + .endStatus() +.build() + + private type PODS = MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] + private type LABELED_PODS = FilterWatchListDeletable[ + Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]] + private type IN_NAMESPACE_PODS = NonNamespaceOperation[ + Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373862 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} +import javax.annotation.concurrent.GuardedBy + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.concurrent.{ExecutionContext, Future} + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, SchedulerBackendUtils} +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK") + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( +requestExecutorsService) + + private val driverPod = kubernetesClient.pods() +.inNamespace(kubernetesNamespace) +.withName(kubernetesDriverPodName) +.get() + + protected override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + private val totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( +conf.get("spark.driver.host"), +conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), +CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf) + + private val podAllocationInterval = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val allocatorRunnable = new Runnable { + +// Maintains a map of executor id to count of checks performed to learn the loss reason +// for an executor. +private val
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373871 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} +import javax.annotation.concurrent.GuardedBy + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.concurrent.{ExecutionContext, Future} + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, SchedulerBackendUtils} +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK") + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( +requestExecutorsService) + + private val driverPod = kubernetesClient.pods() +.inNamespace(kubernetesNamespace) +.withName(kubernetesDriverPodName) +.get() + + protected override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + private val totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( +conf.get("spark.driver.host"), +conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), +CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf) + + private val podAllocationInterval = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val allocatorRunnable = new Runnable { + +// Maintains a map of executor id to count of checks performed to learn the loss reason +// for an executor. +private val
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373840 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.io.Closeable +import java.net.InetAddress +import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit} +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference} +import javax.annotation.concurrent.GuardedBy + +import io.fabric8.kubernetes.api.model._ +import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.concurrent.{ExecutionContext, Future} + +import org.apache.spark.SparkException +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv} +import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, SchedulerBackendUtils} +import org.apache.spark.util.Utils + +private[spark] class KubernetesClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +rpcEnv: RpcEnv, +executorPodFactory: ExecutorPodFactory, +kubernetesClient: KubernetesClient, +allocatorExecutor: ScheduledExecutorService, +requestExecutorsService: ExecutorService) + extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) { + + import KubernetesClusterSchedulerBackend._ + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + private val RUNNING_EXECUTOR_PODS_LOCK = new Object + @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK") + private val runningExecutorsToPods = new mutable.HashMap[String, Pod] + private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]() + private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]() + private val disconnectedPodsByExecutorIdPendingRemoval = new ConcurrentHashMap[String, Pod]() + + private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + private implicit val requestExecutorContext = ExecutionContext.fromExecutorService( +requestExecutorsService) + + private val driverPod = kubernetesClient.pods() +.inNamespace(kubernetesNamespace) +.withName(kubernetesDriverPodName) +.get() + + protected override val minRegisteredRatio = +if (conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { + 0.8 +} else { + super.minRegisteredRatio +} + + private val executorWatchResource = new AtomicReference[Closeable] + private val totalExpectedExecutors = new AtomicInteger(0) + + private val driverUrl = RpcEndpointAddress( +conf.get("spark.driver.host"), +conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT), +CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + + private val initialExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf) + + private val podAllocationInterval = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val allocatorRunnable = new Runnable { + +// Maintains a map of executor id to count of checks performed to learn the loss reason +// for an executor. +private val
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373890 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList} +import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, MixedOperation, NonNamespaceOperation, PodResource} +import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, MockitoAnnotations} +import org.mockito.Matchers.{any, eq => mockitoEq} +import org.mockito.Mockito.{doNothing, never, times, verify, when} +import org.scalatest.BeforeAndAfter +import org.scalatest.mock.MockitoSugar._ +import scala.collection.JavaConverters._ +import scala.concurrent.Future + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc._ +import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor, RemoveExecutor} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.ThreadUtils + +class KubernetesClusterSchedulerBackendSuite +extends SparkFunSuite with BeforeAndAfter { --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373827 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.ConfigurationUtils +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.util.Utils + +/** + * A factory class for configuring and creating executor pods. + */ +private[spark] trait ExecutorPodFactory { + + /** + * Configure and construct an executor pod with the given parameters. + */ + def createExecutorPod( + executorId: String, + applicationId: String, + driverUrl: String, + executorEnvs: Seq[(String, String)], + driverPod: Pod, + nodeToLocalTaskCount: Map[String, Int]): Pod +} + +private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) + extends ExecutorPodFactory { + + import ExecutorPodFactoryImpl._ + + private val executorExtraClasspath = +sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH) + + private val executorLabels = ConfigurationUtils.parsePrefixedKeyValuePairs( +sparkConf, +KUBERNETES_EXECUTOR_LABEL_PREFIX, +"executor label") + require( +!executorLabels.contains(SPARK_APP_ID_LABEL), +s"Custom executor labels cannot contain $SPARK_APP_ID_LABEL as it is reserved for Spark.") + require( +!executorLabels.contains(SPARK_EXECUTOR_ID_LABEL), +s"Custom executor labels cannot contain $SPARK_EXECUTOR_ID_LABEL as it is reserved for" + + s" Spark.") + require( +!executorLabels.contains(SPARK_ROLE_LABEL), +s"Custom executor labels cannot contain $SPARK_ROLE_LABEL as it is reserved for Spark.") + + private val executorAnnotations = +ConfigurationUtils.parsePrefixedKeyValuePairs( + sparkConf, + KUBERNETES_EXECUTOR_ANNOTATION_PREFIX, + "executor annotation") + private val nodeSelector = +ConfigurationUtils.parsePrefixedKeyValuePairs( + sparkConf, + KUBERNETES_NODE_SELECTOR_PREFIX, + "node selector") + + private val executorDockerImage = sparkConf.get(EXECUTOR_DOCKER_IMAGE) + private val dockerImagePullPolicy = sparkConf.get(DOCKER_IMAGE_PULL_POLICY) + private val executorPort = sparkConf.getInt("spark.executor.port", DEFAULT_STATIC_PORT) + private val blockManagerPort = sparkConf +.getInt("spark.blockmanager.port", DEFAULT_BLOCKMANAGER_PORT) + + private val executorPodNamePrefix = sparkConf.get(KUBERNETES_EXECUTOR_POD_NAME_PREFIX) + + private val executorMemoryMiB = sparkConf.get(org.apache.spark.internal.config.EXECUTOR_MEMORY) + private val executorMemoryString = sparkConf.get( +org.apache.spark.internal.config.EXECUTOR_MEMORY.key, +org.apache.spark.internal.config.EXECUTOR_MEMORY.defaultValueString) + + private val memoryOverheadMiB = sparkConf +.get(KUBERNETES_EXECUTOR_MEMORY_OVERHEAD) +.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt, + MEMORY_OVERHEAD_MIN_MIB)) + private val executorMemoryWithOverhead = executorMemoryMiB + memoryOverheadMiB + + private val executorCores = sparkConf.getDouble("spark.executor.cores", 1) + private val executorLimitCores = sparkConf.getOption(KUBERNETES_EXECUTOR_LIMIT_CORES.key) + + override def createExecutorPod( + executorId: String, + applicationId: String, + driverUrl: String, + executorEnvs: Seq[(String, String)], + driverPod: Pod, +
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373817 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.ConfigurationUtils +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.util.Utils + +/** + * A factory class for configuring and creating executor pods. + */ +private[spark] trait ExecutorPodFactory { + + /** + * Configure and construct an executor pod with the given parameters. + */ + def createExecutorPod( + executorId: String, + applicationId: String, + driverUrl: String, + executorEnvs: Seq[(String, String)], + driverPod: Pod, + nodeToLocalTaskCount: Map[String, Int]): Pod +} + +private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) + extends ExecutorPodFactory { + + import ExecutorPodFactoryImpl._ + + private val executorExtraClasspath = +sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH) + + private val executorLabels = ConfigurationUtils.parsePrefixedKeyValuePairs( +sparkConf, +KUBERNETES_EXECUTOR_LABEL_PREFIX, +"executor label") + require( +!executorLabels.contains(SPARK_APP_ID_LABEL), +s"Custom executor labels cannot contain $SPARK_APP_ID_LABEL as it is reserved for Spark.") + require( +!executorLabels.contains(SPARK_EXECUTOR_ID_LABEL), +s"Custom executor labels cannot contain $SPARK_EXECUTOR_ID_LABEL as it is reserved for" + + s" Spark.") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373897 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList} +import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, MixedOperation, NonNamespaceOperation, PodResource} +import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, MockitoAnnotations} +import org.mockito.Matchers.{any, eq => mockitoEq} +import org.mockito.Mockito.{doNothing, never, times, verify, when} +import org.scalatest.BeforeAndAfter +import org.scalatest.mock.MockitoSugar._ +import scala.collection.JavaConverters._ +import scala.concurrent.Future + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.deploy.k8s.constants._ +import org.apache.spark.rpc._ +import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor, RemoveExecutor} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.ThreadUtils + +class KubernetesClusterSchedulerBackendSuite +extends SparkFunSuite with BeforeAndAfter { + + private val APP_ID = "test-spark-app" + private val DRIVER_POD_NAME = "spark-driver-pod" + private val NAMESPACE = "test-namespace" + private val SPARK_DRIVER_HOST = "localhost" + private val SPARK_DRIVER_PORT = 7077 + private val POD_ALLOCATION_INTERVAL = 60L + private val DRIVER_URL = RpcEndpointAddress( + SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r152373795 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import java.io.File + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.client.{Config, ConfigBuilder, DefaultKubernetesClient, KubernetesClient} +import io.fabric8.kubernetes.client.utils.HttpClientUtils +import okhttp3.Dispatcher + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.config._ +import org.apache.spark.util.ThreadUtils + +/** + * Spark-opinionated builder for Kubernetes clients. It uses a prefix plus common suffixes to + * parse configuration keys, similar to the manner in which Spark's SecurityManager parses SSL + * options for different components. + */ +private[spark] object SparkKubernetesClientFactory { + + def createKubernetesClient( + master: String, + namespace: Option[String], + kubernetesAuthConfPrefix: String, + sparkConf: SparkConf, + defaultServiceAccountToken: Option[File], + defaultServiceAccountCaCert: Option[File]): KubernetesClient = { +val oauthTokenFileConf = s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_FILE_CONF_SUFFIX" +val oauthTokenConf = s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_CONF_SUFFIX" +val oauthTokenFile = sparkConf.getOption(oauthTokenFileConf) + .map(new File(_)) + .orElse(defaultServiceAccountToken) +val oauthTokenValue = sparkConf.getOption(oauthTokenConf) +ConfigurationUtils.requireNandDefined( + oauthTokenFile, + oauthTokenValue, + s"Cannot specify OAuth token through both a file $oauthTokenFileConf and a " + +s"value $oauthTokenConf.") + +val caCertFile = sparkConf + .getOption(s"$kubernetesAuthConfPrefix.$CA_CERT_FILE_CONF_SUFFIX") + .orElse(defaultServiceAccountCaCert.map(_.getAbsolutePath)) +val clientKeyFile = sparkConf + .getOption(s"$kubernetesAuthConfPrefix.$CLIENT_KEY_FILE_CONF_SUFFIX") +val clientCertFile = sparkConf + .getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX") +val dispatcher = new Dispatcher( + ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) +val config = new ConfigBuilder() + .withApiVersion("v1") + .withMasterUrl(master) + .withWebsocketPingInterval(0) + .withOption(oauthTokenValue) { +(token, configBuilder) => configBuilder.withOauthToken(token) + }.withOption(oauthTokenFile) { +(file, configBuilder) => +configBuilder.withOauthToken(Files.toString(file, Charsets.UTF_8)) + }.withOption(caCertFile) { +(file, configBuilder) => configBuilder.withCaCertFile(file) + }.withOption(clientKeyFile) { +(file, configBuilder) => configBuilder.withClientKeyFile(file) + }.withOption(clientCertFile) { +(file, configBuilder) => configBuilder.withClientCertFile(file) + }.withOption(namespace) { +(ns, configBuilder) => configBuilder.withNamespace(ns) + }.build() +val baseHttpClient = HttpClientUtils.createHttpClient(config) +val httpClientWithCustomDispatcher = baseHttpClient.newBuilder() + .dispatcher(dispatcher) + .build() +new DefaultKubernetesClient(httpClientWithCustomDispatcher, config) + } + + private implicit class OptionConfigurableConfigBuilder(configBuilder: ConfigBuilder) { --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #84082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19753 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84076/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #84076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84076/testReport)** for PR 19779 at commit [`68ee79d`](https://github.com/apache/spark/commit/68ee79d6e37e36ed043eec2d376e6579ca4b9cee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19752 @kiszk actually checking your PR I think that the same issue addressed there would be handled also here by default. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19389 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84074/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19389 **[Test build #84074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84074/testReport)** for PR 19389 at commit [`c67f646`](https://github.com/apache/spark/commit/c67f646d050fcfd2cba92f8cd5514233b61733c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19774 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19774 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84075/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19774 **[Test build #84075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84075/testReport)** for PR 19774 at commit [`38ef638`](https://github.com/apache/spark/commit/38ef638a333e7ba9e3ab91204f072244f4816b89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > But I think we still need other eyes on this too. Agreed. @rxin can you help rope anyone else in on this? It's a big PR with a bigger history, but absent some savaging by another reviewer I believe it is close to the finish line. A lot of people are hoping this can make it into Spark 2.3. It has a huge performance impact for some Spark users, as evidenced by comments on this PR and VideoAmp's own experience. So I'm hoping we can get this merged to master before the 2.3 branch is cut. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #84081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84081/testReport)** for PR 19779 at commit [`a2560b3`](https://github.com/apache/spark/commit/a2560b3105e7ff712758573ca157703b4397de8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Can you give an example it would fail? We didn't change clipParquetSchema, so I think even when pruning happens, why we read a super set of the file's schema and cause the exception, according to the comment? We won't add new fields but remove existing from the file's schema, right? (Oddly, Github won't let me reply to this comment in line.) The situation we've run into is pruning a schema for a query over a partitioned Hive table backed by parquet files where some files are missing fields specified by the table schema. This can happen, e.g., in schema evolution where fields are added to the table over time without rewriting existing partitions. In those cases, we've found parquet-mr throws an exception if we try to read from that file with table-pruned schema (a superset of that file's schema). Therefore, we further clip the pruned schema against each file's schema before attempting to read. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84080/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #84080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)** for PR 19601 at commit [`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #84080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)** for PR 19601 at commit [`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
Github user vinodkc commented on a diff in the pull request: https://github.com/apache/spark/pull/19779#discussion_r152349156 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging { } } +test(s"$version: SPARK-17920: Insert into/overwrite external avro table") { + withTempDir { dir => +val path = dir.getAbsolutePath +val schemaPath = s"""$path${File.separator}avroschemadir""" + +new File(schemaPath).mkdir() +val avroSchema = + """{ +| "name": "test_record", +| "type": "record", +| "fields": [ { +|"name": "f0", +|"type": [ +| "null", +| { +|"precision": 38, +|"scale": 2, +|"type": "bytes", +|"logicalType": "decimal" +| } +|] +| } ] +|} + """.stripMargin +val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc""" +new java.io.PrintWriter(schemaurl) { write(avroSchema); close() } +val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal") +val srcLocation = new File(url.getFile) +val destTableName = "tab1" +val srcTableName = "tab2" + +withTable(srcTableName, destTableName) { + versionSpark.sql( +s""" + |CREATE TABLE $srcTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$srcLocation' + |TBLPROPERTIES ('avro.schema.url' = '$schemaurl') + """.stripMargin + ) + val destLocation = s"""$path${File.separator}destTableLocation""" + new File(destLocation).mkdir() + + versionSpark.sql( +s""" + |CREATE TABLE $destTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true') + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$destLocation' --- End diff -- Will change to 'CREATE EXTERNAL TABLE' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152347531 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -34,10 +34,10 @@ import org.apache.hadoop.fs.permission.FsAction import org.apache.hadoop.hdfs.DistributedFileSystem import org.apache.hadoop.hdfs.protocol.HdfsConstants import org.apache.hadoop.security.AccessControlException +import org.fusesource.leveldbjni.internal.NativeDB import org.apache.spark.{SecurityManager, SparkConf, SparkException} import org.apache.spark.deploy.SparkHadoopUtil -import org.apache.spark.deploy.history.config._ --- End diff -- this import is not used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19730 **[Test build #84079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)** for PR 19730 at commit [`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19786 **[Test build #84078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152345297 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -570,7 +575,6 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) } val logPath = fileStatus.getPath() -logInfo(s"Replaying log path: $logPath") --- End diff -- I remove this line because in the function call `replay` below, there is another duplicated log "Replaying log path: ..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org