[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-21 Thread MrBago
Github user MrBago commented on a diff in the pull request:

https://github.com/apache/spark/pull/19588#discussion_r152443133
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (
   // TODO: Check more carefully about whether this whole class will be 
included in a closure.
 
   /** Per-vector transform function */
-  private val transformFunc: Vector => Vector = {
+  private lazy val transformFunc: Vector => Vector = {
--- End diff --

I'm not super familiar with scala closures, but if we do
```
def transformFunc: Vector => Vector = {

val sortedCatFeatureIndices = categoryMaps.keys.toArray.sorted
val localVectorMap = categoryMaps
val localNumFeatures = numFeatures
val localHandleInvalid = getHandleInvalid
val f : Vector => Vector = { (v: Vector) => ...}
f
}
```
Won't we still get a closure and avoid serializing the entire model?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152442892
  
--- Diff: pom.xml ---
@@ -2648,6 +2648,13 @@
   
 
 
+
+  kubernetes
+  
+resource-managers/kubernetes/core
--- End diff --

both yarn and meson don't have a sub-directory called `core`, are we going 
to add more modules?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-21 Thread MrBago
Github user MrBago commented on a diff in the pull request:

https://github.com/apache/spark/pull/19588#discussion_r152442655
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (
   // TODO: Check more carefully about whether this whole class will be 
included in a closure.
 
   /** Per-vector transform function */
-  private val transformFunc: Vector => Vector = {
+  private lazy val transformFunc: Vector => Vector = {
--- End diff --

Can I ask about the lazy val here? What happens here if a user sets the 
params, calls transform, modifies one of the params, and then calls transform 
again? Is that a concern?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152442103
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-// place generated code of condition, true value and false value in 
separate methods if
-// their code combined is large
-val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
--- End diff --

BTW if it's really an issue, we can add splitting logic in 
non-leaf/non-unary nodes. This is much less work than before because:  1. no 
need to care about unary nodes 2. the splitting logic can be simpler because 
all children are guaranteed to generate less than 1000 LOC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19746
  
**[Test build #84089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84089/testReport)**
 for PR 19746 at commit 
[`2b1ed0a`](https://github.com/apache/spark/commit/2b1ed0a3a85385f9b4042415889335942b65b9c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-21 Thread MrBago
Github user MrBago commented on the issue:

https://github.com/apache/spark/pull/19746
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19767
  
@felixcheung I'd like to keep it in master only, it has larger impaction 
than other related PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152441210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-// place generated code of condition, true value and false value in 
separate methods if
-// their code combined is large
-val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
--- End diff --

There is no way to guarantee it with the current string based codegen 
framework, even without this PR. 1000 length code may also generate 64kb byte 
code in the end.

`1024` is not a good estimation at all, kind of random to me. So 
multiplying it with 2 doesn't seem a big issue. CASE WHEN may have issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
 @kiszk @maropu any of you wanna take this over? This patch becomes 
important as we now split codes more aggressively.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152436838
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-// place generated code of condition, true value and false value in 
separate methods if
-// their code combined is large
-val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
--- End diff --

Two problems I think for this. One is even the two childs' code don't 
exceed the threshold individually, a method not over 64k but over 8k is still 
big and bad for JIT. One is we estimate it with code length, I'm not sure if 
two 1000 length childs won't definitely generate 64k method in the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84088/testReport)**
 for PR 19468 at commit 
[`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19468
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19776
  
**[Test build #84087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84087/testReport)**
 for PR 19776 at commit 
[`7a19ac6`](https://github.com/apache/spark/commit/7a19ac63fcdae6b67ff989ca90d4a3652c7d02f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread jliwork
Github user jliwork commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152429647
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.test.SharedSQLContext
+
+class DataSourceStrategySuite extends PlanTest with SharedSQLContext {
+
+  test("translate simple expression") {
+val attrInt = 'cint.int
+val attrStr = 'cstr.string
+
+testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 
1)))
+testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 
1)))
+
+testTranslateFilter(EqualNullSafe(attrStr, Literal(null)),
+  Some(sources.EqualNullSafe("cstr", null)))
+testTranslateFilter(EqualNullSafe(Literal(null), attrStr),
+  Some(sources.EqualNullSafe("cstr", null)))
+
+testTranslateFilter(GreaterThan(attrInt, 1), 
Some(sources.GreaterThan("cint", 1)))
+testTranslateFilter(GreaterThan(1, attrInt), 
Some(sources.LessThan("cint", 1)))
+
+testTranslateFilter(LessThan(attrInt, 1), 
Some(sources.LessThan("cint", 1)))
+testTranslateFilter(LessThan(1, attrInt), 
Some(sources.GreaterThan("cint", 1)))
+
+testTranslateFilter(GreaterThanOrEqual(attrInt, 1), 
Some(sources.GreaterThanOrEqual("cint", 1)))
+testTranslateFilter(GreaterThanOrEqual(1, attrInt), 
Some(sources.LessThanOrEqual("cint", 1)))
+
+testTranslateFilter(LessThanOrEqual(attrInt, 1), 
Some(sources.LessThanOrEqual("cint", 1)))
+testTranslateFilter(LessThanOrEqual(1, attrInt), 
Some(sources.GreaterThanOrEqual("cint", 1)))
+
+testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), 
Some(sources.In("cint", Array(1, 2, 3
+
+testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", 
Array(1, 2, 3
+
+testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint")))
+testTranslateFilter(IsNotNull(attrInt), 
Some(sources.IsNotNull("cint")))
+
+// cint > 1 AND cint < 10
+testTranslateFilter(And(
+  GreaterThan(attrInt, 1),
+  LessThan(attrInt, 10)),
+  Some(sources.And(
+sources.GreaterThan("cint", 1),
+sources.LessThan("cint", 10
+
+// cint >= 8 OR cint <= 2
+testTranslateFilter(Or(
+  GreaterThanOrEqual(attrInt, 8),
+  LessThanOrEqual(attrInt, 2)),
+  Some(sources.Or(
+sources.GreaterThanOrEqual("cint", 8),
+sources.LessThanOrEqual("cint", 2
+
+testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)),
+  Some(sources.Not(sources.GreaterThanOrEqual("cint", 8
+
+testTranslateFilter(StartsWith(attrStr, "a"), 
Some(sources.StringStartsWith("cstr", "a")))
+
+testTranslateFilter(EndsWith(attrStr, "a"), 
Some(sources.StringEndsWith("cstr", "a")))
+
+testTranslateFilter(Contains(attrStr, "a"), 
Some(sources.StringContains("cstr", "a")))
+  }
+
+  test("translate complex expression") {
+val attrInt = 'cint.int
+
+// ABS(cint) - 2 = 1
+testTranslateFilter(LessThanOrEqual(
+  // Expressions are not supported
--- End diff --

good catch @_@ fixed the typo. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19776
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152425778
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.test.SharedSQLContext
+
+class DataSourceStrategySuite extends PlanTest with SharedSQLContext {
+
+  test("translate simple expression") {
+val attrInt = 'cint.int
+val attrStr = 'cstr.string
+
+testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 
1)))
+testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 
1)))
+
+testTranslateFilter(EqualNullSafe(attrStr, Literal(null)),
+  Some(sources.EqualNullSafe("cstr", null)))
+testTranslateFilter(EqualNullSafe(Literal(null), attrStr),
+  Some(sources.EqualNullSafe("cstr", null)))
+
+testTranslateFilter(GreaterThan(attrInt, 1), 
Some(sources.GreaterThan("cint", 1)))
+testTranslateFilter(GreaterThan(1, attrInt), 
Some(sources.LessThan("cint", 1)))
+
+testTranslateFilter(LessThan(attrInt, 1), 
Some(sources.LessThan("cint", 1)))
+testTranslateFilter(LessThan(1, attrInt), 
Some(sources.GreaterThan("cint", 1)))
+
+testTranslateFilter(GreaterThanOrEqual(attrInt, 1), 
Some(sources.GreaterThanOrEqual("cint", 1)))
+testTranslateFilter(GreaterThanOrEqual(1, attrInt), 
Some(sources.LessThanOrEqual("cint", 1)))
+
+testTranslateFilter(LessThanOrEqual(attrInt, 1), 
Some(sources.LessThanOrEqual("cint", 1)))
+testTranslateFilter(LessThanOrEqual(1, attrInt), 
Some(sources.GreaterThanOrEqual("cint", 1)))
+
+testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), 
Some(sources.In("cint", Array(1, 2, 3
+
+testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", 
Array(1, 2, 3
+
+testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint")))
+testTranslateFilter(IsNotNull(attrInt), 
Some(sources.IsNotNull("cint")))
+
+// cint > 1 AND cint < 10
+testTranslateFilter(And(
+  GreaterThan(attrInt, 1),
+  LessThan(attrInt, 10)),
+  Some(sources.And(
+sources.GreaterThan("cint", 1),
+sources.LessThan("cint", 10
+
+// cint >= 8 OR cint <= 2
+testTranslateFilter(Or(
+  GreaterThanOrEqual(attrInt, 8),
+  LessThanOrEqual(attrInt, 2)),
+  Some(sources.Or(
+sources.GreaterThanOrEqual("cint", 8),
+sources.LessThanOrEqual("cint", 2
+
+testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)),
+  Some(sources.Not(sources.GreaterThanOrEqual("cint", 8
+
+testTranslateFilter(StartsWith(attrStr, "a"), 
Some(sources.StringStartsWith("cstr", "a")))
+
+testTranslateFilter(EndsWith(attrStr, "a"), 
Some(sources.StringEndsWith("cstr", "a")))
+
+testTranslateFilter(Contains(attrStr, "a"), 
Some(sources.StringContains("cstr", "a")))
+  }
+
+  test("translate complex expression") {
+val attrInt = 'cint.int
+
+// ABS(cint) - 2 = 1
+testTranslateFilter(LessThanOrEqual(
+  // Expressions are not supported
--- End diff --

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-11-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15770
  
ping @yanboliang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...

2017-11-21 Thread jliwork
Github user jliwork commented on the issue:

https://github.com/apache/spark/pull/19776
  
Thanks for everyone's comments! I have polished the test cases. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread jliwork
Github user jliwork commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152421950
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 ---
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+class DataSourceStrategySuite extends PlanTest with SharedSQLContext {
+
+  test("translate simple expression") {
+val attrInt = 'cint.int
+val attrStr = 'cstr.string
+
+assertResult(Some(sources.EqualTo("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.EqualTo(attrInt, 1))
+}
--- End diff --

Thanks! I've followed your suggestion and the test suite looks cleaner now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread jliwork
Github user jliwork commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152422048
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 ---
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+class DataSourceStrategySuite extends PlanTest with SharedSQLContext {
+
+  test("translate simple expression") {
+val attrInt = 'cint.int
+val attrStr = 'cstr.string
+
+assertResult(Some(sources.EqualTo("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.EqualTo(attrInt, 1))
+}
+assertResult(Some(sources.EqualTo("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.EqualTo(1, attrInt))
+}
+
+assertResult(Some(sources.EqualNullSafe("cstr", null))) {
+  DataSourceStrategy.translateFilter(
+expressions.EqualNullSafe(attrStr, Literal(null)))
+}
+assertResult(Some(sources.EqualNullSafe("cstr", null))) {
+  DataSourceStrategy.translateFilter(
+expressions.EqualNullSafe(Literal(null), attrStr))
+}
+
+assertResult(Some(sources.GreaterThan("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.GreaterThan(attrInt, 1))
+}
+assertResult(Some(sources.GreaterThan("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.LessThan(1, attrInt))
+}
+
+assertResult(Some(sources.LessThan("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.LessThan(attrInt, 1))
+}
+assertResult(Some(sources.LessThan("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.GreaterThan(1, attrInt))
+}
+
+assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.GreaterThanOrEqual(attrInt, 1))
+}
+assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.LessThanOrEqual(1, attrInt))
+}
+
+assertResult(Some(sources.LessThanOrEqual("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.LessThanOrEqual(attrInt, 1))
+}
+assertResult(Some(sources.LessThanOrEqual("cint", 1))) {
+  DataSourceStrategy.translateFilter(
+expressions.GreaterThanOrEqual(1, attrInt))
+}
+
+assertResult(Some(sources.In("cint", Array(1, 2, 3 {
+  DataSourceStrategy.translateFilter(
+expressions.InSet(attrInt, Set(1, 2, 3)))
+}
+
+assertResult(Some(sources.In("cint", Array(1, 2, 3 {
+  DataSourceStrategy.translateFilter(
+expressions.In(attrInt, Seq(1, 2, 3)))
+}
+
+assertResult(Some(sources.IsNull("cint"))) {
+  DataSourceStrategy.translateFilter(
+expressions.IsNull(attrInt))
+}
+assertResult(Some(sources.IsNotNull("cint"))) {
+  DataSourceStrategy.translateFilter(
+expressions.IsNotNull(attrInt))
+}
+
+assertResult(Some(sources.And(
+  sources.GreaterThan("cint", 1),
+  sources.LessThan("cint", 10 {
+  DataSourceStrategy.translateFilter(expressions.And(
+expressions.GreaterThan(attrInt, 1),
+expressions.LessThan(attrInt, 10)
+  ))
+}
+
+assertResult(Some(sources.Or(
+  sources.GreaterThanOrEqual("cint", 8),
+  

[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread jliwork
Github user jliwork commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152421982
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -296,9 +296,15 @@ class JDBCSuite extends SparkFunSuite
 // The older versions of spark have this kind of bugs in parquet data 
source.
 val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 
'mary')")
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

2017-11-21 Thread jliwork
Github user jliwork commented on a diff in the pull request:

https://github.com/apache/spark/pull/19776#discussion_r152421773
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 ---
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions
+import org.apache.spark.sql.catalyst.expressions._
--- End diff --

Thanks for the suggestion. Fixed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19776: [SPARK-22548][SQL] Incorrect nested AND expression pushe...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19776
  
**[Test build #84086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84086/testReport)**
 for PR 19776 at commit 
[`a0b3d4e`](https://github.com/apache/spark/commit/a0b3d4e990cd7024b532593bca321499001fc89b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84082/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/19468
  
retest this please

Failure looks unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #84082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)**
 for PR 19601 at commit 
[`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84084/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84084/testReport)**
 for PR 19468 at commit 
[`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152417537
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -105,6 +105,36 @@ abstract class Expression extends TreeNode[Expression] 
{
   val isNull = ctx.freshName("isNull")
   val value = ctx.freshName("value")
   val ve = doGenCode(ctx, ExprCode("", isNull, value))
+
+  // TODO: support whole stage codegen too
+  if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && 
ctx.currentVars == null) {
+val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") {
+  val globalIsNull = ctx.freshName("globalIsNull")
+  ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = 
false;")
+  val localIsNull = ve.isNull
+  ve.isNull = globalIsNull
+  s"$globalIsNull = $localIsNull;"
+} else {
+  ""
+}
+
+val javaType = ctx.javaType(dataType)
+val newValue = ctx.freshName("value")
+
+val funcName = ctx.freshName(nodeName)
+val funcFullName = ctx.addNewFunction(funcName,
+  s"""
+ |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) {
--- End diff --

cc @rednaxelafx @mgaido91 too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r152417440
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins.
+ *
+ * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
+ * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
+ */
+object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (SQLConf.get.constraintPropagationEnabled) {
+  inferJoinConditions(plan)
+} else {
+  plan
+}
+  }
+
+  private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan 
transform {
+case join @ Join(left, right, Cross, conditionOpt) =>
+  val leftConstraints = 
join.constraints.filter(_.references.subsetOf(left.outputSet))
+  val rightConstraints = 
join.constraints.filter(_.references.subsetOf(right.outputSet))
+  val inferredJoinPredicates = inferJoinPredicates(leftConstraints, 
rightConstraints)
+
+  val newConditionOpt = conditionOpt match {
+case Some(condition) =>
+  val existingPredicates = splitConjunctivePredicates(condition)
+  val newPredicates = findNewPredicates(inferredJoinPredicates, 
existingPredicates)
+  if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), 
condition)) else None
+case None =>
+  inferredJoinPredicates.reduceOption(And)
+  }
+  if (newConditionOpt.isDefined) Join(left, right, Inner, 
newConditionOpt) else join
--- End diff --

Please see the rule `CheckCartesianProducts `. The example above is not a 
CROSS join. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r152412423
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins.
+ *
+ * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
+ * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
+ */
+object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
--- End diff --

Yes. Since we decide to focus on cross join only, we should rename it to 
`EliminateCrossJoin `, like what you proposed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r152415250
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,79 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins. For other join types, adding inferred join 
conditions would potentially
+ * shuffle children as child node's partitioning won't satisfy the JOIN 
node's requirements
+ * which otherwise could have.
+ *
+ * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
+ * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
+ */
+object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (SQLConf.get.constraintPropagationEnabled) {
+  inferJoinConditions(plan)
+} else {
+  plan
+}
+  }
+
+  private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan 
transform {
+case join @ Join(left, right, Cross, conditionOpt) =>
+
+  val rightEqualToPredicates = join.constraints.collect {
+case equalTo @ EqualTo(attr: Attribute, _) if 
isAttributeContainedInPlan(attr, right) =>
+  equalTo
+case equalTo @ EqualTo(_, attr: Attribute) if 
isAttributeContainedInPlan(attr, right) =>
+  equalTo
+  }
+
+  val inferredJoinPredicates = join.constraints.flatMap {
+case EqualTo(attr: Attribute, equivalentExpr) if 
isAttributeContainedInPlan(attr, left) =>
+  collectJoinPredicates(attr, equivalentExpr, right, 
rightEqualToPredicates)
+case EqualTo(equivalentExpr, attr: Attribute) if 
isAttributeContainedInPlan(attr, left) =>
+  collectJoinPredicates(attr, equivalentExpr, right, 
rightEqualToPredicates)
+case _ => Nil
+  }
+
+  val newConditionOpt = conditionOpt match {
+case Some(condition) =>
+  val existingPredicates = splitConjunctivePredicates(condition)
+  val newPredicates = findNewPredicates(inferredJoinPredicates, 
existingPredicates)
+  if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), 
condition)) else None
+case None =>
+  inferredJoinPredicates.reduceOption(And)
+  }
+  if (newConditionOpt.isDefined) Join(left, right, Inner, 
newConditionOpt) else join
+  }
+
+  private def collectJoinPredicates(
+  leftAttr: Attribute,
+  equivalentExpr: Expression,
+  rightPlan: LogicalPlan,
+  rightPlanEqualToPredicates: Set[EqualTo]): Set[EqualTo] = {
+
+rightPlanEqualToPredicates.collect {
+  case EqualTo(attr: Attribute, expr)
+if expr.semanticEquals(equivalentExpr) && 
isAttributeContainedInPlan(attr, rightPlan) =>
+EqualTo(leftAttr, attr)
+  case EqualTo(expr, attr: Attribute)
+if expr.semanticEquals(equivalentExpr) && 
isAttributeContainedInPlan(attr, rightPlan) =>
+EqualTo(leftAttr, attr)
+}
+  }
+
+  private def isAttributeContainedInPlan(attr: Attribute, logicalPlan: 
LogicalPlan): Boolean = {
+attr.references.subsetOf(logicalPlan.outputSet)
+  }
+
+  private def findNewPredicates(
+  inferredPredicates: Set[EqualTo],
+  existingPredicates: Seq[Expression]) : Set[EqualTo] = {
--- End diff --

`existingPredicates: Seq[Expression]) : Set[EqualTo] = {`
-> `existingPredicates: Seq[Expression]): Set[EqualTo] = {`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152417333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -105,6 +105,36 @@ abstract class Expression extends TreeNode[Expression] 
{
   val isNull = ctx.freshName("isNull")
   val value = ctx.freshName("value")
   val ve = doGenCode(ctx, ExprCode("", isNull, value))
+
+  // TODO: support whole stage codegen too
+  if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && 
ctx.currentVars == null) {
+val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") {
+  val globalIsNull = ctx.freshName("globalIsNull")
+  ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = 
false;")
+  val localIsNull = ve.isNull
+  ve.isNull = globalIsNull
+  s"$globalIsNull = $localIsNull;"
+} else {
+  ""
+}
+
+val javaType = ctx.javaType(dataType)
+val newValue = ctx.freshName("value")
+
+val funcName = ctx.freshName(nodeName)
+val funcFullName = ctx.addNewFunction(funcName,
+  s"""
+ |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) {
--- End diff --

To continue the discussion in 
https://github.com/apache/spark/pull/19767#discussion_r151631456

I think there are more global variables can be eliminated by leveraging the 
method return value. However in some cases, we use global variables to avoid 
creating an object for each iteration, then we are facing a trade-off between 
GC overhead and global variable overhead. It would be great if java has 
something like C struct and can allocate objects on method stack...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r152414088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-// place generated code of condition, true value and false value in 
separate methods if
-// their code combined is large
-val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
--- End diff --

I already explained it in 
https://github.com/apache/spark/pull/19767#issuecomment-345176286

Mostly it's ok because the threshold is just an estimation, not a big deal 
to make it 2 times larger. CASE WHEN may be a problem and we can evaluate it in 
#18641 after this PR gets merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19767
  
**[Test build #84085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84085/testReport)**
 for PR 19767 at commit 
[`d126977`](https://github.com/apache/spark/commit/d126977bbdd221287b0825fa78c04b1065d97ab1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19730
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19786
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84078/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19786
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19786
  
**[Test build #84078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)**
 for PR 19786 at commit 
[`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19730


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84081/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #84081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84081/testReport)**
 for PR 19779 at commit 
[`a2560b3`](https://github.com/apache/spark/commit/a2560b3105e7ff712758573ca157703b4397de8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...

2017-11-21 Thread ebuildy
Github user ebuildy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19792#discussion_r152397263
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1108,19 +1109,22 @@ def _has_nulltype(dt):
 return isinstance(dt, NullType)
 
 
-def _merge_type(a, b):
+def _merge_type(a, b, name=None):
 if isinstance(a, NullType):
 return b
 elif isinstance(b, NullType):
 return a
 elif type(a) is not type(b):
 # TODO: type cast (such as int -> long)
-raise TypeError("Can not merge type %s and %s" % (type(a), 
type(b)))
+if name is not None:
--- End diff --

Easier to read as:

```
if name is None:
raise TypeError("Can not merge type %s and %s" % (type(a), type(b)))
else:
raise TypeError("Can not merge type %s and %s in column %s" % (type(a), 
type(b), name))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19730
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84079/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19730
  
**[Test build #84079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)**
 for PR 19730 at commit 
[`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19774


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19389: [SPARK-22165][SQL] Fixes type conflicts between d...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19389


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19774
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19389
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84084/testReport)**
 for PR 19468 at commit 
[`b75b413`](https://github.com/apache/spark/commit/b75b4136352d4606a41ce2b3fe1c7e31fdf71ffc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152383898
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

This bug is for external table only? how about managed table?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-21 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18906
  
I believe the equivalent API in Scala would only be in the following form 
when registering a UDF
```
spark.udf.register("func", () => { 1 }).asNonNullable()
```
Would it be preferable to just stick with a similar API for Python if we 
are trying to match the behavior?

> So I think with the performance improvements coming into Python UDFs 
considering annotating results as nullable or not could make sense (although I 
imagine we'd need to do something differeent for the vectorized UDFs if they 
aren't already being done).

Regarding performance increases with vectorized UDFs, right now the Java 
side is only implemented to accept nullable return types, so there wouldn't be 
any difference.  In the future it would be possible to accept either and that 
would give a little performance bump.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84077/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #84077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84077/testReport)**
 for PR 19370 at commit 
[`a4d516f`](https://github.com/apache/spark/commit/a4d516f2326e39e3f04af3395a66da8e11964ce8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84083/testReport)**
 for PR 19468 at commit 
[`7afce3f`](https://github.com/apache/spark/commit/7afce3f1d61df2ecba3efd2019ca4d287fc0bffb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84083/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84083/testReport)**
 for PR 19468 at commit 
[`7afce3f`](https://github.com/apache/spark/commit/7afce3f1d61df2ecba3efd2019ca4d287fc0bffb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152374715
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/constants.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+package object constants {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373882
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+import javax.annotation.concurrent.GuardedBy
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.concurrent.{ExecutionContext, Future}
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, 
SchedulerBackendUtils}
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+requestExecutorsService)
+
+  private val driverPod = kubernetesClient.pods()
+.inNamespace(kubernetesNamespace)
+.withName(kubernetesDriverPodName)
+.get()
+
+  protected override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+conf.get("spark.driver.host"),
+conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = 
SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)
+
+  private val podAllocationInterval = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val allocatorRunnable = new Runnable {
+
+// Maintains a map of executor id to count of checks performed to 
learn the loss reason
+// for an executor.
+private val 

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373908
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
 ---
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{ExecutorService, ScheduledExecutorService, 
TimeUnit}
+
+import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, 
PodList}
+import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, 
MixedOperation, NonNamespaceOperation, PodResource}
+import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, 
MockitoAnnotations}
+import org.mockito.Matchers.{any, eq => mockitoEq}
+import org.mockito.Mockito.{doNothing, never, times, verify, when}
+import org.scalatest.BeforeAndAfter
+import org.scalatest.mock.MockitoSugar._
+import scala.collection.JavaConverters._
+import scala.concurrent.Future
+
+import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc._
+import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, 
SlaveLost, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor,
 RemoveExecutor}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.ThreadUtils
+
+class KubernetesClusterSchedulerBackendSuite
+extends SparkFunSuite with BeforeAndAfter {
+
+  private val APP_ID = "test-spark-app"
+  private val DRIVER_POD_NAME = "spark-driver-pod"
+  private val NAMESPACE = "test-namespace"
+  private val SPARK_DRIVER_HOST = "localhost"
+  private val SPARK_DRIVER_PORT = 7077
+  private val POD_ALLOCATION_INTERVAL = 60L
+  private val DRIVER_URL = RpcEndpointAddress(
+  SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, 
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+  private val FIRST_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod1")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node1")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.100")
+  .endStatus()
+.build()
+  private val SECOND_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod2")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node2")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.101")
+  .endStatus()
+.build()
+
+  private type PODS = MixedOperation[Pod, PodList, DoneablePod, 
PodResource[Pod, DoneablePod]]
+  private type LABELED_PODS = FilterWatchListDeletable[
+  Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]]
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373938
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ---
@@ -22,15 +22,14 @@ import java.util.concurrent._
 import java.util.concurrent.atomic.AtomicInteger
 import java.util.regex.Pattern
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
-import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
-import scala.util.control.NonFatal
-
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.client.api.AMRMClient
 import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
 import org.apache.hadoop.yarn.conf.YarnConfiguration
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue}
+import scala.util.control.NonFatal
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373927
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
 ---
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{ExecutorService, ScheduledExecutorService, 
TimeUnit}
+
+import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, 
PodList}
+import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, 
MixedOperation, NonNamespaceOperation, PodResource}
+import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, 
MockitoAnnotations}
+import org.mockito.Matchers.{any, eq => mockitoEq}
+import org.mockito.Mockito.{doNothing, never, times, verify, when}
+import org.scalatest.BeforeAndAfter
+import org.scalatest.mock.MockitoSugar._
+import scala.collection.JavaConverters._
+import scala.concurrent.Future
+
+import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc._
+import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, 
SlaveLost, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor,
 RemoveExecutor}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.ThreadUtils
+
+class KubernetesClusterSchedulerBackendSuite
+extends SparkFunSuite with BeforeAndAfter {
+
+  private val APP_ID = "test-spark-app"
+  private val DRIVER_POD_NAME = "spark-driver-pod"
+  private val NAMESPACE = "test-namespace"
+  private val SPARK_DRIVER_HOST = "localhost"
+  private val SPARK_DRIVER_PORT = 7077
+  private val POD_ALLOCATION_INTERVAL = 60L
+  private val DRIVER_URL = RpcEndpointAddress(
+  SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, 
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+  private val FIRST_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod1")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node1")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.100")
+  .endStatus()
+.build()
+  private val SECOND_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod2")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node2")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.101")
+  .endStatus()
+.build()
+
+  private type PODS = MixedOperation[Pod, PodList, DoneablePod, 
PodResource[Pod, DoneablePod]]
+  private type LABELED_PODS = FilterWatchListDeletable[
+  Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]]
+  private type IN_NAMESPACE_PODS = NonNamespaceOperation[
+  Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]]
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373862
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+import javax.annotation.concurrent.GuardedBy
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.concurrent.{ExecutionContext, Future}
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, 
SchedulerBackendUtils}
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+requestExecutorsService)
+
+  private val driverPod = kubernetesClient.pods()
+.inNamespace(kubernetesNamespace)
+.withName(kubernetesDriverPodName)
+.get()
+
+  protected override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+conf.get("spark.driver.host"),
+conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = 
SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)
+
+  private val podAllocationInterval = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val allocatorRunnable = new Runnable {
+
+// Maintains a map of executor id to count of checks performed to 
learn the loss reason
+// for an executor.
+private val 

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373871
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+import javax.annotation.concurrent.GuardedBy
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.concurrent.{ExecutionContext, Future}
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, 
SchedulerBackendUtils}
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+requestExecutorsService)
+
+  private val driverPod = kubernetesClient.pods()
+.inNamespace(kubernetesNamespace)
+.withName(kubernetesDriverPodName)
+.get()
+
+  protected override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+conf.get("spark.driver.host"),
+conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = 
SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)
+
+  private val podAllocationInterval = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val allocatorRunnable = new Runnable {
+
+// Maintains a map of executor id to count of checks performed to 
learn the loss reason
+// for an executor.
+private val 

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373840
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.io.Closeable
+import java.net.InetAddress
+import java.util.concurrent.{ConcurrentHashMap, ExecutorService, 
ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, 
AtomicReference}
+import javax.annotation.concurrent.GuardedBy
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.{KubernetesClient, 
KubernetesClientException, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.concurrent.{ExecutionContext, Future}
+
+import org.apache.spark.SparkException
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc.{RpcAddress, RpcEndpointAddress, RpcEnv}
+import org.apache.spark.scheduler.{ExecutorExited, SlaveLost, 
TaskSchedulerImpl}
+import org.apache.spark.scheduler.cluster.{CoarseGrainedSchedulerBackend, 
SchedulerBackendUtils}
+import org.apache.spark.util.Utils
+
+private[spark] class KubernetesClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+rpcEnv: RpcEnv,
+executorPodFactory: ExecutorPodFactory,
+kubernetesClient: KubernetesClient,
+allocatorExecutor: ScheduledExecutorService,
+requestExecutorsService: ExecutorService)
+  extends CoarseGrainedSchedulerBackend(scheduler, rpcEnv) {
+
+  import KubernetesClusterSchedulerBackend._
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+  private val RUNNING_EXECUTOR_PODS_LOCK = new Object
+  @GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
+  private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
+  private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
+  private val podsWithKnownExitReasons = new ConcurrentHashMap[String, 
ExecutorExited]()
+  private val disconnectedPodsByExecutorIdPendingRemoval = new 
ConcurrentHashMap[String, Pod]()
+
+  private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+  private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
+requestExecutorsService)
+
+  private val driverPod = kubernetesClient.pods()
+.inNamespace(kubernetesNamespace)
+.withName(kubernetesDriverPodName)
+.get()
+
+  protected override val minRegisteredRatio =
+if 
(conf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
+  0.8
+} else {
+  super.minRegisteredRatio
+}
+
+  private val executorWatchResource = new AtomicReference[Closeable]
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val driverUrl = RpcEndpointAddress(
+conf.get("spark.driver.host"),
+conf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
+CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+
+  private val initialExecutors = 
SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)
+
+  private val podAllocationInterval = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val allocatorRunnable = new Runnable {
+
+// Maintains a map of executor id to count of checks performed to 
learn the loss reason
+// for an executor.
+private val 

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373890
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
 ---
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{ExecutorService, ScheduledExecutorService, 
TimeUnit}
+
+import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, 
PodList}
+import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, 
MixedOperation, NonNamespaceOperation, PodResource}
+import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, 
MockitoAnnotations}
+import org.mockito.Matchers.{any, eq => mockitoEq}
+import org.mockito.Mockito.{doNothing, never, times, verify, when}
+import org.scalatest.BeforeAndAfter
+import org.scalatest.mock.MockitoSugar._
+import scala.collection.JavaConverters._
+import scala.concurrent.Future
+
+import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc._
+import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, 
SlaveLost, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor,
 RemoveExecutor}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.ThreadUtils
+
+class KubernetesClusterSchedulerBackendSuite
+extends SparkFunSuite with BeforeAndAfter {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373827
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
 ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.util.Utils
+
+/**
+ * A factory class for configuring and creating executor pods.
+ */
+private[spark] trait ExecutorPodFactory {
+
+  /**
+   * Configure and construct an executor pod with the given parameters.
+   */
+  def createExecutorPod(
+  executorId: String,
+  applicationId: String,
+  driverUrl: String,
+  executorEnvs: Seq[(String, String)],
+  driverPod: Pod,
+  nodeToLocalTaskCount: Map[String, Int]): Pod
+}
+
+private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
+  extends ExecutorPodFactory {
+
+  import ExecutorPodFactoryImpl._
+
+  private val executorExtraClasspath =
+sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH)
+
+  private val executorLabels = 
ConfigurationUtils.parsePrefixedKeyValuePairs(
+sparkConf,
+KUBERNETES_EXECUTOR_LABEL_PREFIX,
+"executor label")
+  require(
+!executorLabels.contains(SPARK_APP_ID_LABEL),
+s"Custom executor labels cannot contain $SPARK_APP_ID_LABEL as it is 
reserved for Spark.")
+  require(
+!executorLabels.contains(SPARK_EXECUTOR_ID_LABEL),
+s"Custom executor labels cannot contain $SPARK_EXECUTOR_ID_LABEL as it 
is reserved for" +
+  s" Spark.")
+  require(
+!executorLabels.contains(SPARK_ROLE_LABEL),
+s"Custom executor labels cannot contain $SPARK_ROLE_LABEL as it is 
reserved for Spark.")
+
+  private val executorAnnotations =
+ConfigurationUtils.parsePrefixedKeyValuePairs(
+  sparkConf,
+  KUBERNETES_EXECUTOR_ANNOTATION_PREFIX,
+  "executor annotation")
+  private val nodeSelector =
+ConfigurationUtils.parsePrefixedKeyValuePairs(
+  sparkConf,
+  KUBERNETES_NODE_SELECTOR_PREFIX,
+  "node selector")
+
+  private val executorDockerImage = sparkConf.get(EXECUTOR_DOCKER_IMAGE)
+  private val dockerImagePullPolicy = 
sparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val executorPort = sparkConf.getInt("spark.executor.port", 
DEFAULT_STATIC_PORT)
+  private val blockManagerPort = sparkConf
+.getInt("spark.blockmanager.port", DEFAULT_BLOCKMANAGER_PORT)
+
+  private val executorPodNamePrefix = 
sparkConf.get(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
+
+  private val executorMemoryMiB = 
sparkConf.get(org.apache.spark.internal.config.EXECUTOR_MEMORY)
+  private val executorMemoryString = sparkConf.get(
+org.apache.spark.internal.config.EXECUTOR_MEMORY.key,
+org.apache.spark.internal.config.EXECUTOR_MEMORY.defaultValueString)
+
+  private val memoryOverheadMiB = sparkConf
+.get(KUBERNETES_EXECUTOR_MEMORY_OVERHEAD)
+.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
+  MEMORY_OVERHEAD_MIN_MIB))
+  private val executorMemoryWithOverhead = executorMemoryMiB + 
memoryOverheadMiB
+
+  private val executorCores = sparkConf.getDouble("spark.executor.cores", 
1)
+  private val executorLimitCores = 
sparkConf.getOption(KUBERNETES_EXECUTOR_LIMIT_CORES.key)
+
+  override def createExecutorPod(
+  executorId: String,
+  applicationId: String,
+  driverUrl: String,
+  executorEnvs: Seq[(String, String)],
+  driverPod: Pod,
+  

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373817
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
 ---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.util.Utils
+
+/**
+ * A factory class for configuring and creating executor pods.
+ */
+private[spark] trait ExecutorPodFactory {
+
+  /**
+   * Configure and construct an executor pod with the given parameters.
+   */
+  def createExecutorPod(
+  executorId: String,
+  applicationId: String,
+  driverUrl: String,
+  executorEnvs: Seq[(String, String)],
+  driverPod: Pod,
+  nodeToLocalTaskCount: Map[String, Int]): Pod
+}
+
+private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
+  extends ExecutorPodFactory {
+
+  import ExecutorPodFactoryImpl._
+
+  private val executorExtraClasspath =
+sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH)
+
+  private val executorLabels = 
ConfigurationUtils.parsePrefixedKeyValuePairs(
+sparkConf,
+KUBERNETES_EXECUTOR_LABEL_PREFIX,
+"executor label")
+  require(
+!executorLabels.contains(SPARK_APP_ID_LABEL),
+s"Custom executor labels cannot contain $SPARK_APP_ID_LABEL as it is 
reserved for Spark.")
+  require(
+!executorLabels.contains(SPARK_EXECUTOR_ID_LABEL),
+s"Custom executor labels cannot contain $SPARK_EXECUTOR_ID_LABEL as it 
is reserved for" +
+  s" Spark.")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373897
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
 ---
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{ExecutorService, ScheduledExecutorService, 
TimeUnit}
+
+import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, 
PodList}
+import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, 
MixedOperation, NonNamespaceOperation, PodResource}
+import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, 
MockitoAnnotations}
+import org.mockito.Matchers.{any, eq => mockitoEq}
+import org.mockito.Mockito.{doNothing, never, times, verify, when}
+import org.scalatest.BeforeAndAfter
+import org.scalatest.mock.MockitoSugar._
+import scala.collection.JavaConverters._
+import scala.concurrent.Future
+
+import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.deploy.k8s.constants._
+import org.apache.spark.rpc._
+import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, 
SlaveLost, TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor,
 RemoveExecutor}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.ThreadUtils
+
+class KubernetesClusterSchedulerBackendSuite
+extends SparkFunSuite with BeforeAndAfter {
+
+  private val APP_ID = "test-spark-app"
+  private val DRIVER_POD_NAME = "spark-driver-pod"
+  private val NAMESPACE = "test-namespace"
+  private val SPARK_DRIVER_HOST = "localhost"
+  private val SPARK_DRIVER_PORT = 7077
+  private val POD_ALLOCATION_INTERVAL = 60L
+  private val DRIVER_URL = RpcEndpointAddress(
+  SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, 
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-11-21 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19468#discussion_r152373795
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.client.{Config, ConfigBuilder, 
DefaultKubernetesClient, KubernetesClient}
+import io.fabric8.kubernetes.client.utils.HttpClientUtils
+import okhttp3.Dispatcher
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.config._
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Spark-opinionated builder for Kubernetes clients. It uses a prefix plus 
common suffixes to
+ * parse configuration keys, similar to the manner in which Spark's 
SecurityManager parses SSL
+ * options for different components.
+ */
+private[spark] object SparkKubernetesClientFactory {
+
+  def createKubernetesClient(
+  master: String,
+  namespace: Option[String],
+  kubernetesAuthConfPrefix: String,
+  sparkConf: SparkConf,
+  defaultServiceAccountToken: Option[File],
+  defaultServiceAccountCaCert: Option[File]): KubernetesClient = {
+val oauthTokenFileConf = 
s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_FILE_CONF_SUFFIX"
+val oauthTokenConf = 
s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_CONF_SUFFIX"
+val oauthTokenFile = sparkConf.getOption(oauthTokenFileConf)
+  .map(new File(_))
+  .orElse(defaultServiceAccountToken)
+val oauthTokenValue = sparkConf.getOption(oauthTokenConf)
+ConfigurationUtils.requireNandDefined(
+  oauthTokenFile,
+  oauthTokenValue,
+  s"Cannot specify OAuth token through both a file $oauthTokenFileConf 
and a " +
+s"value $oauthTokenConf.")
+
+val caCertFile = sparkConf
+  .getOption(s"$kubernetesAuthConfPrefix.$CA_CERT_FILE_CONF_SUFFIX")
+  .orElse(defaultServiceAccountCaCert.map(_.getAbsolutePath))
+val clientKeyFile = sparkConf
+  .getOption(s"$kubernetesAuthConfPrefix.$CLIENT_KEY_FILE_CONF_SUFFIX")
+val clientCertFile = sparkConf
+  
.getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX")
+val dispatcher = new Dispatcher(
+  ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
+val config = new ConfigBuilder()
+  .withApiVersion("v1")
+  .withMasterUrl(master)
+  .withWebsocketPingInterval(0)
+  .withOption(oauthTokenValue) {
+(token, configBuilder) => configBuilder.withOauthToken(token)
+  }.withOption(oauthTokenFile) {
+(file, configBuilder) =>
+configBuilder.withOauthToken(Files.toString(file, 
Charsets.UTF_8))
+  }.withOption(caCertFile) {
+(file, configBuilder) => configBuilder.withCaCertFile(file)
+  }.withOption(clientKeyFile) {
+(file, configBuilder) => configBuilder.withClientKeyFile(file)
+  }.withOption(clientCertFile) {
+(file, configBuilder) => configBuilder.withClientCertFile(file)
+  }.withOption(namespace) {
+(ns, configBuilder) => configBuilder.withNamespace(ns)
+  }.build()
+val baseHttpClient = HttpClientUtils.createHttpClient(config)
+val httpClientWithCustomDispatcher = baseHttpClient.newBuilder()
+  .dispatcher(dispatcher)
+  .build()
+new DefaultKubernetesClient(httpClientWithCustomDispatcher, config)
+  }
+
+  private implicit class OptionConfigurableConfigBuilder(configBuilder: 
ConfigBuilder) {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #84082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)**
 for PR 19601 at commit 
[`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19753


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84076/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #84076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84076/testReport)**
 for PR 19779 at commit 
[`68ee79d`](https://github.com/apache/spark/commit/68ee79d6e37e36ed043eec2d376e6579ca4b9cee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

2017-11-21 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19752
  
@kiszk actually checking your PR I think that the same issue addressed 
there would be handled also here by default. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84074/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Fixes type conflicts between double, ...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19389
  
**[Test build #84074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84074/testReport)**
 for PR 19389 at commit 
[`c67f646`](https://github.com/apache/spark/commit/c67f646d050fcfd2cba92f8cd5514233b61733c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84075/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN command

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19774
  
**[Test build #84075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84075/testReport)**
 for PR 19774 at commit 
[`38ef638`](https://github.com/apache/spark/commit/38ef638a333e7ba9e3ab91204f072244f4816b89).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
> But I think we still need other eyes on this too.

Agreed.

@rxin can you help rope anyone else in on this? It's a big PR with a bigger 
history, but absent some savaging by another reviewer I believe it is close to 
the finish line.

A lot of people are hoping this can make it into Spark 2.3. It has a huge 
performance impact for some Spark users, as evidenced by comments on this PR 
and VideoAmp's own experience. So I'm hoping we can get this merged to master 
before the 2.3 branch is cut.

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #84081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84081/testReport)**
 for PR 19779 at commit 
[`a2560b3`](https://github.com/apache/spark/commit/a2560b3105e7ff712758573ca157703b4397de8b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
> Can you give an example it would fail? We didn't change 
clipParquetSchema, so I think even when pruning happens, why we read a super 
set of the file's schema and cause the exception, according to the comment? We 
won't add new fields but remove existing from the file's schema, right?

(Oddly, Github won't let me reply to this comment in line.)

The situation we've run into is pruning a schema for a query over a 
partitioned Hive table backed by parquet files where some files are missing 
fields specified by the table schema. This can happen, e.g., in schema 
evolution where fields are added to the table over time without rewriting 
existing partitions. In those cases, we've found parquet-mr throws an exception 
if we try to read from that file with table-pruned schema (a superset of that 
file's schema). Therefore, we further clip the pruned schema against each 
file's schema before attempting to read.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84080/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #84080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)**
 for PR 19601 at commit 
[`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #84080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)**
 for PR 19601 at commit 
[`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152349156
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

Will change to 'CREATE EXTERNAL TABLE'


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

2017-11-21 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19786#discussion_r152347531
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -34,10 +34,10 @@ import org.apache.hadoop.fs.permission.FsAction
 import org.apache.hadoop.hdfs.DistributedFileSystem
 import org.apache.hadoop.hdfs.protocol.HdfsConstants
 import org.apache.hadoop.security.AccessControlException
+import org.fusesource.leveldbjni.internal.NativeDB
 
 import org.apache.spark.{SecurityManager, SparkConf, SparkException}
 import org.apache.spark.deploy.SparkHadoopUtil
-import org.apache.spark.deploy.history.config._
--- End diff --

this import is not used.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19730
  
**[Test build #84079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)**
 for PR 19730 at commit 
[`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19786
  
**[Test build #84078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)**
 for PR 19786 at commit 
[`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

2017-11-21 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19786#discussion_r152345297
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -570,7 +575,6 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
 
 val logPath = fileStatus.getPath()
-logInfo(s"Replaying log path: $logPath")
--- End diff --

I remove this line because in the function call `replay` below, there is 
another duplicated log "Replaying log path: ..."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >