from:"sadhen"

[GitHub] spark issue #23268: [SPARK-26319][SQL][Test] Add appendReadColumns Unit Test...

2018-12-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
@HyukjinKwon I've updated the desc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23268: [SPARK-26319][SQL][Test] Add appendReadColumns Unit Test...

2018-12-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
@HyukjinKwon  Please re-review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
I've revert the refactor commit.

I'm wondering if I need to create a issue for a unit test only PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
OK, I will modify the PR several hours later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240103941
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Just a idiomatic way to write Scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
Nevermind, just do not like the coding style personally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098806
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

For comprehension is just a syntax sugar and is not performant at all. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098620
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

As for performance:

Current Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (549 milliseconds)
```

Previous Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (949 milliseconds)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/23268

[Hive][Minor] Refactor on HiveShim and Add Unit Tests

## What changes were proposed in this pull request?

Refactor on HiveShim, and add Unit Tests.

## How was this patch tested?
```
$ build/sbt
> project hive
> testOnly *HiveShimSuite
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark refactor/hiveshim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23268


commit 987fa18a3b50e117fc839201fcc29c166732486e
Author: Darcy Shen 
Date:   2018-12-10T06:32:35Z

Add unit tests for HiveShim appendReadColumns

commit b68e7b1421f977c7256573564b12b4cc07d31f4a
Author: Darcy Shen 
Date:   2018-12-10T06:38:19Z

Refactor




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-11 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22685
  
OK. Nevermind.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-11 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/22685


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22685
  
@HyukjinKwon  Sorry. Happened to find that some code is not elegant 
according to my taste. I have not got the fact that massive changes are not 
good for backporting. As a result, I spent some time trying the make the code 
better.

I have commented on the most important ones according to my standard. 
Please review. And the rest, I will revert.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224054521
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -147,7 +147,7 @@ private[sql] object SparkUserDefinedFunction {
   f: AnyRef,
   dataType: DataType,
   inputSchemas: Option[Seq[ScalaReflection.Schema]]): 
UserDefinedFunction = {
-val udf = new UserDefinedFunction(f, dataType, 
inputSchemas.map(_.map(_.dataType)))
+val udf = UserDefinedFunction(f, dataType, 
inputSchemas.map(_.map(_.dataType)))
--- End diff --

useless `new` for case class


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224054362
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -95,7 +95,7 @@ object SQLMetrics {
 
   def createMetric(sc: SparkContext, name: String): SQLMetric = {
 val acc = new SQLMetric(SUM_METRIC)
-acc.register(sc, name = Some(name), countFailedValues = false)
+acc.register(sc, name = Some(name))
--- End diff --

duplicated assignment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224053894
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -567,10 +567,10 @@ class KeyValueGroupedDataset[K, V] private[sql](
   override def toString: String = {
 val builder = new StringBuilder
 val kFields = kExprEnc.schema.map {
-  case f => s"${f.name}: ${f.dataType.simpleString(2)}"
+  f => s"${f.name}: ${f.dataType.simpleString(2)}"
--- End diff --

Should not be a partial function!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224054020
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -188,7 +188,7 @@ private[sql] object SQLUtils extends Logging {
 dataType match {
   case 's' =>
 // Read StructType for DataFrame
-val fields = SerDe.readList(dis, jvmObjectTracker = 
null).asInstanceOf[Array[Object]]
+val fields = SerDe.readList(dis, jvmObjectTracker = null)
--- End diff --

useless asInstanceOf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224053653
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala 
---
@@ -123,7 +123,7 @@ object StaticSQLConf {
   val UI_RETAINED_EXECUTIONS =
 buildStaticConf("spark.sql.ui.retainedExecutions")
   .doc("Number of executions to retain in the Spark UI.")
-  .intConf
+  .longConf
--- End diff --

This involves a implicit `int2long` cast.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [WIP][SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22685
  
@HyukjinKwon  OK, most of changes are related to parens. I will consider to 
reset it back for backporting friendliness.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22685

[SQL][MINOR][Refactor] Refactor sql/core

## What changes were proposed in this pull request?
Only minor changes on Scala syntax.

## How was this patch tested?
Existing Tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark minor/refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22685


commit 2d0debb0abd401b2fe68fe0b934a4a6939f7cb27
Author: Darcy Shen 
Date:   2018-10-10T08:58:22Z

refactor sql/core according to Intellij




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22577: [CORE][MINOR] Fix obvious error and compiling for Scala ...

2018-09-28 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22577
  
@HyukjinKwon  Yes, just confirmed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22577: [CORE][MINOR] Fix obvious error and compiling for Scala ...

2018-09-28 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22577
  
@cloud-fan  The bug is introduced by #19748

branch-2.2 is fine, but branch-2.3 should be fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22577: [CORE][MINOR] Fix obvious error and compiling for Scala ...

2018-09-27 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22577
  
@cloud-fan 

see:


https://github.com/scala/scala/blob/2.12.x/src/compiler/scala/tools/nsc/typechecker/RefChecks.scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22577: [CORE][MINOR] Fix obvious error and compiling for Scala ...

2018-09-27 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22577
  
Error Message:

```
$ dev/change-scala-version.sh 2.12


â  spark git:(branch-2.4) â sbt -Dscala.version=2.12.7
sbt (spark)> project core
sbt (core)> clean
sbt (core)> compile

[error] [warn] 
/Users/rendong/wdi/spark/core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala:178:
 Option[String] and String are unrelated: they will most likely never compare 
equal
[error] [warn] app.attempts.filter(_.attemptId == 
attemptId).headOption
[error] [warn]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22577: [CORE][MINOR] Fix obvious error and compiling for Scala ...

2018-09-27 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22577
  
Dear Maintainers, please backport it to branch-2.4 if it is accepted. 
@cloud-fan @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22577: [CORE][MINOR] Fix obvious error and compiling for...

2018-09-27 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22577

[CORE][MINOR] Fix obvious error and compiling for Scala 2.12.7

## What changes were proposed in this pull request?

Fix an obvious error.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark minor_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22577


commit 1fa56185d69e5ba9a23d911ccaf844a967880068
Author: Darcy Shen 
Date:   2018-09-28T03:33:07Z

Fix obvious error and compiling for Scala 2.12.7




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-11 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22355
  
@maropu there are tips to view the codegen:

> spark.sql("explain codegen select 1 + 1").show(false)

You may explain an aggregation, and the NoOp check is obvious.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-07 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r216116648
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of 
the specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value 
of each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(expressions.map(BindReferences.bindReference(_, inputSchema)))
+
+  private[this] val buffer = new Array[Any](expressions.size)
+
+  override def initialize(partitionIndex: Int): Unit = {
+expressions.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
+
+  private[this] val exprArray = expressions.toArray
+  private[this] var mutableRow: InternalRow = new 
GenericInternalRow(exprArray.length)
+  def currentValue: InternalRow = mutableRow
+
+  override def target(row: InternalRow): MutableProjection = {
+mutableRow = row
+this
+  }
+
+  override def apply(input: InternalRow): InternalRow = {
+var i = 0
+while (i < exprArray.length) {
+  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
+  buffer(i) = exprArray(i).eval(input)
+  i += 1
+}
+i = 0
+while (i < exprArray.length) {
+  mutableRow(i) = buffer(i)
+  i += 1
+}
+mutableRow
+  }
+}
--- End diff --

Should be easy to create a test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-06 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r215862272
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of 
the specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value 
of each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(expressions.map(BindReferences.bindReference(_, inputSchema)))
+
+  private[this] val buffer = new Array[Any](expressions.size)
+
+  override def initialize(partitionIndex: Int): Unit = {
+expressions.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
+
+  private[this] val exprArray = expressions.toArray
+  private[this] var mutableRow: InternalRow = new 
GenericInternalRow(exprArray.length)
+  def currentValue: InternalRow = mutableRow
+
+  override def target(row: InternalRow): MutableProjection = {
+mutableRow = row
+this
+  }
+
+  override def apply(input: InternalRow): InternalRow = {
+var i = 0
+while (i < exprArray.length) {
+  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
+  buffer(i) = exprArray(i).eval(input)
+  i += 1
+}
+i = 0
+while (i < exprArray.length) {
+  mutableRow(i) = buffer(i)
+  i += 1
+}
+mutableRow
+  }
+}
--- End diff --

The improvement (written 3 months ago) is based on the generated Java code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-06 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r215859282
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of 
the specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value 
of each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(expressions.map(BindReferences.bindReference(_, inputSchema)))
+
+  private[this] val buffer = new Array[Any](expressions.size)
+
+  override def initialize(partitionIndex: Int): Unit = {
+expressions.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
+
+  private[this] val exprArray = expressions.toArray
+  private[this] var mutableRow: InternalRow = new 
GenericInternalRow(exprArray.length)
+  def currentValue: InternalRow = mutableRow
+
+  override def target(row: InternalRow): MutableProjection = {
+mutableRow = row
+this
+  }
+
+  override def apply(input: InternalRow): InternalRow = {
+var i = 0
+while (i < exprArray.length) {
+  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
+  buffer(i) = exprArray(i).eval(input)
+  i += 1
+}
+i = 0
+while (i < exprArray.length) {
+  mutableRow(i) = buffer(i)
+  i += 1
+}
+mutableRow
+  }
+}
--- End diff --

This is my improved version of InterpretedMutableProject:

```
  override def apply(input: InternalRow): InternalRow = {
var i = 0
while (i < exprArray.length) {
  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
  if (exprArray(i) != NoOp) {
buffer(i) = exprArray(i).eval(input)
  }
  i += 1
}
i = 0
while (i < exprArray.length) {
  if (exprArray(i) != NoOp) {
mutableRow(i) = buffer(i)
  }
  i += 1
}
mutableRow
  }
```

The AggregationIterator uses NoOp. If you replace the codegen one with the 
interpreted one. You will encounter an exception from `NoOp.eval`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22310: [Spark-25298][Build] Improve build definition for Scala ...

2018-09-02 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22310
  
The problem of package hierarchy is fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22310: [Spark-25298][Build] Improve build definition for Scala ...

2018-09-02 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22310
  
@srowen Sorry I should have explained why I made these changes.

The follow steps failed to compile:
```
$ ./dev/change-scala-version.sh 2.12
$ ./build/sbt -Dscala-2.12 -Dscala.version=2.12.6
> project repl
> compile
```

After these changes, the commands are simplified and the compile works:
```
$ ./dev/change-scala-version.sh 2.12
$ ./build/sbt -Dscala.version=2.12.6
> project repl
> compile
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22308: [SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite SPARK-...

2018-09-02 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22308
  
@srowen  The 2.12 jar is compiled and packaged from `Main.scala` and 
`MyCoolClass.scala`. Not a copy of 2.10 jar. Diff it, you will verify it.

The steps to generate it:

```
mvn install // install the 2.4.0-SNAPSHOT at project root
sbt package // package at 
sql/hive/src/test/resources/regression-test-SPARK-8489
```

This is the build.sbt:
```
scalaVersion := 2.12.6
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.4.0-SNAPSHOT"
)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22308: [SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite SPARK-...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22308
  
@cloud-fan @srowen 

This PR will make our jenkins build green.


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/204/testReport/
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22308: [SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite SPARK-...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22308
  
see https://github.com/apache/spark/pull/12924 and 
https://github.com/apache/spark/pull/11744


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22310: [Spark-25298][Build] Improve build definition for Scala ...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22310
  
@srowen  It is about the sbt convention.

see my demo project: https://github.com/sadhen/spark-25298


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22308: [SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite SPARK-...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22308
  
Remove the deprecated 2.10 jar.

And build a 2.12 jar.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22310: [Spark-25298][Build] Improve build definition for Scala ...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22310
  
@srowen  This is a better solution for #22280

No hard-coded 2.11 or  2.12 any more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22310: [Spark-25298][Build] Improve build definition for...

2018-09-01 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22310

[Spark-25298][Build] Improve build definition for Scala 2.12

## What changes were proposed in this pull request?

Improve build for Scala 2.12.

## How was this patch tested?

```
./dev/change-scala-version.sh 2.12

##  For Maven
./build/mvn -Pscala-2.12 [mvn commands]
##  For SBT
sbt -Dscala.version=2.12.6
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK-25298

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22310


commit 080df1328c0665c6f1d72a9b956d0013e5811127
Author: Darcy Shen 
Date:   2018-09-01T15:18:23Z

attempt to improve build

commit 7f1b3f40e50ca60aa7efaf6f731f96d896ed871b
Author: Darcy Shen 
Date:   2018-09-01T15:25:13Z

tuning for maven

commit 7c0b9b955b34f19e337619c369bf7c468b7654e1
Author: Darcy Shen 
Date:   2018-09-01T15:46:47Z

more docs for scala 2.12




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22308: [SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite...

2018-09-01 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22308

[SPARK-25304][SQL][TEST] Fix HiveSparkSubmitSuite SPARK-8489 test for Scala 
2.12

## What changes were proposed in this pull request?

remove test-2.10.jar and add test-2.12.jar.

## How was this patch tested?

```
$ sbt -Dscala-2.12
> ++ 2.12.6
> project hive
> testOnly *HiveSparkSubmitSuite -- -z "8489"
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK-8489-FOLLOWUP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22308


commit 221fcac7e38429686a992f7cf6e8b09dc55924ac
Author: Darcy Shen 
Date:   2018-09-01T14:02:30Z

enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12, and remove the 
Scala 2.10 jar




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22304: [SPARK-25297][Streaming][Test] Fix blocking unit ...

2018-09-01 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/22304


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22304: [SPARK-25297][Streaming][Test] Fix blocking unit tests f...

2018-09-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22304
  
@srowen @cloud-fan  The merged #22292 already fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22304: [SPARK-25297][Streaming][Test] Fix blocking unit ...

2018-08-31 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22304#discussion_r214342975
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala
 ---
@@ -65,7 +65,8 @@ private[streaming] class FileBasedWriteAheadLog(
 "WriteAheadLogManager" + callerName.map(c => s" for $c").getOrElse("")
   }
   private val forkJoinPool = ThreadUtils.newForkJoinPool(threadpoolName, 
20)
-  private val executionContext = 
ExecutionContext.fromExecutorService(forkJoinPool)
+  private val executionContext = ExecutionContext
+.fromExecutorService(forkJoinPool, { e: Throwable => throw e })
--- End diff --

The default reporter does not throw the exception in Scala 2.11 either.

I guess there are bugs in the Future impl in Scala 2.12, since there is a 
big change at Future.scala in Scala 2.12.

There is a temporary fix.

I will dig into Future.scala, and hopefully propose a PR for Scala.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22304: [SPARK-25297][Streaming][Test] Fix blocking unit tests f...

2018-08-31 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22304
  
There is simplified project for this specific problem.

https://github.com/sadhen/blocking-future


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22304: [SPARK-25297][Streaming][Test] Fix blocking unit ...

2018-08-31 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22304

[SPARK-25297][Streaming][Test] Fix blocking unit tests for Scala 2.12

## What changes were proposed in this pull request?

Customize ExecutorContext's reporter to fix blocking unit tests for Scala 
2.12

## How was this patch tested?
```
./dev/change-scala-version.sh 2.12
$ sbt -Dscala-2.12
> ++2.12.6
> project streaming
> testOnly *FileBasedWriteAheadLogWithFileCloseAfterWriteSuite
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark blocking_future

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22304


commit 60eec773c5dcc605e2491c6c77e60f3ca0d7b6a8
Author: å¿å¬ 
Date:   2018-08-31T12:13:56Z

fix blocking unit tests by providing customized reporter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
@srowen  A PR for this "bug" is proposed: 
https://github.com/scala/scala/pull/7156

Hopefully, Scala 2.12.7 will fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
 scala/bug#11123 had been added into 
https://github.com/scala/bug/milestone/93 .

I will spare some time working on it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
```
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
  /_/

Using Scala version 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val rowOfDoubleWrappedArray = spark.sql("select array(cast (1.0 as 
double) , cast(1.0 as double))").collect.head
rowOfDoubleWrappedArray: org.apache.spark.sql.Row = [WrappedArray(1.0, 1.0)]

scala> val rowOfIntWrappedArray = spark.sql("select array(1, 
1)").collect.head
rowOfIntWrappedArray: org.apache.spark.sql.Row = [WrappedArray(1, 1)]

scala> rowOfDoubleWrappedArray == rowOfIntWrappedArray
res5: Boolean = false
```

And for Scala 2.11, the result is true.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
Comment resolved, please review. @srowen @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...

2018-08-30 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22264#discussion_r214069522
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -290,6 +290,16 @@ object QueryTest {
 Row.fromSeq(row.toSeq.map {
   case null => null
   case d: java.math.BigDecimal => BigDecimal(d)
+  // Equality of WrappedArray differs for AnyVal and AnyRef in Scala 
2.12.2+
+  case seq: Seq[_] => seq.map {
--- End diff --

@maropu 


```
scala> 1.toShort == 1.toDouble
res9: Boolean = true

scala> 1.toShort == true
:12: warning: comparing values of types Short and Boolean using 
`==' will always yield false
   1.toShort == true
 ^
res10: Boolean = false
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22280: [SPARK-25235] [Build] [SHELL] [FOLLOWUP] Fix repl compil...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22280
  
This PR together with #22264 should make the scala-2.12 jenkins job work.


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22280: [SPARK-25235] [Build] [SHELL] [FOLLOWUP] Fix repl compil...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22280
  
This PR intends to fix the build error by mvn. 

I have no idea about 
`scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala` .

Maybe we should spare some time to eliminate the code under `scala-2.11`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...

2018-08-30 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22264#discussion_r214064282
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -290,6 +290,16 @@ object QueryTest {
 Row.fromSeq(row.toSeq.map {
   case null => null
   case d: java.math.BigDecimal => BigDecimal(d)
+  // Equality of WrappedArray differs for AnyVal and AnyRef in Scala 
2.12.2+
+  case seq: Seq[_] => seq.map {
--- End diff --

This is the full list from Row.scala
```
   *   BooleanType -> java.lang.Boolean
   *   ByteType -> java.lang.Byte
   *   ShortType -> java.lang.Short
   *   IntegerType -> java.lang.Integer
   *   LongType -> java.lang.Long
   *   FloatType -> java.lang.Float
   *   DoubleType -> java.lang.Double
   *   StringType -> String
   *   DecimalType -> java.math.BigDecimal
   *
   *   DateType -> java.sql.Date
   *   TimestampType -> java.sql.Timestamp
   *
   *   BinaryType -> byte array
   *   ArrayType -> scala.collection.Seq (use getList for java.util.List)
   *   MapType -> scala.collection.Map (use getJavaMap for java.util.Map)
   *   StructType -> org.apache.spark.sql.Row
```

Byte, Short, Integer, Long,Float, Double are handled because they extends 
`java.lang.Number`.

Equality for `AnyVal`s is specially handled.

e.g. In `Short.scala`:

```
  def ==(x : scala.Byte) : scala.Boolean
  def ==(x : scala.Short) : scala.Boolean
  def ==(x : scala.Char) : scala.Boolean
  def ==(x : scala.Int) : scala.Boolean
  def ==(x : scala.Long) : scala.Boolean
  def ==(x : scala.Float) : scala.Boolean
  def ==(x : scala.Double) : scala.Boolean
```

And scala.Char is not a value in the spark Row.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22280: [SPARK-25235] [Build] [SHELL] [FOLLOWUP] Fix repl compil...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22280
  
First, `scala-2.12` and `scala-2.11` is the conventions from sbt. 
`scala-2.11` won't be compiled if we are using 2.12.

Actually, only 
`scala-2.11/src/main/scala` is effective 
in the original `pom.xml`. And 
`scala-2.11/src/test/scala` is a 
non-exist dir.

Add a `scala-2.12` profile, override `extra.source.dir` and 
`extra.testsource.dir` with two **standard** dirs.

We may not care about whether the dirs are empty, we care about the 
consistent behavior with sbt.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
The fix works for both 2.11 and 2.12.

And I reported a bug: https://github.com/scala/bug/issues/11123


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22280: [SPARK-25235] [Build] [SHELL] [FOLLOWUP] Fix repl...

2018-08-30 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22280

[SPARK-25235] [Build] [SHELL] [FOLLOWUP] Fix repl compile for 2.12

## What changes were proposed in this pull request?

Error messages from 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/183/

```
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ 
spark-repl_2.12 ---
[INFO] Using zinc server for incremental compilation
[warn] Pruning sources from previous analysis, due to incompatible 
CompileSetup.
[info] Compiling 6 Scala sources to 
/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/repl/target/scala-2.12/classes...
[error] 
/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala:80:
 overriding lazy value importableSymbolsWithRenames in class ImportHandler of 
type List[(this.intp.global.Symbol, this.intp.global.Name)];
[error]  lazy value importableSymbolsWithRenames needs `override' modifier
[error]   lazy val importableSymbolsWithRenames: List[(Symbol, Name)] = 
{
[error]^
[warn] 
/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:53:
 variable addedClasspath in class ILoop is deprecated (since 2.11.0): use 
reset, replay or require to update class path
[warn]   if (addedClasspath != "") {
[warn]   ^
[warn] 
/home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:54:
 variable addedClasspath in class ILoop is deprecated (since 2.11.0): use 
reset, replay or require to update class path
[warn] settings.classpath append addedClasspath
[warn]   ^
[warn] two warnings found
[error] one error found
[error] Compile failed at Aug 29, 2018 5:28:22 PM [0.679s]
```

Readd the profile for `scala-2.12`. Using `-Pscala-2.12` will overrides 
`extra.source.dir` and `extra.testsource.dir` with two non-exist directories.

## How was this patch tested?
```
dev/change-scala-version.sh 2.12

mvn -Pscala-2.12 -DskipTests compile install
```



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK_24785_FOLLOWUP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22280


commit de4f543a98834b519a4ac0b5f638e176e1c06967
Author: å¿å¬ 
Date:   2018-08-30T07:21:46Z

fix repl compile for 2.12




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [SPARK-25256][SQL] Plan mismatch errors in Hive t...

2018-08-29 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/22264


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [SPARK-25256][SQL] Plan mismatch errors in Hive t...

2018-08-29 Thread sadhen

GitHub user sadhen reopened a pull request:

https://github.com/apache/spark/pull/22264

[SPARK-25256][SQL] Plan mismatch errors in Hive tests in 2.12

## What changes were proposed in this pull request?

### For `SPARK-5775 read array from 
partitioned_parquet_with_key_and_complextypes`:

scala2.12
```
scala> (1 to 10).toString
res4: String = Range 1 to 10
```

scala2.11
```
scala> (1 to 10).toString
res2: String = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
```
And 

```
  def prepareAnswer(answer: Seq[Row], isSorted: Boolean): Seq[Row] = {
val converted: Seq[Row] = answer.map(prepareRow)
if (!isSorted) converted.sortBy(_.toString()) else converted
  }
```
sortBy `_.toString` is not a good idea.

### Other failures are caused by

```
Array(Int.box(1)).toSeq == Array(Double.box(1.0)).toSeq
```

It is false in 2.12.2 + and is true in 2.11.x , 2.12.0, 2.12.1

## How was this patch tested?

This is a  patch on a specific unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK25256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22264


commit 8d009e1d4bc9582cf349474a9e61086690cf3943
Author: å¿å¬ 
Date:   2018-08-29T08:16:26Z

fix equality for Range and WrappedArray

commit d7f2e3763cfd03f537eaef98a34a5e7eac5d3d6d
Author: å¿å¬ 
Date:   2018-08-30T03:35:14Z

fix equlity for WrappedArray.ofRef




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL] Plan mismatch errors in Hive tests in...

2018-08-29 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
@srowen  please review, and this PR should be rebased on #22260 and then 
tested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22260: [SQL][MINOR] Fix compiling for scala 2.12

2018-08-29 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22260
  
@maropu @srowen please review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22260: [SQL][MINOR] Fix compiling for scala 2.12

2018-08-29 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22260#discussion_r213884979
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala
 ---
@@ -38,7 +38,7 @@ private[execution] case class 
ProjectionOverSchema(schema: StructType) {
   case GetArrayItem(child, arrayItemOrdinal) =>
 getProjection(child).map { projection => GetArrayItem(projection, 
arrayItemOrdinal) }
   case a: GetArrayStructFields =>
-getProjection(a.child).map(p => (p, p.dataType)).map {
+getProjection(a.child).map(p => (p, p.dataType)).collect {
--- End diff --

To use `collect`, one must dive into `SPARK-4502`.

This PR is intended for #22264 .

I will preserve the error behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22260: [SQL][MINOR] Fix compiling for scala 2.12

2018-08-29 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22260#discussion_r213884559
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -19,7 +19,7 @@ package org.apache.spark.sql.hive
 
 import java.io.{BufferedWriter, File, FileWriter}
 
-import scala.tools.nsc.Properties
+import scala.util.Properties
--- End diff --

`scala.tools` is part of scala-compiler


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22260: [SQL][MINOR] Fix compiling for scala 2.12

2018-08-29 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22260#discussion_r213884075
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -19,7 +19,7 @@ package org.apache.spark.sql.hive
 
 import java.io.{BufferedWriter, File, FileWriter}
 
-import scala.tools.nsc.Properties
+import scala.util.Properties
--- End diff --

Yes. And

`scala.util.Properties` is better than `scala.tools.nsc.Properties` !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [WIP][SPARK-25256][SQL] Plan mismatch errors in Hive tes...

2018-08-29 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
To simplify:

```
Array(Int.box(1)).toSeq == Array(Double.box(1.0)).toSeq
```
is `false` in `2.12.x` and is `true` in `2.11.x`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [WIP][SPARK-25256][SQL] Plan mismatch errors in Hive tes...

2018-08-29 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
@srowen  Still work in progress.

There are three unit test failures in total.

The first one is fixed.

And this is the cause of the second one:

```
Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala> import collection.mutable.WrappedArray
import collection.mutable.WrappedArray

scala> new WrappedArray.ofRef(Array(1, 1).map(_.asInstanceOf[AnyRef]))
res0: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = WrappedArray(1, 
1)

scala> new WrappedArray.ofRef(Array(1.0, 1.0).map(_.asInstanceOf[AnyRef]))
res1: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = 
WrappedArray(1.0, 1.0)

scala> res0 == res1
res2: Boolean = true
```

```
scala2.12â¨â¨scala> new WrappedArray.ofRef(Array(1, 
1).map(_.asInstanceOf[AnyRef]))
res18: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = 
WrappedArray(1, 1)

scala> new WrappedArray.ofRef(Array(1.0, 1.0).map(_.asInstanceOf[AnyRef]))
res19: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = 
WrappedArray(1.0, 1.0)

scala> res18.getClass
res20: Class[_ <: scala.collection.mutable.WrappedArray.ofRef[AnyRef]] = 
class scala.collection.mutable.WrappedArray$ofRef

scala> res19.getClass
res21: Class[_ <: scala.collection.mutable.WrappedArray.ofRef[AnyRef]] = 
class scala.collection.mutable.WrappedArray$ofRef

scala> res18 == res19
res22: Boolean = false
```

A behavior change introduced by https://github.com/scala/scala/pull/5551 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [WIP][SPARK-25044][SQL] Plan mismatch errors in H...

2018-08-29 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22264

[WIP][SPARK-25044][SQL] Plan mismatch errors in Hive tests in 2.12

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK25256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22264


commit 8d009e1d4bc9582cf349474a9e61086690cf3943
Author: å¿å¬ 
Date:   2018-08-29T08:16:26Z

fix equality for Range and WrappedArray




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22260: [SQL][MINOR] Fix compiling for scala 2.12

2018-08-29 Thread sadhen

Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22260#discussion_r213573236
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala
 ---
@@ -38,7 +38,7 @@ private[execution] case class 
ProjectionOverSchema(schema: StructType) {
   case GetArrayItem(child, arrayItemOrdinal) =>
 getProjection(child).map { projection => GetArrayItem(projection, 
arrayItemOrdinal) }
   case a: GetArrayStructFields =>
-getProjection(a.child).map(p => (p, p.dataType)).map {
+getProjection(a.child).map(p => (p, p.dataType)).collect {
--- End diff --

Well, there is a semantic change using `collect`. I will provide a unit 
test to verify the change or simply using `sys.error`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22260

[MINOR] Fix scala 2.12 build using collect

## What changes were proposed in this pull request?
Introduced by #21320 
```
[error] [warn] 
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala:41:
 match may not be exhaustive.
[error] It would fail on the following inputs: (_, ArrayType(_, _)), (_, _)
[error] [warn] getProjection(a.child).map(p => (p, p.dataType)).map 
{
[error] [warn]
[error] [warn] 
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala:52:
 match may not be exhaustive.
[error] It would fail on the following input: (_, _)
[error] [warn] getProjection(child).map(p => (p, p.dataType)).map {
[error] [warn]
```

```
$ sbt
> ++2.12.6
> project sql
> compile
```

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark fix_exhaustive_match

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22260


commit 626f26598ccfc692cca59e29a2d7861133654ef0
Author: å¿å¬ 
Date:   2018-08-29T03:14:11Z

Fix exhaustive match using collect




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22069: [MINOR][DOC] Fix Java example code in Column's co...

2018-08-10 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22069

[MINOR][DOC] Fix Java example code in Column's comments

## What changes were proposed in this pull request?
Fix scaladoc in Column

## How was this patch tested?
None

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark fix_doc_minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22069


commit 8520df899a3364f2bb41d4155d2bed9e68772a07
Author: å¿å¬ 
Date:   2018-08-10T09:24:08Z

Fix Java example code in Column's comments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21166: [SPARK-11334][CORE] clear idle executors in execu...

2018-05-29 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/21166


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21166: [SPARK-11334][CORE] clear idle executors in execu...

2018-04-27 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/21166


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21166: [SPARK-11334][CORE] clear idle executors in executorIdTo...

2018-04-27 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/21166
  
I have checked the latest master code.

``` scala
// If the executor is no longer running any scheduled tasks, mark 
it as idle
if (executorIdToTaskIds.contains(executorId)) {
  executorIdToTaskIds(executorId) -= taskId
  if (executorIdToTaskIds(executorId).isEmpty) {
executorIdToTaskIds -= executorId
allocationManager.onExecutorIdle(executorId)
  }
}
```

To remove a executor, we have to mark it idle. To mark a executor idle, we 
have to invoke `allocationManager.onExecutorIdle`. In the latest master code, 
the two places that invokes the method are `onExecutorAdded` and `onTaskEnd`. 
For a running executor, the only way to mark it idle is the code snippet above.

#19580 makes sense on the basis that the event TaskEnd for some reason may 
be lost.

On the same basis, the Set `executorIdToTaskIds(executorId)` may never be 
empty.

>  the issue is not valid any more in the latest code

Please tell me why, thanks! The TaskEnd event will never be lost or other 
improves ? Which PR?

We are using Spark 2.1.1, and the issue is valid.

quote from #11205 by @jerryshao 
> Verified again, looks like the 2nd bullet is not valid anymore, I cannot 
reproduce it in latest master branch, this might have already been fixed in 
SPARK-13054.

Spark 2.1.1 is shipped with the fixes for SPARK-13054, but still we 
encountered the issue.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21166: [SPARK-11334][CORE] clear idle executors in execu...

2018-04-27 Thread sadhen

GitHub user sadhen reopened a pull request:

https://github.com/apache/spark/pull/21166

[SPARK-11334][CORE] clear idle executors in executorIdToTaskIds keySet

## What changes were proposed in this pull request?

quote from #11205 

> Executors may never be idle. Currently we use the executor to tasks 
mapping relation to identify the status of executors, in maximum task failure 
situation, some TaskEnd events may never be delivered, which makes the related 
executor always be busy.

#19580 fix the incorrect number of running tasks.

This PR solves the problem in a similar way on the `fact` that TaskEnd 
events may never be delivered but StageCompleted events would be delivered 
eventually.

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK-11334

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21166


commit 3f32a92352551edf79aca2c508ee575d7e3f2aa1
Author: å¿å¬ 
Date:   2018-04-26T10:46:32Z

clear idle executors in executorIdToTaskIds keySet




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21166: [SPARK-11334][CORE] clear idle executors in execu...

2018-04-26 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/21166

[SPARK-11334][CORE] clear idle executors in executorIdToTaskIds keySet

## What changes were proposed in this pull request?

quote from #11205 

> Executors may never be idle. Currently we use the executor to tasks 
mapping relation to identify the status of executors, in maximum task failure 
situation, some TaskEnd events may never be delivered, which makes the related 
executor always be busy.

#19580 fix the incorrect number of running tasks.

This PR solve the problem in a similar way on the `fact` that TaskEnd 
events may never be delivered but StageCompleted events would be delivered 
eventually.

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark SPARK-11334

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21166


commit 3f32a92352551edf79aca2c508ee575d7e3f2aa1
Author: å¿å¬ 
Date:   2018-04-26T10:46:32Z

clear idle executors in executorIdToTaskIds keySet




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11205: [SPARK-11334][Core] Handle maximum task failure situatio...

2018-04-26 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/11205
  
@jerryshao I think the 2nd bullet has not been fixed in SPARK-13054.

I use spark 2.1.1, and I still find that finished tasks remain in 
`private val executorIdToTaskIds = new mutable.HashMap[String, 
mutable.HashSet[Long]]`

But the numRunningTasks equals 0 since:

```
  if (numRunningTasks != 0) {
logWarning("No stages are running, but numRunningTasks != 0")
numRunningTasks = 0
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21059: [SPARK-23974][CORE] fix when numExecutorsTarget e...

2018-04-13 Thread sadhen

Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/21059


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21059: [SPARK-23974][CORE] fix when numExecutorsTarget equals m...

2018-04-13 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/21059
  
@jiangxb1987 

I re-investigated the logs and find that there must be bugs in the yarn 
scheduler backend. And this PR is not the right way to fix the issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21059: fix when numExecutorsTarget equals maxNumExecutor...

2018-04-12 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/21059

fix when numExecutorsTarget equals maxNumExecutors

## What changes were proposed in this pull request?

In dynamic allocation, there are cases that the `numExecutorsTarget` has 
reached `maxNumExecutors`, but for some reason (client.requestTotalExecutors 
didn't work as expected or throw exceptions due to RPC failure), the  method 
`addExecutors` always returns 0 without do `client.requestTotalExecutors`. And 
there are too many tasks to handle, `maxNeeded < numExecutorsTarget` is false, 
in `updateAndSyncNumExecutorsTarget` we always run into `addExecutors` with 
`numExecutorsTarget == maxNumExecutors`. Since numExecutorsTarget is hard to 
decrease, as a result, we are using only a few executors to handle the heavy 
tasks without dynamically increase the number of executors.

Online logs:
```
$ grep "Not adding executors because our current target total" 
spark-job-server.log.9 | tail
[2018-04-12 16:07:19,070] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:20,071] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:21,072] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:22,073] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:23,074] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:24,075] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:25,076] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:26,077] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:27,078] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 16:07:28,079] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)

$ grep "Not adding executors because our current target total" 
spark-job-server.log.9 | head
[2018-04-12 13:52:18,067] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:19,071] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:20,072] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:21,073] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:22,074] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:23,075] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:24,076] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:25,077] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:26,078] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)
[2018-04-12 13:52:27,079] DEBUG .ExecutorAllocationManager [] 
[akka://JobServer/user/jobManager] - Not adding executors because our current 
target total is already 600 (limit 600)


$ grep "Not adding executors because our current target total" 
spark-job-server.log.9 | wc -l

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2017-11-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/14638
  
yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2017-04-21 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/14638
  
Since this pr won't be accepted. My solution is:
``` sql
# create table using spark-csv
CREATE TABLE tmp.cars
USING com.databricks.spark.csv
OPTIONS (path "/tmp/cars.csv", header "true", inferSchema "true");

# create table use CTAS: whether using parquet or orc or others depends on 
your conf
create table tmp.cars_transformed as select * from tmp.cars

# optional: drop the spark-csv table
drop table tmp.cars
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2017-04-21 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/14638
  
Also a useful feature for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

80 matches

Mail list logo