[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156001886
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -235,6 +239,61 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(types(1).equals("class java.sql.Timestamp"))
   }
 
+  test("Column type TIMESTAMP with SESSION_LOCAL_TIMEZONE is different 
from default") {
+val defaultJVMTimeZone = TimeZone.getDefault
+// Pick the timezone different from the current default time zone of 
JVM
+val sofiaTimeZone = TimeZone.getTimeZone("Europe/Sofia")
+val shanghaiTimeZone = TimeZone.getTimeZone("Asia/Shanghai")
+val localSessionTimeZone =
+  if (defaultJVMTimeZone == shanghaiTimeZone) sofiaTimeZone else 
shanghaiTimeZone
+
+withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> 
localSessionTimeZone.getID) {
+  val e = intercept[java.sql.SQLException] {
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+dfRead.collect()
+  }.getMessage
+  assert(e.contains("Unrecognized SQL type -101"))
+}
+  }
+
+  /**
+   * Change the Time Zone `timeZoneId` of JVM before executing `f`, then 
switches back to the
+   * original after `f` returns.
+   * @param timeZoneId the ID for a TimeZone, either an abbreviation such 
as "PST", a full name such
+   *   as "America/Los_Angeles", or a custom ID such as 
"GMT-8:00".
+   */
+  private def withTimeZone(timeZoneId: String)(f: => Unit): Unit = {
+val originalLocale = TimeZone.getDefault
+try {
+  // Add Locale setting
+  TimeZone.setDefault(TimeZone.getTimeZone(timeZoneId))
+  f
+} finally {
+  TimeZone.setDefault(originalLocale)
+}
+  }
+
+  test("Column TIMESTAMP with TIME ZONE(JVM timezone)") {
+def checkRow(row: Row, ts: String): Unit = {
+  assert(row.getTimestamp(1).equals(Timestamp.valueOf(ts)))
+}
+
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+withTimeZone("PST") {
--- End diff --

oh i see, session time zone is dynamically evaluated with JVM timezone if 
not set, so we need to guarantee that session time zone is not set here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19939#discussion_r156001560
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -235,6 +239,61 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(types(1).equals("class java.sql.Timestamp"))
   }
 
+  test("Column type TIMESTAMP with SESSION_LOCAL_TIMEZONE is different 
from default") {
+val defaultJVMTimeZone = TimeZone.getDefault
+// Pick the timezone different from the current default time zone of 
JVM
+val sofiaTimeZone = TimeZone.getTimeZone("Europe/Sofia")
+val shanghaiTimeZone = TimeZone.getTimeZone("Asia/Shanghai")
+val localSessionTimeZone =
+  if (defaultJVMTimeZone == shanghaiTimeZone) sofiaTimeZone else 
shanghaiTimeZone
+
+withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> 
localSessionTimeZone.getID) {
+  val e = intercept[java.sql.SQLException] {
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+dfRead.collect()
+  }.getMessage
+  assert(e.contains("Unrecognized SQL type -101"))
+}
+  }
+
+  /**
+   * Change the Time Zone `timeZoneId` of JVM before executing `f`, then 
switches back to the
+   * original after `f` returns.
+   * @param timeZoneId the ID for a TimeZone, either an abbreviation such 
as "PST", a full name such
+   *   as "America/Los_Angeles", or a custom ID such as 
"GMT-8:00".
+   */
+  private def withTimeZone(timeZoneId: String)(f: => Unit): Unit = {
+val originalLocale = TimeZone.getDefault
+try {
+  // Add Locale setting
+  TimeZone.setDefault(TimeZone.getTimeZone(timeZoneId))
+  f
+} finally {
+  TimeZone.setDefault(originalLocale)
+}
+  }
+
+  test("Column TIMESTAMP with TIME ZONE(JVM timezone)") {
+def checkRow(row: Row, ts: String): Unit = {
+  assert(row.getTimestamp(1).equals(Timestamp.valueOf(ts)))
+}
+
+val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
+withTimeZone("PST") {
--- End diff --

how do you make sure sesson time zone is same with JVM time zone?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19937
  
LGTM except one code style comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156000717
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Since these local variables are only needed inside the loop, I feel the 
current code is more readable. Performance is not a concern here as the 
overhead is very low or none.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r15631
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19938
  
I have the same feeling with @viirya , expressions/operators usually won't 
mutate input from children. One concern is how to guarantee this 
programmatically, but might be OK as I don't think this would happen. We can 
also remove the temp global variables introduced by the recent commits from 
@kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84701/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84701/testReport)**
 for PR 19717 at commit 
[`2e7810b`](https://github.com/apache/spark/commit/2e7810b49f00bbdb0cd5f4ac5cf99a74c748cf5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19927: [SPARK-22737][ML] OVR transform optimization

2017-12-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19927#discussion_r155996272
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -156,54 +153,22 @@ final class OneVsRestModel private[ml] (
 // Check schema
 transformSchema(dataset.schema, logging = true)
 
-// determine the input columns: these need to be passed through
-val origCols = dataset.schema.map(f => col(f.name))
-
-// add an accumulator column to store predictions of all the models
-val accColName = "mbc$acc" + UUID.randomUUID().toString
-val initUDF = udf { () => Map[Int, Double]() }
-val newDataset = dataset.withColumn(accColName, initUDF())
-
-// persist if underlying dataset is not persistent.
-val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-if (handlePersistence) {
-  newDataset.persist(StorageLevel.MEMORY_AND_DISK)
-}
-
-// update the accumulator column with the result of prediction of 
models
-val aggregatedDataset = 
models.zipWithIndex.foldLeft[DataFrame](newDataset) {
-  case (df, (model, index)) =>
-val rawPredictionCol = model.getRawPredictionCol
-val columns = origCols ++ List(col(rawPredictionCol), 
col(accColName))
-
-// add temporary column to store intermediate scores and update
-val tmpColName = "mbc$tmp" + UUID.randomUUID().toString
-val updateUDF = udf { (predictions: Map[Int, Double], prediction: 
Vector) =>
-  predictions + ((index, prediction(1)))
+val predictUDF = udf { (features: Any) =>
+  var i = 0
+  var maxIndex = Double.NaN
+  var maxPred = Double.MinValue
+  while (i < models.length) {
+val pred = models(i).predictRawAsFeaturesType(features)(1)
--- End diff --

Here serialize & broadcast models, is it safe and will it bring some 
overhead ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19911
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19911
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84702/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19911
  
**[Test build #84702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84702/testReport)**
 for PR 19911 at commit 
[`3d32e34`](https://github.com/apache/spark/commit/3d32e34e8a05fbfc23ecc3819a649a704c09c436).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-12-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
Ya, I was just trying to do that tonight! Thanks for pinging me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19082
  
@maropu could you please resolve conflicts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19938
  
Thank you for great thought. Let me think about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

2017-12-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r155989817
  
--- Diff: pom.xml ---
@@ -185,7 +185,7 @@
 2.8
 1.8
 1.0.0
-0.4.0
+0.8.0-SNAPSHOT
--- End diff --

Please don't forget that we also need to update 
`dev/deps/spark-deps-hadoop-2.x` files.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

2017-12-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r155983224
  
--- Diff: python/pyspark/serializers.py ---
@@ -223,27 +223,13 @@ def _create_batch(series, timezone):
 series = [series]
 series = ((s, None) if not isinstance(s, (list, tuple)) else s for s 
in series)
 
-# If a nullable integer series has been promoted to floating point 
with NaNs, need to cast
-# NOTE: this is not necessary with Arrow >= 0.7
-def cast_series(s, t):
-if type(t) == pa.TimestampType:
-# NOTE: convert to 'us' with astype here, unit ignored in 
`from_pandas` see ARROW-1680
-return _check_series_convert_timestamps_internal(s.fillna(0), 
timezone)\
-.values.astype('datetime64[us]', copy=False)
-# NOTE: can not compare None with pyarrow.DataType(), fixed with 
Arrow >= 0.7.1
-elif t is not None and t == pa.date32():
-# TODO: this converts the series to Python objects, possibly 
avoid with Arrow >= 0.8
-return s.dt.date
-elif t is None or s.dtype == t.to_pandas_dtype():
-return s
-else:
-return s.fillna(0).astype(t.to_pandas_dtype(), copy=False)
-
-# Some object types don't support masks in Arrow, see ARROW-1721
 def create_array(s, t):
-casted = cast_series(s, t)
-mask = None if casted.dtype == 'object' else s.isnull()
-return pa.Array.from_pandas(casted, mask=mask, type=t)
+mask = s.isnull()
+# Workaround for casting timestamp units with timezone, ARROW-1906
--- End diff --

Will the fix for this workaround be included in Arrow 0.8?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19841: [SPARK-22642][SQL] the createdTempDir will not be delete...

2017-12-10 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/19841
  
It would be great if someone help to merge it to master, Thanks! 
@gatorsmile @srowen @HyukjinKwon @wangyum


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984429
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
--- End diff --

Could you assign it first
```Scala
val leftVarsDecl = 
...
(ExprCode(code, isNull, value), leftVarsDecl)

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984439
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
   } else {
-ExprCode(s"$value = $valueCode;", "false", value)
+(ExprCode(s"$value = $valueCode;", "false", value),
+  s"""${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};""")
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

If no performance difference, we prefer to the simpler codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984096
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
--- End diff --

We need to update the function description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19721: [SPARK-22496][SQL]thrift server adds operation lo...

2017-12-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19721


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19721: [SPARK-22496][SQL]thrift server adds operation logs

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19721
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19911
  
**[Test build #84702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84702/testReport)**
 for PR 19911 at commit 
[`3d32e34`](https://github.com/apache/spark/commit/3d32e34e8a05fbfc23ecc3819a649a704c09c436).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19911
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84701/testReport)**
 for PR 19717 at commit 
[`2e7810b`](https://github.com/apache/spark/commit/2e7810b49f00bbdb0cd5f4ac5cf99a74c748cf5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979284
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979118
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,42 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL. Prefix
+   * "k8s:" is appended to the resolved URL as the prefix is used by 
KubernetesClusterManager
+   * in canCreate to determine if the KubernetesClusterManager should be 
used.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+
+// To handle master URLs, e.g., k8s://host:port.
+if (!masterWithoutK8sPrefix.contains("://")) {
--- End diff --

This is used to differentiate `https://host:port` vs. `host:port`. So 
`contains` is the right method here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979008
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -294,6 +298,16 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }
 }
 
+if (clusterManager == KUBERNETES) {
+  args.master = Utils.checkAndGetK8sMasterUrl(args.master)
+  // Make sure YARN is included in our build if we're trying to use it
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19783
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84700/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19783
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19783
  
**[Test build #84700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84700/testReport)**
 for PR 19783 at commit 
[`be1e7ba`](https://github.com/apache/spark/commit/be1e7ba1ae485da523232e2ddffce8ac911392dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19937
  
@cloud-fan @viirya Could you please review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19783
  
**[Test build #84700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84700/testReport)**
 for PR 19783 at commit 
[`be1e7ba`](https://github.com/apache/spark/commit/be1e7ba1ae485da523232e2ddffce8ac911392dd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r155964396
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,171 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param bins an array of bins for a given numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+  def findFirstBinForValue(value: Double, bins: Array[HistogramBin]): Int 
= {
+var i = 0
+while ((i < bins.length) && (value > bins(i).hi)) {
+  i +=1
+}
+i
+  }
+
+  /**
+   * Returns the number of the last bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param bins an array of bins for a given numeric equi-height histogram
+   * @return the number of the last bin into which a column values falls.
+   */
+  def findLastBinForValue(value: Double, bins: Array[HistogramBin]): Int = 
{
+var i = bins.length - 1
+while ((i >= 0) && (value < bins(i).lo)) {
+  i -=1
+}
+i
+  }
+
+  /**
+   * Returns a percentage of a bin holding values for column value in the 
range of
+   * [lowerValue, higherValue]
+   *
+   * @param binId a given bin id in a specified histogram
+   * @param higherValue a given upper bound value of a specified column 
value range
+   * @param lowerValue a given lower bound value of a specified column 
value range
+   * @param histogram a numeric equi-height histogram
+   * @return the percentage of a single bin holding values in [lowerValue, 
higherValue].
+   */
+  private def getOccupation(
+  binId: Int,
+  higherValue: Double,
+  lowerValue: Double,
+  histogram: Histogram): Double = {
+val curBin = histogram.bins(binId)
+if (curBin.hi == curBin.lo) {
+  // the entire bin is covered in the range
+  1.0
+} else if (higherValue == lowerValue) {
+  // set percentage to 1/NDV
+  1.0 / curBin.ndv.toDouble
+} else {
+  // Use proration since the range falls inside this bin.
+  math.min((higherValue - lowerValue) / (curBin.hi - curBin.lo), 1.0)
+}
+  }
+
+  /**
+   * Returns the number of bins for column values in [lowerValue, 
higherValue].
--- End diff --

OK.  Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r155963930
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -359,7 +371,7 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
   test("cbool > false") {
 validateEstimatedStats(
   Filter(GreaterThan(attrBool, Literal(false)), 
childStatsTestPlan(Seq(attrBool), 10L)),
-  Seq(attrBool -> ColumnStat(distinctCount = 1, min = Some(true), max 
= Some(true),
+  Seq(attrBool -> ColumnStat(distinctCount = 1, min = Some(false), max 
= Some(true),
--- End diff --

Agreed with wzhfy.  Today's logic is: for these 2 conditions, (column > x) 
and (column >= x), we set the min value to x.  We do not distinguish these 2 
cases.  This is because we do not know the exact next value larger than x if x 
is a continuous data type like double type.  We may do some special coding for 
discrete data types such as Boolean or integer.  But, as wzhfy said, it does 
not deserve the complexity.   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r155963740
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -471,37 +508,47 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
   percent = 1.0
 } else {
   // This is the partial overlap case:
-  // Without advanced statistics like histogram, we assume uniform 
data distribution.
-  // We just prorate the adjusted range over the initial range to 
compute filter selectivity.
-  assert(max > min)
-  percent = op match {
-case _: LessThan =>
-  if (numericLiteral == max) {
-// If the literal value is right on the boundary, we can minus 
the part of the
-// boundary value (1/ndv).
-1.0 - 1.0 / ndv
-  } else {
-(numericLiteral - min) / (max - min)
-  }
-case _: LessThanOrEqual =>
-  if (numericLiteral == min) {
-// The boundary value is the only satisfying value.
-1.0 / ndv
-  } else {
-(numericLiteral - min) / (max - min)
-  }
-case _: GreaterThan =>
-  if (numericLiteral == min) {
-1.0 - 1.0 / ndv
-  } else {
-(max - numericLiteral) / (max - min)
-  }
-case _: GreaterThanOrEqual =>
-  if (numericLiteral == max) {
-1.0 / ndv
-  } else {
-(max - numericLiteral) / (max - min)
-  }
+
+  if (colStat.histogram.isEmpty) {
--- End diff --

We cannot move the if statement to upper level.  This is because, for the 
partial overlap case, we need to update the the [min, max] range for a given 
column.  For the no-overlap and complete-overlap cases, we do  not need to do 
so.  I think the current code is modular for this reason.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r155963622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -332,8 +332,41 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
 colStatsMap.update(attr, newStats)
   }
 
-  Some(1.0 / BigDecimal(ndv))
-} else {
+  if (colStat.histogram.isEmpty) {
+// returns 1/ndv if there is no histogram
+Some(1.0 / BigDecimal(ndv))
+  } else {
+// We compute filter selectivity using Histogram information.
--- End diff --

O.  will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84697/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19939
  
**[Test build #84697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84697/testReport)**
 for PR 19939 at commit 
[`69c6e3e`](https://github.com/apache/spark/commit/69c6e3ef97b6450da5250171d47cf6c7f0faa9ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19599
  
**[Test build #84699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84699/testReport)**
 for PR 19599 at commit 
[`f55b22d`](https://github.com/apache/spark/commit/f55b22dacef04b3d3e6f0cb05acad3e9997c133d).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84699/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19599
  
**[Test build #84699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84699/testReport)**
 for PR 19599 at commit 
[`f55b22d`](https://github.com/apache/spark/commit/f55b22dacef04b3d3e6f0cb05acad3e9997c133d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956779
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,42 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL. Prefix
+   * "k8s:" is appended to the resolved URL as the prefix is used by 
KubernetesClusterManager
+   * in canCreate to determine if the KubernetesClusterManager should be 
used.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+
+// To handle master URLs, e.g., k8s://host:port.
+if (!masterWithoutK8sPrefix.contains("://")) {
--- End diff --

startsWith instead of contains?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155957034
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, an

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956719
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -294,6 +298,16 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }
 }
 
+if (clusterManager == KUBERNETES) {
+  args.master = Utils.checkAndGetK8sMasterUrl(args.master)
+  // Make sure YARN is included in our build if we're trying to use it
--- End diff --

this should say KUBERNETES


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [WIP][ML][SPARK-11171][SPARK-11239] Add PMML export to S...

2017-12-10 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19876
  
@sethah: want to publish the comments from when we were chatting in 
Singapore? :p 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956360
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

we could release with Dockerfile as-is (since the source or lineage doesn't 
have licensing issue)
we just need to figure out the process of releasing the "convenient" docker 
images "binaries" on docker hub. there's probably a way to do but let's table 
that for now.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19754: [BUILD] update release scripts

2017-12-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19754
  
I think these are Jenkins specific config related to how the user 
credentials are stored (not) securely?

Specifically I don’t know how we are saving ASF user and password inside 
Jenkins to publish to svn before.

Not sure about the gpg one since I don’t think we merged any changes to 
scripts then. Perhaps some changes to Jenkins with regards to where the private 
key is stored.

Generally both sound like something we should review more closely since 
they are security practices.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84698/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19599
  
**[Test build #84698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84698/testReport)**
 for PR 19599 at commit 
[`3f6e472`](https://github.com/apache/spark/commit/3f6e472ebff8327e20dcdb0041f525a3856244b0).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StringParam(parent: Params, name: String, doc: String, isValid: 
String => Boolean)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84696/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84696/testReport)**
 for PR 19811 at commit 
[`013e233`](https://github.com/apache/spark/commit/013e23309a01ba3b8c9682977f848f1413afd290).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19939
  
**[Test build #84697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84697/testReport)**
 for PR 19939 at commit 
[`69c6e3e`](https://github.com/apache/spark/commit/69c6e3ef97b6450da5250171d47cf6c7f0faa9ff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19599
  
**[Test build #84698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84698/testReport)**
 for PR 19599 at commit 
[`3f6e472`](https://github.com/apache/spark/commit/3f6e472ebff8327e20dcdb0041f525a3856244b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19939
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19940
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19940
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84695/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19940
  
**[Test build #84695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84695/testReport)**
 for PR 19940 at commit 
[`978bfd6`](https://github.com/apache/spark/commit/978bfd6a490cfef2c0f2da1190830ef6a64287af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19938
  
They are possibly passed in split functions. But it is hard to image a case 
we will change their values in the functions. In SparkSQL, the output from a 
child operator are just used as input to evaluate new output in the parent 
operator. We don't use the output as mutable statuses.

The possible problematic case is when we create a global variable in an 
operator/expression codegen and use this global variable to carry mutable 
status (e.g., the condition meeting status in casewhen) during the evaluation 
of the op/expr. Then it is possibly we pass it and modify it in split functions.

If we don't change the values, seems to me this change just creates 
redundant local variables.

This doesn't cause any harm at all. So I feel no strong option for this.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84696/testReport)**
 for PR 19811 at commit 
[`013e233`](https://github.com/apache/spark/commit/013e23309a01ba3b8c9682977f848f1413afd290).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155949301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Here is my benchmark result. Is there any your result?

```
$ cat Loop.java 
public class Loop {
  private static int _i;
  private static boolean _b;
 
  public static void main(String[] args){
Loop l = new Loop();
long s, e;
long c1 = 100L;
long c2 = 10L;
int N = 10;

// warmup
for (int i = 0; i < 100; i++) {
  l.inner(1, 100);
  l.outer(1, 100);
}

for (int n = 0; n < N; n++) {
  s = System.nanoTime();
  l.inner(c1, c2);
  e = System.nanoTime();
  System.out.println("inner(ms): " + (e-s) / 100);
}

for (int n = 0; n < N; n++) {
  s = System.nanoTime();
  l.outer(c1, c2);
  e = System.nanoTime();   
  System.out.println("outer(ms): " + (e-s) / 100);
}
  }
 
  public void inner(long c1, long c2) {
for (long i = 0; i < c1; i++) {
  int v1 = 1;
  boolean v2 = true;
  for (long j = 0; i < c2; i++) {
_i = v1;
_b = v2;
  }
}
  }
 
  public void outer(long c1, long c2) {
int v1 = 1;
boolean v2 = true;
for (long i = 0; i < c1; i++) {
  for (long j = 0; i < c2; i++) {
_i = v1;
_b = v2;
  }
}
  }
}
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ java Loop
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19938
  
Can we guarantee 100% that any operation in `consume()` at 
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L255)
 does not pass global variables into split functions?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19940
  
**[Test build #84695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84695/testReport)**
 for PR 19940 at commit 
[`978bfd6`](https://github.com/apache/spark/commit/978bfd6a490cfef2c0f2da1190830ef6a64287af).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...

2017-12-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19933
  
No, I understand it. Good news is you can already do this, and the driver 
host is actually probably already in the logs. This is definitely not a config 
property, no matter what. Whatever your logic it can and should be in code. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-10 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19940

[SPARK-22750][SQL] Reuse mutable states when possible

## What changes were proposed in this pull request?

The PR introduces a new method `addSingleMutableState ` to `CodeGenerator` 
to allow reusing and sharing the same global variable between different 
Expressions. This helps reducing the number of global variables needed, which 
is important to limit the impact on the constant pool.

## How was this patch tested?

added UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-22750

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19940


commit 978bfd6a490cfef2c0f2da1190830ef6a64287af
Author: Marco Gaido 
Date:   2017-12-10T11:23:10Z

[SPARK-22750][SQL] Reuse mutable states when possible




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16578
  
As I mentioned before, we still don't have enough eyes on this change so 
far. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19938
  
IIUC, the problem is only happened when we wrongly pass global variables 
into split functions and change the values. Will we change the result variables 
from aggregation? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84693/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19939
  
**[Test build #84693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84693/testReport)**
 for PR 19939 at commit 
[`69c6e3e`](https://github.com/apache/spark/commit/69c6e3ef97b6450da5250171d47cf6c7f0faa9ff).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19938
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19938
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84694/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19938
  
**[Test build #84694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84694/testReport)**
 for PR 19938 at commit 
[`ac65dd2`](https://github.com/apache/spark/commit/ac65dd21b975481f5c4feb5f0745a2c45d000728).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155944247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

We would appreciate it if you would past the kernel code what you are 
thinking about.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155944205
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I think that it is not easy for me to cast to a standalone benchmark since 
generated code depends on the Spark internal structures.  
I would appreciate it if you would prepare an easy benchmark for you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155942157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I was thinking about an easy benchmark like what we did for casting. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19939
  
**[Test build #84693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84693/testReport)**
 for PR 19939 at commit 
[`69c6e3e`](https://github.com/apache/spark/commit/69c6e3ef97b6450da5250171d47cf6c7f0faa9ff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19938
  
**[Test build #84694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84694/testReport)**
 for PR 19938 at commit 
[`ac65dd2`](https://github.com/apache/spark/commit/ac65dd21b975481f5c4feb5f0745a2c45d000728).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19938
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19939
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19938
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19939
  
**[Test build #84692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84692/testReport)**
 for PR 19939 at commit 
[`69c6e3e`](https://github.com/apache/spark/commit/69c6e3ef97b6450da5250171d47cf6c7f0faa9ff).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84691/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19938
  
**[Test build #84691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84691/testReport)**
 for PR 19938 at commit 
[`ac65dd2`](https://github.com/apache/spark/commit/ac65dd21b975481f5c4feb5f0745a2c45d000728).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19939
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84692/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org