date:20171220

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20035
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85237/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20035
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20035
  
**[Test build #85237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)**
 for PR 20035 at commit 
[`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85235/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85235/testReport)**
 for PR 19977 at commit 
[`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85244/testReport)**
 for PR 19884 at commit 
[`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158212056
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John Doe", 21)],
+   ...("id", "name", "age"))  # doctest: 
+SKIP
>>> df.select(slen("name").alias("slen(name)"), to_upper("name"), 
add_one("age")) \\
... .show()  # doctest: +SKIP
+--+--++
|slen(name)|to_upper(name)|add_one(age)|
+--+--++
-   | 8|  JOHN DOE|  22|
+   | 8|  JOHN|  22|
--- End diff --

oops, done!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread zuotingbing

Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
It seems each time when connect to thrift server through beeline, the 
`SessionState.start(state)` will be called two times. one is in 
`HiveSessionImpl:open` , another is in `HiveClientImpl.newSession()` for 
`sql("use default")` . When close the beeline connection, only close the 
HiveSession with `HiveSessionImpl.close()`, but the object of 
`HiveClientImpl.newSession()` will be left over.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158211101
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John Doe", 21)],
+   ...("id", "name", "age"))  # doctest: 
+SKIP
>>> df.select(slen("name").alias("slen(name)"), to_upper("name"), 
add_one("age")) \\
... .show()  # doctest: +SKIP
+--+--++
|slen(name)|to_upper(name)|add_one(age)|
+--+--++
-   | 8|  JOHN DOE|  22|
+   | 8|  JOHN|  22|
--- End diff --

nit: we should revert this too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85243/testReport)**
 for PR 20043 at commit 
[`81c9b6e`](https://github.com/apache/spark/commit/81c9b6e73ee64adcd8fc931d51f3faa98b892e0b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r158210849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
  * @param value A term for a (possibly primitive) value of the result of 
the evaluation. Not
  *  valid if `isNull` is set to `true`.
  */
-case class ExprCode(var code: String, var isNull: String, var value: 
String)
+case class ExprCode(var code: String, var isNull: ExprValue, var value: 
ExprValue)
+
+
+// An abstraction that represents the evaluation result of [[ExprCode]].
+abstract class ExprValue
+
+object ExprValue {
+  implicit def exprValueToString(exprValue: ExprValue): String = 
exprValue.toString
+}
+
+// A literal evaluation of [[ExprCode]].
+case class LiteralValue(val value: String) extends ExprValue {
+  override def toString: String = value
+}
+
+// A variable evaluation of [[ExprCode]].
+case class VariableValue(val variableName: String) extends ExprValue {
+  override def toString: String = variableName
+}
+
+// A statement evaluation of [[ExprCode]].
+case class StatementValue(val statement: String) extends ExprValue {
+  override def toString: String = statement
+}
+
+// A global variable evaluation of [[ExprCode]].
+case class GlobalValue(val value: String) extends ExprValue {
--- End diff --

It is considered as global variable now, as it can be accessed globally and 
don't/can't be parameterized.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210754
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
+ * Since we are not explicitly providing hive database location, it 
automatically takes default warehouse location
+ * given to 'spark.sql.warehouse.dir' while creating SparkSession with 
enableHiveSupport().
+ * For example, we have given '/user/hive/warehouse/' as a Hive 
Warehouse location. It will create schema directories
+ * under '/user/hive/warehouse/' as 
'/user/hive/warehouse/database_name.db' and 
'/user/hive/warehouse/database_name'.
+ */
+
+// to make Hive parquet format compatible with spark parquet format
+spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
+// Multiple parquet files could be created accordingly to volume of 
data under directory given.
+val hiveExternalTableLocation = 
s"/user/hive/warehouse/database_name.db/records"
+
hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation)
+
+// turn on flag for Dynamic Partitioning
+spark.sqlContext.setConf("hive.exec.dynamic.partition", "true")
+spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", 
"nonstrict")
+// You can create partitions in Hive table, so downstream queries run 
much faster.
+hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key")
+  .parquet(hiveExternalTableLocation)
+/*
+If Data volume is very huge, then every partitions would have many 
small-small files which may harm
+downstream query performance due to File I/O, Bandwidth I/O, Network 
I/O, Disk I/O.
+To improve performance you can create single parquet file under each 
partition directory using 'repartition'
+on partitioned key for Hive table. When you add partition to table, 
there will be change in table DDL.
+Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED 
AS PARQUET;
+ */
+hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite)
+  .partitionBy("key").parquet(hiveExternalTableLocation)
+
+/*
+ You can also do coalesce to control number of files under each 
partitions, repartition does full shuffle and equal
+ data distribution to all partitions. here coalesce can reduce number 
of files to given 'Int' argument without
+ full data shuffle.
+ */
+// coalesce of 10 could create 10 parquet files under each partitions,
+// if data is huge and make sense to do partitioning.
+hiveTableDF.coalesce(10).write.mode(SaveMode.Overwrite)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210714
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
+ * Since we are not explicitly providing hive database location, it 
automatically takes default warehouse location
+ * given to 'spark.sql.warehouse.dir' while creating SparkSession with 
enableHiveSupport().
+ * For example, we have given '/user/hive/warehouse/' as a Hive 
Warehouse location. It will create schema directories
+ * under '/user/hive/warehouse/' as 
'/user/hive/warehouse/database_name.db' and 
'/user/hive/warehouse/database_name'.
+ */
+
+// to make Hive parquet format compatible with spark parquet format
+spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
+// Multiple parquet files could be created accordingly to volume of 
data under directory given.
+val hiveExternalTableLocation = 
s"/user/hive/warehouse/database_name.db/records"
+
hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation)
+
+// turn on flag for Dynamic Partitioning
+spark.sqlContext.setConf("hive.exec.dynamic.partition", "true")
+spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", 
"nonstrict")
+// You can create partitions in Hive table, so downstream queries run 
much faster.
+hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key")
+  .parquet(hiveExternalTableLocation)
+/*
+If Data volume is very huge, then every partitions would have many 
small-small files which may harm
+downstream query performance due to File I/O, Bandwidth I/O, Network 
I/O, Disk I/O.
+To improve performance you can create single parquet file under each 
partition directory using 'repartition'
+on partitioned key for Hive table. When you add partition to table, 
there will be change in table DDL.
+Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED 
AS PARQUET;
+ */
+hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite)
--- End diff --

This is not a standard usage, let's not put it in the example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210666
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
--- End diff --

it's weird to create an external table without a location. User may be 
confused between the difference between managed table and external table.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210425
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
--- End diff --

actually, I think `spark.table("records")` is a better example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210374
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
--- End diff --

`.toDF` is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19884
  
I used a workaround for timestamp casts that allows the tests to pass for 
me locally, and left a note to look into the root cause later.  Hopefully this 
should pass now and we will be good to merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210132
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85241/testReport)**
 for PR 20043 at commit 
[`d120750`](https://github.com/apache/spark/commit/d120750ff61bb066e7ceb628f3356fa37af462f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85242/testReport)**
 for PR 19884 at commit 
[`ae84c84`](https://github.com/apache/spark/commit/ae84c8454875906e488b895e18ad78ddf6e9fbc9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r158209659
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
  * @param value A term for a (possibly primitive) value of the result of 
the evaluation. Not
  *  valid if `isNull` is set to `true`.
  */
-case class ExprCode(var code: String, var isNull: String, var value: 
String)
+case class ExprCode(var code: String, var isNull: ExprValue, var value: 
ExprValue)
+
+
+// An abstraction that represents the evaluation result of [[ExprCode]].
+abstract class ExprValue
+
+object ExprValue {
+  implicit def exprValueToString(exprValue: ExprValue): String = 
exprValue.toString
+}
+
+// A literal evaluation of [[ExprCode]].
+case class LiteralValue(val value: String) extends ExprValue {
+  override def toString: String = value
+}
+
+// A variable evaluation of [[ExprCode]].
+case class VariableValue(val variableName: String) extends ExprValue {
+  override def toString: String = variableName
+}
+
+// A statement evaluation of [[ExprCode]].
+case class StatementValue(val statement: String) extends ExprValue {
+  override def toString: String = statement
+}
+
+// A global variable evaluation of [[ExprCode]].
+case class GlobalValue(val value: String) extends ExprValue {
--- End diff --

for compacted global variables, we may get something like `arr[1]` while 
`arr` is a global variable. Is `arr[1]` a statement or global variable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85233/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85233/testReport)**
 for PR 20008 at commit 
[`ec07bc2`](https://github.com/apache/spark/commit/ec07bc2a463b089dd5798ab9e6bf8aea1b8ccd28).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20021
  
**[Test build #85240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85240/testReport)**
 for PR 20021 at commit 
[`900f246`](https://github.com/apache/spark/commit/900f246579f86f187b61454df6b0f76c8d5052c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85234/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85234/testReport)**
 for PR 20043 at commit 
[`5ace8b8`](https://github.com/apache/spark/commit/5ace8b83b7c90cd5a6a451812ac9c1087aaa1c29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ExprCode(var code: String, var isNull: ExprValue, var 
value: ExprValue)`
  * `case class LiteralValue(val value: String) extends ExprValue `
  * `case class VariableValue(val variableName: String) extends ExprValue `
  * `case class StatementValue(val statement: String) extends ExprValue `
  * `case class GlobalValue(val value: String) extends ExprValue `
  * `case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158208592
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", 
"age"))  # doctest: +SKIP
--- End diff --

The name change shouldn't have been committed, I'll change it back.  I 
don't think we can make the doctests conditional on if pandas/pyarrow is 
installed, so unless we make these required dependencies and have them 
installed on all the workers, we need to skip them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85232/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85232/testReport)**
 for PR 20043 at commit 
[`d5c986a`](https://github.com/apache/spark/commit/d5c986a1cab410c4eb64a72119346875d7607be6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ExprCode(var code: String, var isNull: ExprValue, var 
value: ExprValue)`
  * `case class LiteralValue(val value: String) extends ExprValue `
  * `case class VariableValue(val variableName: String) extends ExprValue `
  * `case class StatementValue(val statement: String) extends ExprValue `
  * `case class GlobalValue(val value: String) extends ExprValue `
  * `case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2017-12-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20023
  
I think that we should be careful on suddenly changing behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205151
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 ---
@@ -243,17 +248,43 @@ object DecimalPrecision extends TypeCoercionRule {
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case b @ BinaryOperator(left, right) if left.dataType != 
right.dataType =>
-  (left.dataType, right.dataType) match {
-case (t: IntegralType, DecimalType.Fixed(p, s)) =>
-  b.makeCopy(Array(Cast(left, DecimalType.forType(t)), right))
-case (DecimalType.Fixed(p, s), t: IntegralType) =>
-  b.makeCopy(Array(left, Cast(right, DecimalType.forType(t
-case (t, DecimalType.Fixed(p, s)) if isFloat(t) =>
-  b.makeCopy(Array(left, Cast(right, DoubleType)))
-case (DecimalType.Fixed(p, s), t) if isFloat(t) =>
-  b.makeCopy(Array(Cast(left, DoubleType), right))
-case _ =>
-  b
-  }
+  nondecimalLiteralAndDecimal(b).lift((left, right)).getOrElse(
+nondecimalNonliteralAndDecimal(b).applyOrElse((left.dataType, 
right.dataType),
+  (_: (DataType, DataType)) => b))
   }
+
+  /**
+   * Type coercion for BinaryOperator in which one side is a non-decimal 
literal numeric, and the
+   * other side is a decimal.
+   */
+  private def nondecimalLiteralAndDecimal(
--- End diff --

Is this rule newly introduced?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158206693
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
+} else {
+  // Precision/scale exceed maximum precision. Result must be adjusted 
to MAX_PRECISION.
+  val intDigits = precision - scale
+  // If original scale less than MINIMUM_ADJUSTED_SCALE, use original 
scale value; otherwise
+  // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits
+  val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE)
+  val adjustedScale = Math.max(MAX_PRECISION - intDigits, 
minScaleValue)
--- End diff --

Sounds like `Math.min`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158207539
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
--- End diff --

Not sure if this blog link can be available for long time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205829
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
--- End diff --

Shouldn't we also prevent `scale` > `MAX_SCALE`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205620
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
--- End diff --

Is this different than `forType` if applied on `Literal.dataType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20035: [SPARK-22848][SQL] Eliminate mutable state from S...

2017-12-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20035


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158206388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
+} else {
+  // Precision/scale exceed maximum precision. Result must be adjusted 
to MAX_PRECISION.
+  val intDigits = precision - scale
+  // If original scale less than MINIMUM_ADJUSTED_SCALE, use original 
scale value; otherwise
+  // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits
+  val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE)
--- End diff --

Sounds like `MAXIMUM_ADJUSTED_SCALE` instead of `MINIMUM_ADJUSTED_SCALE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205387
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
--- End diff --

Can't we just use `ShortDecimal`, `IntDecimal`...?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85231/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85231/testReport)**
 for PR 20043 at commit 
[`78680cc`](https://github.com/apache/spark/commit/78680cc60c27d645c349451090bfa056380e89f6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ExprCode(var code: String, var isNull: ExprValue, var 
value: ExprValue)`
  * `case class LiteralValue(var value: String) extends ExprValue `
  * `case class VariableValue(var variableName: String) extends ExprValue `
  * `case class StatementValue(var statement: String) extends ExprValue `
  * `case class GlobalValue(var value: String) extends ExprValue `
  * `case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19884
  
LGTM, I'm also fine to ignore some tests if they are hard to fix, to 
unblock other PRs sooner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206546
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
 for i in range(0, len(arr)):
 jarr[i] = arr[i]
 return jarr
+
+
+def _require_minimum_pyarrow_version():
--- End diff --

No. I just checked if `ImportError` occurred or not. We should do the same 
thing for pandas later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20043
  
cc @kiszk @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206309
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
 for i in range(0, len(arr)):
 jarr[i] = arr[i]
 return jarr
+
+
+def _require_minimum_pyarrow_version():
--- End diff --

@ueshin did we do the same thing for pandas?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206051
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", 
"age"))  # doctest: +SKIP
--- End diff --

why change `John Doe` to `John`? And are we going to re-enable these 
doctest later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...

2017-12-20 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158205893
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

Yea I don't think you need to block this pr with this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158205751
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3356,6 +3356,7 @@ def test_schema_conversion_roundtrip(self):
 self.assertEquals(self.schema, schema_rt)
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
--- End diff --

Sorry for the delay. Yeah, we should check the minimum pyarrow version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20035
  
yea it's failing globally, I'm merging this PR, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20042: [SPARK-22855][BUILD] Add -no-java-comments to sbt docs/s...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20042
  
**[Test build #4021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4021/testReport)**
 for PR 20042 at commit 
[`9f3407a`](https://github.com/apache/spark/commit/9f3407a2fd75e0577c46e5e9b5ca3f4edeee2bf6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20041
  
**[Test build #85239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85239/testReport)**
 for PR 20041 at commit 
[`8c37d46`](https://github.com/apache/spark/commit/8c37d467ac364fc6ee0c20d2d0d2736d70bf214d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/20041
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20035
  
I think the test failure is not related to this change, but the ongoing 
work to upgrade pyarrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/20041
  
checked the test case failure but I dont think its related to this PR. 

```
org.apache.spark.sql.execution.datasources.parquet.ParquetQuerySuite.(It is 
not a test it is a sbt.testing.SuiteSelector)
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed 
to eventually never returned normally. Attempted 651 times over 10.008601144 
seconds. Last failure message: There are 1 possibly leaked file streams..
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...

2017-12-20 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158202645
  
--- Diff: docs/running-on-yarn.md ---
@@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and 
the AM when running in
 
 There are two deploy modes that can be used to launch Spark applications 
on YARN. In `cluster` mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. In `client` mode, the driver runs in the 
client process, and the application master is only used for requesting 
resources from YARN.
 
-Unlike [Spark standalone](spark-standalone.html) and 
[Mesos](running-on-mesos.html) modes, in which the master's address is 
specified in the `--master` parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the `--master` 
parameter is `yarn`.
+Unlike other cluster managers supported by Spark
+in which the master's address is specified in the `--master` parameter, in 
YARN mode the ResourceManager's address is picked up from the Hadoop 
configuration. Thus, the `--master` parameter is `yarn`.
--- End diff --

nit: why start a new line here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158203446
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 ---
@@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, 
Command, LogicalPlan}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
-
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
  */
-trait DataWritingCommand extends RunnableCommand {
-
+trait DataWritingCommand extends Command {
   /**
* The input query plan that produces the data to be written.
+   * IMPORTANT: the input query plan MUST be analyzed, so that we can 
carry its output columns
+   *to [[FileFormatWriter]].
*/
   def query: LogicalPlan
 
-  // We make the input `query` an inner child instead of a child in order 
to hide it from the
-  // optimizer. This is because optimizer may not preserve the output 
schema names' case, and we
-  // have to keep the original analyzed plan here so that we can pass the 
corrected schema to the
-  // writer. The schema of analyzed plan is what user expects(or 
specifies), so we should respect
-  // it when writing.
-  override protected def innerChildren: Seq[LogicalPlan] = query :: Nil
+  override def children: Seq[LogicalPlan] = query :: Nil
--- End diff --

ah, right. We should add the barrier when passing in the query.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85238/testReport)**
 for PR 19981 at commit 
[`94ee3c7`](https://github.com/apache/spark/commit/94ee3c7ddbe776783dcb880d9daa05f598a418a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19981
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-20 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19954
  
I'll finish reading this by Friday, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158198274
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.util.concurrent.ThreadFactory;
+import java.util.logging.Level;
+import java.util.logging.Logger;
+
+class InProcessAppHandle extends AbstractAppHandle {
+
+  private static final Logger LOG = 
Logger.getLogger(ChildProcAppHandle.class.getName());
+  private static final ThreadFactory THREAD_FACTORY = new 
NamedThreadFactory("spark-app-%d");
--- End diff --

how about the thread name including `builder.appName`?  might be useful if 
you are trying to monitor a bunch of threads?

(though you'd also be launching in cluster mode in that case, so the thread 
wouldn't be doing much ...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158201917
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/package-info.java ---
@@ -49,6 +49,15 @@
  * 
  *
  * 
+ * Applications can also be launched in-process by using
+ * {@link org.apache.spark.launcher.InProcessLauncher} instead. Launching 
applications in-process
--- End diff --

comment above about "there is only one entry point" is a bit incorrect now


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158201527
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -260,24 +264,30 @@ private long getConnectionTimeout() {
   }
 
   private String createSecret() {
-byte[] secret = new byte[128];
-RND.nextBytes(secret);
-
-StringBuilder sb = new StringBuilder();
-for (byte b : secret) {
-  int ival = b >= 0 ? b : Byte.MAX_VALUE - b;
-  if (ival < 0x10) {
-sb.append("0");
+while (true) {
--- End diff --

checking my understanding -- extra bug fix here?  even in old code, if by 
chance two apps had same secret, you'd end up losing one handle?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158200240
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.util.concurrent.ThreadFactory;
+import java.util.logging.Level;
+import java.util.logging.Logger;
+
+class InProcessAppHandle extends AbstractAppHandle {
+
+  private static final Logger LOG = 
Logger.getLogger(ChildProcAppHandle.class.getName());
+  private static final ThreadFactory THREAD_FACTORY = new 
NamedThreadFactory("spark-app-%d");
+
+  private Thread app;
+
+  InProcessAppHandle(LauncherServer server) {
+super(server);
+  }
+
+  @Override
+  public synchronized void kill() {
+LOG.warning("kill() may leave the underlying app running in in-process 
mode.");
+disconnect();
+
+// Interrupt the thread. This is not guaranteed to kill the app, 
though.
+if (app != null) {
+  app.interrupt();
+}
+
+setState(State.KILLED);
+  }
+
+  synchronized void start(Method main, String[] args) {
+CommandBuilderUtils.checkState(app == null, "Handle already started.");
+app = THREAD_FACTORY.newThread(() -> {
+  try {
+main.invoke(null, (Object) args);
+  } catch (Throwable t) {
+LOG.log(Level.WARNING, "Application failed with exception.", t);
+setState(State.FAILED);
+  }
+
+  synchronized (InProcessAppHandle.this) {
+if (!isDisposed()) {
+  State currState = getState();
+  disconnect();
--- End diff --

same comment here on ordering of getState() & disconnect()


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158200059
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -171,10 +90,11 @@ void monitorChild() {
 }
 
 synchronized (this) {
-  if (disposed) {
+  if (isDisposed()) {
 return;
   }
 
+  State currState = getState();
--- End diff --

this was added just because `currState` is no longer accessible, right?  
You're not particularly trying to grab the state before the call to 
`disconnect()`?  Might be clearly to move it after, otherwise on first read it 
looks like it is trying to grab the state before its modified by `disconnect()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158200163
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.util.concurrent.ThreadFactory;
+import java.util.logging.Level;
+import java.util.logging.Logger;
+
+class InProcessAppHandle extends AbstractAppHandle {
+
+  private static final Logger LOG = 
Logger.getLogger(ChildProcAppHandle.class.getName());
+  private static final ThreadFactory THREAD_FACTORY = new 
NamedThreadFactory("spark-app-%d");
+
+  private Thread app;
+
+  InProcessAppHandle(LauncherServer server) {
+super(server);
+  }
+
+  @Override
+  public synchronized void kill() {
+LOG.warning("kill() may leave the underlying app running in in-process 
mode.");
+disconnect();
+
+// Interrupt the thread. This is not guaranteed to kill the app, 
though.
--- End diff --

in cluster mode, shouldn't this be pretty safe?  If not, wouldn't it be a 
bug in spark-submit?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...

2017-12-20 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19591#discussion_r158201743
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -210,8 +209,13 @@ int getPort() {
* Removes the client handle from the pending list (in case it's still 
there), and unrefs
* the server.
*/
-  void unregister(ChildProcAppHandle handle) {
-pending.remove(handle.getSecret());
+  void unregister(AbstractAppHandle handle) {
+for (Map.Entry e : pending.entrySet()) {
--- End diff --

you could add a system.identityHashCode "secret" to the InProcessAppHandle 
to keep the old version, though this is fine too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-20 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20019
  
also cc @gatorsmile whom I believe should be interested to look at this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20019: [SPARK-22361][SQL][TEST] Add unit test for Window...

2017-12-20 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20019#discussion_r158200860
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala 
---
@@ -0,0 +1,381 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.expressions.Window
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Window frame testing for DataFrame API.
+ */
+class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext {
+  import testImplicits._
+
+  test("lead/lag with empty data frame") {
+val df = Seq.empty[(Int, String)].toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Nil)
+  }
+
+  test("lead/lag with positive offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Row(1, "3", null) :: Row(1, null, "1") :: Row(2, "4", null) :: 
Row(2, null, "2") :: Nil)
+  }
+
+  test("reverse lead/lag with positive offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value".desc)
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Row(1, "1", null) :: Row(1, null, "3") :: Row(2, "2", null) :: 
Row(2, null, "4") :: Nil)
+  }
+
+  test("lead/lag with negative offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", -1).over(window),
+lag("value", -1).over(window)),
+  Row(1, null, "3") :: Row(1, "1", null) :: Row(2, null, "4") :: 
Row(2, "2", null) :: Nil)
+  }
+
+  test("reverse lead/lag with negative offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value".desc)
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", -1).over(window),
+lag("value", -1).over(window)),
+  Row(1, null, "1") :: Row(1, "3", null) :: Row(2, null, "2") :: 
Row(2, "4", null) :: Nil)
+  }
+
+  test("lead/lag with default value") {
+val default = "n/a"
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4"), (2, 
"5")).toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 2, default).over(window),
+lag("value", 2, default).over(window),
+lead("value", -2, default).over(window),
+lag("value", -2, default).over(window)),
+  Row(1, default, default, default, default) :: Row(1, default, 
default, default, default) ::
+Row(2, "5", default, default, "5") :: Row(2, default, "2", "2", 
default) ::
+Row(2, default, default, default, default) :: Nil)
+  }
+
+  test("rows/range between with empty data frame") {
+val df = Seq.empty[(String, Int)].toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+'key,
+first("value").over(
+

[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20019: [SPARK-22361][SQL][TEST] Add unit test for Window...

2017-12-20 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20019#discussion_r158201087
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala 
---
@@ -0,0 +1,381 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.expressions.Window
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Window frame testing for DataFrame API.
+ */
+class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext {
+  import testImplicits._
+
+  test("lead/lag with empty data frame") {
+val df = Seq.empty[(Int, String)].toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Nil)
+  }
+
+  test("lead/lag with positive offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Row(1, "3", null) :: Row(1, null, "1") :: Row(2, "4", null) :: 
Row(2, null, "2") :: Nil)
+  }
+
+  test("reverse lead/lag with positive offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value".desc)
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 1).over(window),
+lag("value", 1).over(window)),
+  Row(1, "1", null) :: Row(1, null, "3") :: Row(2, "2", null) :: 
Row(2, null, "4") :: Nil)
+  }
+
+  test("lead/lag with negative offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", -1).over(window),
+lag("value", -1).over(window)),
+  Row(1, null, "3") :: Row(1, "1", null) :: Row(2, null, "4") :: 
Row(2, "2", null) :: Nil)
+  }
+
+  test("reverse lead/lag with negative offset") {
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", 
"value")
+val window = Window.partitionBy($"key").orderBy($"value".desc)
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", -1).over(window),
+lag("value", -1).over(window)),
+  Row(1, null, "1") :: Row(1, "3", null) :: Row(2, null, "2") :: 
Row(2, "4", null) :: Nil)
+  }
+
+  test("lead/lag with default value") {
+val default = "n/a"
+val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4"), (2, 
"5")).toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+$"key",
+lead("value", 2, default).over(window),
+lag("value", 2, default).over(window),
+lead("value", -2, default).over(window),
+lag("value", -2, default).over(window)),
+  Row(1, default, default, default, default) :: Row(1, default, 
default, default, default) ::
+Row(2, "5", default, default, "5") :: Row(2, default, "2", "2", 
default) ::
+Row(2, default, default, default, default) :: Nil)
+  }
+
+  test("rows/range between with empty data frame") {
+val df = Seq.empty[(String, Int)].toDF("key", "value")
+val window = Window.partitionBy($"key").orderBy($"value")
+
+checkAnswer(
+  df.select(
+'key,
+first("value").over(
+

[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85230/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #85230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85230/testReport)**
 for PR 19984 at commit 
[`825d437`](https://github.com/apache/spark/commit/825d437fe1e897c2047171ce78c6bb92805dc5be).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20036
  
**[Test build #85236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85236/testReport)**
 for PR 20036 at commit 
[`53661eb`](https://github.com/apache/spark/commit/53661eb72bba55376bc6112b51c25489522d309c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20035
  
**[Test build #85237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)**
 for PR 20035 at commit 
[`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...

2017-12-20 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20036
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20035
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158200471
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 ---
@@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, 
Command, LogicalPlan}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
-
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
  */
-trait DataWritingCommand extends RunnableCommand {
-
+trait DataWritingCommand extends Command {
   /**
* The input query plan that produces the data to be written.
+   * IMPORTANT: the input query plan MUST be analyzed, so that we can 
carry its output columns
+   *to [[FileFormatWriter]].
*/
   def query: LogicalPlan
 
-  // We make the input `query` an inner child instead of a child in order 
to hide it from the
-  // optimizer. This is because optimizer may not preserve the output 
schema names' case, and we
-  // have to keep the original analyzed plan here so that we can pass the 
corrected schema to the
-  // writer. The schema of analyzed plan is what user expects(or 
specifies), so we should respect
-  // it when writing.
-  override protected def innerChildren: Seq[LogicalPlan] = query :: Nil
+  override def children: Seq[LogicalPlan] = query :: Nil
--- End diff --

I tried but it will cause runtime exception. When the `AnalysisBarrier` is 
removed by analyzer,  the child of `DataWritingCommand` is no longer 
`AnalysisBarrier` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158197728
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 ---
@@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, 
Command, LogicalPlan}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
-
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
--- End diff --

Not a `RunnableCommand` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158198595
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 ---
@@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, 
Command, LogicalPlan}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
-
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
  */
-trait DataWritingCommand extends RunnableCommand {
-
+trait DataWritingCommand extends Command {
   /**
* The input query plan that produces the data to be written.
+   * IMPORTANT: the input query plan MUST be analyzed, so that we can 
carry its output columns
+   *to [[FileFormatWriter]].
*/
   def query: LogicalPlan
 
-  // We make the input `query` an inner child instead of a child in order 
to hide it from the
-  // optimizer. This is because optimizer may not preserve the output 
schema names' case, and we
-  // have to keep the original analyzed plan here so that we can pass the 
corrected schema to the
-  // writer. The schema of analyzed plan is what user expects(or 
specifies), so we should respect
-  // it when writing.
-  override protected def innerChildren: Seq[LogicalPlan] = query :: Nil
+  override def children: Seq[LogicalPlan] = query :: Nil
 
-  override lazy val metrics: Map[String, SQLMetric] = {
+  // Output columns of the analyzed input query plan
+  def allColumns: Seq[Attribute]
--- End diff --

`outputColumns`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158198250
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -87,6 +87,51 @@ case class ExecutedCommandExec(cmd: RunnableCommand) 
extends LeafExecNode {
   }
 }
--- End diff --

I think the `metrics` in `RunnableCommand` is mainly for use in 
`DataWritingCommand`. Now `DataWritingCommand` is a separate class other than 
`RunnableCommand`, do we still need `metrics` in `RunnableCommand`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158198088
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -87,6 +87,51 @@ case class ExecutedCommandExec(cmd: RunnableCommand) 
extends LeafExecNode {
   }
 }
 
+/**
+ * A physical operator that executes the run method of a `RunnableCommand` 
and
+ * saves the result to prevent multiple executions.
+ *
+ * @param cmd the `RunnableCommand` this operator will run.
--- End diff --

`DataWritingCommand` is not a `RunnableCommand`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...

2017-12-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20020#discussion_r158199084
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 ---
@@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, 
Command, LogicalPlan}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
+import org.apache.spark.sql.execution.datasources.FileFormatWriter
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.util.SerializableConfiguration
 
-
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
  */
-trait DataWritingCommand extends RunnableCommand {
-
+trait DataWritingCommand extends Command {
   /**
* The input query plan that produces the data to be written.
+   * IMPORTANT: the input query plan MUST be analyzed, so that we can 
carry its output columns
+   *to [[FileFormatWriter]].
*/
   def query: LogicalPlan
 
-  // We make the input `query` an inner child instead of a child in order 
to hide it from the
-  // optimizer. This is because optimizer may not preserve the output 
schema names' case, and we
-  // have to keep the original analyzed plan here so that we can pass the 
corrected schema to the
-  // writer. The schema of analyzed plan is what user expects(or 
specifies), so we should respect
-  // it when writing.
-  override protected def innerChildren: Seq[LogicalPlan] = query :: Nil
+  override def children: Seq[LogicalPlan] = query :: Nil
--- End diff --

Add `AnalysisBarrier` around `query` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...

2017-12-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20020
  
> since now we let inserting commands have a child, it makes sense to wrap 
the child with AnalysisBarrier, to avoid re-analyzing them.

It makes sense to me too.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20026
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85228/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20026
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20026
  
**[Test build #85228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85228/testReport)**
 for PR 20026 at commit 
[`6fd07a6`](https://github.com/apache/spark/commit/6fd07a66e58fdce99f035679b074b7f4eec34c6f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85227/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85227/testReport)**
 for PR 19981 at commit 
[`94ee3c7`](https://github.com/apache/spark/commit/94ee3c7ddbe776783dcb880d9daa05f598a418a9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20040: [SPARK-22852][BUILD] Exclude -Xlint:unchecked from sbt j...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20040
  
**[Test build #4020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4020/testReport)**
 for PR 20040 at commit 
[`1827c2d`](https://github.com/apache/spark/commit/1827c2d18953d569ad903d21360c3ea5b807f76b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85235/testReport)**
 for PR 19977 at commit 
[`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19977
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85234/testReport)**
 for PR 20043 at commit 
[`5ace8b8`](https://github.com/apache/spark/commit/5ace8b83b7c90cd5a6a451812ac9c1087aaa1c29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85233/testReport)**
 for PR 20008 at commit 
[`ec07bc2`](https://github.com/apache/spark/commit/ec07bc2a463b089dd5798ab9e6bf8aea1b8ccd28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20008
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20035
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 659 matches

Mail list logo