[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20035 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85237/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20035 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20035 **[Test build #85237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)** for PR 20035 at commit [`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85235/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85235/testReport)** for PR 19977 at commit [`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85244/testReport)** for PR 19884 at commit [`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158212056 --- Diff: python/pyspark/sql/functions.py --- @@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, functionType=None): >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> from pyspark.sql.types import IntegerType, StringType - >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) - >>> @pandas_udf(StringType()) + >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) # doctest: +SKIP + >>> @pandas_udf(StringType()) # doctest: +SKIP ... def to_upper(s): ... return s.str.upper() ... - >>> @pandas_udf("integer", PandasUDFType.SCALAR) + >>> @pandas_udf("integer", PandasUDFType.SCALAR) # doctest: +SKIP ... def add_one(x): ... return x + 1 ... - >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) + >>> df = spark.createDataFrame([(1, "John Doe", 21)], + ...("id", "name", "age")) # doctest: +SKIP >>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\ ... .show() # doctest: +SKIP +--+--++ |slen(name)|to_upper(name)|add_one(age)| +--+--++ - | 8| JOHN DOE| 22| + | 8| JOHN| 22| --- End diff -- oops, done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/20029 It seems each time when connect to thrift server through beeline, the `SessionState.start(state)` will be called two times. one is in `HiveSessionImpl:open` , another is in `HiveClientImpl.newSession()` for `sql("use default")` . When close the beeline connection, only close the HiveSession with `HiveSessionImpl.close()`, but the object of `HiveClientImpl.newSession()` will be left over. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158211101 --- Diff: python/pyspark/sql/functions.py --- @@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, functionType=None): >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> from pyspark.sql.types import IntegerType, StringType - >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) - >>> @pandas_udf(StringType()) + >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) # doctest: +SKIP + >>> @pandas_udf(StringType()) # doctest: +SKIP ... def to_upper(s): ... return s.str.upper() ... - >>> @pandas_udf("integer", PandasUDFType.SCALAR) + >>> @pandas_udf("integer", PandasUDFType.SCALAR) # doctest: +SKIP ... def add_one(x): ... return x + 1 ... - >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) + >>> df = spark.createDataFrame([(1, "John Doe", 21)], + ...("id", "name", "age")) # doctest: +SKIP >>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\ ... .show() # doctest: +SKIP +--+--++ |slen(name)|to_upper(name)|add_one(age)| +--+--++ - | 8| JOHN DOE| 22| + | 8| JOHN| 22| --- End diff -- nit: we should revert this too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85243/testReport)** for PR 20043 at commit [`81c9b6e`](https://github.com/apache/spark/commit/81c9b6e73ee64adcd8fc931d51f3faa98b892e0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20043#discussion_r158210849 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils} * @param value A term for a (possibly primitive) value of the result of the evaluation. Not * valid if `isNull` is set to `true`. */ -case class ExprCode(var code: String, var isNull: String, var value: String) +case class ExprCode(var code: String, var isNull: ExprValue, var value: ExprValue) + + +// An abstraction that represents the evaluation result of [[ExprCode]]. +abstract class ExprValue + +object ExprValue { + implicit def exprValueToString(exprValue: ExprValue): String = exprValue.toString +} + +// A literal evaluation of [[ExprCode]]. +case class LiteralValue(val value: String) extends ExprValue { + override def toString: String = value +} + +// A variable evaluation of [[ExprCode]]. +case class VariableValue(val variableName: String) extends ExprValue { + override def toString: String = variableName +} + +// A statement evaluation of [[ExprCode]]. +case class StatementValue(val statement: String) extends ExprValue { + override def toString: String = statement +} + +// A global variable evaluation of [[ExprCode]]. +case class GlobalValue(val value: String) extends ExprValue { --- End diff -- It is considered as global variable now, as it can be accessed globally and don't/can't be parameterized. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210754 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* + * Save DataFrame to Hive Managed table as Parquet format + * 1. Create Hive Database / Schema with location at HDFS if you want to mentioned explicitly else default + * warehouse location will be used to store Hive table Data. + * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path; + * You don't have to explicitly give location for each table, every tables under specified schema will be located at + * location given while creating schema. + * 2. Create Hive Managed table with storage format as 'Parquet' + * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET; + */ +val hiveTableDF = sql("SELECT * FROM records").toDF() + hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records") + +/* + * Save DataFrame to Hive External table as compatible parquet format. + * 1. Create Hive External table with storage format as parquet. + * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS PARQUET; + * Since we are not explicitly providing hive database location, it automatically takes default warehouse location + * given to 'spark.sql.warehouse.dir' while creating SparkSession with enableHiveSupport(). + * For example, we have given '/user/hive/warehouse/' as a Hive Warehouse location. It will create schema directories + * under '/user/hive/warehouse/' as '/user/hive/warehouse/database_name.db' and '/user/hive/warehouse/database_name'. + */ + +// to make Hive parquet format compatible with spark parquet format +spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true") +// Multiple parquet files could be created accordingly to volume of data under directory given. +val hiveExternalTableLocation = s"/user/hive/warehouse/database_name.db/records" + hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation) + +// turn on flag for Dynamic Partitioning +spark.sqlContext.setConf("hive.exec.dynamic.partition", "true") +spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict") +// You can create partitions in Hive table, so downstream queries run much faster. +hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key") + .parquet(hiveExternalTableLocation) +/* +If Data volume is very huge, then every partitions would have many small-small files which may harm +downstream query performance due to File I/O, Bandwidth I/O, Network I/O, Disk I/O. +To improve performance you can create single parquet file under each partition directory using 'repartition' +on partitioned key for Hive table. When you add partition to table, there will be change in table DDL. +Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED AS PARQUET; + */ +hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite) + .partitionBy("key").parquet(hiveExternalTableLocation) + +/* + You can also do coalesce to control number of files under each partitions, repartition does full shuffle and equal + data distribution to all partitions. here coalesce can reduce number of files to given 'Int' argument without + full data shuffle. + */ +// coalesce of 10 could create 10 parquet files under each partitions, +// if data is huge and make sense to do partitioning. +hiveTableDF.coalesce(10).write.mode(SaveMode.Overwrite) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210714 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* + * Save DataFrame to Hive Managed table as Parquet format + * 1. Create Hive Database / Schema with location at HDFS if you want to mentioned explicitly else default + * warehouse location will be used to store Hive table Data. + * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path; + * You don't have to explicitly give location for each table, every tables under specified schema will be located at + * location given while creating schema. + * 2. Create Hive Managed table with storage format as 'Parquet' + * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET; + */ +val hiveTableDF = sql("SELECT * FROM records").toDF() + hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records") + +/* + * Save DataFrame to Hive External table as compatible parquet format. + * 1. Create Hive External table with storage format as parquet. + * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS PARQUET; + * Since we are not explicitly providing hive database location, it automatically takes default warehouse location + * given to 'spark.sql.warehouse.dir' while creating SparkSession with enableHiveSupport(). + * For example, we have given '/user/hive/warehouse/' as a Hive Warehouse location. It will create schema directories + * under '/user/hive/warehouse/' as '/user/hive/warehouse/database_name.db' and '/user/hive/warehouse/database_name'. + */ + +// to make Hive parquet format compatible with spark parquet format +spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true") +// Multiple parquet files could be created accordingly to volume of data under directory given. +val hiveExternalTableLocation = s"/user/hive/warehouse/database_name.db/records" + hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation) + +// turn on flag for Dynamic Partitioning +spark.sqlContext.setConf("hive.exec.dynamic.partition", "true") +spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict") +// You can create partitions in Hive table, so downstream queries run much faster. +hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key") + .parquet(hiveExternalTableLocation) +/* +If Data volume is very huge, then every partitions would have many small-small files which may harm +downstream query performance due to File I/O, Bandwidth I/O, Network I/O, Disk I/O. +To improve performance you can create single parquet file under each partition directory using 'repartition' +on partitioned key for Hive table. When you add partition to table, there will be change in table DDL. +Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED AS PARQUET; + */ +hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite) --- End diff -- This is not a standard usage, let's not put it in the example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210666 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* + * Save DataFrame to Hive Managed table as Parquet format + * 1. Create Hive Database / Schema with location at HDFS if you want to mentioned explicitly else default + * warehouse location will be used to store Hive table Data. + * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path; + * You don't have to explicitly give location for each table, every tables under specified schema will be located at + * location given while creating schema. + * 2. Create Hive Managed table with storage format as 'Parquet' + * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET; + */ +val hiveTableDF = sql("SELECT * FROM records").toDF() + hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records") + +/* + * Save DataFrame to Hive External table as compatible parquet format. + * 1. Create Hive External table with storage format as parquet. + * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS PARQUET; --- End diff -- it's weird to create an external table without a location. User may be confused between the difference between managed table and external table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210425 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* + * Save DataFrame to Hive Managed table as Parquet format + * 1. Create Hive Database / Schema with location at HDFS if you want to mentioned explicitly else default + * warehouse location will be used to store Hive table Data. + * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path; + * You don't have to explicitly give location for each table, every tables under specified schema will be located at + * location given while creating schema. + * 2. Create Hive Managed table with storage format as 'Parquet' + * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET; + */ +val hiveTableDF = sql("SELECT * FROM records").toDF() --- End diff -- actually, I think `spark.table("records")` is a better example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210374 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* + * Save DataFrame to Hive Managed table as Parquet format + * 1. Create Hive Database / Schema with location at HDFS if you want to mentioned explicitly else default + * warehouse location will be used to store Hive table Data. + * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path; + * You don't have to explicitly give location for each table, every tables under specified schema will be located at + * location given while creating schema. + * 2. Create Hive Managed table with storage format as 'Parquet' + * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET; + */ +val hiveTableDF = sql("SELECT * FROM records").toDF() --- End diff -- `.toDF` is not needed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19884 I used a workaround for timestamp casts that allows the tests to pass for me locally, and left a note to look into the root cause later. Hopefully this should pass now and we will be good to merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r158210132 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -102,8 +101,63 @@ object SparkHiveExample { // | 4| val_4| 4| val_4| // | 5| val_5| 5| val_5| // ... -// $example off:spark_hive$ +/* --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85241/testReport)** for PR 20043 at commit [`d120750`](https://github.com/apache/spark/commit/d120750ff61bb066e7ceb628f3356fa37af462f5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85242/testReport)** for PR 19884 at commit [`ae84c84`](https://github.com/apache/spark/commit/ae84c8454875906e488b895e18ad78ddf6e9fbc9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20043#discussion_r158209659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils} * @param value A term for a (possibly primitive) value of the result of the evaluation. Not * valid if `isNull` is set to `true`. */ -case class ExprCode(var code: String, var isNull: String, var value: String) +case class ExprCode(var code: String, var isNull: ExprValue, var value: ExprValue) + + +// An abstraction that represents the evaluation result of [[ExprCode]]. +abstract class ExprValue + +object ExprValue { + implicit def exprValueToString(exprValue: ExprValue): String = exprValue.toString +} + +// A literal evaluation of [[ExprCode]]. +case class LiteralValue(val value: String) extends ExprValue { + override def toString: String = value +} + +// A variable evaluation of [[ExprCode]]. +case class VariableValue(val variableName: String) extends ExprValue { + override def toString: String = variableName +} + +// A statement evaluation of [[ExprCode]]. +case class StatementValue(val statement: String) extends ExprValue { + override def toString: String = statement +} + +// A global variable evaluation of [[ExprCode]]. +case class GlobalValue(val value: String) extends ExprValue { --- End diff -- for compacted global variables, we may get something like `arr[1]` while `arr` is a global variable. Is `arr[1]` a statement or global variable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85233/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85233/testReport)** for PR 20008 at commit [`ec07bc2`](https://github.com/apache/spark/commit/ec07bc2a463b089dd5798ab9e6bf8aea1b8ccd28). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20021 **[Test build #85240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85240/testReport)** for PR 20021 at commit [`900f246`](https://github.com/apache/spark/commit/900f246579f86f187b61454df6b0f76c8d5052c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85234/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85234/testReport)** for PR 20043 at commit [`5ace8b8`](https://github.com/apache/spark/commit/5ace8b83b7c90cd5a6a451812ac9c1087aaa1c29). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExprCode(var code: String, var isNull: ExprValue, var value: ExprValue)` * `case class LiteralValue(val value: String) extends ExprValue ` * `case class VariableValue(val variableName: String) extends ExprValue ` * `case class StatementValue(val statement: String) extends ExprValue ` * `case class GlobalValue(val value: String) extends ExprValue ` * `case class SubExprEliminationState(isNull: ExprValue, value: ExprValue)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158208592 --- Diff: python/pyspark/sql/functions.py --- @@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, functionType=None): >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> from pyspark.sql.types import IntegerType, StringType - >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) - >>> @pandas_udf(StringType()) + >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) # doctest: +SKIP + >>> @pandas_udf(StringType()) # doctest: +SKIP ... def to_upper(s): ... return s.str.upper() ... - >>> @pandas_udf("integer", PandasUDFType.SCALAR) + >>> @pandas_udf("integer", PandasUDFType.SCALAR) # doctest: +SKIP ... def add_one(x): ... return x + 1 ... - >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) + >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", "age")) # doctest: +SKIP --- End diff -- The name change shouldn't have been committed, I'll change it back. I don't think we can make the doctests conditional on if pandas/pyarrow is installed, so unless we make these required dependencies and have them installed on all the workers, we need to skip them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85232/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85232/testReport)** for PR 20043 at commit [`d5c986a`](https://github.com/apache/spark/commit/d5c986a1cab410c4eb64a72119346875d7607be6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExprCode(var code: String, var isNull: ExprValue, var value: ExprValue)` * `case class LiteralValue(val value: String) extends ExprValue ` * `case class VariableValue(val variableName: String) extends ExprValue ` * `case class StatementValue(val statement: String) extends ExprValue ` * `case class GlobalValue(val value: String) extends ExprValue ` * `case class SubExprEliminationState(isNull: ExprValue, value: ExprValue)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20023 I think that we should be careful on suddenly changing behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158205151 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -243,17 +248,43 @@ object DecimalPrecision extends TypeCoercionRule { // Promote integers inside a binary expression with fixed-precision decimals to decimals, // and fixed-precision decimals in an expression with floats / doubles to doubles case b @ BinaryOperator(left, right) if left.dataType != right.dataType => - (left.dataType, right.dataType) match { -case (t: IntegralType, DecimalType.Fixed(p, s)) => - b.makeCopy(Array(Cast(left, DecimalType.forType(t)), right)) -case (DecimalType.Fixed(p, s), t: IntegralType) => - b.makeCopy(Array(left, Cast(right, DecimalType.forType(t -case (t, DecimalType.Fixed(p, s)) if isFloat(t) => - b.makeCopy(Array(left, Cast(right, DoubleType))) -case (DecimalType.Fixed(p, s), t) if isFloat(t) => - b.makeCopy(Array(Cast(left, DoubleType), right)) -case _ => - b - } + nondecimalLiteralAndDecimal(b).lift((left, right)).getOrElse( +nondecimalNonliteralAndDecimal(b).applyOrElse((left.dataType, right.dataType), + (_: (DataType, DataType)) => b)) } + + /** + * Type coercion for BinaryOperator in which one side is a non-decimal literal numeric, and the + * other side is a decimal. + */ + private def nondecimalLiteralAndDecimal( --- End diff -- Is this rule newly introduced? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158206693 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { +case v: Short => fromBigDecimal(BigDecimal(v)) +case v: Int => fromBigDecimal(BigDecimal(v)) +case v: Long => fromBigDecimal(BigDecimal(v)) +case _ => forType(literal.dataType) + } + + private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = { +DecimalType(Math.max(d.precision, d.scale), d.scale) + } + private[sql] def bounded(precision: Int, scale: Int): DecimalType = { DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) } + // scalastyle:off line.size.limit + /** + * Decimal implementation is based on Hive's one, which is itself inspired to SQLServer's one. + * In particular, when a result precision is greater than {@link #MAX_PRECISION}, the + * corresponding scale is reduced to prevent the integral part of a result from being truncated. + * + * For further reference, please see + * https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/. + * + * @param precision + * @param scale + * @return + */ + // scalastyle:on line.size.limit + private[sql] def adjustPrecisionScale(precision: Int, scale: Int): DecimalType = { +// Assumptions: +// precision >= scale +// scale >= 0 +if (precision <= MAX_PRECISION) { + // Adjustment only needed when we exceed max precision + DecimalType(precision, scale) +} else { + // Precision/scale exceed maximum precision. Result must be adjusted to MAX_PRECISION. + val intDigits = precision - scale + // If original scale less than MINIMUM_ADJUSTED_SCALE, use original scale value; otherwise + // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits + val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE) + val adjustedScale = Math.max(MAX_PRECISION - intDigits, minScaleValue) --- End diff -- Sounds like `Math.min`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158207539 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { +case v: Short => fromBigDecimal(BigDecimal(v)) +case v: Int => fromBigDecimal(BigDecimal(v)) +case v: Long => fromBigDecimal(BigDecimal(v)) +case _ => forType(literal.dataType) + } + + private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = { +DecimalType(Math.max(d.precision, d.scale), d.scale) + } + private[sql] def bounded(precision: Int, scale: Int): DecimalType = { DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) } + // scalastyle:off line.size.limit + /** + * Decimal implementation is based on Hive's one, which is itself inspired to SQLServer's one. + * In particular, when a result precision is greater than {@link #MAX_PRECISION}, the + * corresponding scale is reduced to prevent the integral part of a result from being truncated. + * + * For further reference, please see + * https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/. --- End diff -- Not sure if this blog link can be available for long time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158205829 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { +case v: Short => fromBigDecimal(BigDecimal(v)) +case v: Int => fromBigDecimal(BigDecimal(v)) +case v: Long => fromBigDecimal(BigDecimal(v)) +case _ => forType(literal.dataType) + } + + private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = { +DecimalType(Math.max(d.precision, d.scale), d.scale) + } + private[sql] def bounded(precision: Int, scale: Int): DecimalType = { DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) } + // scalastyle:off line.size.limit + /** + * Decimal implementation is based on Hive's one, which is itself inspired to SQLServer's one. + * In particular, when a result precision is greater than {@link #MAX_PRECISION}, the + * corresponding scale is reduced to prevent the integral part of a result from being truncated. + * + * For further reference, please see + * https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/. + * + * @param precision + * @param scale + * @return + */ + // scalastyle:on line.size.limit + private[sql] def adjustPrecisionScale(precision: Int, scale: Int): DecimalType = { +// Assumptions: +// precision >= scale +// scale >= 0 +if (precision <= MAX_PRECISION) { + // Adjustment only needed when we exceed max precision + DecimalType(precision, scale) --- End diff -- Shouldn't we also prevent `scale` > `MAX_SCALE`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158205620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { --- End diff -- Is this different than `forType` if applied on `Literal.dataType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20035: [SPARK-22848][SQL] Eliminate mutable state from S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20035 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158206388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { +case v: Short => fromBigDecimal(BigDecimal(v)) +case v: Int => fromBigDecimal(BigDecimal(v)) +case v: Long => fromBigDecimal(BigDecimal(v)) +case _ => forType(literal.dataType) + } + + private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = { +DecimalType(Math.max(d.precision, d.scale), d.scale) + } + private[sql] def bounded(precision: Int, scale: Int): DecimalType = { DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) } + // scalastyle:off line.size.limit + /** + * Decimal implementation is based on Hive's one, which is itself inspired to SQLServer's one. + * In particular, when a result precision is greater than {@link #MAX_PRECISION}, the + * corresponding scale is reduced to prevent the integral part of a result from being truncated. + * + * For further reference, please see + * https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/. + * + * @param precision + * @param scale + * @return + */ + // scalastyle:on line.size.limit + private[sql] def adjustPrecisionScale(precision: Int, scale: Int): DecimalType = { +// Assumptions: +// precision >= scale +// scale >= 0 +if (precision <= MAX_PRECISION) { + // Adjustment only needed when we exceed max precision + DecimalType(precision, scale) +} else { + // Precision/scale exceed maximum precision. Result must be adjusted to MAX_PRECISION. + val intDigits = precision - scale + // If original scale less than MINIMUM_ADJUSTED_SCALE, use original scale value; otherwise + // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits + val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE) --- End diff -- Sounds like `MAXIMUM_ADJUSTED_SCALE` instead of `MINIMUM_ADJUSTED_SCALE`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r158205387 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType { case DoubleType => DoubleDecimal } + private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { +case v: Short => fromBigDecimal(BigDecimal(v)) --- End diff -- Can't we just use `ShortDecimal`, `IntDecimal`...? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85231/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85231/testReport)** for PR 20043 at commit [`78680cc`](https://github.com/apache/spark/commit/78680cc60c27d645c349451090bfa056380e89f6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExprCode(var code: String, var isNull: ExprValue, var value: ExprValue)` * `case class LiteralValue(var value: String) extends ExprValue ` * `case class VariableValue(var variableName: String) extends ExprValue ` * `case class StatementValue(var statement: String) extends ExprValue ` * `case class GlobalValue(var value: String) extends ExprValue ` * `case class SubExprEliminationState(isNull: ExprValue, value: ExprValue)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19884 LGTM, I'm also fine to ignore some tests if they are hard to fix, to unblock other PRs sooner. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158206546 --- Diff: python/pyspark/sql/utils.py --- @@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr): for i in range(0, len(arr)): jarr[i] = arr[i] return jarr + + +def _require_minimum_pyarrow_version(): --- End diff -- No. I just checked if `ImportError` occurred or not. We should do the same thing for pandas later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20043 cc @kiszk @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158206309 --- Diff: python/pyspark/sql/utils.py --- @@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr): for i in range(0, len(arr)): jarr[i] = arr[i] return jarr + + +def _require_minimum_pyarrow_version(): --- End diff -- @ueshin did we do the same thing for pandas? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158206051 --- Diff: python/pyspark/sql/functions.py --- @@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, functionType=None): >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> from pyspark.sql.types import IntegerType, StringType - >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) - >>> @pandas_udf(StringType()) + >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) # doctest: +SKIP + >>> @pandas_udf(StringType()) # doctest: +SKIP ... def to_upper(s): ... return s.str.upper() ... - >>> @pandas_udf("integer", PandasUDFType.SCALAR) + >>> @pandas_udf("integer", PandasUDFType.SCALAR) # doctest: +SKIP ... def add_one(x): ... return x + 1 ... - >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age")) + >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", "age")) # doctest: +SKIP --- End diff -- why change `John Doe` to `John`? And are we going to re-enable these doctest later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r158205893 --- Diff: docs/building-spark.md --- @@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured with Maven profile settings and so on like the direct Maven build. Example: -./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn +./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes --- End diff -- Yea I don't think you need to block this pr with this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r158205751 --- Diff: python/pyspark/sql/tests.py --- @@ -3356,6 +3356,7 @@ def test_schema_conversion_roundtrip(self): self.assertEquals(self.schema, schema_rt) +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") --- End diff -- Sorry for the delay. Yeah, we should check the minimum pyarrow version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20035 yea it's failing globally, I'm merging this PR, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20042: [SPARK-22855][BUILD] Add -no-java-comments to sbt docs/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20042 **[Test build #4021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4021/testReport)** for PR 20042 at commit [`9f3407a`](https://github.com/apache/spark/commit/9f3407a2fd75e0577c46e5e9b5ca3f4edeee2bf6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20041 **[Test build #85239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85239/testReport)** for PR 20041 at commit [`8c37d46`](https://github.com/apache/spark/commit/8c37d467ac364fc6ee0c20d2d0d2736d70bf214d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20041 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20035 I think the test failure is not related to this change, but the ongoing work to upgrade pyarrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20041 checked the test case failure but I dont think its related to this PR. ``` org.apache.spark.sql.execution.datasources.parquet.ParquetQuerySuite.(It is not a test it is a sbt.testing.SuiteSelector) org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 651 times over 10.008601144 seconds. Last failure message: There are 1 possibly leaked file streams.. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r158202645 --- Diff: docs/running-on-yarn.md --- @@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and the AM when running in There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. -Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`. +Unlike other cluster managers supported by Spark +in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`. --- End diff -- nit: why start a new line here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158203446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkContext -import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, Command, LogicalPlan} +import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker +import org.apache.spark.sql.execution.datasources.FileFormatWriter import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration - /** * A special `RunnableCommand` which writes data out and updates metrics. */ -trait DataWritingCommand extends RunnableCommand { - +trait DataWritingCommand extends Command { /** * The input query plan that produces the data to be written. + * IMPORTANT: the input query plan MUST be analyzed, so that we can carry its output columns + *to [[FileFormatWriter]]. */ def query: LogicalPlan - // We make the input `query` an inner child instead of a child in order to hide it from the - // optimizer. This is because optimizer may not preserve the output schema names' case, and we - // have to keep the original analyzed plan here so that we can pass the corrected schema to the - // writer. The schema of analyzed plan is what user expects(or specifies), so we should respect - // it when writing. - override protected def innerChildren: Seq[LogicalPlan] = query :: Nil + override def children: Seq[LogicalPlan] = query :: Nil --- End diff -- ah, right. We should add the barrier when passing in the query. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85238/testReport)** for PR 19981 at commit [`94ee3c7`](https://github.com/apache/spark/commit/94ee3c7ddbe776783dcb880d9daa05f598a418a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19981 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19954 I'll finish reading this by Friday, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158198274 --- Diff: launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.util.concurrent.ThreadFactory; +import java.util.logging.Level; +import java.util.logging.Logger; + +class InProcessAppHandle extends AbstractAppHandle { + + private static final Logger LOG = Logger.getLogger(ChildProcAppHandle.class.getName()); + private static final ThreadFactory THREAD_FACTORY = new NamedThreadFactory("spark-app-%d"); --- End diff -- how about the thread name including `builder.appName`? might be useful if you are trying to monitor a bunch of threads? (though you'd also be launching in cluster mode in that case, so the thread wouldn't be doing much ...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158201917 --- Diff: launcher/src/main/java/org/apache/spark/launcher/package-info.java --- @@ -49,6 +49,15 @@ * * * + * Applications can also be launched in-process by using + * {@link org.apache.spark.launcher.InProcessLauncher} instead. Launching applications in-process --- End diff -- comment above about "there is only one entry point" is a bit incorrect now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158201527 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -260,24 +264,30 @@ private long getConnectionTimeout() { } private String createSecret() { -byte[] secret = new byte[128]; -RND.nextBytes(secret); - -StringBuilder sb = new StringBuilder(); -for (byte b : secret) { - int ival = b >= 0 ? b : Byte.MAX_VALUE - b; - if (ival < 0x10) { -sb.append("0"); +while (true) { --- End diff -- checking my understanding -- extra bug fix here? even in old code, if by chance two apps had same secret, you'd end up losing one handle? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158200240 --- Diff: launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.util.concurrent.ThreadFactory; +import java.util.logging.Level; +import java.util.logging.Logger; + +class InProcessAppHandle extends AbstractAppHandle { + + private static final Logger LOG = Logger.getLogger(ChildProcAppHandle.class.getName()); + private static final ThreadFactory THREAD_FACTORY = new NamedThreadFactory("spark-app-%d"); + + private Thread app; + + InProcessAppHandle(LauncherServer server) { +super(server); + } + + @Override + public synchronized void kill() { +LOG.warning("kill() may leave the underlying app running in in-process mode."); +disconnect(); + +// Interrupt the thread. This is not guaranteed to kill the app, though. +if (app != null) { + app.interrupt(); +} + +setState(State.KILLED); + } + + synchronized void start(Method main, String[] args) { +CommandBuilderUtils.checkState(app == null, "Handle already started."); +app = THREAD_FACTORY.newThread(() -> { + try { +main.invoke(null, (Object) args); + } catch (Throwable t) { +LOG.log(Level.WARNING, "Application failed with exception.", t); +setState(State.FAILED); + } + + synchronized (InProcessAppHandle.this) { +if (!isDisposed()) { + State currState = getState(); + disconnect(); --- End diff -- same comment here on ordering of getState() & disconnect() --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158200059 --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java --- @@ -171,10 +90,11 @@ void monitorChild() { } synchronized (this) { - if (disposed) { + if (isDisposed()) { return; } + State currState = getState(); --- End diff -- this was added just because `currState` is no longer accessible, right? You're not particularly trying to grab the state before the call to `disconnect()`? Might be clearly to move it after, otherwise on first read it looks like it is trying to grab the state before its modified by `disconnect()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158200163 --- Diff: launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.util.concurrent.ThreadFactory; +import java.util.logging.Level; +import java.util.logging.Logger; + +class InProcessAppHandle extends AbstractAppHandle { + + private static final Logger LOG = Logger.getLogger(ChildProcAppHandle.class.getName()); + private static final ThreadFactory THREAD_FACTORY = new NamedThreadFactory("spark-app-%d"); + + private Thread app; + + InProcessAppHandle(LauncherServer server) { +super(server); + } + + @Override + public synchronized void kill() { +LOG.warning("kill() may leave the underlying app running in in-process mode."); +disconnect(); + +// Interrupt the thread. This is not guaranteed to kill the app, though. --- End diff -- in cluster mode, shouldn't this be pretty safe? If not, wouldn't it be a bug in spark-submit? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19591: [SPARK-11035][core] Add in-process Spark app laun...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19591#discussion_r158201743 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -210,8 +209,13 @@ int getPort() { * Removes the client handle from the pending list (in case it's still there), and unrefs * the server. */ - void unregister(ChildProcAppHandle handle) { -pending.remove(handle.getSecret()); + void unregister(AbstractAppHandle handle) { +for (Map.Entrye : pending.entrySet()) { --- End diff -- you could add a system.identityHashCode "secret" to the InProcessAppHandle to keep the old version, though this is fine too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20019 also cc @gatorsmile whom I believe should be interested to look at this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20019: [SPARK-22361][SQL][TEST] Add unit test for Window...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20019#discussion_r158200860 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.expressions.Window +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.unsafe.types.CalendarInterval + +/** + * Window frame testing for DataFrame API. + */ +class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("lead/lag with empty data frame") { +val df = Seq.empty[(Int, String)].toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +lead("value", 1).over(window), +lag("value", 1).over(window)), + Nil) + } + + test("lead/lag with positive offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", 1).over(window), +lag("value", 1).over(window)), + Row(1, "3", null) :: Row(1, null, "1") :: Row(2, "4", null) :: Row(2, null, "2") :: Nil) + } + + test("reverse lead/lag with positive offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value".desc) + +checkAnswer( + df.select( +$"key", +lead("value", 1).over(window), +lag("value", 1).over(window)), + Row(1, "1", null) :: Row(1, null, "3") :: Row(2, "2", null) :: Row(2, null, "4") :: Nil) + } + + test("lead/lag with negative offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", -1).over(window), +lag("value", -1).over(window)), + Row(1, null, "3") :: Row(1, "1", null) :: Row(2, null, "4") :: Row(2, "2", null) :: Nil) + } + + test("reverse lead/lag with negative offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value".desc) + +checkAnswer( + df.select( +$"key", +lead("value", -1).over(window), +lag("value", -1).over(window)), + Row(1, null, "1") :: Row(1, "3", null) :: Row(2, null, "2") :: Row(2, "4", null) :: Nil) + } + + test("lead/lag with default value") { +val default = "n/a" +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4"), (2, "5")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", 2, default).over(window), +lag("value", 2, default).over(window), +lead("value", -2, default).over(window), +lag("value", -2, default).over(window)), + Row(1, default, default, default, default) :: Row(1, default, default, default, default) :: +Row(2, "5", default, default, "5") :: Row(2, default, "2", "2", default) :: +Row(2, default, default, default, default) :: Nil) + } + + test("rows/range between with empty data frame") { +val df = Seq.empty[(String, Int)].toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +'key, +first("value").over( +
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19984 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20019: [SPARK-22361][SQL][TEST] Add unit test for Window...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20019#discussion_r158201087 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.expressions.Window +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.unsafe.types.CalendarInterval + +/** + * Window frame testing for DataFrame API. + */ +class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("lead/lag with empty data frame") { +val df = Seq.empty[(Int, String)].toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +lead("value", 1).over(window), +lag("value", 1).over(window)), + Nil) + } + + test("lead/lag with positive offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", 1).over(window), +lag("value", 1).over(window)), + Row(1, "3", null) :: Row(1, null, "1") :: Row(2, "4", null) :: Row(2, null, "2") :: Nil) + } + + test("reverse lead/lag with positive offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value".desc) + +checkAnswer( + df.select( +$"key", +lead("value", 1).over(window), +lag("value", 1).over(window)), + Row(1, "1", null) :: Row(1, null, "3") :: Row(2, "2", null) :: Row(2, null, "4") :: Nil) + } + + test("lead/lag with negative offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", -1).over(window), +lag("value", -1).over(window)), + Row(1, null, "3") :: Row(1, "1", null) :: Row(2, null, "4") :: Row(2, "2", null) :: Nil) + } + + test("reverse lead/lag with negative offset") { +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value".desc) + +checkAnswer( + df.select( +$"key", +lead("value", -1).over(window), +lag("value", -1).over(window)), + Row(1, null, "1") :: Row(1, "3", null) :: Row(2, null, "2") :: Row(2, "4", null) :: Nil) + } + + test("lead/lag with default value") { +val default = "n/a" +val df = Seq((1, "1"), (2, "2"), (1, "3"), (2, "4"), (2, "5")).toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +$"key", +lead("value", 2, default).over(window), +lag("value", 2, default).over(window), +lead("value", -2, default).over(window), +lag("value", -2, default).over(window)), + Row(1, default, default, default, default) :: Row(1, default, default, default, default) :: +Row(2, "5", default, default, "5") :: Row(2, default, "2", "2", default) :: +Row(2, default, default, default, default) :: Nil) + } + + test("rows/range between with empty data frame") { +val df = Seq.empty[(String, Int)].toDF("key", "value") +val window = Window.partitionBy($"key").orderBy($"value") + +checkAnswer( + df.select( +'key, +first("value").over( +
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19984 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85230/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19984 **[Test build #85230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85230/testReport)** for PR 19984 at commit [`825d437`](https://github.com/apache/spark/commit/825d437fe1e897c2047171ce78c6bb92805dc5be). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20036 **[Test build #85236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85236/testReport)** for PR 20036 at commit [`53661eb`](https://github.com/apache/spark/commit/53661eb72bba55376bc6112b51c25489522d309c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20035 **[Test build #85237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)** for PR 20035 at commit [`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20036 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20035 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158200471 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkContext -import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, Command, LogicalPlan} +import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker +import org.apache.spark.sql.execution.datasources.FileFormatWriter import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration - /** * A special `RunnableCommand` which writes data out and updates metrics. */ -trait DataWritingCommand extends RunnableCommand { - +trait DataWritingCommand extends Command { /** * The input query plan that produces the data to be written. + * IMPORTANT: the input query plan MUST be analyzed, so that we can carry its output columns + *to [[FileFormatWriter]]. */ def query: LogicalPlan - // We make the input `query` an inner child instead of a child in order to hide it from the - // optimizer. This is because optimizer may not preserve the output schema names' case, and we - // have to keep the original analyzed plan here so that we can pass the corrected schema to the - // writer. The schema of analyzed plan is what user expects(or specifies), so we should respect - // it when writing. - override protected def innerChildren: Seq[LogicalPlan] = query :: Nil + override def children: Seq[LogicalPlan] = query :: Nil --- End diff -- I tried but it will cause runtime exception. When the `AnalysisBarrier` is removed by analyzer, the child of `DataWritingCommand` is no longer `AnalysisBarrier` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158197728 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkContext -import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, Command, LogicalPlan} +import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker +import org.apache.spark.sql.execution.datasources.FileFormatWriter import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration - /** * A special `RunnableCommand` which writes data out and updates metrics. --- End diff -- Not a `RunnableCommand` now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158198595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkContext -import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, Command, LogicalPlan} +import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker +import org.apache.spark.sql.execution.datasources.FileFormatWriter import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration - /** * A special `RunnableCommand` which writes data out and updates metrics. */ -trait DataWritingCommand extends RunnableCommand { - +trait DataWritingCommand extends Command { /** * The input query plan that produces the data to be written. + * IMPORTANT: the input query plan MUST be analyzed, so that we can carry its output columns + *to [[FileFormatWriter]]. */ def query: LogicalPlan - // We make the input `query` an inner child instead of a child in order to hide it from the - // optimizer. This is because optimizer may not preserve the output schema names' case, and we - // have to keep the original analyzed plan here so that we can pass the corrected schema to the - // writer. The schema of analyzed plan is what user expects(or specifies), so we should respect - // it when writing. - override protected def innerChildren: Seq[LogicalPlan] = query :: Nil + override def children: Seq[LogicalPlan] = query :: Nil - override lazy val metrics: Map[String, SQLMetric] = { + // Output columns of the analyzed input query plan + def allColumns: Seq[Attribute] --- End diff -- `outputColumns`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158198250 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -87,6 +87,51 @@ case class ExecutedCommandExec(cmd: RunnableCommand) extends LeafExecNode { } } --- End diff -- I think the `metrics` in `RunnableCommand` is mainly for use in `DataWritingCommand`. Now `DataWritingCommand` is a separate class other than `RunnableCommand`, do we still need `metrics` in `RunnableCommand`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158198088 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -87,6 +87,51 @@ case class ExecutedCommandExec(cmd: RunnableCommand) extends LeafExecNode { } } +/** + * A physical operator that executes the run method of a `RunnableCommand` and + * saves the result to prevent multiple executions. + * + * @param cmd the `RunnableCommand` this operator will run. --- End diff -- `DataWritingCommand` is not a `RunnableCommand`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158199084 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -20,30 +20,32 @@ package org.apache.spark.sql.execution.command import org.apache.hadoop.conf.Configuration import org.apache.spark.SparkContext -import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{AnalysisBarrier, Command, LogicalPlan} +import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker +import org.apache.spark.sql.execution.datasources.FileFormatWriter import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.util.SerializableConfiguration - /** * A special `RunnableCommand` which writes data out and updates metrics. */ -trait DataWritingCommand extends RunnableCommand { - +trait DataWritingCommand extends Command { /** * The input query plan that produces the data to be written. + * IMPORTANT: the input query plan MUST be analyzed, so that we can carry its output columns + *to [[FileFormatWriter]]. */ def query: LogicalPlan - // We make the input `query` an inner child instead of a child in order to hide it from the - // optimizer. This is because optimizer may not preserve the output schema names' case, and we - // have to keep the original analyzed plan here so that we can pass the corrected schema to the - // writer. The schema of analyzed plan is what user expects(or specifies), so we should respect - // it when writing. - override protected def innerChildren: Seq[LogicalPlan] = query :: Nil + override def children: Seq[LogicalPlan] = query :: Nil --- End diff -- Add `AnalysisBarrier` around `query` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20020 > since now we let inserting commands have a child, it makes sense to wrap the child with AnalysisBarrier, to avoid re-analyzing them. It makes sense to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85228/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20026 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20026 **[Test build #85228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85228/testReport)** for PR 20026 at commit [`6fd07a6`](https://github.com/apache/spark/commit/6fd07a66e58fdce99f035679b074b7f4eec34c6f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85227/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #85227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85227/testReport)** for PR 19981 at commit [`94ee3c7`](https://github.com/apache/spark/commit/94ee3c7ddbe776783dcb880d9daa05f598a418a9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20040: [SPARK-22852][BUILD] Exclude -Xlint:unchecked from sbt j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20040 **[Test build #4020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4020/testReport)** for PR 20040 at commit [`1827c2d`](https://github.com/apache/spark/commit/1827c2d18953d569ad903d21360c3ea5b807f76b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85235/testReport)** for PR 19977 at commit [`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19977 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #85234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85234/testReport)** for PR 20043 at commit [`5ace8b8`](https://github.com/apache/spark/commit/5ace8b83b7c90cd5a6a451812ac9c1087aaa1c29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85233/testReport)** for PR 20008 at commit [`ec07bc2`](https://github.com/apache/spark/commit/ec07bc2a463b089dd5798ab9e6bf8aea1b8ccd28). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20008 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20035 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org