[GitHub] [spark] SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555884834 **[Test build #114141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114141/testReport)** for PR 26604 at commit [`164507c`](https://github.com/apache/spark/commit/164507ce26ddcf20e9c970cccf7746cc78a6119d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348333421 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -372,12 +372,44 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA assert(table.provider == Some("org.apache.spark.sql.hive.orc")) } } + + test("Database Ownership") { +val catalog = spark.sessionState.catalog +try { + val dbName = "spark_29425" + val location = getDBPath(dbName) + + sql(s"CREATE DATABASE $dbName") + + checkAnswer( +sql(s"DESCRIBE DATABASE $dbName"), +Row("Database Name", dbName) :: + Row("Description", "") :: + Row("Location", CatalogUtils.URIToString(location)) :: + Row("Owner Name", Utils.getCurrentUserName()) :: + Row("Owner Type", "USER") :: Nil) + + sql(s"ALTER DATABASE $dbName SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')") Review comment: I'm not talking about this specific case but the general framework. What if we need to add "lastAccessTime" later? We can't keep adding new fields and break the API. Using properties is more future-proof. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348332940 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ## @@ -170,6 +171,34 @@ class VersionsSuite extends SparkFunSuite with Logging { client.createDatabase(tempDB, ignoreIfExists = true) } +test(s"$version: create/get/alter database should pick right user name as owner") { + if (version != "0.12") { +val currentUser = UserGroupInformation.getCurrentUser.getUserName +val ownerName = "SPARK_29425" +val db1 = "SPARK_29425_1" +val db2 = "SPARK_29425_2" +val ownerProps = Map("ownerName" -> ownerName) + +// create database with owner +val dbWithOwner = CatalogDatabase(db1, "desc", Utils.createTempDir().toURI, ownerProps) +client.createDatabase(dbWithOwner, ignoreIfExists = true) +val getDbWithOwner = client.getDatabase(db1) +assert(getDbWithOwner.properties("ownerName") === ownerName) +// alter database without owner +client.alterDatabase(getDbWithOwner.copy(properties = Map())) +assert(client.getDatabase(getDbWithOwner.name).properties("ownerName") === currentUser) Review comment: so the owner gets reset to `currentUser`. Is it expected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jalpan-randeri commented on a change in pull request #26585: [WIP][SPARK-25351][SQL][Python] Handle Pandas category type when converting from Python with Arrow
jalpan-randeri commented on a change in pull request #26585: [WIP][SPARK-25351][SQL][Python] Handle Pandas category type when converting from Python with Arrow URL: https://github.com/apache/spark/pull/26585#discussion_r348282920 ## File path: python/pyspark/serializers.py ## @@ -298,7 +298,10 @@ def create_array(s, t): if t is not None and pa.types.is_timestamp(t): s = _check_series_convert_timestamps_internal(s, self._timezone) try: -array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck) +if str(s.dtype) == 'category': +array = pa.array(s.get_values()) Review comment: I can think of introducing arrow dictionary type in Spark DataFrame, this way we can convert Spark DataFrame -> pandas DataFrame -> Spark DataFrame. However, this will change DataFrame types in Arrow mode vs non Arrow mode. For example, ``` >>> spark.conf.set("spark.sql.execution.arrow.enabled", False) >>> df = spark.createDataFrame(pdf) >>> df.printSchema() root |-- A: string (nullable = true) |-- B: string (nullable = true) --- >>> spark.conf.set("spark.sql.execution.arrow.enabled", True) >>> df = spark.createDataFrame(pdf) >>> df.printSchema() root |-- A: string (nullable = true) |-- B: dictionary - values: string (nullable = true) ``` I am not sure if this may break some use case or correct way to do it. can i get advice/guidance around this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555882472 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348331643 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -372,12 +372,44 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA assert(table.provider == Some("org.apache.spark.sql.hive.orc")) } } + + test("Database Ownership") { +val catalog = spark.sessionState.catalog +try { + val dbName = "spark_29425" + val location = getDBPath(dbName) + + sql(s"CREATE DATABASE $dbName") + + checkAnswer( +sql(s"DESCRIBE DATABASE $dbName"), +Row("Database Name", dbName) :: + Row("Description", "") :: + Row("Location", CatalogUtils.URIToString(location)) :: + Row("Owner Name", Utils.getCurrentUserName()) :: + Row("Owner Type", "USER") :: Nil) + + sql(s"ALTER DATABASE $dbName SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')") Review comment: DCL support or ACL Management for Spark SQL is a necessary feature in the long term, I guess. And these fields have been stable in hive metastore API for years, we may consider following that in our own catalog API. BTW, we already have tests for `Behavior After` in `VersionsSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555882479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18999/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555882472 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555882479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18999/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555882071 **[Test build #114140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114140/testReport)** for PR 26604 at commit [`74183e6`](https://github.com/apache/spark/commit/74183e6e1a1ce930f07979678d494224b6bd41ad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348330752 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -56,6 +64,38 @@ trait Predicate extends Expression { override def dataType: DataType = BooleanType } +/** + * The factory object for `BasePredicate`. + */ +object Predicate extends CodeGeneratorWithInterpretedFallback[Expression, BasePredicate] { + + override protected def createCodeGeneratedObject(in: Expression): BasePredicate = { +GeneratePredicate.generate(in) + } + + override protected def createInterpretedObject(in: Expression): BasePredicate = { +InterpretedPredicate(in) + } + + def createInterpreted(e: Expression, inputSchema: Seq[Attribute]): InterpretedPredicate = Review comment: this method seems not used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26466: [SPARK-29839][SQL] Supporting STORED AS in CREATE TABLE LIKE
cloud-fan commented on a change in pull request #26466: [SPARK-29839][SQL] Supporting STORED AS in CREATE TABLE LIKE URL: https://github.com/apache/spark/pull/26466#discussion_r348330397 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ## @@ -53,24 +53,31 @@ import org.apache.spark.sql.util.SchemaUtils * are identical to the ones defined in the source table. * * The CatalogTable attributes copied from the source table are storage(inputFormat, outputFormat, - * serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec. + * serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec by default. * - * Use "CREATE TABLE t1 LIKE t2 USING file_format" - * to specify new file format for t1 from a data source table t2. + * Use "CREATE TABLE t1 LIKE t2 USING file_format" to specify new provider for t1. + * For Hive compatibility, use "CREATE TABLE t1 LIKE t2 STORED AS hiveFormat" + * to specify new file storage format (inputFormat, outputFormat, serde) for t1. * * The syntax of using this command in SQL is: * {{{ * CREATE TABLE [IF NOT EXISTS] [db_name.]table_name - * LIKE [other_db_name.]existing_table_name [USING provider] [locationSpec] + * LIKE [other_db_name.]existing_table_name [USING provider | STORED AS hiveFormat] + * [locationSpec] [TBLPROPERTIES (property_name=property_value, ...)] * }}} */ case class CreateTableLikeCommand( targetTable: TableIdentifier, sourceTable: TableIdentifier, provider: Option[String], +hiveFormat: Option[CatalogStorageFormat], location: Option[String], +properties: Map[String, String] = Map.empty, ifNotExists: Boolean) extends RunnableCommand { + assert(!(hiveFormat.isDefined && provider.isDefined), Review comment: Can we check this at the parser side? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26466: [SPARK-29839][SQL] Supporting STORED AS in CREATE TABLE LIKE
cloud-fan commented on a change in pull request #26466: [SPARK-29839][SQL] Supporting STORED AS in CREATE TABLE LIKE URL: https://github.com/apache/spark/pull/26466#discussion_r348330080 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ## @@ -53,24 +53,31 @@ import org.apache.spark.sql.util.SchemaUtils * are identical to the ones defined in the source table. * * The CatalogTable attributes copied from the source table are storage(inputFormat, outputFormat, - * serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec. + * serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec by default. * - * Use "CREATE TABLE t1 LIKE t2 USING file_format" - * to specify new file format for t1 from a data source table t2. + * Use "CREATE TABLE t1 LIKE t2 USING file_format" to specify new provider for t1. + * For Hive compatibility, use "CREATE TABLE t1 LIKE t2 STORED AS hiveFormat" + * to specify new file storage format (inputFormat, outputFormat, serde) for t1. * * The syntax of using this command in SQL is: * {{{ * CREATE TABLE [IF NOT EXISTS] [db_name.]table_name - * LIKE [other_db_name.]existing_table_name [USING provider] [locationSpec] + * LIKE [other_db_name.]existing_table_name [USING provider | STORED AS hiveFormat] + * [locationSpec] [TBLPROPERTIES (property_name=property_value, ...)] * }}} */ case class CreateTableLikeCommand( targetTable: TableIdentifier, sourceTable: TableIdentifier, provider: Option[String], +hiveFormat: Option[CatalogStorageFormat], location: Option[String], Review comment: `CatalogStorageFormat` has the location field, shall we remove this parameter? For data source table, we can create `CatalogStorageFormat.empty.copy(locationUri = ...)` in the parser This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348327701 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -372,12 +372,44 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA assert(table.provider == Some("org.apache.spark.sql.hive.orc")) } } + + test("Database Ownership") { +val catalog = spark.sessionState.catalog +try { + val dbName = "spark_29425" + val location = getDBPath(dbName) + + sql(s"CREATE DATABASE $dbName") + + checkAnswer( +sql(s"DESCRIBE DATABASE $dbName"), +Row("Database Name", dbName) :: + Row("Description", "") :: + Row("Location", CatalogUtils.URIToString(location)) :: + Row("Owner Name", Utils.getCurrentUserName()) :: + Row("Owner Type", "USER") :: Nil) + + sql(s"ALTER DATABASE $dbName SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')") Review comment: we should definitely add `ALTER DATABASE dbname SET OWNER USER userName`, but I'm not sure it's the right approach to keep adding fields to `CatalogDatabase`/`CatalogTable` in the long term. `CatalogDatabase`/`CatalogTable` are private so we can change them. But in the long term, we will have a stable catalog API (e.g. `TableCatalog`), and at that time it's not allowed to add more fields in newer versions. Can we add tests to verify the `Behavior After`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26563: [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple`
cloud-fan commented on issue #26563: [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple` URL: https://github.com/apache/spark/pull/26563#issuecomment-555876761 thanks, merging to 2.4! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #26473: [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals
MaxGekk commented on a change in pull request #26473: [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals URL: https://github.com/apache/spark/pull/26473#discussion_r348327011 ## File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql ## @@ -97,17 +97,24 @@ select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisec select interval '30' year '25' month '-100' day '40' hour '80' minute '299.889987299' second; select interval '0 0:0:0.1' day to second; select interval '10-9' year to month; +-- SPARK-29933: ThriftServerQueryTestSuite runs tests with wrong settings +set spark.sql.dialect=Spark; Review comment: Look at the comment above, `ThriftServerQueryTestSuite` runs this in the PostgreSQL dialect, and `select interval '20 15' day to hour` fails because current implementation (enabled for PostgreSQL dialect) has the bug: https://github.com/apache/spark/pull/26473#issuecomment-554456769 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #26563: [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple`
cloud-fan closed pull request #26563: [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple` URL: https://github.com/apache/spark/pull/26563 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348325143 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -372,12 +372,44 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA assert(table.provider == Some("org.apache.spark.sql.hive.orc")) } } + + test("Database Ownership") { +val catalog = spark.sessionState.catalog +try { + val dbName = "spark_29425" + val location = getDBPath(dbName) + + sql(s"CREATE DATABASE $dbName") + + checkAnswer( +sql(s"DESCRIBE DATABASE $dbName"), +Row("Database Name", dbName) :: + Row("Description", "") :: + Row("Location", CatalogUtils.URIToString(location)) :: + Row("Owner Name", Utils.getCurrentUserName()) :: + Row("Owner Type", "USER") :: Nil) + + sql(s"ALTER DATABASE $dbName SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')") Review comment: Behavior Before: we `create db` with no owner, `alter db` erase the owner if exists not with our default spark user. Behavior After: we `create db` with the spark user as default, or with dbProps if ownerName exists; we `alter db` will prefer the owner in order of `specified ownerName` -> `original db's ownerName` -> `spark's default if the foregoing ones are null or empty`. `ALTER DATABASE dbname SET DBPROPERTIES('ownerName'='userName')` equals Hive's `ALTER DATABASE dbname SET OWNER USER userName`. I suggest we make the ownerName and its type become members of `CatalogDabase` in followup, and the `CatalogTable` too. Then support, ```sql ALTER [DATABASE|SCHEMA] dbname SET OWNER [USER|ROLE] userName ALTER TABLE tblname SET OWNER [USER|ROLE] userName ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ayudovin commented on issue #26349: [SPARK-27558][CORE] Gracefully cleanup task when it fails with OOM exception
ayudovin commented on issue #26349: [SPARK-27558][CORE] Gracefully cleanup task when it fails with OOM exception URL: https://github.com/apache/spark/pull/26349#issuecomment-555875526 @gatorsmile, ok, no problem, I'll do it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
AmplabJenkins commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555875275 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
AmplabJenkins removed a comment on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555875281 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18998/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348324412 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ## @@ -148,7 +147,7 @@ object ExternalCatalogUtils { } val boundPredicate = -InterpretedPredicate.create(predicates.reduce(And).transform { +Predicate.createInterpretedPredicate(predicates.reduce(And).transform { Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
AmplabJenkins commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555875281 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18998/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
AmplabJenkins removed a comment on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555875275 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555875003 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555875003 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555875010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114133/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
SparkQA removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856222 **[Test build #114133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114133/testReport)** for PR 26596 at commit [`40be7c6`](https://github.com/apache/spark/commit/40be7c6815ee4aa1604670ec870b6558f5decdea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555875010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114133/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
SparkQA commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555874711 **[Test build #114133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114133/testReport)** for PR 26596 at commit [`40be7c6`](https://github.com/apache/spark/commit/40be7c6815ee4aa1604670ec870b6558f5decdea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
SparkQA commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555874787 **[Test build #114139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114139/testReport)** for PR 26591 at commit [`6d9e427`](https://github.com/apache/spark/commit/6d9e4278ed5897a283a72efb41509479d83f1052). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
cloud-fan commented on issue #26591: [SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions URL: https://github.com/apache/spark/pull/26591#issuecomment-555873657 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26473: [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals
cloud-fan commented on a change in pull request #26473: [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals URL: https://github.com/apache/spark/pull/26473#discussion_r348322840 ## File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql ## @@ -97,17 +97,24 @@ select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisec select interval '30' year '25' month '-100' day '40' hour '80' minute '299.889987299' second; select interval '0 0:0:0.1' day to second; select interval '10-9' year to month; +-- SPARK-29933: ThriftServerQueryTestSuite runs tests with wrong settings +set spark.sql.dialect=Spark; Review comment: hmm, why we set the config to its default value? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26569: [SPARK-29938] [SQL] Add batching support in Alter table add partition flow
cloud-fan commented on a change in pull request #26569: [SPARK-29938] [SQL] Add batching support in Alter table add partition flow URL: https://github.com/apache/spark/pull/26569#discussion_r348320655 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ## @@ -472,6 +472,21 @@ class InsertSuite extends DataSourceTest with SharedSparkSession { } } + test("new partitions should be added to catalog after writing to catalog table") { +val table = "partitioned_catalog_table" +val numParts = 210 +withTable(table) { + case class TableRow(part: Int, col1: Int) Review comment: it's not used. can we remove? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26569: [SPARK-29938] [SQL] Add batching support in Alter table add partition flow
cloud-fan commented on a change in pull request #26569: [SPARK-29938] [SQL] Add batching support in Alter table add partition flow URL: https://github.com/apache/spark/pull/26569#discussion_r348320390 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ## @@ -470,14 +470,36 @@ case class AlterTableAddPartitionCommand( CatalogTablePartition(normalizedSpec, table.storage.copy( locationUri = location.map(CatalogUtils.stringToURI))) } -catalog.createPartitions(table.identifier, parts, ignoreIfExists = ifNotExists) + +// Hive metastore may not have enough memory to handle millions of partitions in single RPC. +// Also the request to metastore times out when adding lot of partitions in one shot. +// we should split them into smaller batches +val batchSize = 100 Review comment: can we make it configurable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
cloud-fan commented on issue #26519: [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab URL: https://github.com/apache/spark/pull/26519#issuecomment-555867087 @LucaCanali we need to update org.apache.spark.sql.execution.metric.SQLMetricsSuite.WholeStageCodegen metrics This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555866745 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555866752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114127/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string
AmplabJenkins removed a comment on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string URL: https://github.com/apache/spark/pull/26592#issuecomment-555866683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18997/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string
AmplabJenkins removed a comment on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string URL: https://github.com/apache/spark/pull/26592#issuecomment-555866670 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string
AmplabJenkins commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string URL: https://github.com/apache/spark/pull/26592#issuecomment-555866683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18997/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555866745 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string
AmplabJenkins commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string URL: https://github.com/apache/spark/pull/26592#issuecomment-555866670 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555866752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114127/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555817265 **[Test build #114127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114127/testReport)** for PR 26604 at commit [`4b1e311`](https://github.com/apache/spark/commit/4b1e3119e998e68ef3f58a82fb20440e46dbd8a0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555866247 **[Test build #114127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114127/testReport)** for PR 26604 at commit [`4b1e311`](https://github.com/apache/spark/commit/4b1e3119e998e68ef3f58a82fb20440e46dbd8a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class SpecificPredicate extends $` * `abstract class BasePredicate ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string
SparkQA commented on issue #26592: [SPARK-29371][SQL] Fractional representation for interval string URL: https://github.com/apache/spark/pull/26592#issuecomment-555866227 **[Test build #114138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114138/testReport)** for PR 26592 at commit [`5604b83`](https://github.com/apache/spark/commit/5604b8393cc9b62e16c5a6d2dc447802e0da62d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
cloud-fan commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r348315681 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -372,12 +372,44 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA assert(table.provider == Some("org.apache.spark.sql.hive.orc")) } } + + test("Database Ownership") { +val catalog = spark.sessionState.catalog +try { + val dbName = "spark_29425" + val location = getDBPath(dbName) + + sql(s"CREATE DATABASE $dbName") + + checkAnswer( +sql(s"DESCRIBE DATABASE $dbName"), +Row("Database Name", dbName) :: + Row("Description", "") :: + Row("Location", CatalogUtils.URIToString(location)) :: + Row("Owner Name", Utils.getCurrentUserName()) :: + Row("Owner Type", "USER") :: Nil) + + sql(s"ALTER DATABASE $dbName SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')") Review comment: Just to confirm: when a user runs the command `ALTER DATABASE SET DBPROPERTIES`, we will reset owner to the default value if the DBPROPERTIES doesn't contain owner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348313030 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala ## @@ -88,7 +88,7 @@ class SimpleTextSource extends TextBasedFileFormat with DataSourceRegister { val attribute = inputAttributes.find(_.name == column).get expressions.GreaterThan(attribute, literal) }.reduceOption(expressions.And).getOrElse(Literal(true)) -InterpretedPredicate.create(filterCondition, inputAttributes) +Predicate.createInterpretedPredicate(filterCondition, inputAttributes) Review comment: This is a testing source, we don't care it's codegen or not. Should be fine to call `Predicate.create` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348312432 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ## @@ -148,7 +147,7 @@ object ExternalCatalogUtils { } val boundPredicate = -InterpretedPredicate.create(predicates.reduce(And).transform { +Predicate.createInterpretedPredicate(predicates.reduce(And).transform { Review comment: nit: double `predicate` looks weird, how about just `Predicate.createInterpreted`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-555860587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18996/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555860570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18994/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
AmplabJenkins commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555860549 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18995/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555860562 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
AmplabJenkins removed a comment on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555860540 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555860562 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555860570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18994/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
AmplabJenkins commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555860540 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
AmplabJenkins removed a comment on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555860549 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18995/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins removed a comment on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-555860581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-555860581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
AmplabJenkins commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-555860587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18996/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
SparkQA commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555860216 **[Test build #114136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114136/testReport)** for PR 26599 at commit [`7acdf39`](https://github.com/apache/spark/commit/7acdf39b7538e74e4b120290d07562766438113e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
SparkQA commented on issue #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#issuecomment-555860257 **[Test build #114137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114137/testReport)** for PR 26509 at commit [`5b0923c`](https://github.com/apache/spark/commit/5b0923cb77d47b96153ac94adac02f9cfececede). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555860221 **[Test build #114135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114135/testReport)** for PR 26604 at commit [`0896afe`](https://github.com/apache/spark/commit/0896afe8dd31595bd88465fc3959d35f9b692908). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
viirya commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#discussion_r348309460 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -129,6 +130,36 @@ class RelationalGroupedDataset protected[sql]( (inputExpr: Expression) => exprToFunc(inputExpr) } + /** + * Returns a `KeyValueGroupedDataset` where the data is grouped by the grouping expressions + * of current `RelationalGroupedDataset`. + * + * @since 3.0.0 + */ + def as[K: Encoder, T: Encoder]: KeyValueGroupedDataset[K, T] = { +val keyEncoder = encoderFor[K] +val valueEncoder = encoderFor[T] + +// Resolves grouping expressions. +val dummyPlan = Project(groupingExprs.map(alias), LocalRelation(df.logicalPlan.output)) +val analyzedPlan = SimpleAnalyzer.execute(dummyPlan).asInstanceOf[Project] Review comment: Got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555858682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18993/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555858675 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns
cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns URL: https://github.com/apache/spark/pull/26593#discussion_r348308559 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -468,12 +477,21 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { s"Unsupported value type ${v.getClass.getName} ($v).") } + private def toAttributes(cols: Seq[String]): Seq[Attribute] = { +cols.map(df.col(_).named.toAttribute) Review comment: we should be more strict: ``` cols.map(resolve).map { case a: Attribute => a case _ => fail(is not a top-level column) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555858675 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555858682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18993/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns
cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns URL: https://github.com/apache/spark/pull/26593#discussion_r348307899 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -468,12 +477,21 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { s"Unsupported value type ${v.getClass.getName} ($v).") } + private def toAttributes(cols: Seq[String]): Seq[Attribute] = { +cols.map(df.col(_).named.toAttribute) + } + + private def outputAttributes: Seq[Attribute] = { +df.queryExecution.analyzed.output + } + /** * Returns a new `DataFrame` that replaces null or NaN values in specified * numeric, string columns. If a specified column is not a numeric, string - * or boolean column it is ignored. + * or boolean column it is ignored. If `cols` is empty, fill() is applied to + * all the eligible columns. Review comment: really? Looking at the code, if `cols` is empty, seems we do nothing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns
cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns URL: https://github.com/apache/spark/pull/26593#discussion_r348307105 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -349,7 +349,7 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { * * // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname". * df.na.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed")); - * }}} + * }}}outputAttributes Review comment: a mistake? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
AmplabJenkins removed a comment on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double URL: https://github.com/apache/spark/pull/26595#issuecomment-555856650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555856633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
AmplabJenkins removed a comment on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double URL: https://github.com/apache/spark/pull/26595#issuecomment-555856656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins removed a comment on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856611 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins removed a comment on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555856626 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
AmplabJenkins commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double URL: https://github.com/apache/spark/pull/26595#issuecomment-555856650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
AmplabJenkins commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double URL: https://github.com/apache/spark/pull/26595#issuecomment-555856656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18992/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18991/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#discussion_r348306656 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ## @@ -2221,4 +,62 @@ class DataFrameSuite extends QueryTest with SharedSparkSession { val idTuples = sampled.collect().map(row => row.getLong(0) -> row.getLong(1)) assert(idTuples.length == idTuples.toSet.size) } + + test("groupBy.keyAs") { Review comment: name `keyAs` needs update now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
AmplabJenkins commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856611 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555856626 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
AmplabJenkins commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555856633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset
cloud-fan commented on a change in pull request #26509: [SPARK-29427][SQL] Add API to convert RelationalGroupedDataset to KeyValueGroupedDataset URL: https://github.com/apache/spark/pull/26509#discussion_r348306336 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -129,6 +130,36 @@ class RelationalGroupedDataset protected[sql]( (inputExpr: Expression) => exprToFunc(inputExpr) } + /** + * Returns a `KeyValueGroupedDataset` where the data is grouped by the grouping expressions + * of current `RelationalGroupedDataset`. + * + * @since 3.0.0 + */ + def as[K: Encoder, T: Encoder]: KeyValueGroupedDataset[K, T] = { +val keyEncoder = encoderFor[K] +val valueEncoder = encoderFor[T] + +// Resolves grouping expressions. +val dummyPlan = Project(groupingExprs.map(alias), LocalRelation(df.logicalPlan.output)) +val analyzedPlan = SimpleAnalyzer.execute(dummyPlan).asInstanceOf[Project] Review comment: We can use `df.sparkSession.sessionState.analyzer`, instead of a fake one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics
SparkQA commented on issue #26596: [SPARK-29959][ML][PYSPARK] Summarizer support more metrics URL: https://github.com/apache/spark/pull/26596#issuecomment-555856222 **[Test build #114133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114133/testReport)** for PR 26596 at commit [`40be7c6`](https://github.com/apache/spark/commit/40be7c6815ee4aa1604670ec870b6558f5decdea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double
SparkQA commented on issue #26595: [SPARK-29956][SQL] A literal number with an exponent should be parsed to Double URL: https://github.com/apache/spark/pull/26595#issuecomment-555856223 **[Test build #114134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114134/testReport)** for PR 26595 at commit [`24d19de`](https://github.com/apache/spark/commit/24d19debab91014e0524ae7bef3db43ef583f1d3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
SparkQA commented on issue #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#issuecomment-555856219 **[Test build #114132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114132/testReport)** for PR 26604 at commit [`bed96f4`](https://github.com/apache/spark/commit/bed96f4b8be74f697eca0571bce24b8d074a7ee3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof
cloud-fan commented on issue #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#issuecomment-555855696 We've marked UDT as interval API, I don't think we need to support it in the end-user function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26599: [SPARK-29961][SQL] Implement builtin function - typeof
cloud-fan commented on a change in pull request #26599: [SPARK-29961][SQL] Implement builtin function - typeof URL: https://github.com/apache/spark/pull/26599#discussion_r348305688 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ## @@ -177,3 +177,22 @@ case class Version() extends LeafExpression with CodegenFallback { UTF8String.fromString(SPARK_VERSION_SHORT + " " + SPARK_REVISION) } } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """_FUNC_(expr) - Return readable string representation for the data type of the input.""", + examples = """ + Examples: + > SELECT _FUNC_(1); + int + > SELECT _FUNC_(array(1)); + array + """, + since = "3.0.0") +// scalastyle:on line.size.limit +case class TypeOf(child: Expression) extends UnaryExpression with CodegenFallback { Review comment: This is foldable so Spark should be able to turn it into literal before entering codegen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348305635 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -56,6 +64,33 @@ trait Predicate extends Expression { override def dataType: DataType = BooleanType } +/** + * The factory object for `BasePredicate`. + */ +object Predicate extends CodeGeneratorWithInterpretedFallback[Expression, BasePredicate] { + + override protected def createCodeGeneratedObject(in: Expression): BasePredicate = { +GeneratePredicate.generate(in) + } + + override protected def createInterpretedObject(in: Expression): BasePredicate = { +InterpretedPredicate(in) + } + + /** + * Returns a BasePredicate for a bound Expression. + */ + def create(expr: Expression): BasePredicate = { +create(expr) + } + Review comment: Yea. ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348305367 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -56,6 +64,33 @@ trait Predicate extends Expression { override def dataType: DataType = BooleanType } +/** + * The factory object for `BasePredicate`. + */ +object Predicate extends CodeGeneratorWithInterpretedFallback[Expression, BasePredicate] { + + override protected def createCodeGeneratedObject(in: Expression): BasePredicate = { +GeneratePredicate.generate(in) + } + + override protected def createInterpretedObject(in: Expression): BasePredicate = { +InterpretedPredicate(in) + } + + /** + * Returns a BasePredicate for a bound Expression. + */ + def create(expr: Expression): BasePredicate = { +create(expr) + } + Review comment: We can add a method `def createInterpreted...` for places that want to use interpreted predicates. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348305310 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1507,7 +1507,7 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Filter(condition, LocalRelation(output, data, isStreaming)) if !hasUnevaluableExpr(condition) => - val predicate = InterpretedPredicate.create(condition, output) + val predicate = Predicate.create(condition, output) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
maropu commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348305290 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala ## @@ -171,7 +171,7 @@ abstract class PartitioningAwareFileIndex( if (partitionPruningPredicates.nonEmpty) { val predicate = partitionPruningPredicates.reduce(expressions.And) - val boundPredicate = InterpretedPredicate.create(predicate.transform { + val boundPredicate = Predicate.create(predicate.transform { Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26605: Updated JavaSparkPi
HyukjinKwon commented on a change in pull request #26605: Updated JavaSparkPi URL: https://github.com/apache/spark/pull/26605#discussion_r348305082 ## File path: examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java ## @@ -34,6 +34,7 @@ public static void main(String[] args) throws Exception { SparkSession spark = SparkSession .builder() .appName("JavaSparkPi") + .config("spark.master", "local") Review comment: This example is supposed to run with, for example, `./bin/run-example JavaSparkPi 10`. I think it's fine without the master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348304894 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1507,7 +1507,7 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Filter(condition, LocalRelation(output, data, isStreaming)) if !hasUnevaluableExpr(condition) => - val predicate = InterpretedPredicate.create(condition, output) + val predicate = Predicate.create(condition, output) Review comment: This is to optimize local relation so perf doesn't matter too much. The change should be fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan
cloud-fan commented on a change in pull request #26604: [SPARK-29968][SQL] Remove the Predicate code from SparkPlan URL: https://github.com/apache/spark/pull/26604#discussion_r348305020 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala ## @@ -171,7 +171,7 @@ abstract class PartitioningAwareFileIndex( if (partitionPruningPredicates.nonEmpty) { val predicate = partitionPruningPredicates.reduce(expressions.And) - val boundPredicate = InterpretedPredicate.create(predicate.transform { + val boundPredicate = Predicate.create(predicate.transform { Review comment: to be safe, I think we should keep using the interpreted version, in case there are only a few partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jobitmathew84 commented on issue #26546: [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean
jobitmathew84 commented on issue #26546: [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean URL: https://github.com/apache/spark/pull/26546#issuecomment-555854221 @cloud-fan or @maropu ,can you please update the SPARK-29913 JIRA status also.Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org