[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71474127 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -316,27 +340,25 @@ object CreateDataSourceTableUtils extends Logging { tableProperties.put(DATASOURCE_PROVIDER, provider) // Saves optional user specified schema. Serialized JSON schema string may be too long to be --- End diff -- I think this comment is not correct anymore? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14278: [SPARK-16632][SQL] Use Spark requested schema to guide v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14278 **[Test build #62583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62583/consoleFull)** for PR 14278 at commit [`2ade381`](https://github.com/apache/spark/commit/2ade381403080d1390a34b44366ade05f42f6d4f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14086 **[Test build #62582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62582/consoleFull)** for PR 14086 at commit [`98c81c7`](https://github.com/apache/spark/commit/98c81c7bc14f8514a96a0e63f89cd98da25d43f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14278: [SPARK-16632][SQL] Use Spark requested schema to ...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14278 [SPARK-16632][SQL] Use Spark requested schema to guide vectorized Parquet reader initialization ## What changes were proposed in this pull request? In `SpecificParquetRecordReaderBase`, which is used by the vectorized Parquet reader, we convert the Parquet requested schema into a Spark schema to guide column reader initialization. However, the Parquet requested schema is tailored from the schema of the physical file being scanned, and may have inaccurate type information due to bugs of other systems (e.g. HIVE-14294). On the other hand, we already set the real Spark requested schema into Hadoop configuration in [`ParquetFileFormat`][1]. This PR simply reads out this schema to replace the converted one. ## How was this patch tested? New test case added in `ParquetQuerySuite`. [1]: https://github.com/apache/spark/blob/v2.0.0-rc5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L292-L294 You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-16632-simpler-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14278 commit 2ade381403080d1390a34b44366ade05f42f6d4f Author: Cheng Lian Date: 2016-07-20T06:31:10Z Fixes SPARK-16632 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14086 The descriptions of PR/code are updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71472687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- The current exception message is "Column `seq` not found". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71472473 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -95,17 +95,41 @@ case class CreateDataSourceTableCommand( } // Create the relation to validate the arguments before writing the metadata to the metastore. -DataSource( - sparkSession = sparkSession, - userSpecifiedSchema = userSpecifiedSchema, - className = provider, - bucketSpec = None, - options = optionsWithPath).resolveRelation(checkPathExist = false) +val dataSource: BaseRelation = + DataSource( +sparkSession = sparkSession, +userSpecifiedSchema = userSpecifiedSchema, +className = provider, +bucketSpec = None, +options = optionsWithPath).resolveRelation(checkPathExist = false) + +val partitionColumns = --- End diff -- Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62577/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)** for PR 14132 at commit [`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71472087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- Nope, dropping index does not make sense here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71471996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- - Drop, Create and Insert: Create and Insert could fail, but we still drop the table. - Truncate and Insert: Insert could fail, but we always truncate the table. I think it is OK to raise an exception here, but check whether the exception message is meaningful or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71471942 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -252,6 +252,165 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("Create partitioned data source table with partitioning columns but no schema") { +import testImplicits._ + +withTempPath { dir => + val pathToPartitionedTable = new File(dir, "partitioned") + val df = sparkContext.parallelize(1 to 10).map(i => (i, i.toString)).toDF("num", "str") + df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath) + val tabName = "tab1" + withTable(tabName) { +spark.sql( + s""" + |CREATE TABLE $tabName + |USING parquet + |OPTIONS ( + | path '$pathToPartitionedTable' + |) + |PARTITIONED BY (inexistentColumns) + """.stripMargin) +val tableMetadata = spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName)) --- End diff -- Sure, will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14240#discussion_r71471878 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala --- @@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with SharedSQLContext { testPruning("SELECT * FROM oneToTenPruned", "a", "b") testPruning("SELECT a, b FROM oneToTenPruned", "a", "b") testPruning("SELECT b, a FROM oneToTenPruned", "b", "a") - testPruning("SELECT b, b FROM oneToTenPruned", "b") + testPruning("SELECT b, b FROM oneToTenPruned", "b", "b") + testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b") --- End diff -- Yeah! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14240#discussion_r71471414 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala --- @@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with SharedSQLContext { testPruning("SELECT * FROM oneToTenPruned", "a", "b") testPruning("SELECT a, b FROM oneToTenPruned", "a", "b") testPruning("SELECT b, a FROM oneToTenPruned", "b", "a") - testPruning("SELECT b, b FROM oneToTenPruned", "b") + testPruning("SELECT b, b FROM oneToTenPruned", "b", "b") + testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b") --- End diff -- so `SELECT b, b FROM oneToTenPruned` will return 2 columns and `SELECT b as alias_b, b FROM oneToTenPruned` only returns one column? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71471142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- Sure. I'll update the document and PR description more clearly. Thank you for guidance, @rxin and @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71471136 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -95,17 +95,41 @@ case class CreateDataSourceTableCommand( } // Create the relation to validate the arguments before writing the metadata to the metastore. -DataSource( - sparkSession = sparkSession, - userSpecifiedSchema = userSpecifiedSchema, - className = provider, - bucketSpec = None, - options = optionsWithPath).resolveRelation(checkPathExist = false) +val dataSource: BaseRelation = + DataSource( +sparkSession = sparkSession, +userSpecifiedSchema = userSpecifiedSchema, +className = provider, +bucketSpec = None, +options = optionsWithPath).resolveRelation(checkPathExist = false) + +val partitionColumns = --- End diff -- IIUC, the logic should be: if schema is specified, use the given partition columns, else, infer it. Maybe it's more clear to write: ``` val partitionColumns = if (userSpecifiedSchema.isEmpty) { if (userSpecifiedPartitionColumns.length > 0) { ... } dataSource match { ... } } else { userSpecifiedPartitionColumns } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14098 @liancheng Sorry for replying late. I was on vacation last a few days. I have addressed most of your comments. Only the .md file is not updated yet. By the way, I am trying to make the hive example work, but I still can not get it work. Any suggestions? I found that pyspark sql is different from the corresponding scala hive example. Thanks! Miao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71471006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- For my understanding, I will ask one question. Literally, we should not do whatever we do with drop, e.g., we should not drop INDEX, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71470908 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -522,31 +522,31 @@ object DDLUtils { table.partitionColumns.nonEmpty || table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS) } - // A persisted data source table may not store its schema in the catalog. In this case, its schema - // will be inferred at runtime when the table is referenced. - def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = { + // A persisted data source table always store its schema in the catalog. + def getSchemaFromTableProperties(metadata: CatalogTable): StructType = { require(isDatasourceTable(metadata)) +val msgSchemaCorrupted = "Could not read schema from the metastore because it is corrupted." val props = metadata.properties if (props.isDefinedAt(DATASOURCE_SCHEMA)) { --- End diff -- Sure, let me change it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71470907 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -522,31 +522,31 @@ object DDLUtils { table.partitionColumns.nonEmpty || table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS) } - // A persisted data source table may not store its schema in the catalog. In this case, its schema - // will be inferred at runtime when the table is referenced. - def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = { + // A persisted data source table always store its schema in the catalog. + def getSchemaFromTableProperties(metadata: CatalogTable): StructType = { require(isDatasourceTable(metadata)) +val msgSchemaCorrupted = "Could not read schema from the metastore because it is corrupted." val props = metadata.properties if (props.isDefinedAt(DATASOURCE_SCHEMA)) { --- End diff -- Sure, let me change it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71470873 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- Currently, it raises exceptions if one of the column names is different. For the different column type with same column name, it works like `SaveMode.Append` operation. I thought for the trade-off between DROP and TRUNCATE. Let me think about the decision point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71470834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- I see. Then, the current implementation looks good to me. @dongjoon-hyun Could you summarize the previous discussion and design decision we made? Document them in the PR description. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71470616 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -252,6 +252,165 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("Create partitioned data source table with partitioning columns but no schema") { +import testImplicits._ + +withTempPath { dir => + val pathToPartitionedTable = new File(dir, "partitioned") + val df = sparkContext.parallelize(1 to 10).map(i => (i, i.toString)).toDF("num", "str") + df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath) + val tabName = "tab1" + withTable(tabName) { +spark.sql( + s""" + |CREATE TABLE $tabName + |USING parquet + |OPTIONS ( + | path '$pathToPartitionedTable' + |) + |PARTITIONED BY (inexistentColumns) + """.stripMargin) +val tableMetadata = spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName)) --- End diff -- we can abstract common logic into some methods, to remove duplicated code a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71470476 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- First of all, it will raise exceptions if one of the column names are different. For the different column type with same column name, it will work like `SaveMode.Append` operation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71470367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- We should do whatever we do with drop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71470370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -522,31 +522,31 @@ object DDLUtils { table.partitionColumns.nonEmpty || table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS) } - // A persisted data source table may not store its schema in the catalog. In this case, its schema - // will be inferred at runtime when the table is referenced. - def getSchemaFromTableProperties(metadata: CatalogTable): Option[StructType] = { + // A persisted data source table always store its schema in the catalog. + def getSchemaFromTableProperties(metadata: CatalogTable): StructType = { require(isDatasourceTable(metadata)) +val msgSchemaCorrupted = "Could not read schema from the metastore because it is corrupted." val props = metadata.properties if (props.isDefinedAt(DATASOURCE_SCHEMA)) { --- End diff -- how about ``` props.get(DATASOURCE_SCHEMA).map { schema => // DataType.fromJson(schema).asInstanceOf[StructType] }.getOrElse { props.get(DATASOURCE_SCHEMA_NUMPARTS).map { }.getOrElse(throw ...) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71469619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- : ) Sure. Then, the next design question to @rxin and @srowen Should we still truncate the table if the table schema does not match the schema of new table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62576/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14272 yea. I think the fix is pretty safe. After discussion with @liancheng, seems the more general fix is to just to use the requested catalyst schema to initialize the vectorized reader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62576/consoleFull)** for PR 14045 at commit [`cc35cab`](https://github.com/apache/spark/commit/cc35cabac105b3778c26afc22ac4f4ca1b295585). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 Discussed with @yhuai, I'm also merging this to branch-2.0. @vanzin Thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71468781 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- I'd say no, because user has explicitly specified truncate. They can turn if off themselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71468419 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } if (mode == SaveMode.Overwrite && tableExists) { -JdbcUtils.dropTable(conn, table) -tableExists = false +if (extraOptions.getOrElse("truncate", "false").toBoolean && +JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { + JdbcUtils.truncateTable(conn, table) --- End diff -- If `truncateTable` failed due to a non fatal exception, should we fall back to the previous way (i.e., drop and create)? This is a design decision. CC @srowen @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14264: [SPARK-11976][SPARKR] Support "." character in Da...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14264#discussion_r71467294 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -201,6 +201,8 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { attribute: Attribute): Option[(Attribute, List[String])] = { if (!attribute.isGenerated && resolver(attribute.name, nameParts.head)) { Option((attribute.withName(nameParts.head), nameParts.tail.toList)) +} else if (!attribute.isGenerated && resolver(attribute.name, nameParts.mkString("."))) { + Option((attribute.withName(nameParts.mkString(".")), Nil)) --- End diff -- Hi, I'm just curious. Is it okay for other Spark module? > Different from resolveAsTableColumn, this assumes `name` does NOT start with a qualifier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62581/consoleFull)** for PR 14207 at commit [`1ee1743`](https://github.com/apache/spark/commit/1ee1743906b41ffcc182cb8c74b4134bce8a3006). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62580/consoleFull)** for PR 14207 at commit [`727ecf8`](https://github.com/apache/spark/commit/727ecf87463d6fe02cd29e0bbf3f488c043b1962). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14112: [SPARK-16240][ML] Model loading backward compatibility f...
Github user GayathriMurali commented on the issue: https://github.com/apache/spark/pull/14112 @jkbradley Can you please help review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14277 **[Test build #62579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62579/consoleFull)** for PR 14277 at commit [`c517add`](https://github.com/apache/spark/commit/c517addc2a00fca7578b4fcb1f47a7ef6f337e5c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14045 ping @liancheng @yhuai @rxin Can you review this? I think that we should support complex types in vectorization to extend the coverage of performance improvement. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14277 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/14277 [SPARK-16640][SQL] Add codegen for Elt function ## What changes were proposed in this pull request? Elt function doesn't support codegen execution now. We should add the support. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 elt-codegen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14277 commit c517addc2a00fca7578b4fcb1f47a7ef6f337e5c Author: Liang-Chi Hsieh Date: 2016-07-20T05:06:27Z Add codegen for Elt function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12054: [SPARK-14262] correct app's state after master leader ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12054 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62578/consoleFull)** for PR 14132 at commit [`9021975`](https://github.com/apache/spark/commit/9021975a2243153edbfa1d4f760f8fcade760513). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14272: [SPARK-16632][sql] Respect Hive schema when mergi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14272 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71465508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1774,6 +1775,35 @@ class Analyzer( } /** + * Substitute Hints. + * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. --- End diff -- Also, in the PR description, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 Would like to add that AFAIK byte and short are the only problematic types that we don't handle before this PR. Other Hive-Parquet schema conversion quirks like string (translated into `binary` without `UTF8` annotation) and timestamp (translated into deprecated `int96`) are already worked around in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 I'm merging this to master. @yhuai Do we want this in branch-2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14272 This LGTM. Although it's a little bit hacky since technically the fields in requested schema passed to the Parquet record reader may have different original types (`INT_8` and `INT_16`) from the actual ones defined in the physical file, fortunately Parquet record reader doesn't check for original types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71464803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1774,6 +1775,35 @@ class Analyzer( } /** + * Substitute Hints. + * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. --- End diff -- Oh, I missed you comment here. It's too far from the bottom now. :) I'll add more `prerequisite, dependency assumptions` here now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62575/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #62575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62575/consoleFull)** for PR 14204 at commit [`4d1d47f`](https://github.com/apache/spark/commit/4d1d47fd9c8c4909d182e963c33c064c5bafb3e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62573/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62573/consoleFull)** for PR 14045 at commit [`545a57a`](https://github.com/apache/spark/commit/545a57a718484e61cf77653e810ed368e9381266). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71464346 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { } } +val broadcastHint = project match { + case p @ Project(projectList, Hint("BROADCAST", tables, child)) => +if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else "" + case _ => "" +} build( "SELECT", + broadcastHint, --- End diff -- "SELECT" occurs in the followings. But, I didn't added meaning-logic based on the testcases. - aggregateToSQL - generateToSQL => This has only "SELECT 1" - groupingSetToSQL - projectToSQL - windowToSQL I think the test coverage enough due to it generates the above cases. If you suggests more testcases, I welcome. I like robustness both for this PR and for the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14276: [WIP][SPARK-16638][ML][Optimizer] fix L2 reg comp...
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)** for PR 14132 at commit [`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71463733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { } } +val broadcastHint = project match { + case p @ Project(projectList, Hint("BROADCAST", tables, child)) => +if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else "" + case _ => "" +} build( "SELECT", + broadcastHint, --- End diff -- It's the result of Window test query. For the Windows query, there were nested Projects. ``` test("broadcast hint with window") { checkSQL( """ |SELECT /*+ MAPJOIN(parquet_t1) */ | x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY x.key) |FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key """.stripMargin, "broadcast_hint_window") } ``` I had the same feeling why some "SELECT" doesn't happen. After https://issues.apache.org/jira/browse/SPARK-16576 , I think this kind of weirdness will be reduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62574/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62574/consoleFull)** for PR 14207 at commit [`e930819`](https://github.com/apache/spark/commit/e93081918b170d3fbd08d992ef251c83af9e433d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71463337 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { } } +/** + * Merge and move upward to the nearest Project. + * A broadcast hint comment is scattered into multiple nodes inside the plan, and the + * information of BroadcastHint resides its current position inside the plan. In order to + * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into + * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node. + */ +object NormalizeBroadcastHint extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +// Capture the broadcasted information and store it in Hint. +case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _ => + Hint("BROADCAST", Seq(table), child) + +// Nearest Project is found. +case p @ Project(_, Hint(_, _, _)) => p + +// Merge BROADCAST hints up to the nearest Project. +case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) => + h.copy(parameters = params1 ++ params2) +case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) => + h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right)) + +// Bubble up BROADCAST hints to the nearest Project. +case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) => + h.copy(child = j.copy(left = hintChild)) +case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) => + h.copy(child = j.copy(right = hintChild)) + +// Other UnaryNodes are bypassed. +case u: UnaryNode + if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") => --- End diff -- Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14174 @ooq Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62570/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #62570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62570/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14240: [SPARK-16594] [SQL] Remove Physical Plan Differences whe...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14240 @cloud-fan It is not a bug. I prefer to make them consistent. I did a few performance test and find it makes sense to return only one column, and then do the Filter, and then Project will generate two duplicate columns. This should be faster when Filter can remove most of rows. However, this optimization condition `projectSet.size == projects.size` is very specific in this rare case: `SELECT b, b FROM oneToTenPruned`. It does not make sense to write such columns without specifying an alias. If using the alias, we will always return one column. This PR removed this condition, instead of adding the condition into the `Data Source Table Scan`. Let me know what is your opinion. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62576/consoleFull)** for PR 14045 at commit [`cc35cab`](https://github.com/apache/spark/commit/cc35cabac105b3778c26afc22ac4f4ca1b295585). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user ooq commented on the issue: https://github.com/apache/spark/pull/14174 hi @viirya , you can find the benchmark numbers here in this PR: https://github.com/apache/spark/pull/14266 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71461308 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -153,15 +157,113 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu requireDbExists(db) requireDbMatches(db, tableDefinition) -if ( +if (tableDefinition.provider == Some("hive") || +tableDefinition.tableType == CatalogTableType.VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + import CreateDataSourceTableUtils._ + + val provider = tableDefinition.provider.get + val userSpecifiedSchema = tableDefinition.userSpecifiedSchema + val partitionColumns = tableDefinition.partitionColumnNames + + val tableProperties = new mutable.HashMap[String, String] + tableProperties.put(DATASOURCE_PROVIDER, provider) + + // Saves optional user specified schema. Serialized JSON schema string may be too long to be + // stored into a single metastore SerDe property. In this case, we split the JSON string and + // store each part as a separate SerDe property. + userSpecifiedSchema.foreach { schema => +val schemaJsonString = schema.json +// Split the JSON string. +val parts = schemaJsonString.grouped(4000).toSeq --- End diff -- It is related to the limit of VARCHAR: ``` Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR '{"type":"struct","fields":[{"name":"contributors","type":"st&' to length 4000. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r71461226 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -153,15 +157,113 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu requireDbExists(db) requireDbMatches(db, tableDefinition) -if ( +if (tableDefinition.provider == Some("hive") || +tableDefinition.tableType == CatalogTableType.VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + import CreateDataSourceTableUtils._ + + val provider = tableDefinition.provider.get + val userSpecifiedSchema = tableDefinition.userSpecifiedSchema + val partitionColumns = tableDefinition.partitionColumnNames + + val tableProperties = new mutable.HashMap[String, String] + tableProperties.put(DATASOURCE_PROVIDER, provider) + + // Saves optional user specified schema. Serialized JSON schema string may be too long to be + // stored into a single metastore SerDe property. In this case, we split the JSON string and + // store each part as a separate SerDe property. + userSpecifiedSchema.foreach { schema => +val schemaJsonString = schema.json +// Split the JSON string. +val parts = schemaJsonString.grouped(4000).toSeq --- End diff -- Found the original PR for this config: https://github.com/apache/spark/pull/4795 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71460325 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -518,6 +510,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } } + private def describeSchema( + tableDesc: CatalogTable, + buffer: ArrayBuffer[Row]): Unit = { +if (DDLUtils.isDatasourceTable(tableDesc)) { + DDLUtils.getSchemaFromTableProperties(tableDesc) match { --- End diff -- Sure, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71460333 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -252,6 +252,115 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("Create data source table with partitioning columns but no schema") { +import testImplicits._ + +val tabName = "tab1" +withTempPath { dir => + val pathToPartitionedTable = new File(dir, "partitioned") + val pathToNonPartitionedTable = new File(dir, "nonPartitioned") + val df = sparkContext.parallelize(1 to 10).map(i => (i, i.toString)).toDF("num", "str") + df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath) + df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath) + + Seq(pathToPartitionedTable, pathToNonPartitionedTable).foreach { path => +withTable(tabName) { + spark.sql( +s""" + |CREATE TABLE $tabName + |USING parquet + |OPTIONS ( + | path '$path' + |) + |PARTITIONED BY (inexistentColumns) + """.stripMargin) + val catalog = spark.sessionState.catalog + val tableMetadata = catalog.getTableMetadata(TableIdentifier(tabName)) + + val tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata) + assert(tableSchema.nonEmpty, "the schema of data source tables are always recorded") + val partCols = DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata) + + if (tableMetadata.storage.serdeProperties.get("path") == --- End diff -- Ok, no problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14259 Use the added test case as example, the generated java code to access the second element in a Tuple2 `(false, null)` throws `NullPointerException`: int value = isNull1? -1 : (Integer) obj._2(); To assign a null to int will cause `NullPointerException`. But `isNull1` only checks if `obj` is null or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14276 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62571/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14276 **[Test build #62571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62571/consoleFull)** for PR 14276 at commit [`9d4f7a8`](https://github.com/apache/spark/commit/9d4f7a8cf20bcd1f6ede46097406f235f3581b3b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14276 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14276 cc @srowen Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #62575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62575/consoleFull)** for PR 14204 at commit [`4d1d47f`](https://github.com/apache/spark/commit/4d1d47fd9c8c4909d182e963c33c064c5bafb3e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71458430 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -518,6 +510,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } } + private def describeSchema( + tableDesc: CatalogTable, + buffer: ArrayBuffer[Row]): Unit = { +if (DDLUtils.isDatasourceTable(tableDesc)) { + DDLUtils.getSchemaFromTableProperties(tableDesc) match { --- End diff -- Can we make `DDLUtils.getSchemaFromTableProperties` always return a schema and throw exception if it's corrupted? I think it's more consistent with the previous behaviour, i.e. throw exception if the expected schema properties doesn't exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62572/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #62572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62572/consoleFull)** for PR 14204 at commit [`f2ab3a3`](https://github.com/apache/spark/commit/f2ab3a35fee03b178e92fd1e2a5fa3763746ff96). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14132 Yeah, just answered the JIRA. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71457670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { } } +/** + * Merge and move upward to the nearest Project. + * A broadcast hint comment is scattered into multiple nodes inside the plan, and the + * information of BroadcastHint resides its current position inside the plan. In order to + * reconstruct broadcast hint comment, we need to pack the information of BroadcastHint into + * Hint("BROADCAST", _, _) and collect them up by moving upward to the nearest Project node. + */ +object NormalizeBroadcastHint extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +// Capture the broadcasted information and store it in Hint. +case BroadcastHint(child @ SubqueryAlias(_, Project(_, SQLTable(database, table, _, _ => + Hint("BROADCAST", Seq(table), child) + +// Nearest Project is found. +case p @ Project(_, Hint(_, _, _)) => p + +// Merge BROADCAST hints up to the nearest Project. +case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) => + h.copy(parameters = params1 ++ params2) +case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ Hint("BROADCAST", p2, right), _, _) => + h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right = right)) + +// Bubble up BROADCAST hints to the nearest Project. +case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) => + h.copy(child = j.copy(left = hintChild)) +case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) => + h.copy(child = j.copy(right = hintChild)) + +// Other UnaryNodes are bypassed. +case u: UnaryNode + if u.child.isInstanceOf[Hint] && u.child.asInstanceOf[Hint].name.equals("BROADCAST") => --- End diff -- uh, yeah! please add two more spaces before `if` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r71457606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { } } +val broadcastHint = project match { + case p @ Project(projectList, Hint("BROADCAST", tables, child)) => +if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" else "" + case _ => "" +} build( "SELECT", + broadcastHint, --- End diff -- Could you please do more investigation on this? The current solution looks not clean to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62574/consoleFull)** for PR 14207 at commit [`e930819`](https://github.com/apache/spark/commit/e93081918b170d3fbd08d992ef251c83af9e433d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71457330 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -518,6 +510,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } } + private def describeSchema( + tableDesc: CatalogTable, + buffer: ArrayBuffer[Row]): Unit = { +if (DDLUtils.isDatasourceTable(tableDesc)) { + DDLUtils.getSchemaFromTableProperties(tableDesc) match { --- End diff -- Now, the message is changed to `"# Schema of this table is corrupted"` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71457323 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -252,6 +252,115 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("Create data source table with partitioning columns but no schema") { +import testImplicits._ + +val tabName = "tab1" +withTempPath { dir => + val pathToPartitionedTable = new File(dir, "partitioned") + val pathToNonPartitionedTable = new File(dir, "nonPartitioned") + val df = sparkContext.parallelize(1 to 10).map(i => (i, i.toString)).toDF("num", "str") + df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath) + df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath) + + Seq(pathToPartitionedTable, pathToNonPartitionedTable).foreach { path => +withTable(tabName) { + spark.sql( +s""" + |CREATE TABLE $tabName + |USING parquet + |OPTIONS ( + | path '$path' + |) + |PARTITIONED BY (inexistentColumns) + """.stripMargin) + val catalog = spark.sessionState.catalog + val tableMetadata = catalog.getTableMetadata(TableIdentifier(tabName)) + + val tableSchema = DDLUtils.getSchemaFromTableProperties(tableMetadata) + assert(tableSchema.nonEmpty, "the schema of data source tables are always recorded") + val partCols = DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata) + + if (tableMetadata.storage.serdeProperties.get("path") == --- End diff -- hmmm, can we separate it into 2 cases instead of doing `Seq(...).foreach`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14259 > When the returned value is null, NullPointerException will be thrown. Can you explain a bit more about this? Why a method can't return null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14243 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #62572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62572/consoleFull)** for PR 14204 at commit [`f2ab3a3`](https://github.com/apache/spark/commit/f2ab3a35fee03b178e92fd1e2a5fa3763746ff96). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62573/consoleFull)** for PR 14045 at commit [`545a57a`](https://github.com/apache/spark/commit/545a57a718484e61cf77653e810ed368e9381266). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include j...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14243 Thanks @sun-rui - Merging this to master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14207#discussion_r71456601 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -518,6 +510,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } } + private def describeSchema( + tableDesc: CatalogTable, + buffer: ArrayBuffer[Row]): Unit = { +if (DDLUtils.isDatasourceTable(tableDesc)) { + DDLUtils.getSchemaFromTableProperties(tableDesc) match { --- End diff -- For all types of data source tables, we store the schema in the table properties. Thus, we should not return None; unless the table properties are modified by users using the `Alter Table` command. Sorry, forgot to update the message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14276 **[Test build #62571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62571/consoleFull)** for PR 14276 at commit [`9d4f7a8`](https://github.com/apache/spark/commit/9d4f7a8cf20bcd1f6ede46097406f235f3581b3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14267: [SPARK-15705] [SQL] Change the default value of spark.sq...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14267 Thanks for notifying @rxin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org