[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14678 Looks good, but I didn't look super carefully. @gatorsmile do you have time to take a more careful look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide format identifier for TextFil...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14680 Can you show the before/after comparison in pr description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14680: [SPARK-17101][SQL] Provide format identifier for TextFil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14680 **[Test build #63904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63904/consoleFull)** for PR 14680 at commit [`133e5de`](https://github.com/apache/spark/commit/133e5deff497ce1be92497344bb7e0d4e7d57c21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14680: [SPARK-17101][SQL] Provide format identifier for ...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/14680 [SPARK-17101][SQL] Provide format identifier for TextFileFormat ## What changes were proposed in this pull request? Define the format identifier that is used in Optimized Logical Plan in explain for text file format (following CSV and JSON formats). ``` scala> spark.read.text("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[value#0] text == Analyzed Logical Plan == value: string Relation[value#0] text == Optimized Logical Plan == InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#0] Batched: false, Format: TEXT, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct == Physical Plan == InMemoryTableScan [value#0] +- InMemoryRelation [value#0], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#0] Batched: false, Format: TEXT, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` ## How was this patch tested? Local build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-17101 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14680 commit 133e5deff497ce1be92497344bb7e0d4e7d57c21 Author: Jacek Laskowski Date: 2016-08-17T06:43:34Z [SPARK-17101][SQL] Provide format identifier for TextFileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13704 ping @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63897/consoleFull)** for PR 14676 at commit [`092605b`](https://github.com/apache/spark/commit/092605be786adae0aa241da43e25d1f1be5de492). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14576: [SPARK-16391][SQL] Support partial aggregation for reduc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14576 **[Test build #63903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63903/consoleFull)** for PR 14576 at commit [`1b57b74`](https://github.com/apache/spark/commit/1b57b7452557ec34821159ca35c8ff5189d825c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14679: [SPARK-17102][SQL] bypass UserDefinedGenerator for json ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14679 **[Test build #63902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63902/consoleFull)** for PR 14679 at commit [`c04084d`](https://github.com/apache/spark/commit/c04084d29c3d049053a5929d6860006549c96573). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14576: [SPARK-16391][SQL] Support partial aggregation for reduc...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14576 cc @cloud-fan can you review this? This should target branch-2.0 too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14679: [SPARK-17102][SQL] bypass UserDefinedGenerator for json ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14679 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14679: [SPARK-17102][SQL] bypass UserDefinedGenerator fo...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14679 [SPARK-17102][SQL] bypass UserDefinedGenerator for json format check ## What changes were proposed in this pull request? We use reflection to convert `TreeNode` to json string, and currently don't support arbitrary object. `UserDefinedGenerator` takes a function object, so we should skip json format test for it, or the tests can be flacky, e.g. `DataFrameSuite.simple explode`, this test always fail in branch 1.6, but pass in master, because of the different scala versions. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark json Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14679.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14679 commit c04084d29c3d049053a5929d6860006549c96573 Author: Wenchen Fan Date: 2016-08-17T06:32:17Z bypass UserDefinedGenerator for json format check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14576: [SPARK-16391][SQL] ReduceAggregator and partial a...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r75069508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * :: Experimental :: + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +@Experimental +abstract class ReduceAggregator[T] extends Aggregator[T, (Boolean, T), T] { --- End diff -- I think you are right -- I will keep it private to begin with. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63896/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63896/consoleFull)** for PR 14676 at commit [`fcc3caf`](https://github.com/apache/spark/commit/fcc3cafe08c9549be51b33b5ac993fbb3fa46d37). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14537 cc @cloud-fan @gatorsmile can you also take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75067717 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -287,21 +287,21 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log new Path(metastoreRelation.catalogTable.storage.locationUri.get), partitionSpec) -val inferredSchema = if (fileType.equals("parquet")) { - val inferredSchema = -defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) - inferredSchema.map { inferred => -ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, inferred) +val inferredSchema = + defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) --- End diff -- can we be more specific here? e.g. doing it only if it is parquet or orc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r75067639 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -56,10 +59,11 @@ private[sql] class OrcFileFormat sparkSession: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -OrcFileOperator.readSchema( - files.map(_.getPath.toUri.toString), - Some(sparkSession.sessionState.newHadoopConf()) -) +val schema = Try(OrcFileOperator.readSchema( +files.map(_.getPath.toUri.toString), +Some(sparkSession.sessionState.newHadoopConf( + .recover { case _: FileNotFoundException => None } --- End diff -- why are we ignoring file not found exception here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14624: Fix PySpark DataFrameWriter JDBC method docstring...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14624 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14677: [MINOR][DOC] Fix the descriptions for `properties...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14677 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14677 Thanks - merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14657: [SPARK-17068][SQL] Make view-usage visible during...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14657 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14657: [SPARK-17068][SQL] Make view-usage visible during analys...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14657 (I didn't merge this in 2.0) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14657: [SPARK-17068][SQL] Make view-usage visible during analys...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14657 LGTM - merging in master/2.0. . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14678: [MINOR][SQL] Add missing functions for some optio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14678#discussion_r75066526 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -322,11 +322,6 @@ object SQLConf { .intConf .createWithDefault(4000) - val PARTITION_DISCOVERY_ENABLED = SQLConfigBuilder("spark.sql.sources.partitionDiscovery.enabled") --- End diff -- It seems this was removed while refactoring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14678 cc @rxin, Could you check if this make sense please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75066432 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { +case _ if skipHiveMetadata => + val message = +s"Persisting data source table $qualifiedTableName into Hive metastore in" + + "Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +// our bucketing is un-compatible with hive(different hash function) +case _ if tableDefinition.bucketSpec.nonEmpty => + val message = +s"Persisting bucketed data source table $qualifiedTableName into " + + "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " + (None, message) + +case (Some(serde), Some(path)) => + val message = +s"Persisting data source table $qualifiedTableName with a single input path " + + s"into Hive metastore in Hive compatible format." + (Some(newHiveCompatibleMetastoreTable(serde, path)), message) + +case (Some(_), None) => + val message = +s"Data source table $qualifiedTableName is not file based. Persisting it into " + + s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +case _ => + val provider = tableDefinition.provider.get + val message = +s"Couldn't find corresponding Hive SerDe for data source provider $provider. " + + s"Persisting data source table $qualifiedTableName into Hive metastore in " + + s"Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + } + + (hiveCompatibleTable, logMessage) match { +case (Some(table), message) => + // We first try to save the metadata of the table in a Hive compatible way. + // If Hive throws an error, we fall back to save its metadata in the Spark SQL + // specific way. + try { +logInfo(message) +saveTableIntoHive(table, ignoreIfExists) + } catch { +case NonFatal(e) => + val warningMessage = +s"Could not persist ${tableDefinition.identifier.quotedString} in a Hive " + + "compatible way. Persisting it into Hive metastore in Spark SQL specific format." + logWarning(warningMessage, e) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } + +case (None, message) => + logWarning(message) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } +} + } + + private def tableMetadataToP
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75066449 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") --- End diff -- yup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75066364 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu * Alter a table whose name that matches the one specified in `tableDefinition`, * assuming the table exists. * - * Note: As of now, this only supports altering table properties, serde properties, - * and num buckets! + * Note: As of now, this only supports altering table properties and serde properties. */ override def alterTable(tableDefinition: CatalogTable): Unit = withClient { assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireTableExists(db, tableDefinition.identifier.table) -client.alterTable(tableDefinition) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.alterTable(tableDefinition) +} else { + val oldDef = client.getTable(db, tableDefinition.identifier.table) + // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // to retain the spark specific format if it is. + // Also add table meta properties to table properties, to retain the data source table format. + val newDef = tableDefinition.copy( +schema = oldDef.schema, +partitionColumnNames = oldDef.partitionColumnNames, +bucketSpec = oldDef.bucketSpec, +properties = tableMetadataToProperties(tableDefinition) ++ tableDefinition.properties) --- End diff -- The comment in this method says we only support alter table properties and storage format. Maybe we should add an assert to make this assumption clear? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75066296 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu * Alter a table whose name that matches the one specified in `tableDefinition`, * assuming the table exists. * - * Note: As of now, this only supports altering table properties, serde properties, - * and num buckets! + * Note: As of now, this only supports altering table properties and serde properties. */ override def alterTable(tableDefinition: CatalogTable): Unit = withClient { assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireTableExists(db, tableDefinition.identifier.table) -client.alterTable(tableDefinition) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.alterTable(tableDefinition) +} else { + val oldDef = client.getTable(db, tableDefinition.identifier.table) + // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // to retain the spark specific format if it is. + // Also add table meta properties to table properties, to retain the data source table format. + val newDef = tableDefinition.copy( +schema = oldDef.schema, +partitionColumnNames = oldDef.partitionColumnNames, +bucketSpec = oldDef.bucketSpec, +properties = tableMetadataToProperties(tableDefinition) ++ tableDefinition.properties) + + client.alterTable(newDef) +} } override def getTable(db: String, table: String): CatalogTable = withClient { -client.getTable(db, table) +restoreTableMetadata(client.getTable(db, table)) } override def getTableOption(db: String, table: String): Option[CatalogTable] = withClient { -client.getTableOption(db, table) +client.getTableOption(db, table).map(restoreTableMetadata) + } + + /** + * Restores table metadata from the table properties if it's a datasouce table. This method is + * kind of a opposite version of [[createTable]]. + */ + private def restoreTableMetadata(table: CatalogTable): CatalogTable = { +if (table.tableType == VIEW) { + table +} else { + getProviderFromTableProperties(table).map { provider => --- End diff -- no, for `hive` we won't go to the `table meta to properties` branch, thus if we have provider in table properties, it must not be hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75066185 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -81,6 +86,18 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu withClient { getTable(db, table) } } + /** + * If the given table properties contains datasource properties, throw an exception. + */ + private def verifyTableProperties(table: CatalogTable): Unit = { +val datasourceKeys = table.properties.keys.filter(_.startsWith(DATASOURCE_PREFIX)) +if (datasourceKeys.nonEmpty) { + throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} into hive metastore " + +s"as table property keys may not start with '$DATASOURCE_PREFIX': " + +datasourceKeys.mkString("[", ", ", "]")) +} + } --- End diff -- yea, e.g. `CreateTableLike`. But `restoreTableMetadata` generates table without the data source table properties, and should be ok here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63901/consoleFull)** for PR 14676 at commit [`4723902`](https://github.com/apache/spark/commit/47239020ac7008d8630a5c810df3db29a77795cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63894/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #63894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63894/consoleFull)** for PR 8880 at commit [`77122bb`](https://github.com/apache/spark/commit/77122bb3662c65ffa5596d740efab41f5dfc3a0f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14678: [MINOR][SQL] Add missing functions for some optio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14678#discussion_r75065275 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -322,11 +322,6 @@ object SQLConf { .intConf .createWithDefault(4000) - val PARTITION_DISCOVERY_ENABLED = SQLConfigBuilder("spark.sql.sources.partitionDiscovery.enabled") --- End diff -- It seems we always enables this and this option is not referenced anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14670: [SPARK-15285][SQL] Generated SpecificSafeProjection.appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14670 **[Test build #63900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63900/consoleFull)** for PR 14670 at commit [`86258eb`](https://github.com/apache/spark/commit/86258eb3ca13284ad5593a105b2ddc6d0cde58e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75065140 --- Diff: sql/core/src/test/resources/sql-tests/inputs/inline-table.sql --- @@ -0,0 +1,39 @@ + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select * from values ("one", 1) as data(a, b); --- End diff -- added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14660 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14155 I have took a close look at `HiveExternalCatalog`. My overall feeling is that the current version still not very clear and people may have a hard time to understand the code. Let me also think about it and see how to improve the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63895/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14660 **[Test build #63895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63895/consoleFull)** for PR 14660 at commit [`0130c39`](https://github.com/apache/spark/commit/0130c39450c7bcecfb3a14db1e581c3f3a9f6a20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14678 **[Test build #63899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63899/consoleFull)** for PR 14678 at commit [`a57dd5e`](https://github.com/apache/spark/commit/a57dd5ebaef470dbc311e31e35b2f457321b7a9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75064724 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -81,6 +86,18 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu withClient { getTable(db, table) } } + /** + * If the given table properties contains datasource properties, throw an exception. + */ + private def verifyTableProperties(table: CatalogTable): Unit = { +val datasourceKeys = table.properties.keys.filter(_.startsWith(DATASOURCE_PREFIX)) +if (datasourceKeys.nonEmpty) { + throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} into hive metastore " + +s"as table property keys may not start with '$DATASOURCE_PREFIX': " + +datasourceKeys.mkString("[", ", ", "]")) +} + } --- End diff -- Just realized one thing. Is it possible that we somehow create `table` based on a `CatalogTable` generated from `restoreTableMetadata`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75064596 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1082,57 +1343,62 @@ private class LogisticCostFun( fitIntercept: Boolean, standardization: Boolean, bcFeaturesStd: Broadcast[Array[Double]], -regParamL2: Double) extends DiffFunction[BDV[Double]] { +regParamL2: Double, +multinomial: Boolean) extends DiffFunction[BDV[Double]] { val featuresStd = bcFeaturesStd.value override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { -val numFeatures = featuresStd.length val coeffs = Vectors.fromBreeze(coefficients) val bcCoeffs = instances.context.broadcast(coeffs) -val n = coeffs.size +val localFeaturesStd = featuresStd +val numFeatures = localFeaturesStd.length +val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures val logisticAggregator = { - val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance) + val seqOp = (c: LogisticAggregator, instance: Instance) => +c.add(instance) val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => c1.merge(c2) instances.treeAggregate( -new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept) +new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept, + multinomial) )(seqOp, combOp) } val totalGradientArray = logisticAggregator.gradient.toArray - // regVal is the sum of coefficients squares excluding intercept for L2 regularization. val regVal = if (regParamL2 == 0.0) { 0.0 } else { + val K = if (multinomial) numClasses else numClasses - 1 var sum = 0.0 - coeffs.foreachActive { (index, value) => -// If `fitIntercept` is true, the last term which is intercept doesn't -// contribute to the regularization. -if (index != numFeatures) { + (0 until K).foreach { k => +var j = 0 +while (j < numFeatures) { // The following code will compute the loss of the regularization; also // the gradient of the regularization, and add back to totalGradientArray. + val value = coeffs(k * numFeaturesPlusIntercept + j) --- End diff -- Why are you not using `foreachActive`? Although we know that `coeffs` is dense array, but if we implement strong rule which can know which column of `coeffs` will be zeros before the optimization, we may store it as sparse vector. As a result, using `foreachActive` will be a good abstraction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75064560 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu * Alter a table whose name that matches the one specified in `tableDefinition`, * assuming the table exists. * - * Note: As of now, this only supports altering table properties, serde properties, - * and num buckets! + * Note: As of now, this only supports altering table properties and serde properties. */ override def alterTable(tableDefinition: CatalogTable): Unit = withClient { assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireTableExists(db, tableDefinition.identifier.table) -client.alterTable(tableDefinition) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.alterTable(tableDefinition) +} else { + val oldDef = client.getTable(db, tableDefinition.identifier.table) + // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // to retain the spark specific format if it is. + // Also add table meta properties to table properties, to retain the data source table format. + val newDef = tableDefinition.copy( +schema = oldDef.schema, +partitionColumnNames = oldDef.partitionColumnNames, +bucketSpec = oldDef.bucketSpec, +properties = tableMetadataToProperties(tableDefinition) ++ tableDefinition.properties) + + client.alterTable(newDef) +} } override def getTable(db: String, table: String): CatalogTable = withClient { -client.getTable(db, table) +restoreTableMetadata(client.getTable(db, table)) } override def getTableOption(db: String, table: String): Option[CatalogTable] = withClient { -client.getTableOption(db, table) +client.getTableOption(db, table).map(restoreTableMetadata) + } + + /** + * Restores table metadata from the table properties if it's a datasouce table. This method is + * kind of a opposite version of [[createTable]]. + */ + private def restoreTableMetadata(table: CatalogTable): CatalogTable = { +if (table.tableType == VIEW) { + table +} else { + getProviderFromTableProperties(table).map { provider => --- End diff -- `provider` can be `hive`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75064362 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu * Alter a table whose name that matches the one specified in `tableDefinition`, * assuming the table exists. * - * Note: As of now, this only supports altering table properties, serde properties, - * and num buckets! + * Note: As of now, this only supports altering table properties and serde properties. */ override def alterTable(tableDefinition: CatalogTable): Unit = withClient { assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireTableExists(db, tableDefinition.identifier.table) -client.alterTable(tableDefinition) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.alterTable(tableDefinition) +} else { + val oldDef = client.getTable(db, tableDefinition.identifier.table) + // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // to retain the spark specific format if it is. + // Also add table meta properties to table properties, to retain the data source table format. + val newDef = tableDefinition.copy( +schema = oldDef.schema, +partitionColumnNames = oldDef.partitionColumnNames, +bucketSpec = oldDef.bucketSpec, +properties = tableMetadataToProperties(tableDefinition) ++ tableDefinition.properties) --- End diff -- If we only look at this method, it is not clear if the new `tableDefinition` changes other fields like `storage`. Also, we are using the existing `bucketSpec`. But, is it possible that we have a new `bucketSpec` in `tableDefinition`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75064380 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1082,57 +1343,62 @@ private class LogisticCostFun( fitIntercept: Boolean, standardization: Boolean, bcFeaturesStd: Broadcast[Array[Double]], -regParamL2: Double) extends DiffFunction[BDV[Double]] { +regParamL2: Double, +multinomial: Boolean) extends DiffFunction[BDV[Double]] { val featuresStd = bcFeaturesStd.value override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { -val numFeatures = featuresStd.length val coeffs = Vectors.fromBreeze(coefficients) val bcCoeffs = instances.context.broadcast(coeffs) -val n = coeffs.size +val localFeaturesStd = featuresStd +val numFeatures = localFeaturesStd.length +val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures val logisticAggregator = { - val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance) + val seqOp = (c: LogisticAggregator, instance: Instance) => +c.add(instance) val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => c1.merge(c2) instances.treeAggregate( -new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept) +new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept, + multinomial) )(seqOp, combOp) } val totalGradientArray = logisticAggregator.gradient.toArray - // regVal is the sum of coefficients squares excluding intercept for L2 regularization. val regVal = if (regParamL2 == 0.0) { 0.0 } else { + val K = if (multinomial) numClasses else numClasses - 1 --- End diff -- just make else as `1` for clarity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75064330 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1082,57 +1343,62 @@ private class LogisticCostFun( fitIntercept: Boolean, standardization: Boolean, bcFeaturesStd: Broadcast[Array[Double]], -regParamL2: Double) extends DiffFunction[BDV[Double]] { +regParamL2: Double, +multinomial: Boolean) extends DiffFunction[BDV[Double]] { val featuresStd = bcFeaturesStd.value override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { -val numFeatures = featuresStd.length val coeffs = Vectors.fromBreeze(coefficients) val bcCoeffs = instances.context.broadcast(coeffs) -val n = coeffs.size +val localFeaturesStd = featuresStd +val numFeatures = localFeaturesStd.length +val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures val logisticAggregator = { - val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance) + val seqOp = (c: LogisticAggregator, instance: Instance) => +c.add(instance) val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => c1.merge(c2) instances.treeAggregate( -new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept) +new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, numClasses, fitIntercept, + multinomial) )(seqOp, combOp) } val totalGradientArray = logisticAggregator.gradient.toArray - --- End diff -- revert this if not need. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75064275 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1082,57 +1343,62 @@ private class LogisticCostFun( fitIntercept: Boolean, standardization: Boolean, bcFeaturesStd: Broadcast[Array[Double]], -regParamL2: Double) extends DiffFunction[BDV[Double]] { +regParamL2: Double, +multinomial: Boolean) extends DiffFunction[BDV[Double]] { val featuresStd = bcFeaturesStd.value override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { -val numFeatures = featuresStd.length val coeffs = Vectors.fromBreeze(coefficients) val bcCoeffs = instances.context.broadcast(coeffs) -val n = coeffs.size +val localFeaturesStd = featuresStd --- End diff -- Where is `localFeaturesStd` being used? Why not move `val featuresStd = bcFeaturesStd.value` into the `calculate method`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75064097 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1082,57 +1343,62 @@ private class LogisticCostFun( fitIntercept: Boolean, standardization: Boolean, bcFeaturesStd: Broadcast[Array[Double]], -regParamL2: Double) extends DiffFunction[BDV[Double]] { +regParamL2: Double, +multinomial: Boolean) extends DiffFunction[BDV[Double]] { val featuresStd = bcFeaturesStd.value override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { -val numFeatures = featuresStd.length val coeffs = Vectors.fromBreeze(coefficients) val bcCoeffs = instances.context.broadcast(coeffs) -val n = coeffs.size +val localFeaturesStd = featuresStd +val numFeatures = localFeaturesStd.length +val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures val logisticAggregator = { - val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance) + val seqOp = (c: LogisticAggregator, instance: Instance) => +c.add(instance) --- End diff -- revert this if no change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14678 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063920 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { +case _ if skipHiveMetadata => + val message = +s"Persisting data source table $qualifiedTableName into Hive metastore in" + + "Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +// our bucketing is un-compatible with hive(different hash function) +case _ if tableDefinition.bucketSpec.nonEmpty => + val message = +s"Persisting bucketed data source table $qualifiedTableName into " + + "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " + (None, message) + +case (Some(serde), Some(path)) => + val message = +s"Persisting data source table $qualifiedTableName with a single input path " + + s"into Hive metastore in Hive compatible format." + (Some(newHiveCompatibleMetastoreTable(serde, path)), message) + +case (Some(_), None) => + val message = +s"Data source table $qualifiedTableName is not file based. Persisting it into " + + s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +case _ => + val provider = tableDefinition.provider.get + val message = +s"Couldn't find corresponding Hive SerDe for data source provider $provider. " + + s"Persisting data source table $qualifiedTableName into Hive metastore in " + + s"Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + } + + (hiveCompatibleTable, logMessage) match { +case (Some(table), message) => + // We first try to save the metadata of the table in a Hive compatible way. + // If Hive throws an error, we fall back to save its metadata in the Spark SQL + // specific way. + try { +logInfo(message) +saveTableIntoHive(table, ignoreIfExists) + } catch { +case NonFatal(e) => + val warningMessage = +s"Could not persist ${tableDefinition.identifier.quotedString} in a Hive " + + "compatible way. Persisting it into Hive metastore in Spark SQL specific format." + logWarning(warningMessage, e) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } + +case (None, message) => + logWarning(message) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } +} + } + + private def tableMetadataToPrope
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14678 **[Test build #63898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63898/consoleFull)** for PR 14678 at commit [`c959f3b`](https://github.com/apache/spark/commit/c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14678 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63898/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063736 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { +case _ if skipHiveMetadata => + val message = +s"Persisting data source table $qualifiedTableName into Hive metastore in" + + "Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +// our bucketing is un-compatible with hive(different hash function) +case _ if tableDefinition.bucketSpec.nonEmpty => + val message = +s"Persisting bucketed data source table $qualifiedTableName into " + + "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " + (None, message) + +case (Some(serde), Some(path)) => + val message = +s"Persisting data source table $qualifiedTableName with a single input path " + + s"into Hive metastore in Hive compatible format." + (Some(newHiveCompatibleMetastoreTable(serde, path)), message) + +case (Some(_), None) => + val message = +s"Data source table $qualifiedTableName is not file based. Persisting it into " + + s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +case _ => + val provider = tableDefinition.provider.get + val message = +s"Couldn't find corresponding Hive SerDe for data source provider $provider. " + + s"Persisting data source table $qualifiedTableName into Hive metastore in " + + s"Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + } + + (hiveCompatibleTable, logMessage) match { +case (Some(table), message) => + // We first try to save the metadata of the table in a Hive compatible way. + // If Hive throws an error, we fall back to save its metadata in the Spark SQL + // specific way. + try { +logInfo(message) +saveTableIntoHive(table, ignoreIfExists) + } catch { +case NonFatal(e) => + val warningMessage = +s"Could not persist ${tableDefinition.identifier.quotedString} in a Hive " + + "compatible way. Persisting it into Hive metastore in Spark SQL specific format." + logWarning(warningMessage, e) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } + +case (None, message) => + logWarning(message) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } +} + } + + private def tableMetadataToPrope
[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14678 **[Test build #63898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63898/consoleFull)** for PR 14678 at commit [`c959f3b`](https://github.com/apache/spark/commit/c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063676 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { +case _ if skipHiveMetadata => + val message = +s"Persisting data source table $qualifiedTableName into Hive metastore in" + + "Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +// our bucketing is un-compatible with hive(different hash function) +case _ if tableDefinition.bucketSpec.nonEmpty => + val message = +s"Persisting bucketed data source table $qualifiedTableName into " + + "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " + (None, message) + +case (Some(serde), Some(path)) => + val message = +s"Persisting data source table $qualifiedTableName with a single input path " + + s"into Hive metastore in Hive compatible format." + (Some(newHiveCompatibleMetastoreTable(serde, path)), message) + +case (Some(_), None) => + val message = +s"Data source table $qualifiedTableName is not file based. Persisting it into " + + s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + +case _ => + val provider = tableDefinition.provider.get + val message = +s"Couldn't find corresponding Hive SerDe for data source provider $provider. " + + s"Persisting data source table $qualifiedTableName into Hive metastore in " + + s"Spark SQL specific format, which is NOT compatible with Hive." + (None, message) + } + + (hiveCompatibleTable, logMessage) match { +case (Some(table), message) => + // We first try to save the metadata of the table in a Hive compatible way. + // If Hive throws an error, we fall back to save its metadata in the Spark SQL + // specific way. + try { +logInfo(message) +saveTableIntoHive(table, ignoreIfExists) + } catch { +case NonFatal(e) => + val warningMessage = +s"Could not persist ${tableDefinition.identifier.quotedString} in a Hive " + + "compatible way. Persisting it into Hive metastore in Spark SQL specific format." + logWarning(warningMessage, e) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } + +case (None, message) => + logWarning(message) + saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), ignoreIfExists) + } +} + } + + private def tableMetadataToPrope
[GitHub] spark pull request #14678: [MINOR][SQL] Add missing functions for some optio...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14678 [MINOR][SQL] Add missing functions for some options in SQLConf and use them where applicable ## What changes were proposed in this pull request? I first thought they are missing because they are kind of hidden options but it seems they are just missing. For example, `spark.sql.parquet.mergeSchema` is documented in [sql-programming-guide.md](https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md) but this functions is missing whereas many options such as `spark.sql.join.preferSortMergeJoin` are not documented but has its own function. So, this PR suggests make them consistent by adding the missing functions for some options in `SQLConf` and use them where applicable. ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark sqlconf-cleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14678 commit 27fbbd902dbfca34dd5edd5a219dc64abf9691cf Author: hyukjinkwon Date: 2016-08-17T04:50:15Z Add missing functions for some options and use them where applicable commit c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d Author: hyukjinkwon Date: 2016-08-17T04:59:22Z Fix typos --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063449 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") --- End diff -- I think this path will be set by the ddl command (e.g. `CreateDataSourceTableAsSelectCommand`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063156 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") --- End diff -- If the create table command does not specify the location, does this `maybePath` contains the default location? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75063094 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) --- End diff -- Let's explain what will be put into this `tableProperties`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75062850 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { --- End diff -- comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75062846 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { --- End diff -- comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14539: [SPARK-16947][SQL] Improve type coercion for inline tabl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14539 How to specify `null` when we creating inline table? ``` sql( """ |create temporary view src as select * from values |(201, null), |(86, "val_86"), """.stripMargin) ``` Is that supported? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75062743 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { --- End diff -- Let's add comment to explain what we are doing at here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75062469 --- Diff: sql/core/src/test/resources/sql-tests/inputs/inline-table.sql --- @@ -0,0 +1,39 @@ + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select * from values ("one", 1) as data(a, b); --- End diff -- Could you add a case for `NULL`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63897/consoleFull)** for PR 14676 at commit [`092605b`](https://github.com/apache/spark/commit/092605be786adae0aa241da43e25d1f1be5de492). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63896/consoleFull)** for PR 14676 at commit [`fcc3caf`](https://github.com/apache/spark/commit/fcc3cafe08c9549be51b33b5ac993fbb3fa46d37). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14665: [SPARK-17084][SQL] Rename ParserUtils.assert to v...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14665 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14551: [SPARK-16961][CORE] Fixed off-by-one error that biased r...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14551 `model.gaussiansDF.show()` displays the `mean` and `variance` of the gaussians, which are Dataframes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14665: [SPARK-17084][SQL] Rename ParserUtils.assert to validate
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14665 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63892/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14648 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14648 **[Test build #63892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63892/consoleFull)** for PR 14648 at commit [`0008c3e`](https://github.com/apache/spark/commit/0008c3e11dfb85523f9f4606d5dec714339d5f43). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14660 **[Test build #63895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63895/consoleFull)** for PR 14660 at commit [`0130c39`](https://github.com/apache/spark/commit/0130c39450c7bcecfb3a14db1e581c3f3a9f6a20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #63894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63894/consoleFull)** for PR 8880 at commit [`77122bb`](https://github.com/apache/spark/commit/77122bb3662c65ffa5596d740efab41f5dfc3a0f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63893/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14677 **[Test build #63893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63893/consoleFull)** for PR 14677 at commit [`f501861`](https://github.com/apache/spark/commit/f5018616eee50544c22432c2256c75325b537e82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14677 **[Test build #63893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63893/consoleFull)** for PR 14677 at commit [`f501861`](https://github.com/apache/spark/commit/f5018616eee50544c22432c2256c75325b537e82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/14597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63888/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14620 **[Test build #63888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63888/consoleFull)** for PR 14620 at commit [`36e049a`](https://github.com/apache/spark/commit/36e049a8fdafb3d311029b292e7d2de5efce4c6a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14677 cc @srowen and @mvervuurt I just opened this just for verifying duplicately again as I already did. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14677: [MINOR][DOC] Fix the descriptions for `properties...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14677 [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs ## What changes were proposed in this pull request? This should be credited to @mvervuurt. The main purpose of this PR is - simply to include the change for the same instance in `DataFrameReader` just to match up. - just avoid duplicately verifying the PR (as I already did). The documentation for both should be the same because both assume the `properties` should be the same `dict`. ## How was this patch tested? Manually building Python documentation. This will produce the output as below: - `DataFrameReader` ![2016-08-17 11 12 00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png) - `DataFrameWriter` ![2016-08-17 11 12 10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png) Closes #14624 You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark typo-python Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14677 commit b0864ecb51a28452f6e33f4bdcd30795c7c2ec99 Author: mvervuurt Date: 2016-08-12T18:42:40Z Fix docstring of method jdbc of PySpark DataFrameWriter because a dictionary of JDBC connection arguments is used instead of a list. commit f5018616eee50544c22432c2256c75325b537e82 Author: hyukjinkwon Date: 2016-08-17T02:17:48Z Fix the descriptions for `properties` argument in the documenation for jdbc APIs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14648 **[Test build #63892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63892/consoleFull)** for PR 14648 at commit [`0008c3e`](https://github.com/apache/spark/commit/0008c3e11dfb85523f9f4606d5dec714339d5f43). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63889/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63889/consoleFull)** for PR 14676 at commit [`2327b79`](https://github.com/apache/spark/commit/2327b7971a845ce01b0b18fd5ccd2b1f0bb99be0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost [WIP]
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14547 **[Test build #3224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3224/consoleFull)** for PR 14547 at commit [`a040da5`](https://github.com/apache/spark/commit/a040da5ea64778d766720ecd6a8859893d7204f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63887/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63887/consoleFull)** for PR 14676 at commit [`d7acae5`](https://github.com/apache/spark/commit/d7acae55034d4ff5da3e7579cf44acb7b704b4a1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedInlineTable(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/14639 Thanks @sun-rui Another commit resolved the downloading issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 This is not only about the correct cache dir under MAC OS, but also in yarn-cluster mode, there should not be downloading of Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14675: [SPARK-17096][SQL][STREAMING] Improve exception string r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org