[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63974/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14676 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63974/consoleFull)** for PR 14676 at commit [`2e68438`](https://github.com/apache/spark/commit/2e6843844d126e2ba466fe6b34ea59b3b67942c7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14676 LGTM except https://github.com/apache/spark/pull/14676#discussion_r75248623, waiting feedback from @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14697: [SPARK-17124][SQL] RelationalGroupedDataset.agg should b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14697 **[Test build #63978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63978/consoleFull)** for PR 14697 at commit [`bd64ade`](https://github.com/apache/spark/commit/bd64ade6e3a82e9da55163e96303509275c56678). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14697: [SPARK-17124][SQL] RelationalGroupedDataset.agg s...
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/14697 [SPARK-17124][SQL] RelationalGroupedDataset.agg should be order preserving and allow multiple expressions per column ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? Added a test case in DataFrameAggregateSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-17124 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14697 commit bd64ade6e3a82e9da55163e96303509275c56678 Author: petermaxlee Date: 2016-08-18T06:50:24Z [SPARK-17124][SQL] RelationalGroupedDataset.agg should be order preserving and allow duplicate column names --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14683: [SPARK-16968]Add additional options in jdbc when creatin...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14683 Yes we should just need the documentation change here. You can review `master` to see if it has all the changes you expect from the last PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14672: [SPARK-17034][SQL] Minor code cleanup for UnresolvedOrdi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14672 LGTM, pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14576: [SPARK-16391][SQL] Support partial aggregation for reduc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14576 **[Test build #63977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63977/consoleFull)** for PR 14576 at commit [`50ed0d8`](https://github.com/apache/spark/commit/50ed0d8b39c9305840b326aef034561be487e7c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14690 > If a user queries such a table with predicates which prune that table's partitions, we would like to be able to answer that query without consulting partition metadata which are not involved in the query. When we read a partitioned hive table, we will retrieve all partition metadata from hive metastore and load them to driver memory. Yes, it's not so efficient and may blow up the dirver. However, it only happens at first read, then these data will be cached. If you don't load all partition metadata at first read, how are you going to deal with the cache? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14683: [SPARK-16968]Add additional options in jdbc when creatin...
Github user GraceH commented on the issue: https://github.com/apache/spark/pull/14683 Sorry about my mistake. I will re-post one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14696 from the previous discussion(https://github.com/apache/spark/pull/14389#issuecomment-236591342): > We also need to add some checks before applying the type widen rules, to avoid conflicting with DecimalPrecision, which defines some special rules for binary arithmetic about decimal type. Have you tried this? `push type coercion into each expression` is also in my plan, but it would be a very large change, we should have a design doc first and discuss it with some people. So I'd like to do this small refactor first, what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14683: [SPARK-16968]Add additional options in jdbc when creatin...
Github user GraceH commented on the issue: https://github.com/apache/spark/pull/14683 Oops. @srowen I thought the previous pull request to be closed without merge. That is why I re-post that here. Do you mean we just need the document here, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75254588 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { --- End diff -- > Then, we used the generated BaseRelation to find whether this is a hiveCompatibleTable. If this is not a HadoopFsRelation, hiveCompatibleTable will be None. No, previously we use the `maybeSerde` and `BaseRelation` to decide, so the `HadoopFsRelation` check is already done in `maybeSerde`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14576: [SPARK-16391][SQL] Support partial aggregation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r75254527 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +private[sql] class ReduceAggregator[T: Encoder](func: (T, T) => T) + extends Aggregator[T, (Boolean, T), T] { + + private val encoder = implicitly[Encoder[T]] + + override def zero: (Boolean, T) = (false, null.asInstanceOf[T]) + + override def bufferEncoder: Encoder[(Boolean, T)] = +ExpressionEncoder.tuple( + ExpressionEncoder[Boolean](), + encoder.asInstanceOf[ExpressionEncoder[T]]) + + override def outputEncoder: Encoder[T] = encoder + + override def reduce(b: (Boolean, T), a: T): (Boolean, T) = { +if (b._1) { + (true, func(b._2, a)) +} else { + (true, a) +} + } + + override def merge(b1: (Boolean, T), b2: (Boolean, T)): (Boolean, T) = { +if (!b1._1) { + b2 +} else if (!b2._1) { + b1 +} else { + (true, func(b1._2, b2._2)) +} + } + + override def finish(reduction: (Boolean, T)): T = reduction._2 --- End diff -- Yup I will add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14576: [SPARK-16391][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r75254319 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +private[sql] class ReduceAggregator[T: Encoder](func: (T, T) => T) + extends Aggregator[T, (Boolean, T), T] { + + private val encoder = implicitly[Encoder[T]] + + override def zero: (Boolean, T) = (false, null.asInstanceOf[T]) + + override def bufferEncoder: Encoder[(Boolean, T)] = +ExpressionEncoder.tuple( + ExpressionEncoder[Boolean](), + encoder.asInstanceOf[ExpressionEncoder[T]]) + + override def outputEncoder: Encoder[T] = encoder + + override def reduce(b: (Boolean, T), a: T): (Boolean, T) = { +if (b._1) { + (true, func(b._2, a)) +} else { + (true, a) +} + } + + override def merge(b1: (Boolean, T), b2: (Boolean, T)): (Boolean, T) = { +if (!b1._1) { + b2 +} else if (!b2._1) { + b1 +} else { + (true, func(b1._2, b2._2)) +} + } + + override def finish(reduction: (Boolean, T)): T = reduction._2 --- End diff -- Then shall we add an assert? Or we may probably forget about it and go with `return null for empty relation without grouping key`, which is what the current code do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14576: [SPARK-16391][SQL] Support partial aggregation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r75253981 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +private[sql] class ReduceAggregator[T: Encoder](func: (T, T) => T) + extends Aggregator[T, (Boolean, T), T] { + + private val encoder = implicitly[Encoder[T]] + + override def zero: (Boolean, T) = (false, null.asInstanceOf[T]) + + override def bufferEncoder: Encoder[(Boolean, T)] = +ExpressionEncoder.tuple( + ExpressionEncoder[Boolean](), + encoder.asInstanceOf[ExpressionEncoder[T]]) + + override def outputEncoder: Encoder[T] = encoder + + override def reduce(b: (Boolean, T), a: T): (Boolean, T) = { +if (b._1) { + (true, func(b._2, a)) +} else { + (true, a) +} + } + + override def merge(b1: (Boolean, T), b2: (Boolean, T)): (Boolean, T) = { +if (!b1._1) { + b2 +} else if (!b2._1) { + b1 +} else { + (true, func(b1._2, b2._2)) +} + } + + override def finish(reduction: (Boolean, T)): T = reduction._2 --- End diff -- It's possible for us to support that in the future, but we can worry about it when we want to make this public? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75253703 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: Configu assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireDbExists(db) +verifyTableProperties(tableDefinition) + +if (tableDefinition.provider == Some("hive") || tableDefinition.tableType == VIEW) { + client.createTable(tableDefinition, ignoreIfExists) +} else { + val tableProperties = tableMetadataToProperties(tableDefinition) + + def newSparkSQLSpecificMetastoreTable(): CatalogTable = { +tableDefinition.copy( + schema = new StructType, + partitionColumnNames = Nil, + bucketSpec = None, + properties = tableDefinition.properties ++ tableProperties) + } + + def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { +tableDefinition.copy( + storage = tableDefinition.storage.copy( +locationUri = Some(new Path(path).toUri.toString), +inputFormat = serde.inputFormat, +outputFormat = serde.outputFormat, +serde = serde.serde + ), + properties = tableDefinition.properties ++ tableProperties) + } + + val qualifiedTableName = tableDefinition.identifier.quotedString + val maybeSerde = HiveSerDe.sourceToSerDe(tableDefinition.provider.get) + val maybePath = new CaseInsensitiveMap(tableDefinition.storage.properties).get("path") + val skipHiveMetadata = tableDefinition.storage.properties +.getOrElse("skipHiveMetadata", "false").toBoolean + + val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) match { --- End diff -- Previously, we create a DataSource and resolve it by calling `dataSource.resolveRelation`. (FYI, the `resolveRelation` consumes user-specified `options`.) Then, we used the generated `BaseRelation` to find whether this is a `hiveCompatibleTable`. If this is not a `HadoopFsRelation`, `hiveCompatibleTable` will be None. Now, the decision is based on whether the user-specified options has a `path` property or not. This is not always true. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14618 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14618 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63972/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14618 **[Test build #63972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63972/consoleFull)** for PR 14618 at commit [`fc2dbd9`](https://github.com/apache/spark/commit/fc2dbd972acd4020ca848f1a6d727a511aa70a8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14696 @cloud-fan this actually broke decimal precision. I'm starting to think that it would be better to push type coercion into each expression, and then the arithmetic can create special cases for decimal types before calling the functions provided here. It would be a much larger change though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75252926 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark issue #14672: [SPARK-17034][SQL] Minor code cleanup for UnresolvedOrdi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14672 **[Test build #63976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63976/consoleFull)** for PR 14672 at commit [`2eb02c1`](https://github.com/apache/spark/commit/2eb02c178fefe906e01ab4a98283de2c8a0fcc36). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14672: [SPARK-17034][SQL] Minor code cleanup for Unresol...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14672#discussion_r75252322 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -27,22 +27,21 @@ import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin /** * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. */ -class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan] { - private def isIntegerLiteral(sorter: Expression) = IntegerIndex.unapply(sorter).nonEmpty +class SubstituteUnresolvedOrdinals(conf: CatalystConf) extends Rule[LogicalPlan] { + private def isIntLiteral(sorter: Expression) = IntegerIndex.unapply(sorter).nonEmpty --- End diff -- Good idea. I remove IntegerIndex. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63973/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14696 **[Test build #63973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63973/consoleFull)** for PR 14696 at commit [`9df4551`](https://github.com/apache/spark/commit/9df455107cf89b590ca0bcac807ea8671ccab344). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63970/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63975/consoleFull)** for PR 14676 at commit [`fb9de34`](https://github.com/apache/spark/commit/fb9de341aa5c43907ab4a51a9187434f13defcd3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14155 **[Test build #63970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63970/consoleFull)** for PR 14155 at commit [`96d57b6`](https://github.com/apache/spark/commit/96d57b665ac65750eb5c6f9757e5827ea9c14ca4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63969/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75251858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { --- End diff -- I added some comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14155 **[Test build #63969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63969/consoleFull)** for PR 14155 at commit [`502fd63`](https://github.com/apache/spark/commit/502fd6350edc55537ea99a374abc1aad130aceb1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14695: [SPARK-17117][SQL] 1 / NULL should not fail analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14695 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14695: [SPARK-17117][SQL] 1 / NULL should not fail analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63967/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13873 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75251612 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- nvm, it sounds like the `write` API is just called by CTAS and the save API of DataFrameWriter. It is OK. Let me read it again and check if we might have an issue for `options` in the CREATE Data Source Table command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63968/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14673: [SPARK-15083] [Web UI] History Server can OOM due to unl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14673 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63971/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13873 **[Test build #63968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63968/consoleFull)** for PR 13873 at commit [`f30d7d3`](https://github.com/apache/spark/commit/f30d7d32b084d5fa95e36be037899011e99b51a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14695: [SPARK-17117][SQL] 1 / NULL should not fail analysis
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14695 **[Test build #63967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63967/consoleFull)** for PR 14695 at commit [`a946269`](https://github.com/apache/spark/commit/a946269811540d6cdb2237c62f095f847b461cee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14673: [SPARK-15083] [Web UI] History Server can OOM due to unl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14673 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14673: [SPARK-15083] [Web UI] History Server can OOM due to unl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14673 **[Test build #63971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63971/consoleFull)** for PR 14673 at commit [`014db4c`](https://github.com/apache/spark/commit/014db4c88b6b8a56a8a7a8197c27ac0b6e02f1a9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14597: [WIP][SPARK-17017][MLLIB] add a chiSquare Selector based...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14597 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75251290 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- @cloud-fan How about the write path? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75251168 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- That line is just putting the `options` in the storage properties. It works for `path`, but the external data source connectors might [pass some parameters into `createRelation`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L443). I think `option` is a critical parameter-passing channel for the external data source connectors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen, I have added the parameter to control the feature selection type. The usage is like this: **var selector = new ChiSqSelector() var model = selector.fit(df) // by default, the selector is selection numTopFeatures (50) var newModel = selector.selectKBest(10), or var newModel = selector.selectPercentile(5), or,,** You can fit the DataFrame one time, and generate the model multi times. And the indices is sort in the model internally as we have discussed. For pass the p-value to the model function, this update does not include it. Because for the KBest and Percentile selection, the fit function uses ChiSqTestResult.statics to generate the model. For Fpr, the fit function uses ChiSqTestResult.p-value. So it maybe better to pass ChiSqTestResult to the model and expose to the caller. And I think it is better to submit another PR for "pass value to model and expose to the caller" problem. Because much codes will be changed for this problem, includes which data should be passed to the model, how to save the model, how to test the model. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...
GitHub user mpjlu reopened a pull request: https://github.com/apache/spark/pull/14597 [SPARK-17017][MLLIB] add a chiSquare Selector based on False Positive Rate (FPR) test ## What changes were proposed in this pull request? Univariate feature selection works by selecting the best features based on univariate statistical tests. False Positive Rate (FPR) is a popular univariate statistical test for feature selection. We add a chiSquare Selector based on False Positive Rate (FPR) test in this PR, like it is implemented in scikit-learn. http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection ## How was this patch tested? Add Scala ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark fprChiSquare Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14597.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14597 commit 2adebe8de3881509e510fc518c562d1141ccd0ef Author: Peng, Meng Date: 2016-08-10T05:40:18Z add a chiSquare Selector based on False Positive Rate (FPR) test commit 04053ca207ef4aa955eddc3e65d09a4e03db6292 Author: Peng, Meng Date: 2016-08-11T07:10:43Z Merge remote-tracking branch 'origin/master' into fprChiSquare commit 7623563884355a04867ce5271baa286f65180e62 Author: Peng, Meng Date: 2016-08-16T13:36:11Z Configure the ChiSqSelector to reuse ChiSqTestResult by numTopFeatures, Percentile, and Fpr selector commit 3d6aecb8441503c9c3d62a2d8a3d48824b9d6637 Author: Peng, Meng Date: 2016-08-17T02:34:59Z Config the ChiSqSelector to reuse the ChiSqTestResult by KBest, Percentile and FPR selector commit 026ac85dfa190707891b694f40e737f22f9b4bd5 Author: Peng, Meng Date: 2016-08-17T02:43:45Z Merge branch 'master' into fprChiSquare2 commit 5305709c9d4029186318b99fa9c7c483897aa653 Author: Peng, Meng Date: 2016-08-17T09:59:16Z add Since annotation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75250973 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75250996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- I put the options in `CatalogStorageFormat.properties`, and when the table is read back, we will get the storage.properties as the data source options for create relation, see https://github.com/apache/spark/pull/14155/files#diff-d99813bd5bbc18277e4090475e4944cfR214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75250883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- Is it different from what we do at line https://github.com/apache/spark/pull/14155/files/96d57b665ac65750eb5c6f9757e5827ea9c14ca4#diff-945e51801b84b92da242fcb42f83f5f5R98? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14648 Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14155#discussion_r75250702 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -97,16 +92,17 @@ case class CreateDataSourceTableCommand( } } -CreateDataSourceTableUtils.createDataSourceTable( - sparkSession = sparkSession, - tableIdent = tableIdent, +val table = CatalogTable( + identifier = tableIdent, + tableType = if (isExternal) CatalogTableType.EXTERNAL else CatalogTableType.MANAGED, + storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath), schema = dataSource.schema, - partitionColumns = partitionColumns, - bucketSpec = bucketSpec, - provider = provider, - options = optionsWithPath, --- End diff -- It sounds like we are not following the previous behaviors. `options` might be consumed by the external Data Source implementors. `options` is not only used for specifying `path`, but also used for a channel to pass extra parameters to the data source. I checked the existing implementation of `createDataSourceTable`. We [pass the original `options` into the constructor of `DataSource`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L336). Then, [the `write` API will pass the `option` to `createRelation`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L443). How about adding it as an independent field in `CatalogTable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75250554 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #14576: [SPARK-16391][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r75250470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +private[sql] class ReduceAggregator[T: Encoder](func: (T, T) => T) + extends Aggregator[T, (Boolean, T), T] { + + private val encoder = implicitly[Encoder[T]] + + override def zero: (Boolean, T) = (false, null.asInstanceOf[T]) + + override def bufferEncoder: Encoder[(Boolean, T)] = +ExpressionEncoder.tuple( + ExpressionEncoder[Boolean](), + encoder.asInstanceOf[ExpressionEncoder[T]]) + + override def outputEncoder: Encoder[T] = encoder + + override def reduce(b: (Boolean, T), a: T): (Boolean, T) = { +if (b._1) { + (true, func(b._2, a)) +} else { + (true, a) +} + } + + override def merge(b1: (Boolean, T), b2: (Boolean, T)): (Boolean, T) = { +if (!b1._1) { + b2 +} else if (!b2._1) { + b1 +} else { + (true, func(b1._2, b2._2)) +} + } + + override def finish(reduction: (Boolean, T)): T = reduction._2 --- End diff -- I think it makes sense to support `reduce group` without grouping key, so it may happen in the future. Besides, it's not a lot of work, we just need to decide the expected behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14648: [SPARK-16995][SQL] TreeNodeException when flat ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14648 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14648 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14672: [SPARK-17034][SQL] Minor code cleanup for Unresol...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14672#discussion_r75250012 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala --- @@ -27,22 +27,21 @@ import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin /** * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. */ -class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan] { - private def isIntegerLiteral(sorter: Expression) = IntegerIndex.unapply(sorter).nonEmpty +class SubstituteUnresolvedOrdinals(conf: CatalystConf) extends Rule[LogicalPlan] { + private def isIntLiteral(sorter: Expression) = IntegerIndex.unapply(sorter).nonEmpty --- End diff -- as we are cleaning up the code, shall we also remove `IntegerIndex`? It became unnecessary after we make `-1` a literal instead of `UnaryMinus`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75249808 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { --- End diff -- This looks tricky to me, as ideally `foldable` matches the semantic better. Actually we are making assumptions here, i.e. in the case of inline table, evaluable always mean foldable, because `UnresolvedInlineTable` can't resolve `UnresolvedAttribute` to `AttributeReference` as it's a leaf node. We should either document this, or not support rand, cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75249521 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTablesSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.scalatest.BeforeAndAfter + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Literal, Rand} +import org.apache.spark.sql.catalyst.expressions.aggregate.Count +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.types.LongType + +/** + * Unit tests for [[ResolveInlineTables]]. Note that there are also test cases defined in + * end-to-end tests (in sql/core module) for verifying the correct error messages are shown + * in negative cases. + */ +class ResolveInlineTablesSuite extends PlanTest with BeforeAndAfter { + + private def lit(v: Any): Literal = Literal(v) + + test("validate inputs are foldable") { +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(lit(1) + +// nondeterministic (rand) should be fine +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Rand(1) + +// aggregate should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Count(lit(1)) +} + +// unresolved attribute should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(UnresolvedAttribute("A") --- End diff -- But how would a user construct an AttributeReference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75249457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => +if (row.size != numCols) { + table.failAnalysis(s"expected $numCols columns but found ${row.size} columns in row $ri") +} + } + + if (table.names.size != numCols) { +table.failAnalysis(s"expected ${table.names.size} columns but found $numCols in first row") + } +} + } + + /** + * Convert a valid (with right shape and foldable inputs) [[UnresolvedInlineTable]] + * into a [[LocalRelation]]. + * + * This function attempts to coerce inputs into consistent types. + * + * This is package visible for unit testing. + */ + private[analysis] def convert(table: UnresolvedInlineTable): LocalRelation = { +val numCols = table.rows.head.size + +// For each column, traverse all the values and find a common data type. +val targetTypes = table.rows.transpose.zip(table.names).map { case (column, name) => + val inputTypes = column.map(_.dataType) + TypeCoercion.findWiderTypeWithoutStringPromotion(inputTypes).getOrElse { --- End diff -- I don't have a strong preference, cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75249365 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTablesSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.scalatest.BeforeAndAfter + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Literal, Rand} +import org.apache.spark.sql.catalyst.expressions.aggregate.Count +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.types.LongType + +/** + * Unit tests for [[ResolveInlineTables]]. Note that there are also test cases defined in + * end-to-end tests (in sql/core module) for verifying the correct error messages are shown + * in negative cases. + */ +class ResolveInlineTablesSuite extends PlanTest with BeforeAndAfter { + + private def lit(v: Any): Literal = Literal(v) + + test("validate inputs are foldable") { +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(lit(1) + +// nondeterministic (rand) should be fine +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Rand(1) + +// aggregate should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Count(lit(1)) +} + +// unresolved attribute should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(UnresolvedAttribute("A") --- End diff -- the `Add` will be resolved and evaluable, but not foldable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75249341 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTablesSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.scalatest.BeforeAndAfter + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Literal, Rand} +import org.apache.spark.sql.catalyst.expressions.aggregate.Count +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.types.LongType + +/** + * Unit tests for [[ResolveInlineTables]]. Note that there are also test cases defined in + * end-to-end tests (in sql/core module) for verifying the correct error messages are shown + * in negative cases. + */ +class ResolveInlineTablesSuite extends PlanTest with BeforeAndAfter { + + private def lit(v: Any): Literal = Literal(v) + + test("validate inputs are foldable") { +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(lit(1) + +// nondeterministic (rand) should be fine +ResolveInlineTables.validateInputEvaluable( + UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Rand(1) + +// aggregate should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(Count(lit(1)) +} + +// unresolved attribute should not work +intercept[AnalysisException] { + ResolveInlineTables.validateInputEvaluable( +UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(UnresolvedAttribute("A") --- End diff -- how about `UnresolvedInlineTable(Seq("c1", "c2"), Seq(Seq(AttributeReference("A") + 1)))`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63966/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14155 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/13796 I go through the PR again, and it's in a very good shape. Only couple minor issues needed to be addressed. Thank you @sethah for the great work. This will be a big feature in Spark 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63964/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14155 **[Test build #63966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63966/consoleFull)** for PR 14155 at commit [`263d7c3`](https://github.com/apache/spark/commit/263d7c38d60266db96d65032e53690a57f111a4f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14693: [SPARK-17113][Shuffle] Job failure due to Executor OOM i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14693 **[Test build #63964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63964/consoleFull)** for PR 14693 at commit [`8581659`](https://github.com/apache/spark/commit/85816590f141d1785b2786610d29523ce249c59f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75249124 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75249109 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75249096 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75249042 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark issue #14687: [SPARK-17107][SQL] Remove redundant pushdown rule for Un...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63965/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75248950 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark issue #14687: [SPARK-17107][SQL] Remove redundant pushdown rule for Un...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14687 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14676 **[Test build #63974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63974/consoleFull)** for PR 14676 at commit [`2e68438`](https://github.com/apache/spark/commit/2e6843844d126e2ba466fe6b34ea59b3b67942c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14687: [SPARK-17107][SQL] Remove redundant pushdown rule for Un...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14687 **[Test build #63965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63965/consoleFull)** for PR 14687 at commit [`f840ccb`](https://github.com/apache/spark/commit/f840ccbb43a34aa0c7469027e969eb45b3ae7d33). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248833 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => +if (row.size != numCols) { + table.failAnalysis(s"expected $numCols columns but found ${row.size} columns in row $ri") +} + } + + if (table.names.size != numCols) { +table.failAnalysis(s"expected ${table.names.size} columns but found $numCols in first row") + } +} + } + + /** + * Convert a valid (with right shape and foldable inputs) [[UnresolvedInlineTable]] + * into a [[LocalRelation]]. + * + * This function attempts to coerce inputs into consistent types. + * + * This is package visible for unit testing. + */ + private[analysis] def convert(table: UnresolvedInlineTable): LocalRelation = { +val numCols = table.rows.head.size + +// For each column, traverse all the values and find a common data type. +val targetTypes = table.rows.transpose.zip(table.names).map { case (column, name) => + val inputTypes = column.map(_.dataType) + TypeCoercion.findWiderTypeWithoutStringPromotion(inputTypes).getOrElse { --- End diff -- Postgres doesn't allow it. We can choose to be consistent with union though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75248788 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75248762 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75248754 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => +if (row.size != numCols) { + table.failAnalysis(s"expected $numCols columns but found ${row.size} columns in row $ri") +} + } + + if (table.names.size != numCols) { +table.failAnalysis(s"expected ${table.names.size} columns but found $numCols in first row") + } +} + } + + /** + * Convert a valid (with right shape and foldable inputs) [[UnresolvedInlineTable]] + * into a [[LocalRelation]]. + * + * This function attempts to coerce inputs into consistent types. + * + * This is package visible for unit testing. + */ + private[analysis] def convert(table: UnresolvedInlineTable): LocalRelation = { +val numCols = table.rows.head.size + +// For each column, traverse all the values and find a common data type. +val targetTypes = table.rows.transpose.zip(table.names).map { case (column, name) => + val inputTypes = column.map(_.dataType) + TypeCoercion.findWiderTypeWithoutStringPromotion(inputTypes).getOrElse { --- End diff -- Can you check with other databases? Should we do string promotion for inline table? FYI expressions in `Union` can promote to string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14389: [SPARK-16714][SQL] Refactor type widening for con...
Github user petermaxlee closed the pull request at: https://github.com/apache/spark/pull/14389 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14389: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14389 Sorry that it has taken this long. I have submitted a work in progress pull request at https://github.com/apache/spark/pull/14696 Going to close this one and continue the work there, since it is a fairly different pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248513 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => --- End diff -- That's a good idea. Let me do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248490 --- Diff: sql/core/src/test/resources/sql-tests/inputs/inline-table.sql --- @@ -0,0 +1,48 @@ + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select * from values ("one", 1) as data(a, b); + +-- single column multiple rows +select * from values 1, 2, 3 as data(a); + +-- two rows --- End diff -- nit: 3 rows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/13796#discussion_r75248492 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultinomialLogisticRegression.scala --- @@ -0,0 +1,611 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, LBFGS => BreezeLBFGS, OWLQN => BreezeOWLQN} +import org.apache.hadoop.fs.Path + +import org.apache.spark.SparkException +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.internal.Logging +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.DoubleType +import org.apache.spark.storage.StorageLevel + +/** + * Params for multinomial logistic (softmax) regression. + */ +private[classification] trait MultinomialLogisticRegressionParams + extends ProbabilisticClassifierParams with HasRegParam with HasElasticNetParam with HasMaxIter +with HasFitIntercept with HasTol with HasStandardization with HasWeightCol { + + /** + * Set thresholds in multiclass (or binary) classification to adjust the probability of + * predicting each class. Array must have length equal to the number of classes, with values >= 0. + * The class with largest value p/t is predicted, where p is the original probability of that + * class and t is the class' threshold. + * + * @group setParam + */ + def setThresholds(value: Array[Double]): this.type = { +set(thresholds, value) + } + + /** + * Get thresholds for binary or multiclass classification. + * + * @group getParam + */ + override def getThresholds: Array[Double] = { +$(thresholds) + } +} + +/** + * :: Experimental :: + * Multinomial Logistic (softmax) regression. + */ +@Since("2.1.0") +@Experimental +class MultinomialLogisticRegression @Since("2.1.0") ( +@Since("2.1.0") override val uid: String) + extends ProbabilisticClassifier[Vector, +MultinomialLogisticRegression, MultinomialLogisticRegressionModel] +with MultinomialLogisticRegressionParams with DefaultParamsWritable with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("mlogreg")) + + /** + * Set the regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the ElasticNet mixing parameter. + * For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. + * For 0 < alpha < 1, the penalty is a combination of L1 and L2. + * Default is 0.0 which is an L2 penalty. + * + * @group setParam + */ + @Since("2.1.0") + def setElasticNetParam(value: Double): this.type = set(elasticNetParam, value) + setDefault(elasticNetParam -> 0.0) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group setParam + */ + @Since("2.1.0") + def setMaxIter(value: Int): this.type = set(maxIter, value) + setDefault(maxIter -> 100) + + /** + * Set the convergence tolerance of iterations. + * Smaller value will lead to higher accuracy with the cost of more iterations. + * Default is
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248427 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => +if (row.size != numCols) { + table.failAnalysis(s"expected $numCols columns but found ${row.size} columns in row $ri") +} + } + + if (table.names.size != numCols) { +table.failAnalysis(s"expected ${table.names.size} columns but found $numCols in first row") + } +} + } + + /** + * Convert a valid (with right shape and foldable inputs) [[UnresolvedInlineTable]] + * into a [[LocalRelation]]. + * + * This function attempts to coerce inputs into consistent types. + * + * This is package visible for unit testing. + */ + private[analysis] def convert(table: UnresolvedInlineTable): LocalRelation = { +val numCols = table.rows.head.size + +// For each column, traverse all the values and find a common data type. +val targetTypes = table.rows.transpose.zip(table.names).map { case (column, name) => + val inputTypes = column.map(_.dataType) + TypeCoercion.findWiderTypeWithoutStringPromotion(inputTypes).getOrElse { +table.failAnalysis(s"incompatible types found in column $name for inline table") + } +} +assert(targetTypes.size == table.names.size) --- End diff -- asserts are not meant to be user facing. They are meant to be defensive against programming errors (i.e. bugs in Spark). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-ma
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { --- End diff -- This was suggested by @hvanhovell. I think private functions are still meant to be private. This is only package visible for the purpose of testing. That is to say, I don't expect developers to be calling this function either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14696: [SPARK-16714][SQL] Refactor type widening for consistenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14696 **[Test build #63973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63973/consoleFull)** for PR 14696 at commit [`9df4551`](https://github.com/apache/spark/commit/9df455107cf89b590ca0bcac807ea8671ccab344). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14656 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63963/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { --- End diff -- If we check foldable rand() wouldn't work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14656 **[Test build #63963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63963/consoleFull)** for PR 14656 at commit [`7ebd563`](https://github.com/apache/spark/commit/7ebd563fca73ae4a4e05970709f334a4d09b5ff1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14676#discussion_r75248289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Cast, InterpretedProjection, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => + validateInputDimension(table) + validateInputEvaluable(table) + convert(table) + } + + /** + * Validates that all inline table data are foldable expressions. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputEvaluable(table: UnresolvedInlineTable): Unit = { +table.rows.foreach { row => + row.foreach { e => +if (!e.resolved || e.isInstanceOf[Unevaluable]) { + e.failAnalysis(s"cannot evaluate expression ${e.sql} in inline table definition") +} + } +} + } + + /** + * Validates the input data dimension: + * 1. All rows have the same cardinality. + * 2. The number of column aliases defined is consistent with the number of columns in data. + * + * This is package visible for unit testing. + */ + private[analysis] def validateInputDimension(table: UnresolvedInlineTable): Unit = { +if (table.rows.nonEmpty) { + val numCols = table.rows.head.size + table.rows.zipWithIndex.foreach { case (row, ri) => +if (row.size != numCols) { + table.failAnalysis(s"expected $numCols columns but found ${row.size} columns in row $ri") +} + } + + if (table.names.size != numCols) { +table.failAnalysis(s"expected ${table.names.size} columns but found $numCols in first row") + } +} + } + + /** + * Convert a valid (with right shape and foldable inputs) [[UnresolvedInlineTable]] + * into a [[LocalRelation]]. + * + * This function attempts to coerce inputs into consistent types. + * + * This is package visible for unit testing. + */ + private[analysis] def convert(table: UnresolvedInlineTable): LocalRelation = { +val numCols = table.rows.head.size + +// For each column, traverse all the values and find a common data type. +val targetTypes = table.rows.transpose.zip(table.names).map { case (column, name) => + val inputTypes = column.map(_.dataType) + TypeCoercion.findWiderTypeWithoutStringPromotion(inputTypes).getOrElse { +table.failAnalysis(s"incompatible types found in column $name for inline table") + } +} +assert(targetTypes.size == table.names.size) --- End diff -- it's duplicated, `validateInputDimension` already guarantees `table.names.size` is equal to number of column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h.