[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-218958319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58539/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-218958318 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-218958212 **[Test build #58539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58539/consoleFull)** for PR 13090 at commit [`ddb9ce6`](https://github.com/apache/spark/commit/ddb9ce6692d9b70b7656196191d3572696ee0a12). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13084#discussion_r63136060 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -222,6 +222,33 @@ trait Unevaluable extends Expression { /** + * An expression that gets replaced at runtime (currently by the optimizer) into a different + * expression for evaluation. This is mainly used to provide compatibility with other databases. + * For example, we use this to support "nvl" by replacing it with "coalesce". + */ +trait RuntimeReplaceable extends Unevaluable { --- End diff -- Yea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12987#issuecomment-218958088 @WeichenXu123 I tried that with `ignoreLeadingWhiteSpace` and `ignoreTrailingWhiteSpace` and it seems working fine. I am careful of saying this because I am not a committer but personally I would suggest close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15253] [SQL] Support old table schema c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13073#issuecomment-218957944 **[Test build #58545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58545/consoleFull)** for PR 13073 at commit [`f82f7c6`](https://github.com/apache/spark/commit/f82f7c609d3ba69580c4531f348dd6d87f804c2c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13091#issuecomment-218956881 **[Test build #58544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58544/consoleFull)** for PR 13091 at commit [`06aa4ae`](https://github.com/apache/spark/commit/06aa4aefb1fe47f552275edf0417e5644820). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-218956629 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58538/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63135350 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") + +// Because the ORC configuration value in `default` is not guaranteed to be the same +// with keys in `shortOrcCompressionCodecNames` in Spark, this value should not be +// used as the key for `shortOrcCompressionCodecNames` but just a return value. +parameters.get("compression") match { --- End diff -- Thank you so much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-218956628 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-218956508 **[Test build #58538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)** for PR 12268 at commit [`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63135275 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") + +// Because the ORC configuration value in `default` is not guaranteed to be the same +// with keys in `shortOrcCompressionCodecNames` in Spark, this value should not be +// used as the key for `shortOrcCompressionCodecNames` but just a return value. +parameters.get("compression") match { --- End diff -- yea i think maybe it's best to remove the hadoop config dependency and just depend on parameters, and default to snappy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15305][ML][DOC]:spark.ml document Bisec...
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/13083#issuecomment-218955997 cc @zhengruifeng @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11724 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-218955594 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-218955604 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13088#discussion_r63134583 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala --- @@ -88,7 +88,8 @@ object Main extends Logging { } val builder = SparkSession.builder.config(conf) -if (SparkSession.hiveClassesArePresent) { +if (conf.getBoolean("spark.user.hive.catalog", true) --- End diff -- Right now from the repl/Main.scala level, the way to generate a sparkSession with hive catalog is checking `SparkSession.hiveClassesArePresent`, which checks for the classes for `HiveSharedState` and `HiveSessionState`. If these classes are built, repl will always start sparkSession that uses hive catalog. In order for repl to use InMemoryCatalog, we need to go the else code path of the code ``` if (conf.getBoolean("spark.use.hive.catalog", true) && SparkSession.hiveClassesArePresent) { sparkSession = builder.enableHiveSupport().getOrCreate() logInfo("Created Spark session with Hive support") } else { sparkSession = builder.getOrCreate() logInfo("Created Spark session") } ``` I think the default value for the `CATALOG_IMPLEMENTATION` is `in-memory`. But maybe I can put this key `spark.sql.catalogImplementation` in the spark-defaults.conf ? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63134568 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") + +// Because the ORC configuration value in `default` is not guaranteed to be the same +// with keys in `shortOrcCompressionCodecNames` in Spark, this value should not be +// used as the key for `shortOrcCompressionCodecNames` but just a return value. +parameters.get("compression") match { --- End diff -- Hm.. sorry I think I got confused. So do you mean leave this part as it is and make the default compression value to string `SNAPPY` instead of reading the default value from Hadoop configuration? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12373 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218953966 Merging to master and branch 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218953875 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63134045 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") --- End diff -- @rxin Thank you for informing me. Then, I will update this to read `compression` from parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63133864 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") --- End diff -- i think overtime it'd be better if we are switching entirely over to Spark's own configuration, and propagate values over to Hadoop Configuration, not the other way around ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63133871 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") --- End diff -- Also I'd just call it compression, rather than orc.compression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63133772 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") --- End diff -- Ah... Isn't it still possible to set Hadoop configuration via `spark.sessionState.newHadoopConf()` though? I thought it is safe to use this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user lresende closed the pull request at: https://github.com/apache/spark/pull/13092 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15311] [SQL] Disallow DML on Non-tempor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13093#issuecomment-218952559 **[Test build #58543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58543/consoleFull)** for PR 13093 at commit [`5c3b116`](https://github.com/apache/spark/commit/5c3b11621ead9402602b6eaf7531990c34d3ed31). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13084#discussion_r63133598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -222,6 +222,33 @@ trait Unevaluable extends Expression { /** + * An expression that gets replaced at runtime (currently by the optimizer) into a different + * expression for evaluation. This is mainly used to provide compatibility with other databases. + * For example, we use this to support "nvl" by replacing it with "coalesce". + */ +trait RuntimeReplaceable extends Unevaluable { --- End diff -- This is very cool! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13088#discussion_r63133610 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala --- @@ -88,7 +88,8 @@ object Main extends Logging { } val builder = SparkSession.builder.config(conf) -if (SparkSession.hiveClassesArePresent) { +if (conf.getBoolean("spark.user.hive.catalog", true) --- End diff -- why do we need this? isn't the enableHiveSupport the same as setting a config spark.sql.catalogImplementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218952186 It's better to do this once we have a 2.0 maven artifact published I think. Otherwise Mima is going to have duplicates and as we do api tweaks during qa period we will have to fight with conflicts. Can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218952073 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58541/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15311] [SQL] Disallow DML on Non-tempor...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13093 [SPARK-15311] [SQL] Disallow DML on Non-temporary Tables when Using In-Memory Catalog What changes were proposed in this pull request? So far, when using In-Memory Catalog, we allow DDL operations for non-temporary tables. However, the corresponding DML operations are not supported. This PR is to issue appropriate exceptions in this case. Another option is to disallow users to do DDL operations for non-temporary tables. @rxin @andrewor14 @yhuai @cloud-fan @liancheng Let me know if you want to do it in that way. If so, we have to disable a lot of existing test cases. How was this patch tested? Added test cases in DDLSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark selectAfterCreate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13093.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13093 commit 5c3b11621ead9402602b6eaf7531990c34d3ed31 Author: gatorsmileDate: 2016-05-13T04:47:59Z issue exceptions for non-temporary tables when the catalog is using InMemoryCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218952070 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218952061 **[Test build #58541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58541/consoleFull)** for PR 13092 at commit [`92dddf5`](https://github.com/apache/spark/commit/92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/13091#issuecomment-218952009 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-218951716 cc @davies can you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user lresende commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218951567 @srowen Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13091#issuecomment-218951119 **[Test build #58542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58542/consoleFull)** for PR 13091 at commit [`7424af4`](https://github.com/apache/spark/commit/7424af4bcfbad5a9490321b08425739dfaa2ca67). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13092#issuecomment-218951117 **[Test build #58541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58541/consoleFull)** for PR 13092 at commit [`92dddf5`](https://github.com/apache/spark/commit/92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...
GitHub user lresende opened a pull request: https://github.com/apache/spark/pull/13092 [SPARK-15309] Bump master to version 2.1.0-SNAPSHOT ## What changes were proposed in this pull request? Update pom artifact version to 2.1.0-SNAPSHOT to avoid any conflicts with 2.0.0-SNAPSHOT branch. ## How was this patch tested? Regular build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lresende/spark SPARK-15309 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13092.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13092 commit 92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff Author: Luciano ResendeDate: 2016-05-13T04:40:16Z [SPARK-15309] Bump master to version 2.1.0-SNAPSHOT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [TRIVIAL][Doc] SparkSession class doc example ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13086#issuecomment-218950905 Maybe we should add () to the builder function instead? In that case this code will work in Java too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13091#issuecomment-218950766 cc @yhuai for review; talked with @marmbrus offline already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/13091 [SPARK-15310][SQL] Rename HiveTypeCoercion -> TypeCoercion ## What changes were proposed in this pull request? We originally designed the type coercion rules to match Hive, but over time we have diverged. It does not make sense to call it HiveTypeCoercion anymore. This patch renames it TypeCoercion. ## How was this patch tested? Updated unit tests to reflect the rename. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-15310 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13091.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13091 commit 7424af4bcfbad5a9490321b08425739dfaa2ca67 Author: Reynold XinDate: 2016-05-13T04:42:29Z [SPARK-15310][SQL] Rename HiveTypeCoercion -> TypeCoercion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-218950690 **[Test build #58540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58540/consoleFull)** for PR 12719 at commit [`2d56bc2`](https://github.com/apache/spark/commit/2d56bc27b1091ce37103f7427332841c2b996003). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13078#issuecomment-218950522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58537/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13078#issuecomment-218950521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-218950477 Rebase to resolve conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63132718 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import org.apache.hadoop.conf.Configuration + +/** + * Options for the ORC data source. + */ +private[orc] class OrcOptions( +@transient private val parameters: Map[String, String], +@transient private val conf: Configuration) + extends Serializable { + + import OrcOptions._ + + /** + * Compression codec to use. By default use the value specified in Hadoop configuration. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +val default = conf.get(ORC_COMPRESSION, "SNAPPY") --- End diff -- hmm why are we reading from hadoop configuration? shouldn't we just read from parameters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13078#issuecomment-218950439 **[Test build #58537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58537/consoleFull)** for PR 13078 at commit [`9b2a5aa`](https://github.com/apache/spark/commit/9b2a5aa9f88cbb35a4843298a7c295297b4ce378). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15306][SQL] Move object expressions int...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13085 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218949992 cc @hvanhovell wanna review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15306][SQL] Move object expressions int...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13085#issuecomment-218950017 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-218949812 **[Test build #58539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58539/consoleFull)** for PR 13090 at commit [`ddb9ce6`](https://github.com/apache/spark/commit/ddb9ce6692d9b70b7656196191d3572696ee0a12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/13090 [SPARK-15308][SQL] RowEncoder should preserve nested column name. ## What changes were proposed in this pull request? The following code generates wrong schema: ``` val schema = new StructType().add( "struct", new StructType() .add("i", IntegerType, nullable = false) .add( "s", new StructType().add("int", IntegerType, nullable = false), nullable = false), nullable = false) val ds = sqlContext.range(10).map(l => Row(l, Row(l)))(RowEncoder(schema)) ds.printSchema() ``` This should print as follows: ``` root |-- struct: struct (nullable = false) ||-- i: integer (nullable = false) ||-- s: struct (nullable = false) |||-- int: integer (nullable = false) ``` but the result is: ``` |-- struct: struct (nullable = false) ||-- col1: integer (nullable = false) ||-- col2: struct (nullable = false) |||-- col1: integer (nullable = false) ``` This PR fixes `RowEncoder` to preserve nested column name. ## How was this patch tested? Existing tests and I added a test to check if `RowEncoder` preserves nested column name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-15308 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13090.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13090 commit d4ae79775f2ba0ebd0dd65b52076930244c2be96 Author: Takuya UESHINDate: 2016-05-13T04:02:41Z Add a test to check if RowEncoder preserves nested column name. commit ddb9ce6692d9b70b7656196191d3572696ee0a12 Author: Takuya UESHIN Date: 2016-05-13T04:08:00Z Fix RowEncoder to preserve nested column name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218949638 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218949640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58536/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218949541 **[Test build #58536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58536/consoleFull)** for PR 13048 at commit [`6fd5f0d`](https://github.com/apache/spark/commit/6fd5f0d03b967d4b91f925859ef9ee7a01b98dfd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12627#issuecomment-218949085 @viirya Can you try to merge my code with yours, and see if the python tests pass? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218949073 Hi, @liancheng and @cloud-fan . This PR is similar with your commit, `[SPARK-13473][SQL] Don't push predicate through project with nondeterministic field(s)`. This PR prevent pushing predicate through project with UDF function expression. Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218948867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58535/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218948865 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218948774 **[Test build #58535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58535/consoleFull)** for PR 13087 at commit [`ab5bc4b`](https://github.com/apache/spark/commit/ab5bc4b77ef2fc40684f680cde296f7d91e29aa2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-218948489 **[Test build #58538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)** for PR 12268 at commit [`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218948335 The reported error scenario is the following. ```scala scala> val df = sc.parallelize(Seq(("a", "b"), ("a1", "b1"))).toDF("old","old1") scala> val udfFunc = udf((s: String) => {println(s"running udf($s)"); s }) scala> val newDF = df.withColumn("new", udfFunc(df("old"))) scala> val filteredOnNewColumnDF = newDF.filter("new <> 'a1'") scala> filteredOnNewColumnDF.show running udf(a) running udf(a) running udf(a1) +---++---+ |old|old1|new| +---++---+ | a| b| a| +---++---+ ``` The result of this PR is like the following. ```scala scala> filteredOnNewColumnDF.show running udf(a1) running udf(a) +---++---+ |old|old1|new| +---++---+ | a| b| a| +---++---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...
Github user zzcclp commented on the pull request: https://github.com/apache/spark/pull/12060#issuecomment-218947787 @markhamstra , thanks for your explaintion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-218946776 @rxin Do you mind if I ask a quick look again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-218946730 ping @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-218946661 Please excuse my ping, @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12060#issuecomment-218945792 @zzcclp Not likely. This PR shouldn't produce any different results, but rather produces the same results faster. We're typically very conservative with patch-level releases, so the optimization work for this PR will almost certainly only appear in the Spark 2.x series. That's not too far off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-218945091 Please excuse my ping @rxin @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218943216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58534/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218943213 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218943100 **[Test build #58534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58534/consoleFull)** for PR 13048 at commit [`0f86ce6`](https://github.com/apache/spark/commit/0f86ce60d69ddc97ced64faa3e354338bb3bbf35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13078#issuecomment-218942530 **[Test build #58537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58537/consoleFull)** for PR 13078 at commit [`9b2a5aa`](https://github.com/apache/spark/commit/9b2a5aa9f88cbb35a4843298a7c295297b4ce378). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...
Github user zzcclp commented on the pull request: https://github.com/apache/spark/pull/12060#issuecomment-218942375 Good PR, will it plan to be merged into branch-1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-218942100 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-218942102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58533/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-218942004 **[Test build #58533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58533/consoleFull)** for PR 12836 at commit [`da7bb2b`](https://github.com/apache/spark/commit/da7bb2be9206ce452429072432cf565f35c8763f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13078#discussion_r63128336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -64,8 +64,9 @@ private[ann] trait Layer extends Serializable { * @return the layer model */ def createModel(initialWeights: BDV[Double]): LayerModel + /** - * Returns the instance of the layer with random generated weights + * Returns the instance of the layer with random generated weights. --- End diff -- This method already have a `@return` text. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13078#discussion_r63128009 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/BreezeUtil.scala --- @@ -55,7 +55,7 @@ private[ann] object BreezeUtil { * @param y y */ def dgemv(alpha: Double, a: BDM[Double], x: BDV[Double], beta: Double, y: BDV[Double]): Unit = { -require(a.cols == x.length, "A & b Dimension mismatch!") +require(a.cols == x.length, "A & x Dimension mismatch!") --- End diff -- Right. I will rename the matrix args to upper and add this missing checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13048#issuecomment-218941040 **[Test build #58536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58536/consoleFull)** for PR 13048 at commit [`6fd5f0d`](https://github.com/apache/spark/commit/6fd5f0d03b967d4b91f925859ef9ee7a01b98dfd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218940532 **[Test build #58535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58535/consoleFull)** for PR 13087 at commit [`ab5bc4b`](https://github.com/apache/spark/commit/ab5bc4b77ef2fc40684f680cde296f7d91e29aa2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 2.0
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/13089#issuecomment-218940318 @ahnqirage please close it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] UDF funtion is not always d...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-218939725 There are several cases which assumes UDF is deterministic. It would be a big change to user. I'll revert the change on ScalaUDF, and update this PR to change optimizer not to duplicate the UDF expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14906][ML] Move VectorUDT and MatrixUDT...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12870#issuecomment-218939778 @mengxr Currently I moved only Vector/Matrix and their UDTs to `pyspark.ml` and made `pyspark.ml` and `pyspark.mllib` codes to use moved `ml.linalg` Vector/Matrix. Besides, `pyspark.mllib` is untouched. Do you mean that we want to keep `pyspark.mllib` codes using `pyspark.mllib.linalg` and let `pyspark.ml` codes using new `pyspark.ml.linalg`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63126914 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +// scalastyle:off println + +import org.apache.spark.{SparkConf, SparkContext} +// $example on$ +import org.apache.spark.ml.clustering.{GaussianMixture, GaussianMixtureModel} --- End diff -- `GaussianMixtureModel` is not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63126939 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +// scalastyle:off println + +import org.apache.spark.{SparkConf, SparkContext} --- End diff -- `{SparkConf, SparkContext}` are not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63126885 --- Diff: examples/src/main/python/ml/gaussian_mixture_example.py --- @@ -0,0 +1,55 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +# $example on$ +from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel --- End diff -- `GaussianMixtureModel` is not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12627#issuecomment-218938821 #12870 moved PySpark Vector/Matrix to ml package. It matches PySpark Vector/Matrix to old Vector/Matrix Scala codes. Since it doesn't touch Scala ML/MLlib codes (using old Vector/Matrix now), it can pass the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12627#issuecomment-218938505 @dbtsai Looks like you only let ML codes use new Vector and Matrix. However, we can't have ML using new Vector/Matrix and MLlib using old Vector/Matrix. The matching between PySpark and Scala will be failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63125817 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +// scalastyle:off println + +import org.apache.spark.{SparkConf, SparkContext} +// $example on$ +import org.apache.spark.ml.clustering.{GaussianMixture, GaussianMixtureModel} +import org.apache.spark.sql.SparkSession +// $example off$ + +/** + * An example demonstrating Gaussian Mixture Model (GMM). + * Run with + * {{{ + * bin/run-example ml.GaussianMixtureExample + * }}} + */ +object GaussianMixtureExample { + def main(args: Array[String]): Unit = { +// Creates a SparkSession +val spark = SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate() + +// $example on$ +// Load data +val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") + +// Trains Gaussian Mixture Model +val gmm = new GaussianMixture() + .setK(2) + .setFeaturesCol("features") --- End diff -- keep the args in line with scala one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63126132 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +// $example on$ +import org.apache.spark.ml.clustering.GaussianMixture; +import org.apache.spark.ml.clustering.GaussianMixtureModel; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +// $example off$ +import org.apache.spark.sql.SparkSession; + + +/** + * An example demonstrating a Gaussian Mixture Model. + * Run with + * + * bin/run-example ml.JavaGaussianMixtureExample + * + */ +public class JavaGaussianMixtureExample { + + public static void main(String[] args) { + +// Parses the arguments +SparkSession spark = SparkSession +.builder() +.appName("JavaGaussianMixtureExample") +.getOrCreate(); + +// $example on$ +// Load data +Dataset dataset = spark.read().format("libsvm").load("data/mllib/sample_kmeans_data.txt"); + +// Trains a GaussianMixture model +GaussianMixture gmm = new GaussianMixture() + .setK(2); +GaussianMixtureModel model = gmm.fit(dataset); + +// Output the parameters of the mixture model +for (int j = 0; j < model.getK(); j++) { --- End diff -- nit, `j` -> 'i' to keep in line with scala example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218937650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58532/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218937648 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13084#issuecomment-218937533 **[Test build #58532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58532/consoleFull)** for PR 13084 at commit [`3248fb5`](https://github.com/apache/spark/commit/3248fb5f10e1ae44328450587044c29eeef21d62). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63125992 --- Diff: examples/src/main/python/ml/gaussian_mixture_example.py --- @@ -0,0 +1,55 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +# $example on$ +from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel +# $example off$ +from pyspark.sql import SparkSession + +""" +A simple example demonstrating a Gaussian Mixture Model (GMM). +Run with: + bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py +""" + +if __name__ == "__main__": +spark = SparkSession\ +.builder\ +.appName("PythonGuassianMixtureExample")\ +.getOrCreate() + +# $example on$ +# load data --- End diff -- Loads --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63125902 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +// scalastyle:off println + +import org.apache.spark.{SparkConf, SparkContext} +// $example on$ +import org.apache.spark.ml.clustering.{GaussianMixture, GaussianMixtureModel} +import org.apache.spark.sql.SparkSession +// $example off$ + +/** + * An example demonstrating Gaussian Mixture Model (GMM). + * Run with + * {{{ + * bin/run-example ml.GaussianMixtureExample + * }}} + */ +object GaussianMixtureExample { + def main(args: Array[String]): Unit = { +// Creates a SparkSession +val spark = SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate() + +// $example on$ +// Load data +val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") + +// Trains Gaussian Mixture Model +val gmm = new GaussianMixture() + .setK(2) + .setFeaturesCol("features") + .setPredictionCol("prediction") + .setTol(0.0001) + .setMaxIter(10) + .setSeed(10) +val model = gmm.fit(dataset) + +// output parameters of max-likelihood model --- End diff -- `max-likelihood model` -> `mixture model` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63125785 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +// scalastyle:off println + +import org.apache.spark.{SparkConf, SparkContext} +// $example on$ +import org.apache.spark.ml.clustering.{GaussianMixture, GaussianMixtureModel} +import org.apache.spark.sql.SparkSession +// $example off$ + +/** + * An example demonstrating Gaussian Mixture Model (GMM). + * Run with + * {{{ + * bin/run-example ml.GaussianMixtureExample + * }}} + */ +object GaussianMixtureExample { + def main(args: Array[String]): Unit = { +// Creates a SparkSession +val spark = SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate() + +// $example on$ +// Load data --- End diff -- Loads --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r63125752 --- Diff: examples/src/main/python/ml/gaussian_mixture_example.py --- @@ -0,0 +1,55 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +# $example on$ +from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel +# $example off$ +from pyspark.sql import SparkSession + +""" +A simple example demonstrating a Gaussian Mixture Model (GMM). +Run with: + bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py +""" + +if __name__ == "__main__": +spark = SparkSession\ +.builder\ +.appName("PythonGuassianMixtureExample")\ +.getOrCreate() + +# $example on$ +# load data +dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") + +gmm = GaussianMixture().setK(2).setSeed(10).setFeaturesCol("features") +model = gmm.fit(dataset) + +print("Gaussians: ") +model.gaussiansDF.show() + +transformed = model.transform(dataset).select("prediction") --- End diff -- keep the output in line with scala example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org