[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217789379 **[Test build #58123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58123/consoleFull)** for PR 12871 at commit [`b4f4926`](https://github.com/apache/spark/commit/b4f49263e8a1ba1019daf828f1a3d18ae251ca54). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217788838 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15216] [SQL] Add a new Dataset API expl...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12992#issuecomment-217788397 @gatorsmile we already have such API, see: https://github.com/apache/spark/blob/8dc3987d095ae01ad80c89b8f052f231e0807990/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217788305 Hive is not case-preserving, and so does `HiveExternalCatalog`. When we save table `myTable` to hive, a dir named `mytable` is created, instead of `myTable` as I expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217787634 The PR looks good. Let's resolve the conflicts and get it in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12871#issuecomment-217787572 btw, what's the reason of having https://github.com/apache/spark/pull/12871/commits/aefade3924b52ab05f26d9a8af4f63555e243b24? (where do we turn the string to its lower case form?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12871#discussion_r62454941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -73,6 +79,8 @@ class InMemoryCatalog extends ExternalCatalog { } } + private val fs = FileSystem.get(new Configuration) --- End diff -- I think it is better to use `sparkContext.hadoopConfiguration` (we can access sparkContext in SharedState). We can use `new Configuration` as a default value of the hadoop conf to avoid of changing those tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62454855 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -617,6 +618,46 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Propagate foldable expressions: + * Replace all attributes with aliases of the original foldable expressions except Union queries. + * Aliases and ordinal expressions are the main target to be transformed after propagation. Other + * optimizations will take advantage of the propagated foldable expressions. + * + * {{{ + * SELECT 1.0 x, 'abc' y, Now() z ORDER BY x, y, 3 + * ==> SELECT 1.0 x, 'abc' y, Now() z ORDER BY 1.0, 'abc', Now() + * }}} + */ +object FoldablePropagation extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = { +val isFoldableStatement = plan.find { + case _: Union => true + case _: Command => true + case _ => false +}.isEmpty + +val foldableExprSet = ExpressionSet(plan.flatMap { + case Project(projectList, _) => projectList.collect { +case a: Alias if a.resolved && a.child.foldable => a + } + case _ => Nil +}) + +if (!isFoldableStatement || foldableExprSet.isEmpty) { --- End diff -- So we won't do this optimization if there exists a un-foldable plan(`Union` and `Command`) in the plan tree? I think we can make it more fine-grained, e.g. we can still apply this optimization for the children of `Union`. A possible approach is: find one foldable project, collect foldable expressions, transformUp the plan tree and replace attributes until we reach `Union` or `Command` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62454620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- No problem. Just let me know if anything I can help. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12949#discussion_r62454544 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -239,8 +239,13 @@ case class DataSource( } } - /** Create a resolved [[BaseRelation]] that can be used to read data from this [[DataSource]] */ - def resolveRelation(): BaseRelation = { + /** + * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this + * [[DataSource]] + * + * @param checkPathExist A flag to indicate whether to check the existence of path or not. + */ + def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { --- End diff -- Anything will break if we do not have this flag? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62454308 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- Thanks for the investigation! I guess we won't support this feature in 2.0, but need @yhuai to confirm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14495][SQL][1.6] fix resolution failure...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/12974#discussion_r62453423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala --- @@ -123,15 +119,7 @@ case class DistinctAggregationRewriter(conf: CatalystConf) extends Rule[LogicalP .filter(_.isDistinct) .groupBy(_.aggregateFunction.children.toSet) -val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) { --- End diff -- This flag is for the purpose of benchmarking the performance of single distinct aggregation by `DistinctAggregationRewriter`. The default value is false, which means `DistinctAggregationRewriter` will not be used for a single distinct case. I see 2.0 has removed this flag, so i guess the decision has been made. If it is still needed for 1.6, I can add it back, which will involves more change in Optimizer to take the CatalystConf. Please let me know. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][PySpark] update _shared_params_code_ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12996#issuecomment-217782285 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][PySpark] update _shared_params_code_ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12996#issuecomment-217782287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58122/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][PySpark] update _shared_params_code_ge...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12996#issuecomment-217782241 **[Test build #58122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58122/consoleFull)** for PR 12996 at commit [`52591d8`](https://github.com/apache/spark/commit/52591d88eeb7a3e928641ef26c52aa3edf58cce1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][PySpark] update _shared_params_code_ge...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12996#issuecomment-217781272 **[Test build #58122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58122/consoleFull)** for PR 12996 at commit [`52591d8`](https://github.com/apache/spark/commit/52591d88eeb7a3e928641ef26c52aa3edf58cce1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][PySpark] update _shared_params_code_ge...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/12996 [MINOR][PySpark] update _shared_params_code_gen.py ## What changes were proposed in this pull request? 1, add arg-checkings for `tol` and `stepSize` to keep in line with `SharedParamsCodeGen.scala` 2, fix one typo ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark py_args_checking Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12996 commit 52591d88eeb7a3e928641ef26c52aa3edf58cce1 Author: Zheng RuiFeng Date: 2016-05-09T05:31:54Z add two arg-checkings and fix one typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-217780844 **[Test build #58120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58120/consoleFull)** for PR 12719 at commit [`f3132bc`](https://github.com/apache/spark/commit/f3132bc54d15fb61f06b476a93c13ee74f44753b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12655#issuecomment-217780852 **[Test build #58121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58121/consoleFull)** for PR 12655 at commit [`55d6b6d`](https://github.com/apache/spark/commit/55d6b6db26aba0054c0da87d66404a68385ad3ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-217780806 Made a pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12655#issuecomment-217780501 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58119/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12655#issuecomment-217780498 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62452057 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala --- @@ -98,6 +98,7 @@ class StopWordsRemoverSuite .setInputCol("raw") .setOutputCol("filtered") .setStopWords(stopWords) + .setLocale("tr") --- End diff -- Maybe something more specific to test that Locale setter is working. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12655#issuecomment-217780496 **[Test build #58119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58119/consoleFull)** for PR 12655 at commit [`b4e2eb1`](https://github.com/apache/spark/commit/b4e2eb1557dedc34b9a57b371e11ade2693ff38a). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12655#issuecomment-217780312 **[Test build #58119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58119/consoleFull)** for PR 12655 at commit [`b4e2eb1`](https://github.com/apache/spark/commit/b4e2eb1557dedc34b9a57b371e11ade2693ff38a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62451961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- Overwriting a built-in function is missing in the current implementation of the current `FunctionRegistry`. Should we support it in this release? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62451866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- Yeah, you can overwrite it by creating a temporary function, but you are unable to drop it. For example, ``` hive> drop function lower; FAILED: SemanticException [Error 10301]: Cannot drop native function lower hive> CREATE TEMPORARY FUNCTION lower AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; OK Time taken: 0.005 seconds hive> select lower('a'); OK A Time taken: 0.068 seconds, Fetched: 1 row(s) hive> drop temporary function lower; OK Time taken: 0.01 seconds hive> select lower('a'); OK a Time taken: 0.057 seconds, Fetched: 1 row(s) hive> drop function lower; FAILED: SemanticException [Error 10301]: Cannot drop native function lower ``` In this example, I overwrite the built-in function `lower` by creating the same name temporary function by the implementation of `upper`. Obviously, when I called `lower`, the actual logic is `upper`. That means, we can overwrite the built-in functions. After I dropping the temporary function, the built-in function becomes active again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") --- End diff -- Comparing with EN, it perhaps better to use Locale.default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12995#issuecomment-217779134 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58118/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12995#issuecomment-217779133 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451393 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) --- End diff -- Add parameter check here or in transformSchema, to help detect error before pipeline executes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12995#issuecomment-217779047 **[Test build #58118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58118/consoleFull)** for PR 12995 at commit [`ea9fd47`](https://github.com/apache/spark/commit/ea9fd475d29989b977bca42fd87bb7545e3ffbb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451347 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") --- End diff -- Shall we list what're the available options, or provide some reference here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12935#issuecomment-21835 I'll update it after https://github.com/apache/spark/pull/12949 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-21713 Agree. We need to be careful for deciding the design. This PR is just to recover our previous behavior in `HiveContext`. Regarding case sensitivity, it is complicated and platform/vender-specific. Below is based on my search. It might not be 100% correct. - For the un-quoted identifiers, the SQL2003 compliance and DB2 is No. Oracle and SQL Server are configurable, but the default is No. - For the quoted/delimited identifiers, most traditional RDBMS are case sensitive. Hive is special. Starting from Hive 1.3, Hive supports quoted identifiers in Column names. https://issues.apache.org/jira/browse/HIVE-6013 However, this is not applicable to the Table/Database names in Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15202][SPARKR] add dapplyCollect() meth...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/12989#issuecomment-21755 cc @shivaram, @felixcheung , @NarineK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62450948 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -90,6 +90,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: CatalystConf) CombineUnions, // Constant folding and strength reduction NullPropagation, + FoldablePropagation, + CleanupAliases, --- End diff -- Oh, I see. I will update like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12719#discussion_r62450837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -90,6 +90,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: CatalystConf) CombineUnions, // Constant folding and strength reduction NullPropagation, + FoldablePropagation, + CleanupAliases, --- End diff -- I think a simpler way is just calling `CleanupAliases.execute(result)` at end of `FoldablePropagation`, instead of putting it into optimizer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14495][SQL][1.6] fix resolution failure...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12974#discussion_r62450742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala --- @@ -123,15 +119,7 @@ case class DistinctAggregationRewriter(conf: CatalystConf) extends Rule[LogicalP .filter(_.isDistinct) .groupBy(_.aggregateFunction.children.toSet) -val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) { --- End diff -- is this flag still useful in 1.6? cc @davies @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62450713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- Does hive allow users to overwrite built-in function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15184] [SQL] Fix Silent Removal of An E...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12959 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15184] [SQL] Fix Silent Removal of An E...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12959#issuecomment-217776816 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217775281 @davies As you suggested, I moved most of the implmentation from ```WholeStageCodegenExec``` to ```InMemoryTableScanExec```. Now, the number of changed lines in ```WholeStageCodegenExec.scala``` is about 30 lines. Since it is hard to pass an instance of ```InMemoryTableScanExec``` to ```InputAdaptor```, I newly introduced ```object InMemoryTableScanExec```. Could you please review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217773874 I think we need to discuss it more: 1. should we allow the case sensitivity to be configurable? It's sometimes out of our control like hive catalog, which is always case insensitive 2. except case sensitivity, should we also include the concept of case-preserving for external catalog? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62449958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- I also think what you said is valid. If so, I think we also need to provide users a way to recover the built-in function, if they dropped it for any purpose. This PR is just to make our behavior consistent with Hive and the mainstream RDBMS. Normally, we do not allow users to drop the built-in functions. I am fine if we allow users to drop the built-in functions. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15185] [SQL] InMemoryCatalog: Silent Re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12960 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15185] [SQL] InMemoryCatalog: Silent Re...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12960#issuecomment-217774838 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15199] [SQL] Disallow Dropping Build-in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12975#discussion_r62449710 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -157,6 +157,9 @@ case class DropFunction( throw new AnalysisException(s"Specifying a database in DROP TEMPORARY FUNCTION " + s"is not allowed: '${databaseName.get}'") } + if (FunctionRegistry.builtin.functionExists(functionName)) { --- End diff -- what if users overwrite a built-in function and drop it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15080][CORE] Break copyAndReset into co...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12936#discussion_r62449624 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -291,11 +291,20 @@ private[spark] object TaskMetrics extends Logging { private[spark] class BlockStatusesAccumulator extends AccumulatorV2[(BlockId, BlockStatus), Seq[(BlockId, BlockStatus)]] { - private[this] var _seq = ArrayBuffer.empty[(BlockId, BlockStatus)] + private var _seq = ArrayBuffer.empty[(BlockId, BlockStatus)] override def isZero(): Boolean = _seq.isEmpty - override def copyAndReset(): BlockStatusesAccumulator = new BlockStatusesAccumulator + override def copy(): BlockStatusesAccumulator = { --- End diff -- the `copyAndReset` version is much cheaper than calling `copy` and then `reset`, how about we just add a new `reset` method? cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217773268 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58117/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217773265 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217773178 **[Test build #58117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58117/consoleFull)** for PR 11956 at commit [`0bc23f8`](https://github.com/apache/spark/commit/0bc23f8934366b7a3eaa899d38fbbd571f5b051a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/12995 [SPARK-15209] Fix display of job descriptions with single quotes in web UI timeline ## What changes were proposed in this pull request? This patch fixes an escaping bug in the Web UI's event timeline that caused Javascript errors when displaying timeline entries whose descriptions include single quotes. The original bug can be reproduced by running ```scala sc.setJobDescription("double quote: \" ") sc.parallelize(1 to 10).count() sc.setJobDescription("single quote: ' ") sc.parallelize(1 to 10).count() ``` and then browsing to the driver UI. Previously, this resulted in an "Uncaught SyntaxError" because the single quote from the description was not escaped and ended up closing a Javascript string literal too early. The fix implemented here is to change the relevant Javascript to define its string literals using double-quotes. Our escaping logic already properly escapes double quotes in the description, so this is safe to do. ## How was this patch tested? Tested manually in `spark-shell` using the above example. /cc @sarutak for review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-15209 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12995.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12995 commit ea9fd475d29989b977bca42fd87bb7545e3ffbb6 Author: Josh Rosen Date: 2016-05-09T03:28:04Z Fix SPARK-15209 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12995#issuecomment-217769812 **[Test build #58118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58118/consoleFull)** for PR 12995 at commit [`ea9fd47`](https://github.com/apache/spark/commit/ea9fd475d29989b977bca42fd87bb7545e3ffbb6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217770263 cc @cloud-fan @rxin @yhuai @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15209] Fix display of job descriptions ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12995#issuecomment-217769825 /cc @andrewor14 as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15187] [SQL] Disallow Dropping Default ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/12962#issuecomment-217770405 @cloud-fan https://github.com/apache/spark/pull/12993 resolves the issue you mentioned above. Will change this PR too for resolving the issues of database names by calling `formatDatabaseName` Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13232][YARN] Fix executor node label
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11129#issuecomment-217768853 >now, maximally devious would be to catch the exception and downgrade Maybe we could do this in Spark side, though a little complicated but doable. Yes it is hard to test label related things in Spark side, at least we could manually verify it locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12956#issuecomment-217768690 cc @davies This is ready for review. Please take a look of this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15211][SQL] Select features column from...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12986#issuecomment-217768128 ping @liancheng @yhuai Please take a look of this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13064] Make sure attemptId not none for...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/12075#issuecomment-217768123 @srowen , currently we assume attempt id is `None` when spark application is running on yarn client mode. This assumption is used not only in REST api, but also in history server and yarn extension services. 1. Only changing here will break the consistency for other parts as I mentioned before. 2. Even if we address all the parts related to attempt id, still we may break the backward compatibility, especially for event log file name. So IMHO I suggest not to change the behavior of attempt id unless we have a sufficient reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217766571 **[Test build #58117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58117/consoleFull)** for PR 11956 at commit [`0bc23f8`](https://github.com/apache/spark/commit/0bc23f8934366b7a3eaa899d38fbbd571f5b051a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217765823 +1 for @sureshthalamati #12921 handles the inconsistent behaviour and this is why I think we should hold off this until that PR is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user sureshthalamati commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217765505 I am not sure what was the history behind returning empty String for null value. In my opinion it should be null be default. current behavior is also inconsistent; for numerics it will return null and for strings it will return empty string by default. Example: See the Year (int), and comment (String in the following data). year,make,model,comment,price 2017,Tesla,Mode 3,looks nice.,35000.99 ,Chevy,Bolt,,29000.00 2015,Porsche,"",, scala> val df= sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("/tmp/test1.csv") df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more fields] scala> df.show ++---+--+---++ |year| make| model|comment| price| ++---+--+---++ |2017| Tesla|Mode 3|looks nice.|35000.99| |null| Chevy| Bolt| | 29000.0| |2015|Porsche| null| |null| ++---+--+---++ I can update this PR to change the nullValue default if needed, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14963][Yarn] Using recoveryPath if NM r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12994#issuecomment-217764236 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14963][Yarn] Using recoveryPath if NM r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12994#issuecomment-217764237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58116/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14963][Yarn] Using recoveryPath if NM r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12994#issuecomment-217764184 **[Test build #58116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58116/consoleFull)** for PR 12994 at commit [`08557bf`](https://github.com/apache/spark/commit/08557bf4e8c47d8114af8188fc90822cc622942b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14963][Yarn] Using recoveryPath if NM r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12994#issuecomment-217763069 **[Test build #58116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58116/consoleFull)** for PR 12994 at commit [`08557bf`](https://github.com/apache/spark/commit/08557bf4e8c47d8114af8188fc90822cc622942b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14963][Yarn] Using recoveryPath if NM r...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/12994 [SPARK-14963][Yarn] Using recoveryPath if NM recovery is enabled ## What changes were proposed in this pull request? From Hadoop 2.5+, Yarn NM supports NM recovery which using recovery path for auxiliary services such as spark_shuffle, mapreduce_shuffle. So here change to use this path install of NM local dir if NM recovery is enabled. ## How was this patch tested? Unit test + local test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-14963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12994.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12994 commit 08557bf4e8c47d8114af8188fc90822cc622942b Author: jerryshao Date: 2016-05-09T02:10:29Z Using recoveryPath if NM recovery is enabled --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217762674 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217762675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58114/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217762595 **[Test build #58114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58114/consoleFull)** for PR 12993 at commit [`d7d96c3`](https://github.com/apache/spark/commit/d7d96c34fde79d7078b27733f553deda6bb39fd4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r62444274 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -555,4 +558,37 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { assert(numbers.count() == 8) } + + test("load data with empty quoted string fields.") { +val results = sqlContext + .read + .format("csv") + .options(Map( +"header" -> "true", +"nullValue" -> null, --- End diff -- If nullValue is not set it will return empty string for null values by default. The reason I set it explicitly is to make sure my fix is working. Before my fix it was retruning null for the empty quoted string , and empty string for null values by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217760415 **[Test build #58115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58115/consoleFull)** for PR 11956 at commit [`dedca14`](https://github.com/apache/spark/commit/dedca14126696e8b496c12a103368a0d2f1472c1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217760423 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58115/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217760422 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14098][SQL] Generate Java code that get...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11956#issuecomment-217760139 **[Test build #58115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58115/consoleFull)** for PR 11956 at commit [`dedca14`](https://github.com/apache/spark/commit/dedca14126696e8b496c12a103368a0d2f1472c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12993#issuecomment-217757584 **[Test build #58114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58114/consoleFull)** for PR 12993 at commit [`d7d96c3`](https://github.com/apache/spark/commit/d7d96c34fde79d7078b27733f553deda6bb39fd4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-217757354 Hi @marmbrus , Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/12993 [SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState What changes were proposed in this pull request? In a `HiveSessionState`, which is a given `SparkSession` backed by Hive, the analysis should not be case sensitive because the underlying Hive Metastore is case insensitive. For example, ```SQL CREATE TABLE tab1 (C1 int); SELECT C1 FROM tab1 ``` In the current implementation, we will get the following error because the column name is always stored in lower case. ``` cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7 org.apache.spark.sql.AnalysisException: cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7 ``` This PR is to always use case insensitive analysis in `HiveSessionState`, no matter whether users set `spark.sql.caseSensitive` to true or false. How was this patch tested? Added the related test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark caseSensitive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12993.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12993 commit d7d96c34fde79d7078b27733f553deda6bb39fd4 Author: gatorsmile Date: 2016-05-09T00:35:43Z case insensitive in Hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13382][DOCS][PYSPARK] Update pyspark te...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/11278#issuecomment-217755544 ping @JoshRosen ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-217755535 Updated the classification models that do the mixing in based on the current inheritance in Scala side. I can follow up with more regression changes if no one takes over updating regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15130][PySpark][ML][DOCS] pyspark expos...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12914#issuecomment-217755456 Any more ideas on if this is something we want (cc @davies ?)? This one only does shared params so I'd like to follow it up for the non-shared params as well. I think having the default values in the API docs is pretty useful (we include them in the Scaladoc for a reason). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12919#discussion_r62441488 --- Diff: python/pyspark/ml/regression.py --- @@ -743,6 +743,18 @@ def treeWeights(self): """Return the weights for each tree""" return list(self._call_java("javaTreeWeights")) --- End diff -- I've switched to to be `DecisionTreeRegressionModel` in the base --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12987#issuecomment-217755229 I think this option should be associated with `ignoreLeadingWhiteSpace` and `ignoreTrailingWhiteSpace` options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12987#issuecomment-217754950 Also, the JIRA in the title, `spark-15212` might better be `SPARK-15212` (See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12987#discussion_r62441323 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala --- @@ -61,7 +61,7 @@ class DefaultSource extends FileFormat with DataSourceRegister { val firstRow = new LineCsvReader(csvOptions).parseLine(firstLine) val header = if (csvOptions.headerFlag) { - firstRow + firstRow.map{_.trim} --- End diff -- (Style nit) ```scala firstRow.map(_.trim) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15187] [SQL] Disallow Dropping Default ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12962#discussion_r62441163 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -118,6 +118,9 @@ class SessionCatalog( } def dropDatabase(db: String, ignoreIfNotExists: Boolean, cascade: Boolean): Unit = { +if (db == "default") { --- End diff -- Let me submit a PR to fix it. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12850#issuecomment-217751647 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12850#issuecomment-217751648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58113/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12850#issuecomment-217751614 **[Test build #58113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58113/consoleFull)** for PR 12850 at commit [`3802255`](https://github.com/apache/spark/commit/3802255328481261949976913dfd2df6248a9bb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15216] [SQL] Add a new Dataset API expl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12992#issuecomment-217750373 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15216] [SQL] Add a new Dataset API expl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12992#issuecomment-217750374 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58111/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15216] [SQL] Add a new Dataset API expl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12992#issuecomment-217750329 **[Test build #58111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58111/consoleFull)** for PR 12992 at commit [`f716b10`](https://github.com/apache/spark/commit/f716b106ede5f090c12e261d64ac5fb4ae0af8a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15207][BUILD] Use Travis CI for Java/Sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12980#issuecomment-217749784 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15207][BUILD] Use Travis CI for Java/Sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12980#issuecomment-217749785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58110/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15058][MLLIB][TEST] Enable Java Decisio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12840#issuecomment-217749775 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15058][MLLIB][TEST] Enable Java Decisio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12840#issuecomment-217749756 **[Test build #58112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58112/consoleFull)** for PR 12840 at commit [`37325a7`](https://github.com/apache/spark/commit/37325a7237618689533b07ed60dbf605e9dd00b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15058][MLLIB][TEST] Enable Java Decisio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12840#issuecomment-217749777 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58112/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15207][BUILD] Use Travis CI for Java/Sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12980#issuecomment-217749731 **[Test build #58110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58110/consoleFull)** for PR 12980 at commit [`690ca52`](https://github.com/apache/spark/commit/690ca526701f7ea26c2136eaca03dbaa756f15c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org