[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17724#discussion_r112804293 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import scala.collection.mutable + +import org.apache.spark.annotation.{DeveloperApi, Experimental, InterfaceStability} +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * :: Experimental :: + * Holder for injection points to the [[SparkSession]]. We make NO guarantee about the stability + * regarding binary compatibility and source compatibility of methods here. + * + * This current provides the following extension points: + * - Analyzer Rules. + * - Check Analysis Rules + * - Optimizer Rules. + * - Planning Strategies. + * - Customized Parser. + * - (External) Catalog listeners. + * + * The extensions can be used by calling withExtension on the [[SparkSession.Builder]], for + * example: + * {{{ + * SparkSession.builder() + * .master("...") + * .conf("...", true) + * .withExtensions { extensions => + * extensions.injectAnalyzerRule { session => --- End diff -- `injectAnalyzerRule` -> `buildResolutionRules`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76054/testReport)** for PR 17725 at commit [`0603686`](https://github.com/apache/spark/commit/06036867d96350a51e180565782ffee1515fbea4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17688 LGTM too. I just quickly checked if there are similar instances but I could not find and I checked R's one and Scala one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17680 gentle ping @liancheng and @davies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17724#discussion_r112803968 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -848,6 +851,17 @@ object SparkSession { } /** + * Inject extensions into the [[SparkSession]]. This allows a user to add Analyzer rules, + * Optimizer rules, Planning Strategies or a customized parser. + * + * @since 2.3.0 --- End diff -- In the JIRA, the target version is 2.2. Do we still plan to backport it to 2.2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17717 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76057/testReport)** for PR 17712 at commit [`dd182e4`](https://github.com/apache/spark/commit/dd182e4f1981305852041debed23324c5f689a47). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17717 **[Test build #76056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76056/testReport)** for PR 17717 at commit [`9b5bdc7`](https://github.com/apache/spark/commit/9b5bdc7199e0e5e3f9b3bf7cbaa79b698e5fe3f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17717#discussion_r112803448 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { .filter($"x1".isNotNull || !$"y".isin("a!")) .count } + + test("SPARK-20430 Initialize Range parameters in a deriver side") { --- End diff -- yea, will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17717#discussion_r112803232 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { .filter($"x1".isNotNull || !$"y".isin("a!")) .count } + + test("SPARK-20430 Initialize Range parameters in a deriver side") { --- End diff -- driver --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17717#discussion_r112803234 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { .filter($"x1".isNotNull || !$"y".isin("a!")) .count } + + test("SPARK-20430 Initialize Range parameters in a deriver side") { --- End diff -- also move this into dataframe range suite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112803151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType case class UserDefinedFunction protected[sql] ( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]]) { +inputTypes: Option[Seq[DataType]], +name: Option[String]) { + + // Optionally used for printing an UDF name in EXPLAIN + def withName(name: String): UserDefinedFunction = { +UserDefinedFunction(f, dataType, inputTypes, Option(name)) + } /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil))) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) + } +} + +object UserDefinedFunction { --- End diff -- for now, I'll revert it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112803097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType case class UserDefinedFunction protected[sql] ( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]]) { +inputTypes: Option[Seq[DataType]], +name: Option[String]) { + + // Optionally used for printing an UDF name in EXPLAIN + def withName(name: String): UserDefinedFunction = { +UserDefinedFunction(f, dataType, inputTypes, Option(name)) + } /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil))) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) + } +} + +object UserDefinedFunction { --- End diff -- ah ok - that sucks. that means this will break compatibility ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76055/testReport)** for PR 17725 at commit [`14b0d72`](https://github.com/apache/spark/commit/14b0d72d5fffb69694a2442ade6399161f99545c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user markgrover commented on the issue: https://github.com/apache/spark/pull/17725 Few decisions that I made here: * I considered if just `sun.java.command` property should be checked and redacted but that seemed to specific and likely a bandaid to the current problem, not a long-term solution, so decided against doing it. * Redaction for the `SparkListenerEnvironmentUpdate` event was solely being done on `Spark Properties`, while `sun.java.command` is a part of `System Properties`. I considered doing redaction for `System Properties` in addition to `Spark Properties` (that would have gone somewhere around [here](https://github.com/apache/spark/pull/17725/files#diff-e4a5a68c15eed95d038acfed84b0b66aL258)) but decided against it because that would have even more hardcoding and I didn't see why these 2 special kinds of properties are special enough to be redacted but the rest of them. So, decided to redact information from all kinds of properties. * One way to redact the property value would have been to redact the minimum possible set from the value while keeping the rest of the value intact. For example, if the following were the unredacted case: `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password --conf spark.other.property=2"` One option for the redacted output could have been: `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=*(redacted) --conf spark.other.property=2"` However, such a redaction is very hard to maintain. For example, we would had to take the current regex (which is `(?i)secret|password` by default and add matchers to it like so `(?i)secret|password` like `"("+SECRET_REDACTION_DEFAULT+"[^ ]*=)[^ ]*"`. That would allow us to squeeze out and replaced just the matched portion. But this all seemed very fragile and even worse when the user supplies a non-default regex so I decided it was easiest to simply replace the entire value, even though only a small part of it contained `secret` or `password` in it. * One thing which I didn't explicitly check was the performance implications of this change. The reason I bring this up is because, previously we were comparing keys with a regex, now if the key doesn't match, we match the value with the regex. So, in the worst case, we are twice as many regex matches as before. Also, before we were simply doing regex matching on `Spark Properties`, now we do them on all properties - `Spark Properties`, `System Properties`, `JVM Properties` and `Classpath Properties`. I don't think this should have a big performance impact so I didn't invest time in it, mentioning here in interest of full disclosure. Thanks in advance for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76054/testReport)** for PR 17725 at commit [`0603686`](https://github.com/apache/spark/commit/06036867d96350a51e180565782ffee1515fbea4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17693: [SPARK-16548][SQL] Inconsistent error handling in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17693#discussion_r112801838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -149,7 +149,8 @@ case class GetJsonObject(json: Expression, path: Expression) if (parsed.isDefined) { try { -Utils.tryWithResource(jsonFactory.createParser(jsonStr.getBytes)) { parser => +Utils.tryWithResource(jsonFactory.createParser(new InputStreamReader( --- End diff -- please add some comments to say that, this is to avoid a bug in encoding detection, and we explicitly specify the encoding(UTF8) here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17693 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76053/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76053/testReport)** for PR 17712 at commit [`8800c3b`](https://github.com/apache/spark/commit/8800c3b15048bad5926e2b2ed280b042fa5c9d47). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76051/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76053/testReport)** for PR 17712 at commit [`8800c3b`](https://github.com/apache/spark/commit/8800c3b15048bad5926e2b2ed280b042fa5c9d47). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76051/testReport)** for PR 17725 at commit [`2f5148a`](https://github.com/apache/spark/commit/2f5148a2e37d8d36006fb297b28a9e8c21a0026b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112801078 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType case class UserDefinedFunction protected[sql] ( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]]) { +inputTypes: Option[Seq[DataType]], +name: Option[String]) { + + // Optionally used for printing an UDF name in EXPLAIN + def withName(name: String): UserDefinedFunction = { +UserDefinedFunction(f, dataType, inputTypes, Option(name)) + } /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil))) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) + } +} + +object UserDefinedFunction { --- End diff -- oh, it seems we couldn't add `unapply` there because: ``` [error] /Users/maropu/IdeaProjects/spark/spark-master/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:45: method unapply is defined twic e [error] conflicting symbols both originated in file '/Users/maropu/IdeaProjects/spark/spark-master/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFu nction.scala' [error] case class UserDefinedFunction protected[sql] ( [error]^ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17672 Will be probably cleaner --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112800829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType case class UserDefinedFunction protected[sql] ( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]]) { +inputTypes: Option[Seq[DataType]], +name: Option[String]) { + + // Optionally used for printing an UDF name in EXPLAIN + def withName(name: String): UserDefinedFunction = { +UserDefinedFunction(f, dataType, inputTypes, Option(name)) + } /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil))) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) + } +} + +object UserDefinedFunction { --- End diff -- ok. Is it okay to update the MiMa file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112800640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType case class UserDefinedFunction protected[sql] ( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]]) { +inputTypes: Option[Seq[DataType]], +name: Option[String]) { + + // Optionally used for printing an UDF name in EXPLAIN + def withName(name: String): UserDefinedFunction = { +UserDefinedFunction(f, dataType, inputTypes, Option(name)) + } /** * Returns an expression that invokes the UDF, using the given arguments. * * @since 1.3.0 */ def apply(exprs: Column*): Column = { -Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil))) +Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil), name)) + } +} + +object UserDefinedFunction { --- End diff -- also need an unapply function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17724 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76049/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17724 **[Test build #76049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76049/testReport)** for PR 17724 at commit [`105962a`](https://github.com/apache/spark/commit/105962a4e22e7eb7a668bc0793e7a22965a7a041). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76052/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76052/testReport)** for PR 17712 at commit [`96bc89d`](https://github.com/apache/spark/commit/96bc89d49456bcd00d863950dc0da1271153d186). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76052/testReport)** for PR 17712 at commit [`96bc89d`](https://github.com/apache/spark/commit/96bc89d49456bcd00d863950dc0da1271153d186). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17672 Yeah, I have this feeling that it could be deliberate, but I cannot figure out what is the purpose. Removing `@exports` should be enough, shouldn't it? I thought about cleaning this up, but I wonder if it is better to wait for [SPARK-16693](https://issues.apache.org/jira/browse/SPARK-16693). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76051/testReport)** for PR 17725 at commit [`2f5148a`](https://github.com/apache/spark/commit/2f5148a2e37d8d36006fb297b28a9e8c21a0026b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...
GitHub user markgrover opened a pull request: https://github.com/apache/spark/pull/17725 [SPARK-20435][CORE] More thorough redaction of sensitive information This change does a more thorough redaction of sensitive information from logs and UI Add unit tests that ensure that no regressions happen that leak sensitive information to the logs. Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well. ## How was this patch tested? New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markgrover/spark spark-20435 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17725 commit 2f5148a2e37d8d36006fb297b28a9e8c21a0026b Author: Mark Grover Date: 2017-04-22T00:24:30Z [SPARK-20435][CORE] More thorough redaction of sensitive information from logs/UI, more unit tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17723 cc @vanzin @jerryshao @skonto BTW @vanzin, I decided to parameterize `HadoopFSCredentialProvider` with a new `HadoopAccessManager` object, for which YARN provides a custom `YARNHadoopAccessManager`. I did this instead of conditioning on `SparkHadoopUtil.get.isYarnMode`, since I prefer functional parameterization over global values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76050/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76050/testReport)** for PR 17712 at commit [`5d797a9`](https://github.com/apache/spark/commit/5d797a97fa936a4534ea381f6aa3cfd5545310ce). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17712 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76047/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76047/testReport)** for PR 17723 at commit [`d6d21d1`](https://github.com/apache/spark/commit/d6d21d165a451ce7a285baa98387cbf341fb4739). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17672 Not really its just inconsistent handling. Some comment changes can be deliberated though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17712 **[Test build #76050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76050/testReport)** for PR 17712 at commit [`5d797a9`](https://github.com/apache/spark/commit/5d797a97fa936a4534ea381f6aa3cfd5545310ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17672 BTW @felixcheung - is there any deeper reason behind current stat of `generics.R`? I mean: - Inconsistent usage of standard and `roxygen` comments. - Marking functions which are not to be exported with `@export`. - Slightly mixed up order (both in groups and between groups). - Some minor inconsistencies (like marking `asc` as `@rdname columnfunctions`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17719: [SPARK-20431][SQL] Specify a schema by using a DDL-forma...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17719 cc: @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17717 cc: @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112795922 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,12 +47,20 @@ case class UserDefinedFunction protected[sql] ( dataType: DataType, inputTypes: Option[Seq[DataType]]) { + // Optionally used for printing UDF names in EXPLAIN + private var nameOption: Option[String] = None --- End diff -- okay, I'll recheck --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17724 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17724 **[Test build #76049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76049/testReport)** for PR 17724 at commit [`105962a`](https://github.com/apache/spark/commit/105962a4e22e7eb7a668bc0793e7a22965a7a041). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17648 I was saying rather than implementing them, just rewrite them into an aggregate on the conditions and compare them against the value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434] Move kerberos delegation token code...
Github user mgummelt commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r112788867 --- Diff: core/pom.xml --- @@ -357,6 +357,34 @@ org.apache.commons commons-crypto + + + + ${hive.group} + hive-exec --- End diff -- I still don't know how to place these in the `test` scope, which is where they belong. See my comment here: https://github.com/apache/spark/pull/17665/files#r112337820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17713 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17713 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76043/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17713 **[Test build #76043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76043/testReport)** for PR 17713 at commit [`39e8cf7`](https://github.com/apache/spark/commit/39e8cf752f5bd3325edbb93e69ee09b92026242f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17724 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17724 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76048/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17724 **[Test build #76048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76048/testReport)** for PR 17724 at commit [`c83b4ee`](https://github.com/apache/spark/commit/c83b4ee67b9cf4506cc8ce1ce449055463b1bda9). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `.doc(\"Name of the class used to configure Spark Session extensions. The class should \" +` * `class SparkSessionExtensions ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user yangyangyyy commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r112783447 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -39,8 +39,8 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Se * @param ord the implicit ordering for T * @return an RDD that contains the top k values for each key */ - def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = { -self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))( + def topByKey(num: Int, bucketsCount: Int = 200)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = { --- End diff -- @HyukjinKwon yes , updated that way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17724 **[Test build #76048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76048/testReport)** for PR 17724 at commit [`c83b4ee`](https://github.com/apache/spark/commit/c83b4ee67b9cf4506cc8ce1ce449055463b1bda9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/17724 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/17724 [SPARK-18127] Add hooks and extension points to Spark ## What changes were proposed in this pull request? This patch adds support for customizing the spark session by injecting user-defined custom extensions. This allows a user to add custom analyzer rules/checks, optimizer rules, planning strategies or even a customized parser. ## How was this patch tested? Unit Tests in SparkSessionExtensionSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark session-extensions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17724 commit c83b4ee67b9cf4506cc8ce1ce449055463b1bda9 Author: Sameer Agarwal Date: 2017-04-13T21:58:53Z Add SparkSessionExtensions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76047/testReport)** for PR 17723 at commit [`d6d21d1`](https://github.com/apache/spark/commit/d6d21d165a451ce7a285baa98387cbf341fb4739). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user ewasserman commented on the issue: https://github.com/apache/spark/pull/17693 Reverted from use of toString on the org.apache.spark.unsafe.types.UTF8String by running the byte array through a java.io.Reader. This still fixes the bug and is also more efficient on the JSON parser side so it is a net performance win as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76046/testReport)** for PR 17723 at commit [`a546aab`](https://github.com/apache/spark/commit/a546aab923520ccec7683c3b320a5b92dedc3f1e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76046/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76046/testReport)** for PR 17723 at commit [`a546aab`](https://github.com/apache/spark/commit/a546aab923520ccec7683c3b320a5b92dedc3f1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76045/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76045/testReport)** for PR 17723 at commit [`e15f1ab`](https://github.com/apache/spark/commit/e15f1abcd708d32d863523135ba9fe8690ba2d9c). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76045/testReport)** for PR 17723 at commit [`e15f1ab`](https://github.com/apache/spark/commit/e15f1abcd708d32d863523135ba9fe8690ba2d9c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76044/testReport)** for PR 17723 at commit [`ad4e33b`](https://github.com/apache/spark/commit/ad4e33b9f379538ddcbdb9468f4bb39cafc46057). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76044/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #76044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76044/testReport)** for PR 17723 at commit [`ad4e33b`](https://github.com/apache/spark/commit/ad4e33b9f379538ddcbdb9468f4bb39cafc46057). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434] Move kerberos delegation token code...
GitHub user mgummelt opened a pull request: https://github.com/apache/spark/pull/17723 [SPARK-20434] Move kerberos delegation token code from yarn to core ## What changes were proposed in this pull request? Move kerberos delegation token code from yarn to core, so that other schedulers (such as Mesos), may use it. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark SPARK-20434-refactor-kerberos Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17723 commit ce63a9b6399176b8fa2c59c1004d796ef77c3d71 Author: Dr. Stefan Schimanski Date: 2016-02-10T17:09:46Z [Mesosphere SPARK-126] Move YarnSparkHadoopUtil token helpers into the generic SparkHadoopUtil class commit 75d849a494519a5af97bf22df7676b336746ac92 Author: Dr. Stefan Schimanski Date: 2016-02-10T17:11:20Z [Mesosphere SPARK-126] Add Mesos Kerberos support commit 35002f2bd2e906bf1c6e6800f1f346e962edca75 Author: Michael Gummelt Date: 2017-04-17T22:31:25Z Par down kerberos support commit 13981c8fe7934a8cee53be4cfd59fb14c8d9b07c Author: Michael Gummelt Date: 2017-04-17T22:57:51Z cleanup commit af4a3e4f53509ee1bee714d0846518d2696e0800 Author: Michael Gummelt Date: 2017-04-17T23:14:05Z style commit 5cc66dc91e7684c582b08a84b4901541dd60e38b Author: Michael Gummelt Date: 2017-04-18T00:27:28Z Add MesosSecurityManager commit a47c9c04f61dce38f64e291c66793742239761b7 Author: Michael Gummelt Date: 2017-04-18T00:43:18Z info logs commit c8ec0496ca1c12e5eb43c530f08cb033a7c862fa Author: Michael Gummelt Date: 2017-04-18T20:24:11Z style commit 954eeffda336bbbf6d5a588a38c95f092ecf1679 Author: Michael Gummelt Date: 2017-04-18T21:34:14Z Re-add org.apache.spark.deploy.yarn.security.ServiceCredentialProvider for backwards compatibility commit 2d769287edd2ac6867e9696798c116fdf9165411 Author: Michael Gummelt Date: 2017-04-18T21:43:56Z move YARNHadoopFSCredentialProviderSuite commit d8a968d66c577cc702d00e980c968a57c3f12565 Author: Michael Gummelt Date: 2017-04-19T17:35:03Z Move hive test deps to the core module commit b8093c863ce9af3eadc3fd2b371e1bafe4cf4a47 Author: Michael Gummelt Date: 2017-04-19T22:10:25Z remove test scope commit 25d508823d238d905b102196962f39900b5c526a Author: Michael Gummelt Date: 2017-04-19T22:50:10Z remove test scope commit 4c387ebcb584732d0d67e83c0b9d5f4cfd1db247 Author: Michael Gummelt Date: 2017-04-20T22:15:51Z Removed MesosSecurityManager, added RPC call, removed META-INF ServiceCredentialProvider from core commit e32afeeac95883138751c060a3ebfaf309e3d22f Author: Michael Gummelt Date: 2017-04-20T22:17:37Z add InterfaceStability annotation to ServiceCredentialProvider commit be69f5a639caad0abadafcae471e71847fc9f935 Author: Michael Gummelt Date: 2017-04-21T01:00:52Z Add HadoopAccessManager commit 55616da9f0fd15f1594233b5fe43b04ef1c901c8 Author: Michael Gummelt Date: 2017-04-21T19:28:43Z Remove mesos code commit 240df317dd42584349a3c4a0bf6f7d78a4fbe0e6 Author: Michael Gummelt Date: 2017-04-21T19:38:07Z re-add mistakenly removed files commit 810c6b26e3830e0f4e08e66df2d6a6f50cc65c7b Author: Michael Gummelt Date: 2017-04-21T20:14:16Z test ConfigurableCredentialManager.obtainUserTokens commit ad4e33b9f379538ddcbdb9468f4bb39cafc46057 Author: Michael Gummelt Date: 2017-04-21T21:03:41Z add tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user ptkool commented on the issue: https://github.com/apache/spark/pull/17648 @rxin I'm not sure where you're going with your proposal. These are aggregate functions, not scalar functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user vundela commented on the issue: https://github.com/apache/spark/pull/17688 @holdenk Thanks for the review. Can you please let me know the line number where you are expecting list of types missing. Is this for fillna or other API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17683: [SPARK-20386][Spark Core]modify the log info if the bloc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17683 **[Test build #3672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3672/testReport)** for PR 17683 at commit [`664dfb8`](https://github.com/apache/spark/commit/664dfb8848c38826886430700bfa926116ad28bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17688 We should also update the list of types a few lines up while we are fixing this. thanks a lot for catching this @vundela --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...
Github user vundela commented on the issue: https://github.com/apache/spark/pull/17694 Filed a PR for fixing the issue in spark1.6 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112764788 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator( private var lastSeqNumber: String = null private var internalIterator: Iterator[Record] = null + // variable for kinesis wait time interval between next retry + private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs( +Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")} --- End diff -- This complexity isn't necessary. You can achieve the same effect by using an alternate form of [```SparkConf.get()```](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkConf@get(key:String,defaultValue:String):String): ```scala private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs( sparkConf.get("spark.streaming.kinesis.retry.waitTime", MIN_RETRY_WAIT_TIME_MS)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112764350 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -112,7 +116,8 @@ class KinesisBackedBlockRDD[T: ClassTag]( val credentials = kinesisCreds.provider.getCredentials partition.seqNumberRanges.ranges.iterator.flatMap { range => new KinesisSequenceRangeIterator(credentials, endpointUrl, regionName, - range, retryTimeoutMs).map(messageHandler) + range, retryTimeoutMs, sparkConf +).map(messageHandler) --- End diff -- *nit:* Move this to end of previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112765111 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -17,21 +17,24 @@ package org.apache.spark.streaming.kinesis -import scala.collection.JavaConverters._ -import scala.reflect.ClassTag -import scala.util.control.NonFatal - -import com.amazonaws.auth.{AWSCredentials, DefaultAWSCredentialsProviderChain} +import com.amazonaws.auth.AWSCredentials import com.amazonaws.services.kinesis.AmazonKinesisClient import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord import com.amazonaws.services.kinesis.model._ - import org.apache.spark._ import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils import org.apache.spark.rdd.{BlockRDD, BlockRDDPartition} import org.apache.spark.storage.BlockId import org.apache.spark.util.NextIterator +import scala.collection.JavaConverters._ --- End diff -- Why change the ordering of this import group? I don't think this is consistent with the scalastyle for this project. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112765206 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -17,21 +17,24 @@ package org.apache.spark.streaming.kinesis -import scala.collection.JavaConverters._ -import scala.reflect.ClassTag -import scala.util.control.NonFatal - -import com.amazonaws.auth.{AWSCredentials, DefaultAWSCredentialsProviderChain} +import com.amazonaws.auth.AWSCredentials import com.amazonaws.services.kinesis.AmazonKinesisClient import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord import com.amazonaws.services.kinesis.model._ - --- End diff -- I think this newline should be kept to be consistent with the project's scalastyle. Have you been running style checks when testing this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112766374 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator( private var lastSeqNumber: String = null private var internalIterator: Iterator[Record] = null + // variable for kinesis wait time interval between next retry + private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs( +Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")} --- End diff -- It may also be useful to declare these keys as public constants in a sensible location such as the [companion object to ```KinesisInputDStream```](https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala#L84), e.g.: ```scala object KinesisInputDStream { ... /** * Relevant doc */ val RETRY_WAIT_TIME_KEY = "spark.streaming.kinesis.retry.waitTime" /** * Relevant doc */ val RETRY_MAX_ATTEMPTS_KEY = "spark.streaming.kinesis.retry.maxAttempts" ... ``` This will make things a little less brittle for users who want to dynamically fill in SparkConf values in their apps. You would also be able use these constants in unit tests here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112766633 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala --- @@ -101,6 +101,37 @@ abstract class KinesisBackedBlockRDDTests(aggregateTestData: Boolean) } } + testIfEnabled("Basic reading from Kinesis with modified configurations") { +// Add Kinesis retry configurations +sc.conf.set("spark.streaming.kinesis.retry.waitTime", "1000ms") +sc.conf.set("spark.streaming.kinesis.retry.maxAttempts", "5") + +// Verify all data using multiple ranges in a single RDD partition +val receivedData1 = new KinesisBackedBlockRDD[Array[Byte]](sc, testUtils.regionName, + testUtils.endpointUrl, fakeBlockIds(1), + Array(SequenceNumberRanges(allRanges.toArray)), + sparkConf = sc.getConf).map { bytes => new String(bytes).toInt }.collect() +assert(receivedData1.toSet === testData.toSet) + +// Verify all data using one range in each of the multiple RDD partitions +val receivedData2 = new KinesisBackedBlockRDD[Array[Byte]](sc, testUtils.regionName, + testUtils.endpointUrl, fakeBlockIds(allRanges.size), + allRanges.map { range => SequenceNumberRanges(Array(range)) }.toArray, + sparkConf = sc.getConf).map { bytes => new String(bytes).toInt }.collect() +assert(receivedData2.toSet === testData.toSet) + +// Verify ordering within each partition +val receivedData3 = new KinesisBackedBlockRDD[Array[Byte]](sc, testUtils.regionName, + testUtils.endpointUrl, fakeBlockIds(allRanges.size), + allRanges.map { range => SequenceNumberRanges(Array(range)) }.toArray, + sparkConf = sc.getConf +).map { bytes => new String(bytes).toInt }.collectPartitions() --- End diff -- *nit:* move this to the end of previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112764808 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator( private var lastSeqNumber: String = null private var internalIterator: Iterator[Record] = null + // variable for kinesis wait time interval between next retry + private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs( +Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")} + .getOrElse(MIN_RETRY_WAIT_TIME_MS) + ) + + // variable for kinesis max retry attempts + private val kinesisMaxRetries = +Try {sparkConf.get("spark.streaming.kinesis.retry.maxAttempts")} --- End diff -- See above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112764344 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -83,7 +86,8 @@ class KinesisBackedBlockRDD[T: ClassTag]( @transient private val isBlockIdValid: Array[Boolean] = Array.empty, val retryTimeoutMs: Int = 1, val messageHandler: Record => T = KinesisInputDStream.defaultMessageHandler _, -val kinesisCreds: SparkAWSCredentials = DefaultCredentials +val kinesisCreds: SparkAWSCredentials = DefaultCredentials, +val sparkConf: SparkConf = new SparkConf() --- End diff -- Why does this need to be provided as a constructor parameter? You'll want to use the global ```SparkConf``` for the context via ```sc.getConf```. To avoid bringing ```sc``` into the serialized closure for the ```compute()``` method and raising an exception you can alias it as a private field in this class: ```scala private val sparkConf: SparkConf = sc.getConf ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17713 **[Test build #76043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76043/testReport)** for PR 17713 at commit [`39e8cf7`](https://github.com/apache/spark/commit/39e8cf752f5bd3325edbb93e69ee09b92026242f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17672 Thanks @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76042/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17688 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org