[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14049 **[Test build #61932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61932/consoleFull)** for PR 14049 at commit [`6705a38`](https://github.com/apache/spark/commit/6705a3861483ded60a1659b9045c111f06e1e0e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14096 Yeah we should just call it with empty columns (instead of all the columns) and let the Scala side do the appropriate thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14093: SPARK-16420: Ensure compression streams are closed.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14093 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14093: SPARK-16420: Ensure compression streams are closed.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61926/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14093: SPARK-16420: Ensure compression streams are closed.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14093 **[Test build #61926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61926/consoleFull)** for PR 14093 at commit [`601f934`](https://github.com/apache/spark/commit/601f934372922b3b68424d3ef5a3cc81fd0f4500). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 FYI, here is the result of Scala example. ```scala scala> val df = spark.read.json("examples/src/main/resources/people.json") scala> df.withColumn("boolean", lit(true)).show() ++---+---+ | age| name|boolean| ++---+---+ |null|Michael| true| | 30| Andy| true| | 19| Justin| true| ++---+---+ scala> df.withColumn("boolean", lit(true)).describe().show() +---+--+ |summary| age| +---+--+ | count| 2| | mean| 24.5| | stddev|7.7781745930520225| |min|19| |max|30| +---+--+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14095 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61929/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14095 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14095 **[Test build #61929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61929/consoleFull)** for PR 14095 at commit [`df2edd7`](https://github.com/apache/spark/commit/df2edd730216e659dbcebdcbda61dd67fbcf8d55). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 I mean `colList <- as.list(c(columns(x)))`. We should not do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13984: [SPARK-16310][SPARKR] R na.string-like default fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13984 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 Oh, I see your point. The difference occurs at `all column retrieval` of SparkR. We can make this consistently with Scala/Python by removing `all column retrieval`. That would be more simpler! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14082: [SPARK-16381][SQL][SparkR] Update SQL examples and progr...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14082 I'll take a look at this today. Also cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 At here. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1922 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 Currently, Scala/Python already do column-type checking for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13984: [SPARK-16310][SPARKR] R na.string-like default for csv s...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13984 LGTM. Merging this to master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13984: [SPARK-16310][SPARKR] R na.string-like default fo...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13984#discussion_r69997940 --- Diff: R/pkg/R/SQLContext.R --- @@ -744,6 +747,9 @@ read.df.default <- function(path = NULL, source = NULL, schema = NULL, ...) { if (is.null(source)) { source <- getDefaultSqlSource() } + if (source == "csv" && is.null(options[["nullValue"]])) { --- End diff -- Yeah this is the more conservative option - I guess thats fine for now and we can revisit this if required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 This failure happens only in SparkR because SparkR blindly try for every columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14096 **[Test build #61933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61933/consoleFull)** for PR 14096 at commit [`f0bd1d6`](https://github.com/apache/spark/commit/f0bd1d63f5aa4b1ad812a083563409308fab3d42). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14066: [MINOR] [BUILD] Download Maven 3.3.9 instead of 3.3.3 be...
Github user lresende commented on the issue: https://github.com/apache/spark/pull/14066 The issue here is that releases keep getting archived when new releases comes up. For old releases (or by default) we could use https://archive.apache.org/dist/maven/maven-3/, which is always available but then put a little more load on the "Apache Infrastructure". If you think we should move to use archive, I could provide a patch... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14096 I'm not sure this is something we should be fixing just on R frontend. What happens when we run the query from Scala / Python ? If we get the same error we should be fixing it in Scala ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14071: [SPARK-16397][SQL] make CatalogTable more general...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14071#discussion_r69997365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -403,17 +400,18 @@ object CreateDataSourceTableUtils extends Logging { assert(partitionColumns.isEmpty) assert(relation.partitionSchema.isEmpty) + var storage = CatalogStorageFormat( +locationUri = None, --- End diff -- Any reason why this `locationUri` is set to `None`? It sounds like the original value is `Some(relation.location.paths.map(_.toUri.toString).head` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14096: [SPARK-16425][R] `describe()` should consider numeric/st...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14096 Hi, @shivaram . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14096: [SPARK-16425][R] `describe()` should consider num...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14096 [SPARK-16425][R] `describe()` should consider numeric/string-type columns ## What changes were proposed in this pull request? This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with not-numeric or non-string columns. This failure happens only in `SparkR`. **Before** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) 16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due to data type mismatch: function average requires numeric types, not BooleanType; ``` **After** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) SparkDataFrame[summary:string, eruptions:string, waiting:string] ``` ## How was this patch tested? Pass the Jenkins with a updated testcase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-16425 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14096.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14096 commit f0bd1d63f5aa4b1ad812a083563409308fab3d42 Author: Dongjoon Hyun Date: 2016-07-07T21:57:59Z [SPARK-16425][R] `describe()` should consider numeric/string-type columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14094 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61927/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14094 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14094 **[Test build #61927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61927/consoleFull)** for PR 14094 at commit [`ddd9426`](https://github.com/apache/spark/commit/ddd9426281e743af205f2a3f56be3535cd584b2d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61924/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14022 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14022 **[Test build #61924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61924/consoleFull)** for PR 14022 at commit [`392bddc`](https://github.com/apache/spark/commit/392bddc57eaefb09c73902ea041f05705d9498aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61931/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14079 **[Test build #61931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61931/consoleFull)** for PR 14079 at commit [`cf58374`](https://github.com/apache/spark/commit/cf5837410818dae093ef15617cb42336a14408db). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14049 **[Test build #61932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61932/consoleFull)** for PR 14049 at commit [`6705a38`](https://github.com/apache/spark/commit/6705a3861483ded60a1659b9045c111f06e1e0e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69992687 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ServiceTokenProvider.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.Token + +import org.apache.spark.SparkConf + +/** + * An interface to provide tokens for service, any service wants to communicate with Spark + * through token way needs to implement this interface and register into + * [[ConfigurableTokenManager]] through configurations. + */ +trait ServiceTokenProvider { + + /** + * Name of the ServiceTokenProvider, should be unique. Using this to distinguish different + * service. + */ + def serviceName: String + + /** + * Used to indicate whether a token is required. + */ + def isTokenRequired(conf: Configuration): Boolean = { +UserGroupInformation.isSecurityEnabled + } + + /** + * Obtain tokens from this service, tokens will be added into Credentials and return as array. + */ + def obtainTokensFromService( --- End diff -- If you follow Tom's suggestion and turn this into a generic "obtainCredentials" method, then you could potentially merge it with `getTimeFromNowToRenewal` too. e.g. the provider is responsible for adding the tokens to the `Credentials` object, and it returns when it should be called again to renew those tokens (or obtain new credentials). One less method in the interface! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14092 **[Test build #3169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3169/consoleFull)** for PR 14092 at commit [`b4b02bf`](https://github.com/apache/spark/commit/b4b02bf3879daf9a4532b61a019ea33b0f3ff835). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14079 **[Test build #61931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61931/consoleFull)** for PR 14079 at commit [`cf58374`](https://github.com/apache/spark/commit/cf5837410818dae093ef15617cb42336a14408db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14095 **[Test build #61930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61930/consoleFull)** for PR 14095 at commit [`b6673cb`](https://github.com/apache/spark/commit/b6673cb9ba1e9b5095ceaee8343aac08cc9aea5c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14095 Thank you for fast review, @rxin . I updated it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user squito commented on the issue: https://github.com/apache/spark/pull/14079 I took another look at having BlacklistTracker just be an option, rather than having a NoopBlacklist. After some other cleanup, I decided it made more sense to go back to the option, but its in one commit so easy to go either way https://github.com/apache/spark/pull/14079/commits/a34e9aeb695958c749d306595d1adebe0207fdf9 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14092 **[Test build #3168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3168/consoleFull)** for PR 14092 at commit [`b4b02bf`](https://github.com/apache/spark/commit/b4b02bf3879daf9a4532b61a019ea33b0f3ff835). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14095: [SPARK-16429][SQL] Include `StringType` columns i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14095#discussion_r69991546 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -228,6 +228,15 @@ class Dataset[T] private[sql]( } } + private[sql] def aggregatableColumns: Seq[Expression] = { --- End diff -- That would be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69991485 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ServiceTokenProvider.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.Token + +import org.apache.spark.SparkConf + +/** + * An interface to provide tokens for service, any service wants to communicate with Spark + * through token way needs to implement this interface and register into + * [[ConfigurableTokenManager]] through configurations. + */ +trait ServiceTokenProvider { + + /** + * Name of the ServiceTokenProvider, should be unique. Using this to distinguish different + * service. + */ + def serviceName: String + + /** + * Used to indicate whether a token is required. + */ + def isTokenRequired(conf: Configuration): Boolean = { +UserGroupInformation.isSecurityEnabled + } + + /** + * Obtain tokens from this service, tokens will be added into Credentials and return as array. + */ + def obtainTokensFromService( + sparkConf: SparkConf, + serviceConf: Configuration, + creds: Credentials): Array[Token[_]] +} + +/** + * An interface for service in which token can be renewable, any [[ServiceTokenProvider]] in which + * token can be renewable should also implement this interface, Spark's internal time-based + * token renewal mechanism will invoke the methods to update the tokens periodically. + */ +trait ServiceTokenRenewable { + + /** + * Get the token renewal interval from this service. This renewal interval will be used in + * periodical token renewal mechanism. + */ + def getTokenRenewalInterval(sparkConf: SparkConf, serviceConf: Configuration): Long + + /** + * Get the time length from now to next renewal. + */ + def getTimeFromNowToRenewal( --- End diff -- You only really need this method in the interface, right? The token provider should know what info it needs to calculate this value. It might not event need `getTokenRenewalInterval` for that (HDFS does, but then that logic should live inside the HDFS provider). At that point, you could just merge both interfaces and have this method return an `Option` (None = no renewal necessary), or some magic value (e.g. `-1`) to indicate no renewal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69990872 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ServiceTokenProvider.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.Token + +import org.apache.spark.SparkConf + +/** + * An interface to provide tokens for service, any service wants to communicate with Spark + * through token way needs to implement this interface and register into + * [[ConfigurableTokenManager]] through configurations. + */ +trait ServiceTokenProvider { + + /** + * Name of the ServiceTokenProvider, should be unique. Using this to distinguish different + * service. + */ + def serviceName: String + + /** + * Used to indicate whether a token is required. + */ + def isTokenRequired(conf: Configuration): Boolean = { +UserGroupInformation.isSecurityEnabled + } + + /** + * Obtain tokens from this service, tokens will be added into Credentials and return as array. + */ + def obtainTokensFromService( + sparkConf: SparkConf, + serviceConf: Configuration, --- End diff -- Note the name here is misleading. It won't be the service's conf, but really a `YarnConfiguration`. Note how both Hive and HBase providers have to load their own configuration to be able to see service-specific settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69990265 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ConfigurableTokenManager.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.token.Token + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * A [[ConfigurableTokenManager]] to manage all the token providers register in this class. Also + * it provides other modules the functionality to obtain tokens, get token renewal interval and + * calculate the time length till next renewal. + * + * By default ConfigurableTokenManager has 3 built-in token providers, HDFSTokenProvider, + * HiveTokenProvider and HBaseTokenProvider, and this 3 token providers can also be controlled + * by configuration spark.yarn.security.tokens.{service}.enabled, if it is set to false, this + * provider will not be loaded. + * + * For other token providers which need to be loaded in should: + * 1. Implement [[ServiceTokenProvider]] or [[ServiceTokenRenewable]] if token renewal is + * required for this service. + * 2. set spark.yarn.security.tokens.{service}.enabled to true + * 3. Specify the class name through spark.yarn.security.tokens.{service}.class + * + */ +final class ConfigurableTokenManager private[yarn] (sparkConf: SparkConf) extends Logging { + private val tokenProviderEnabledConfig = "spark\\.yarn\\.security\\.tokens\\.(.+)\\.enabled".r + private val tokenProviderClsConfig = "spark.yarn.security.tokens.%s.class" + + // Maintain all the registered token providers + private val tokenProviders = mutable.HashMap[String, ServiceTokenProvider]() + + private val defaultTokenProviders = Map( +"hdfs" -> "org.apache.spark.deploy.yarn.token.HDFSTokenProvider", +"hive" -> "org.apache.spark.deploy.yarn.token.HiveTokenProvider", +"hbase" -> "org.apache.spark.deploy.yarn.token.HBaseTokenProvider" + ) + + // AMDelegationTokenRenewer, this will only be create and started in the AM + private var _delegationTokenRenewer: AMDelegationTokenRenewer = null + + // ExecutorDelegationTokenUpdater, this will only be created and started in the driver and + // executor side. + private var _delegationTokenUpdater: ExecutorDelegationTokenUpdater = null + + def initialize(): Unit = { +// Copy SparkConf and add default enabled token provider configurations to SparkConf. +val clonedConf = sparkConf.clone +defaultTokenProviders.keys.foreach { key => + clonedConf.setIfMissing(s"spark.yarn.security.tokens.$key.enabled", "true") +} + +// Instantialize all the service token providers according to the configurations. +clonedConf.getAll.filter { case (key, value) => + if (tokenProviderEnabledConfig.findPrefixOf(key).isDefined) { +value.toBoolean + } else { +false + } +}.map { case (key, _) => + val tokenProviderEnabledConfig(service) = key + val cls = sparkConf.getOption(tokenProviderClsConfig.format(service)) +.orElse(defaultTokenProviders.get(service)) + (service, cls) +}.foreach { case (service, cls) => + if (cls.isDefined) { +try { + val tokenProvider = + Utils.classForName(cls.get).newInstance().asInstanceOf[ServiceTokenProvider] + tokenProviders += (service -> tokenProvider) +} catch { + case NonFatal(e) => +logWarning(s"Fail to instantiate class ${cls.get}", e) +}
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69989978 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,18 +113,34 @@ protected void handleMessage( client.getClientId(), NettyUtils.getRemoteAddress(client.getChannel())); callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer()); + transferBlockRate.mark(totalBlockSize / 1024 / 1024); + responseDelayContext.stop(); } else if (msgObj instanceof RegisterExecutor) { + final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time(); RegisterExecutor msg = (RegisterExecutor) msgObj; checkAuth(client, msg.appId); blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo); callback.onSuccess(ByteBuffer.wrap(new byte[0])); + responseDelayContext.stop(); } else { throw new UnsupportedOperationException("Unexpected message: " + msgObj); } } + public MetricSet getAllMetrics() { +return metrics; + } + + public long getRegisteredExecutorsSize() { +return blockManager.getRegisteredExecutorsSize(); + } + + public long getTotalShuffleRequests() { +return timeDelayForOpenBlockRequest.getCount() + timeDelayForOpenBlockRequest.getCount(); --- End diff -- Btw I don't think you need this metric, the client can easily derive it from the other metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69989897 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ConfigurableTokenManager.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.token.Token + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * A [[ConfigurableTokenManager]] to manage all the token providers register in this class. Also + * it provides other modules the functionality to obtain tokens, get token renewal interval and + * calculate the time length till next renewal. + * + * By default ConfigurableTokenManager has 3 built-in token providers, HDFSTokenProvider, + * HiveTokenProvider and HBaseTokenProvider, and this 3 token providers can also be controlled + * by configuration spark.yarn.security.tokens.{service}.enabled, if it is set to false, this + * provider will not be loaded. + * + * For other token providers which need to be loaded in should: + * 1. Implement [[ServiceTokenProvider]] or [[ServiceTokenRenewable]] if token renewal is + * required for this service. + * 2. set spark.yarn.security.tokens.{service}.enabled to true + * 3. Specify the class name through spark.yarn.security.tokens.{service}.class + * + */ +final class ConfigurableTokenManager private[yarn] (sparkConf: SparkConf) extends Logging { + private val tokenProviderEnabledConfig = "spark\\.yarn\\.security\\.tokens\\.(.+)\\.enabled".r + private val tokenProviderClsConfig = "spark.yarn.security.tokens.%s.class" + + // Maintain all the registered token providers + private val tokenProviders = mutable.HashMap[String, ServiceTokenProvider]() + + private val defaultTokenProviders = Map( +"hdfs" -> "org.apache.spark.deploy.yarn.token.HDFSTokenProvider", +"hive" -> "org.apache.spark.deploy.yarn.token.HiveTokenProvider", +"hbase" -> "org.apache.spark.deploy.yarn.token.HBaseTokenProvider" + ) + + // AMDelegationTokenRenewer, this will only be create and started in the AM + private var _delegationTokenRenewer: AMDelegationTokenRenewer = null + + // ExecutorDelegationTokenUpdater, this will only be created and started in the driver and + // executor side. + private var _delegationTokenUpdater: ExecutorDelegationTokenUpdater = null + + def initialize(): Unit = { +// Copy SparkConf and add default enabled token provider configurations to SparkConf. +val clonedConf = sparkConf.clone +defaultTokenProviders.keys.foreach { key => + clonedConf.setIfMissing(s"spark.yarn.security.tokens.$key.enabled", "true") +} + +// Instantialize all the service token providers according to the configurations. +clonedConf.getAll.filter { case (key, value) => + if (tokenProviderEnabledConfig.findPrefixOf(key).isDefined) { +value.toBoolean + } else { +false + } +}.map { case (key, _) => + val tokenProviderEnabledConfig(service) = key + val cls = sparkConf.getOption(tokenProviderClsConfig.format(service)) +.orElse(defaultTokenProviders.get(service)) + (service, cls) +}.foreach { case (service, cls) => + if (cls.isDefined) { +try { + val tokenProvider = + Utils.classForName(cls.get).newInstance().asInstanceOf[ServiceTokenProvider] + tokenProviders += (service -> tokenProvider) +} catch { + case NonFatal(e) => +logWarning(s"Fail to instantiate class ${cls.get}", e) +}
[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14080 Thanks for adding these metrics. Could you also add some unit tests to sanity check these metrics are recorded as expected, e.g. as in https://github.com/apache/spark/pull/13934/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13984: [SPARK-16310][SPARKR] R na.string-like default fo...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13984#discussion_r69989660 --- Diff: R/pkg/R/SQLContext.R --- @@ -744,6 +747,9 @@ read.df.default <- function(path = NULL, source = NULL, schema = NULL, ...) { if (is.null(source)) { source <- getDefaultSqlSource() } + if (source == "csv" && is.null(options[["nullValue"]])) { --- End diff -- Possibly. I wonder if we should conservative - since data source API is extensible - perhaps a new source `nullValue` could cause an unexpected behavior change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69989547 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ConfigurableTokenManager.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.token.Token + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * A [[ConfigurableTokenManager]] to manage all the token providers register in this class. Also + * it provides other modules the functionality to obtain tokens, get token renewal interval and + * calculate the time length till next renewal. + * + * By default ConfigurableTokenManager has 3 built-in token providers, HDFSTokenProvider, + * HiveTokenProvider and HBaseTokenProvider, and this 3 token providers can also be controlled + * by configuration spark.yarn.security.tokens.{service}.enabled, if it is set to false, this + * provider will not be loaded. + * + * For other token providers which need to be loaded in should: + * 1. Implement [[ServiceTokenProvider]] or [[ServiceTokenRenewable]] if token renewal is + * required for this service. + * 2. set spark.yarn.security.tokens.{service}.enabled to true + * 3. Specify the class name through spark.yarn.security.tokens.{service}.class + * + */ +final class ConfigurableTokenManager private[yarn] (sparkConf: SparkConf) extends Logging { + private val tokenProviderEnabledConfig = "spark\\.yarn\\.security\\.tokens\\.(.+)\\.enabled".r + private val tokenProviderClsConfig = "spark.yarn.security.tokens.%s.class" + + // Maintain all the registered token providers + private val tokenProviders = mutable.HashMap[String, ServiceTokenProvider]() + + private val defaultTokenProviders = Map( +"hdfs" -> "org.apache.spark.deploy.yarn.token.HDFSTokenProvider", +"hive" -> "org.apache.spark.deploy.yarn.token.HiveTokenProvider", +"hbase" -> "org.apache.spark.deploy.yarn.token.HBaseTokenProvider" + ) + + // AMDelegationTokenRenewer, this will only be create and started in the AM + private var _delegationTokenRenewer: AMDelegationTokenRenewer = null + + // ExecutorDelegationTokenUpdater, this will only be created and started in the driver and + // executor side. + private var _delegationTokenUpdater: ExecutorDelegationTokenUpdater = null + + def initialize(): Unit = { --- End diff -- A lot of this method would go away by using `java.util.ServiceLoader`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69989502 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) { } } + /** + * A simple class to wrap all shuffle service wrapper metrics + */ + private class ShuffleMetrics implements MetricSet { +private final Map allMetrics; +private final Timer timeDelayForOpenBlockRequest = new Timer(); +private final Timer timeDelayForRegisterExecutorRequest = new Timer(); +private final Meter transferBlockRate = new Meter(); --- End diff -- Can you add comments describing the metrics and their units? e.g. bytes, milliseconds Also consider renaming them for clarity, I think `openBlockRequestLatencyMillis`, `registerExecutorRequestLatencyMillis`, `blockTransferRateBytes` would be more clear to the reader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69989212 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/token/ConfigurableTokenManager.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.token + +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.token.Token + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * A [[ConfigurableTokenManager]] to manage all the token providers register in this class. Also + * it provides other modules the functionality to obtain tokens, get token renewal interval and + * calculate the time length till next renewal. + * + * By default ConfigurableTokenManager has 3 built-in token providers, HDFSTokenProvider, + * HiveTokenProvider and HBaseTokenProvider, and this 3 token providers can also be controlled + * by configuration spark.yarn.security.tokens.{service}.enabled, if it is set to false, this + * provider will not be loaded. + * + * For other token providers which need to be loaded in should: + * 1. Implement [[ServiceTokenProvider]] or [[ServiceTokenRenewable]] if token renewal is + * required for this service. + * 2. set spark.yarn.security.tokens.{service}.enabled to true + * 3. Specify the class name through spark.yarn.security.tokens.{service}.class + * + */ +final class ConfigurableTokenManager private[yarn] (sparkConf: SparkConf) extends Logging { + private val tokenProviderEnabledConfig = "spark\\.yarn\\.security\\.tokens\\.(.+)\\.enabled".r + private val tokenProviderClsConfig = "spark.yarn.security.tokens.%s.class" + + // Maintain all the registered token providers + private val tokenProviders = mutable.HashMap[String, ServiceTokenProvider]() + + private val defaultTokenProviders = Map( --- End diff -- I'd rather use `java.util.ServiceLoader` for this. You'll need something like that at some point anyway, to support other token providers. Doing that now has the extra benefit of using the same code for built-in and third party providers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13993: [SPARK-16144][SPARKR] update R API doc for mllib
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13993#discussion_r69989200 --- Diff: R/pkg/R/mllib.R --- @@ -53,26 +53,27 @@ setClass("AFTSurvivalRegressionModel", representation(jobj = "jobj")) #' @note KMeansModel since 2.0.0 setClass("KMeansModel", representation(jobj = "jobj")) -#' Saves the machine learning model to the input path +#' Saves the MLlib model to the input path #' -#' Saves the machine learning model to the input path. For more information, see the specific -#' machine learning model below. +#' Saves the MLlib model to the input path. For more information, see the specific +#' MLlib model below. #' @rdname write.ml #' @name write.ml #' @export -#' @seealso \link{spark.glm}, \link{spark.kmeans}, \link{spark.naiveBayes}, \link{spark.survreg} +#' @seealso \link{spark.glm}, \link{glm} +#' @seealso \link{spark.kmeans}, \link{spark.naiveBayes}, \link{spark.survreg} #' @seealso \link{read.ml} NULL -#' Predicted values based on a machine learning model +#' Makes predictions from a MLlib model #' -#' Predicted values based on a machine learning model. For more information, see the specific -#' machine learning model below. +#' Makes predictions from a MLlib model. For more information, see the specific --- End diff -- Similarly, here, the plural form is the convention. Please see eg. https://github.com/apache/spark/pull/13993/files#diff-7ede1519b4a56647801b51af33c2dd18R81 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13993: [SPARK-16144][SPARKR] update R API doc for mllib
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13993#discussion_r69989003 --- Diff: R/pkg/R/mllib.R --- @@ -53,26 +53,27 @@ setClass("AFTSurvivalRegressionModel", representation(jobj = "jobj")) #' @note KMeansModel since 2.0.0 setClass("KMeansModel", representation(jobj = "jobj")) -#' Saves the machine learning model to the input path +#' Saves the MLlib model to the input path --- End diff -- I think the convention that has been suggested is that we have the page title being the same first sentence of the description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69988867 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -96,237 +87,19 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil { if (credentials != null) credentials.getSecretKey(new Text(key)) else null } - /** - * Get the list of namenodes the user may access. - */ - def getNameNodesToAccess(sparkConf: SparkConf): Set[Path] = { -sparkConf.get(NAMENODES_TO_ACCESS) - .map(new Path(_)) - .toSet - } - - def getTokenRenewer(conf: Configuration): String = { -val delegTokenRenewer = Master.getMasterPrincipal(conf) -logDebug("delegation token renewer is: " + delegTokenRenewer) -if (delegTokenRenewer == null || delegTokenRenewer.length() == 0) { - val errorMessage = "Can't get Master Kerberos principal for use as renewer" - logError(errorMessage) - throw new SparkException(errorMessage) -} -delegTokenRenewer - } - - /** - * Obtains tokens for the namenodes passed in and adds them to the credentials. - */ - def obtainTokensForNamenodes( -paths: Set[Path], -conf: Configuration, -creds: Credentials, -renewer: Option[String] = None - ): Unit = { -if (UserGroupInformation.isSecurityEnabled()) { - val delegTokenRenewer = renewer.getOrElse(getTokenRenewer(conf)) - paths.foreach { dst => -val dstFs = dst.getFileSystem(conf) -logInfo("getting token for namenode: " + dst) -dstFs.addDelegationTokens(delegTokenRenewer, creds) - } -} - } - - /** - * Obtains token for the Hive metastore and adds them to the credentials. - */ - def obtainTokenForHiveMetastore( - sparkConf: SparkConf, - conf: Configuration, - credentials: Credentials) { -if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) { - YarnSparkHadoopUtil.get.obtainTokenForHiveMetastore(conf).foreach { -credentials.addToken(new Text("hive.server2.delegation.token"), _) - } -} - } - - /** - * Obtain a security token for HBase. - */ - def obtainTokenForHBase( - sparkConf: SparkConf, - conf: Configuration, - credentials: Credentials): Unit = { -if (shouldGetTokens(sparkConf, "hbase") && UserGroupInformation.isSecurityEnabled) { - YarnSparkHadoopUtil.get.obtainTokenForHBase(conf).foreach { token => -credentials.addToken(token.getService, token) -logInfo("Added HBase security token to credentials.") - } -} - } - - /** - * Return whether delegation tokens should be retrieved for the given service when security is - * enabled. By default, tokens are retrieved, but that behavior can be changed by setting - * a service-specific configuration. - */ - private def shouldGetTokens(conf: SparkConf, service: String): Boolean = { -conf.getBoolean(s"spark.yarn.security.tokens.${service}.enabled", true) - } - private[spark] override def startExecutorDelegationTokenRenewer(sparkConf: SparkConf): Unit = { -tokenRenewer = Some(new ExecutorDelegationTokenUpdater(sparkConf, conf)) -tokenRenewer.get.updateCredentialsIfRequired() +configurableTokenManager(sparkConf).delegationTokenUpdater(conf) --- End diff -- I find this syntax a little confusing. You're calling `configurableTokenManager(sparkConf)` in a bunch of different places. To me that looks like either: - each call is creating a new token manager - there's some cache of token managers somewhere keyed by the spark configuration passed here Neither sounds good to me. And the actual implementation is actually neither: there's a single token manager singleton that is instantiated in the first call to `configurableTokenManager`. Why doesn't `Client` instantiate a token manager in its constructor instead? Another option is to have an explicit method in `ConfigurableTokenManager` to initialize the singleton, although I'm not a fan of singletons in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14095: [SPARK-16429][SQL] Include `StringType` columns i...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14095#discussion_r69988758 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -228,6 +228,15 @@ class Dataset[T] private[sql]( } } + private[sql] def aggregatableColumns: Seq[Expression] = { --- End diff -- private rather than private sql? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69988593 --- Diff: core/src/main/scala/org/apache/spark/deploy/ExternalShuffleServiceSource.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import javax.annotation.concurrent.ThreadSafe + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source +import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler + +/** + * Provides metrics source for external shuffle service + */ +@ThreadSafe +private class ExternalShuffleServiceSource +(blockHandler: ExternalShuffleBlockHandler) extends Source { + override val metricRegistry = new MetricRegistry() + override val sourceName = "shuffleService" + + metricRegistry.registerAll(blockHandler.getAllMetrics) + + metricRegistry.register(MetricRegistry.name("registeredExecutorsSize"), --- End diff -- Rather than creating these metrics externally here, consider putting them inside the `metricSet`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69988464 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,18 +113,34 @@ protected void handleMessage( client.getClientId(), NettyUtils.getRemoteAddress(client.getChannel())); callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer()); + transferBlockRate.mark(totalBlockSize / 1024 / 1024); + responseDelayContext.stop(); } else if (msgObj instanceof RegisterExecutor) { + final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time(); RegisterExecutor msg = (RegisterExecutor) msgObj; checkAuth(client, msg.appId); blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo); callback.onSuccess(ByteBuffer.wrap(new byte[0])); + responseDelayContext.stop(); --- End diff -- Consider putting all `stop` calls in a finally block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69988337 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -64,6 +75,10 @@ public ExternalShuffleBlockHandler(TransportConf conf, File registeredExecutorFi public ExternalShuffleBlockHandler( OneForOneStreamManager streamManager, ExternalShuffleBlockResolver blockManager) { +this.metrics = new ShuffleMetrics(); +this.timeDelayForOpenBlockRequest = metrics.timeDelayForOpenBlockRequest; +this.timeDelayForRegisterExecutorRequest = metrics.timeDelayForRegisterExecutorRequest; --- End diff -- It's a little confusing how this metric is duplicated as a class member. Would it work to just reference it through `metrics`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r69988196 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -390,8 +390,9 @@ private[spark] class Client( // Upload Spark and the application JAR to the remote file system if necessary, // and add them as local resources to the application master. val fs = destDir.getFileSystem(hadoopConf) -val nns = YarnSparkHadoopUtil.get.getNameNodesToAccess(sparkConf) + destDir -YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, credentials) +hdfsTokenProvider(sparkConf).setNameNodesToAccess(sparkConf, Set(destDir)) --- End diff -- +1; it would be better if all interactions with token providers were done through the common interface; it seems like these HDFS-specific calls could easily be moved to the HDFS token provider. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986596 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -312,6 +312,17 @@ class DatasetSuite extends QueryTest with SharedSQLContext { "a", "30", "b", "3", "c", "1") } + test("groupBy function, mapValues, flatMap") { +val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS() --- End diff -- Just `.toDS`? (no brackets) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986532 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -65,6 +65,46 @@ class KeyValueGroupedDataset[K, V] private[sql]( groupingAttributes) /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create values grouped by key from a Dataset[(K, V)] + * ds.groupByKey(_._1).mapValues(_._2) // Scala + * }}} + * @since 2.0.0 + */ + def mapValues[W: Encoder](func: V => W): KeyValueGroupedDataset[K, W] = { +val withNewData = AppendColumns(func, dataAttributes, logicalPlan) +val projected = Project(withNewData.newColumns ++ groupingAttributes, withNewData) +val executed = sparkSession.sessionState.executePlan(projected) + +new KeyValueGroupedDataset( + encoderFor[K], + encoderFor[W], + executed, + withNewData.newColumns, + groupingAttributes) + } + + /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create Integer values grouped by String key from a Dataset> + * Dataset> ds = ...; + * KeyValueGroupedDataset grouped = + * ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT()); // Java 8 + * }}} + * @since 2.0.0 + */ + def mapValues[W](func: MapFunction[V, W], encoder: Encoder[W]): KeyValueGroupedDataset[K, W] = { +implicit val uEnc = encoder +mapValues{ (v: V) => func.call(v) } --- End diff -- A space before `{`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986479 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -65,6 +65,46 @@ class KeyValueGroupedDataset[K, V] private[sql]( groupingAttributes) /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create values grouped by key from a Dataset[(K, V)] + * ds.groupByKey(_._1).mapValues(_._2) // Scala + * }}} + * @since 2.0.0 + */ + def mapValues[W: Encoder](func: V => W): KeyValueGroupedDataset[K, W] = { +val withNewData = AppendColumns(func, dataAttributes, logicalPlan) +val projected = Project(withNewData.newColumns ++ groupingAttributes, withNewData) +val executed = sparkSession.sessionState.executePlan(projected) + +new KeyValueGroupedDataset( + encoderFor[K], + encoderFor[W], + executed, + withNewData.newColumns, + groupingAttributes) + } + + /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create Integer values grouped by String key from a Dataset> + * Dataset> ds = ...; + * KeyValueGroupedDataset grouped = + * ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT()); // Java 8 + * }}} + * @since 2.0.0 --- End diff -- A new line before `@since`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -65,6 +65,46 @@ class KeyValueGroupedDataset[K, V] private[sql]( groupingAttributes) /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create values grouped by key from a Dataset[(K, V)] + * ds.groupByKey(_._1).mapValues(_._2) // Scala + * }}} + * @since 2.0.0 + */ + def mapValues[W: Encoder](func: V => W): KeyValueGroupedDataset[K, W] = { +val withNewData = AppendColumns(func, dataAttributes, logicalPlan) +val projected = Project(withNewData.newColumns ++ groupingAttributes, withNewData) +val executed = sparkSession.sessionState.executePlan(projected) + +new KeyValueGroupedDataset( + encoderFor[K], + encoderFor[W], + executed, + withNewData.newColumns, + groupingAttributes) + } + + /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the --- End diff -- ...with the given function `func` applied to the data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986245 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -65,6 +65,46 @@ class KeyValueGroupedDataset[K, V] private[sql]( groupingAttributes) /** + * Returns a new [[KeyValueGroupedDataset]] where the given function has been applied to the + * data. The grouping key is unchanged by this. + * + * {{{ + * // Create values grouped by key from a Dataset[(K, V)] + * ds.groupByKey(_._1).mapValues(_._2) // Scala + * }}} + * @since 2.0.0 + */ + def mapValues[W: Encoder](func: V => W): KeyValueGroupedDataset[K, W] = { --- End diff -- ...while here `W: Encoder` only after `:`. Why is this inconsistency? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/14094 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14094: [SPARK-16430][SQL][STREAMING] Add option maxFiles...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14094#discussion_r69986165 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -45,6 +47,7 @@ class FileStreamSource( private val qualifiedBasePath = fs.makeQualified(new Path(path)) // can contains glob patterns private val metadataLog = new HDFSMetadataLog[Seq[String]](sparkSession, metadataPath) private var maxBatchId = metadataLog.getLatest().map(_._1).getOrElse(-1L) + private val maxFilesPerBatch = getMaxFilesPerBatch() --- End diff -- Maybe some scaladoc here about what this parameter does / its purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13526: [SPARK-15780][SQL] Support mapValues on KeyValueG...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13526#discussion_r69986179 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -175,6 +175,17 @@ object AppendColumns { encoderFor[U].namedExpressions, child) } + + def apply[T : Encoder, U : Encoder]( --- End diff -- Here you use `T : Encoder`, i.e. with spaces before and after `:` while... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14094: [SPARK-16430][SQL][STREAMING] Add option maxFiles...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/14094#discussion_r69985831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -26,6 +27,7 @@ import org.apache.spark.internal.Logging import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} import org.apache.spark.sql.execution.datasources.{CaseInsensitiveMap, DataSource, ListingFileCatalog, LogicalRelation} import org.apache.spark.sql.types.StructType +import org.apache.spark.util.Utils --- End diff -- Where is this used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13980: [SPARK-16198] [MLlib] [ML] Change access level of...
Github user husseinhazimeh closed the pull request at: https://github.com/apache/spark/pull/13980 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14095: [SPARK-16429][SQL] Include `StringType` columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14095 **[Test build #61929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61929/consoleFull)** for PR 14095 at commit [`df2edd7`](https://github.com/apache/spark/commit/df2edd730216e659dbcebdcbda61dd67fbcf8d55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69985379 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -331,6 +331,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest { } } + test("read from textfile") { +withTempDirs { case (src, tmp) => + val textStream = spark.readStream.textFile(src.getCanonicalPath) + val filtered = textStream.filter($"value" contains "keep") + + testStream(filtered)( +AddTextFileData("drop1\nkeep2\nkeep3", src, tmp), +CheckAnswer("keep2", "keep3"), +StopStream, +AddTextFileData("drop4\nkeep5\nkeep6", src, tmp), +StartStream(), --- End diff -- Just wondering why `()` are here while not for `StopStream`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69985100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset + * contains a single string column named "value". + * + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use `text`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: + * {{{ + * // Scala: + * spark.read.textFile("/path/to/spark/README.md") + * + * // Java: + * spark.read().textFile("/path/to/spark/README.md") + * }}} + * + * @param path input path + * @since 2.0.0 + */ + def textFile(path: String): Dataset[String] = { +if (userSpecifiedSchema.nonEmpty) { + throw new AnalysisException("User specified schema not supported with `textFile`") +} + text(path).select("value").as[String](sparkSession.implicits.newStringEncoder) --- End diff -- I'm surprised that `sparkSession.implicits.newStringEncoder` is required here? Why is `sparkSession.implicits._` not imported here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69985212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset + * contains a single string column named "value". + * + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use `text`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: + * {{{ + * // Scala: + * spark.read.textFile("/path/to/spark/README.md") + * + * // Java: + * spark.read().textFile("/path/to/spark/README.md") --- End diff -- s/read/readStream? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69985195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset + * contains a single string column named "value". + * + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use `text`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: + * {{{ + * // Scala: + * spark.read.textFile("/path/to/spark/README.md") --- End diff -- s/read/readStream? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69984805 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset + * contains a single string column named "value". + * + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use `text`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: + * {{{ + * // Scala: + * spark.read.textFile("/path/to/spark/README.md") + * + * // Java: + * spark.read().textFile("/path/to/spark/README.md") + * }}} + * + * @param path input path + * @since 2.0.0 + */ + def textFile(path: String): Dataset[String] = { +if (userSpecifiedSchema.nonEmpty) { + throw new AnalysisException("User specified schema not supported with `textFile`") --- End diff -- user-specified --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69984678 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset + * contains a single string column named "value". + * + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use `text`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: --- End diff -- s/element/record? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14095: [SPARK-16429][SQL] Include `StringType` columns i...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14095 [SPARK-16429][SQL] Include `StringType` columns in Scala/Python `describe()` ## What changes were proposed in this pull request? Currently, Spark `describe` supports `StringType`. However, Scala/Python `describe()` returns a dataset for only all numeric columns. SparkR returns all columns. This PR include `StringType` columns in Scala/Python `describe()`, `describe` without argument. **Before** * Scala ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show() +---+--+---+ |summary| age| name| +---+--+---+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| |min|19| Andy| |max|30|Michael| +---+--+---+ scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +---+--+ |summary| age| +---+--+ | count| 2| | mean| 24.5| | stddev|7.7781745930520225| |min|19| |max|30| +---+--+ ``` * Python ``` >>> spark.read.json("examples/src/main/resources/people.json").describe().show() +---+--+ |summary| age| +---+--+ | count| 2| | mean| 24.5| | stddev|7.7781745930520225| |min|19| |max|30| +---+--+ ``` * R ```r > collect(describe(read.json("examples/src/main/resources/people.json"))) summaryagename 1 count 2 3 2mean 24.5 3 stddev 7.7781745930520225 4 min 19Andy 5 max 30 Michael ``` **After** * Scala ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +---+--+---+ |summary| age| name| +---+--+---+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| |min|19| Andy| |max|30|Michael| +---+--+---+ ``` * Python ``` >>> spark.read.json("examples/src/main/resources/people.json").describe().show() +---+--+---+ |summary| age| name| +---+--+---+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| |min|19| Andy| |max|30|Michael| +---+--+---+ ``` * R SparkR is the same. ## How was this patch tested? Pass the Jenkins with a update testcase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-16429 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14095 commit df2edd730216e659dbcebdcbda61dd67fbcf8d55 Author: Dongjoon Hyun Date: 2016-07-07T20:45:26Z [SPARK-16429][SQL] Include `StringType` columns in Scala/Python `describe()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r69984539 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -165,111 +169,99 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { def resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression] = { -resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver) + outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), resolver) } /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * This assumes `name` has multiple parts, where the 1st part is a qualifier - * (i.e. table name, alias, or subquery alias). - * See the comment above `candidates` variable in resolve() for semantics the returned data. + * Refreshes (or invalidates) any metadata/data cached in the plan recursively. */ - private def resolveAsTableColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -assert(nameParts.length > 1) -if (attribute.qualifier.exists(resolver(_, nameParts.head))) { - // At least one qualifier matches. See if remaining parts match. - val remainingParts = nameParts.tail - resolveAsColumn(remainingParts, resolver, attribute) -} else { - None -} + def refresh(): Unit = children.foreach(_.refresh()) +} + +/** + * Helper class for (LogicalPlan) attribute resolution. This class indexes attributes by their + * case-in-sensitive name, and checks potential candidates using the given Resolver. Both qualified --- End diff -- The `resolve` methods takes a `Resolver` as its parameter. This allows us to use either case sensitive or insensitive attribute resolution depending on the `Resolver` passed. The names of both classes are confusing and I might rename the `AttributeResolver` class to `AttributeIndex` or something like that... The `AttributeResolver` creates two indexes based on the lower case (qualified) attribute name; we do an initial lookup based on the lower case name, and then use the `Resolver` for the actual attribute selection. This allows us to do fast(er) and correct lookups. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14094 **[Test build #61928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61928/consoleFull)** for PR 14094 at commit [`c591007`](https://github.com/apache/spark/commit/c591007452f2fe3b08f99db64a94d88384a9b101). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14087#discussion_r69984584 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo @Experimental def text(path: String): DataFrame = format("text").load(path) + /** + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset --- End diff -- a text file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14083 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61923/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14083 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14083 **[Test build #61923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61923/consoleFull)** for PR 14083 at commit [`c75ae8d`](https://github.com/apache/spark/commit/c75ae8d892ec46a18342235c39c7002402740b7d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14094 **[Test build #61927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61927/consoleFull)** for PR 14094 at commit [`ddd9426`](https://github.com/apache/spark/commit/ddd9426281e743af205f2a3f56be3535cd584b2d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14094: [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrig...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/14094 @marmbrus @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14094: [SPARK-16430][SQL][STREAMING] Add option maxFiles...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/14094 [SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrigger ## What changes were proposed in this pull request? An option that limits the file stream source to read 1 file at a time enables rate limiting. It has the additional convenience that a static set of files can be used like a stream for testing as this will allows those files to be considered one at a time. This PR adds option `maxFilesPerTrigger`. ## How was this patch tested? New unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-16430 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14094.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14094 commit ddd9426281e743af205f2a3f56be3535cd584b2d Author: Tathagata Das Date: 2016-07-07T20:45:38Z Add option maxFilesPerTrigger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r69982767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -165,111 +169,99 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { def resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression] = { -resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver) + outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), resolver) } /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * This assumes `name` has multiple parts, where the 1st part is a qualifier - * (i.e. table name, alias, or subquery alias). - * See the comment above `candidates` variable in resolve() for semantics the returned data. + * Refreshes (or invalidates) any metadata/data cached in the plan recursively. */ - private def resolveAsTableColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -assert(nameParts.length > 1) -if (attribute.qualifier.exists(resolver(_, nameParts.head))) { - // At least one qualifier matches. See if remaining parts match. - val remainingParts = nameParts.tail - resolveAsColumn(remainingParts, resolver, attribute) -} else { - None -} + def refresh(): Unit = children.foreach(_.refresh()) +} + +/** + * Helper class for (LogicalPlan) attribute resolution. This class indexes attributes by their + * case-in-sensitive name, and checks potential candidates using the given Resolver. Both qualified --- End diff -- case-insensitive? When you say "the given Resolver", what do you mean by "Resolver"? Can we link to the type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14080#discussion_r69981791 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) { } } + /** + * A simple class to wrap all shuffle service wrapper metrics + */ + private class ShuffleMetrics implements MetricSet { +private final Map allMetrics; +private final Timer timeDelayForOpenBlockRequest = new Timer(); +private final Timer timeDelayForRegisterExecutorRequest = new Timer(); +private final Meter transferBlockRate = new Meter(); + +private ShuffleMetrics() { + allMetrics = new HashMap<>(); --- End diff -- Will it work with Java 7? I think Spark 2.0 will keep support for the version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14093: SPARK-16420: Ensure compression streams are closed.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14093 **[Test build #61926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61926/consoleFull)** for PR 14093 at commit [`601f934`](https://github.com/apache/spark/commit/601f934372922b3b68424d3ef5a3cc81fd0f4500). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14088 **[Test build #61925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61925/consoleFull)** for PR 14088 at commit [`55e66b2`](https://github.com/apache/spark/commit/55e66b21cdcd68861db0f1045186048c54b13153). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14088 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61925/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14088 **[Test build #61925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61925/consoleFull)** for PR 14088 at commit [`55e66b2`](https://github.com/apache/spark/commit/55e66b21cdcd68861db0f1045186048c54b13153). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14088 ok to test. shouldn't be hard to add a unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13876 Thank you for review and merging, @cloud-fan and @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14092 **[Test build #3169 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3169/consoleFull)** for PR 14092 at commit [`b4b02bf`](https://github.com/apache/spark/commit/b4b02bf3879daf9a4532b61a019ea33b0f3ff835). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14092 **[Test build #3168 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3168/consoleFull)** for PR 14092 at commit [`b4b02bf`](https://github.com/apache/spark/commit/b4b02bf3879daf9a4532b61a019ea33b0f3ff835). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14022: [SPARK-16272][core] Allow config values to reference con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14022 **[Test build #61924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61924/consoleFull)** for PR 14022 at commit [`392bddc`](https://github.com/apache/spark/commit/392bddc57eaefb09c73902ea041f05705d9498aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org