[GitHub] spark issue #17715: [SPARK-20047][ML] Constrained Logistic Regression
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/17715 Many use-cases are setting the bounds as a constant instead of setting each dimensional individually. Maybe we can add the following APIs. ```scala def setLowerBoundsOnIntercepts(bound: Double) ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17342 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17342 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76128/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17342 **[Test build #76128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)** for PR 17342 at commit [`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17748#discussion_r113115850 --- Diff: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java --- @@ -363,25 +362,29 @@ protected File initRecoveryDb(String dbFileName) { // make sure to move all DBs to the recovery path from the old NM local dirs. // If another DB was initialized first just make sure all the DBs are in the same // location. - File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName); - if (!newLoc.equals(f)) { + Path newLoc = new Path(_recoveryPath, dbName); + Path copyFrom = new Path(f.toURI()); + if (!newLoc.equals(copyFrom)) { +logger.info("Moving " + copyFrom + " to: " + newLoc); try { - Files.move(f.toPath(), newLoc.toPath()); + // The move here needs to handle moving non-empty directories across NFS mounts + FileSystem fs = FileSystem.getLocal(_conf); + fs.rename(copyFrom, newLoc); --- End diff -- How much more expensive is this for non nfs cases ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17756: [SPARK-20455][DOCS] Fix Broken Docker IT Docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17756 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17756: [SPARK-20455][DOCS] Fix Broken Docker IT Docs
GitHub user original-brownbear opened a pull request: https://github.com/apache/spark/pull/17756 [SPARK-20455][DOCS] Fix Broken Docker IT Docs ## What changes were proposed in this pull request? Just added the Maven `test`goal. ## How was this patch tested? No test needed, just a trivial documentation fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/original-brownbear/spark SPARK-20455 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17756 commit 01fdcea9c8a62e6800fcc73a7137672fbf77e2cd Author: Armin Braun Date: 2017-04-25T06:10:15Z [SPARK-20455][DOCS] Fix Broken Docker IT Docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17737 The point here is to fix Python documentation and it matches if there are some mismatches at `bitwiseOR`, `bitwiseAND`, `bitwiseXOR` `contains`, `asc` and `desc` among `functions.py`, `column.py`, `functions.scala` and `Column.scala`. I hope other extra changes do not hold off this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113112450 --- Diff: python/pyspark/sql/column.py --- @@ -527,7 +583,7 @@ def _test(): .appName("sql.column tests")\ .getOrCreate() sc = spark.sparkContext -globs['sc'] = sc +globs['spark'] = spark globs['df'] = sc.parallelize([(2, 'Alice'), (5, 'Bob')]) \ --- End diff -- Maybe we could. I think this is not related with Python documentation fix BTW. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17755 CC @vanzin , this backport can be merged to branch 2.0 cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17680 @liancheng and @davies, if you are not sure of this way, I could simply avoid to push down the filters in this case for now. Please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113110805 --- Diff: python/pyspark/sql/column.py --- @@ -527,7 +583,7 @@ def _test(): .appName("sql.column tests")\ .getOrCreate() sc = spark.sparkContext -globs['sc'] = sc +globs['spark'] = spark globs['df'] = sc.parallelize([(2, 'Alice'), (5, 'Bob')]) \ --- End diff -- Do you want to update the `globs['df']` definition to `spark.createDataFrame`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17737 Thank you for your review and approval @felixcheung, @zero323 and @map222. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76129/testReport)** for PR 17737 at commit [`eaeb456`](https://github.com/apache/spark/commit/eaeb4564562272ae021fa1a7a8a083ccc56e5c33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113109064 --- Diff: python/pyspark/sql/column.py --- @@ -288,8 +324,16 @@ def __iter__(self): >>> df.filter(df.name.endswith('ice$')).collect() [] """ +_contains_doc = """ +Contains the other element. Returns a boolean :class:`Column` based on a string match. + +:param other: string in line + +>>> df.filter(df.name.contains('o')).collect() +[Row(age=5, name=u'Bob')] +""" --- End diff -- Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113109049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -1008,7 +1009,7 @@ class Column(val expr: Expression) extends Logging { def cast(to: String): Column = cast(CatalystSqlParser.parseDataType(to)) /** - * Returns an ordering used in sorting. + * Returns a sort expression based on the descending order of the column. --- End diff -- Yea, that sounds good in a way but the downside of adding examples is to maintain and keep them up to date. Let's leave them out here as this PR targets to fix Python documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113108974 --- Diff: python/pyspark/sql/column.py --- @@ -251,15 +285,16 @@ def __iter__(self): # string methods _rlike_doc = """ -Return a Boolean :class:`Column` based on a regex match. +SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex --- End diff -- Let's leave so that it indicates the regular expression is in SQL syntax. I would like to keep them identically in most cases to reduce the overhead when someone needs to sweep the documentation. It looks there are few places that needs the clarification. If this is something that has to be done, then, let's do this in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17755 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17755 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76127/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17755 **[Test build #76127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76127/testReport)** for PR 17755 at commit [`2fc1525`](https://github.com/apache/spark/commit/2fc1525c4e0f55a684bc894403694fcfac8f878e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17751 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17751 thanks, merging to master/2.2/2.1/2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17753: [SPARK-20453] Bump master branch version to 2.3.0...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17753 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17753 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 It is. But it has no problem for normal string literal. It causes problem only if the string literal is used as regex pattern string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17467 @brkyvz - Added new changes that adds - - A case class `KinesisReadConfigurations` that adds all the kinesis read configs in a single place - A test class that passes the kinesis configs in `SparkConf` which are then used to create the kinesis configs object in `KinesisInputDStream` and passed down to `KinesisBackedBlockRDD` - Docs improvement I also played with the `PrivateMethodTester ` but wasn't able to access the private function `KinesisSequenceRangeIterator#retryOrTimeout` . Probably because of the generics used in the function. I used an alternative to fetch the RDD's directly and check the configs passed in there. I would still like to learn how to get the `retryOrTimeout` working just out of interest. Adding the error below: ``` // KinesisSequenceRangeIterator # retryOrTimeout val retryOrTimeoutMethod = PrivateMethod[Object]('retryOrTimeout) // <<<- Issue val partitions = kinesisRDD.partitions.map { _.asInstanceOf[KinesisBackedBlockRDDPartition] }.toSeq seqNumRanges1.ranges.map{ range => val seqRangeIter = new KinesisSequenceRangeIterator(DefaultCredentials.provider.getCredentials, dummyEndpointUrl, dummyRegionName, range, kinesisRDD.kinesisReadConfigs) seqRangeIter.invokePrivate(retryOrTimeoutMethod("Passing custom message")) } - Kinesis read with custom configurations *** FAILED *** java.lang.IllegalArgumentException: Can't find a private method named: retryOrTimeout at org.scalatest.PrivateMethodTester$Invoker.invokePrivate(PrivateMethodTester.scala:247) at org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7$$anonfun$apply$mcV$sp$13.apply(KinesisStreamSuite.scala:286) at org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7$$anonfun$apply$mcV$sp$13.apply(KinesisStreamSuite.scala:281) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7.apply$mcV$sp(KinesisStreamSuite.scala:281) at org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7.apply(KinesisStreamSuite.scala:237) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17342 **[Test build #76128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)** for PR 17342 at commit [`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user weiqingy commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r113103389 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage) } } + + test("SPARK-12868: Allow adding jars from hdfs ") { +val jarFromHdfs = "hdfs://doesnotmatter/test.jar" +val jarFromInvalidFs = "fffs://doesnotmatter/test.jar" + +// if 'hdfs' is not supported, MalformedURLException will be thrown +new URL(jarFromHdfs) +var exceptionThrown: Boolean = false --- End diff -- Thanks. PR has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk The link you pasted is for the case that using scala closure to create udf. While `registerJava` use java reflection to create udf. This is what I use in `registerJava` https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L528 It returns Unit. Maybe it is possible to create `registerScala` to return scala udf. But it seems it is not possible for java udf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya I fixed the test as you asked, so please take a look when you get a chance. I'm having a little bit of trouble with my local spark build for some reason, but I'll do that other benchmark when it's resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 isn't the regex parsed as string literal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17714: [SPARK-20428][Core]REST interface about 'v1/submissions/...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/17714 Help with code review.Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL][Documentation]Modify the instructions...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/17698 @srowen test is not started, could you help trigger it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17755 **[Test build #76127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76127/testReport)** for PR 17755 at commit [`2fc1525`](https://github.com/apache/spark/commit/2fc1525c4e0f55a684bc894403694fcfac8f878e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17755: [SPARK-20239][CORE][2.1-backport] Improve History...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/17755 [SPARK-20239][CORE][2.1-backport] Improve HistoryServer's ACL mechanism Current SHS (Spark History Server) has two different ACLs: * ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app. * Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app. With this two ACLs, we may encounter several unexpected behaviors: 1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app. 2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" could download any application's event log, even it is not run by user "A". 3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file. The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all. So to improve SHS's ACL mechanism, here in this PR proposed to: 1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server. 2. Check permission for event-log download REST API. With this PR: 1. Admin user could see/download the list of all applications, as well as application details. 2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him. New UTs are added, also verified in real cluster. CC tgravescs vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-20239-2.1-backport Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17755.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17755 commit 2fc1525c4e0f55a684bc894403694fcfac8f878e Author: jerryshao Date: 2017-04-25T01:18:59Z [SPARK-20239][CORE] Improve HistoryServer's ACL mechanism Current SHS (Spark History Server) two different ACLs: * ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app. * Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app. With this two ACLs, we may encounter several unexpected behaviors: 1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app. 2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" could download any application's event log, even it is not run by user "A". 3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file. The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all. So to improve SHS's ACL mechanism, here in this PR proposed to: 1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server. 2. Check permission for event-log download REST API. With this PR: 1. Admin user could see/download the list of all applications, as well as application details. 2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him. New UTs are added, also verified in real cluster. CC tgravescs vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot. Author: jerryshao Closes #17582 from jerryshao/SPARK-20239. Change-Id: I65d5d0c5e5a76f08abbe2b7dd43a2e08d295f6b6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi
[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76122/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17753 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17753 **[Test build #76122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76122/testReport)** for PR 17753 at commit [`983f746`](https://github.com/apache/spark/commit/983f74659a310a970280ae3696ee40e244cf67a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17750: [SPARK-4899][MESOS] Support for checkpointing on Coarse ...
Github user lhoss commented on the issue: https://github.com/apache/spark/pull/17750 would be great to have this soon in 2.2.x (maybe even backported to 2.1.x) many accepted reviews already in https://github.com/metamx/spark/pull/26 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17751 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76123/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17751 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17751 **[Test build #76123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76123/testReport)** for PR 17751 at commit [`b9dbb9c`](https://github.com/apache/spark/commit/b9dbb9c9515b2b53cc03e59935cca740a2a56f44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17582 OK, let me try it, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76121/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17698: [SPARK-20403][SQL][Documentation]Modify the instructions...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/17698 can Jenkins to test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17640 **[Test build #76121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76121/testReport)** for PR 17640 at commit [`331b781`](https://github.com/apache/spark/commit/331b781f4d396e4dcf981d3b50ba63e770ec9880). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17582 It would be good, but maybe the 2.1 backport will merge cleanly to 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17582 What about branch 2.0, do we also need to backport to it @vanzin ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113092246 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java --- @@ -95,7 +95,8 @@ public boolean isFinal() { void kill(); /** - * Disconnects the handle from the application, without stopping it. After this method is called, + * Disconnects the handle from the application. If using {@link SparkLauncher#autoShutdown()} --- End diff -- Sorry, I thought it was Scala one. I just checked it passes the doc build locally with the suggestion above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17725#discussion_r113091533 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -404,6 +407,37 @@ class SparkSubmitSuite runSparkSubmit(args) } + test("launch simple application with spark-submit with redaction") { +val testDir = Utils.createTempDir() +testDir.deleteOnExit() +val testDirPath = new Path(testDir.getAbsolutePath()) +val unusedJar = TestUtils.createJarWithClasses(Seq.empty) +val fileSystem = Utils.getHadoopFileSystem("/", + SparkHadoopUtil.get.newConfiguration(new SparkConf())) +try { + val args = Seq( +"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"), +"--name", "testApp", +"--master", "local", +"--conf", "spark.ui.enabled=false", +"--conf", "spark.master.rest.enabled=false", +"--conf", "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password", +"--conf", "spark.eventLog.enabled=true", +"--conf", "spark.eventLog.testing=true", +"--conf", s"spark.eventLog.dir=${testDirPath.toUri.toString}", +"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com", +unusedJar.toString) + runSparkSubmit(args) + val listStatuses = fileSystem.listStatus(testDirPath) --- End diff -- s/listStatuses/something else. Use list, statuses, statusList, but "listStatuses" doesn't parse for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17725#discussion_r113091328 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2606,8 +2606,22 @@ private[spark] object Utils extends Logging { } private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): Seq[(String, String)] = { +// If the sensitive information regex matches with either the key or the value, redact the value +// While the original intent was to only redact the value if the key matched with the regex, +// we've found that especially in verbose mode, the value of the property may contain sensitive +// information like so: +// "sun.java.command":"org.apache.spark.deploy.SparkSubmit ... \ +// --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ... +// +// And, in such cases, simply searching for the sensitive information regex in the key name is +// not sufficient. The values themselves have to be searched as well and redacted if matched. +// This does mean we may be accounting more false positives - for example, if the value of an +// arbitrary property contained the term 'password', we may redact the value from the UI and +// logs. In order to work around it, user would have to make the spark.redaction.regex property +// more specific. kvs.map { kv => --- End diff -- Since you're looking at values now... `.map { case (key, value) =>` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17725#discussion_r113091235 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -252,11 +252,17 @@ private[spark] class EventLoggingListener( private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { -// "Spark Properties" entry will always exist because the map is always populated with it. -val redactedProps = Utils.redact(sparkConf, event.environmentDetails("Spark Properties")) -val redactedEnvironmentDetails = event.environmentDetails + - ("Spark Properties" -> redactedProps) -SparkListenerEnvironmentUpdate(redactedEnvironmentDetails) +// environmentDetails maps a string descriptor to a set of properties +// Similar to: +// "JVM Information" -> jvmInformation, +// "Spark Properties" -> sparkProperties, +// ... +// where jvmInformation, sparkProperties, etc. are sequence of tuples. +// We go through the various of properties and redact sensitive information from them. +val redactedProps = event.environmentDetails.map{ --- End diff -- `.map { case (name, props) =>` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17725#discussion_r113091382 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -404,6 +407,37 @@ class SparkSubmitSuite runSparkSubmit(args) } + test("launch simple application with spark-submit with redaction") { +val testDir = Utils.createTempDir() +testDir.deleteOnExit() +val testDirPath = new Path(testDir.getAbsolutePath()) +val unusedJar = TestUtils.createJarWithClasses(Seq.empty) +val fileSystem = Utils.getHadoopFileSystem("/", + SparkHadoopUtil.get.newConfiguration(new SparkConf())) +try { + val args = Seq( +"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"), +"--name", "testApp", +"--master", "local", +"--conf", "spark.ui.enabled=false", +"--conf", "spark.master.rest.enabled=false", +"--conf", "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password", +"--conf", "spark.eventLog.enabled=true", +"--conf", "spark.eventLog.testing=true", +"--conf", s"spark.eventLog.dir=${testDirPath.toUri.toString}", +"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com", +unusedJar.toString) + runSparkSubmit(args) + val listStatuses = fileSystem.listStatus(testDirPath) + val logData = EventLoggingListener.openEventLog(listStatuses.last.getPath, fileSystem) + Source.fromInputStream(logData).getLines().foreach { --- End diff -- `.foreach { line =>` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76126/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17754 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17754 **[Test build #76126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76126/testReport)** for PR 17754 at commit [`771e490`](https://github.com/apache/spark/commit/771e490fd46c277479b4a06cfa6bb166d1f62856). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17582: [SPARK-20239][Core] Improve HistoryServer's ACL m...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17582 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17582 No luck with 2.1, please file a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17582 LGTM. Merging to master / 2.2, will try 2.1 and 2.0 too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17222 @zjffdu - if you look at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L129 though it returns the `UserDefinedFunction` (currently the Python one is Unit but it would be more useful if it returned a `UserDefinedFunction`). I think to make it easier for people to take advantage of Java UDFs we would want them to be able to use it programmatic ally in the Dataframe DSL not just in SQL string expressions. What do you think @gatorsmile & @zjffdu ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r113088010 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage) } } + + test("SPARK-12868: Allow adding jars from hdfs ") { +val jarFromHdfs = "hdfs://doesnotmatter/test.jar" +val jarFromInvalidFs = "fffs://doesnotmatter/test.jar" + +// if 'hdfs' is not supported, MalformedURLException will be thrown +new URL(jarFromHdfs) +var exceptionThrown: Boolean = false --- End diff -- Replace this whole block with: ``` intercept[MalformedURLException] { new URL(jarFromInvalidFs) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/17540 The current `withNewExecutionId` issue is that it doesn't support nested QueryExecution. I'm wondering if you can really fix this issue without introducing regression, e.g., track the nested QueryExecution and display them properly in the UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r113087928 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -161,50 +161,51 @@ object FileFormatWriter extends Logging { } } -SQLExecution.withNewExecutionId(sparkSession, queryExecution) { - // This call shouldn't be put into the `try` block below because it only initializes and - // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called. - committer.setupJob(job) - - try { -val rdd = if (orderingMatched) { - queryExecution.toRdd -} else { - SortExec( -requiredOrdering.map(SortOrder(_, Ascending)), -global = false, -child = queryExecution.executedPlan).execute() -} -val ret = new Array[WriteTaskResult](rdd.partitions.length) -sparkSession.sparkContext.runJob( - rdd, - (taskContext: TaskContext, iter: Iterator[InternalRow]) => { -executeTask( - description = description, - sparkStageId = taskContext.stageId(), - sparkPartitionId = taskContext.partitionId(), - sparkAttemptNumber = taskContext.attemptNumber(), - committer, - iterator = iter) - }, - 0 until rdd.partitions.length, - (index, res: WriteTaskResult) => { -committer.onTaskCommit(res.commitMsg) -ret(index) = res - }) - -val commitMsgs = ret.map(_.commitMsg) -val updatedPartitions = ret.flatMap(_.updatedPartitions) - .distinct.map(PartitioningUtils.parsePathFragment) - -committer.commitJob(job, commitMsgs) -logInfo(s"Job ${job.getJobID} committed.") -refreshFunction(updatedPartitions) - } catch { case cause: Throwable => -logError(s"Aborting job ${job.getJobID}.", cause) -committer.abortJob(job) -throw new SparkException("Job aborted.", cause) +// During tests, make sure there is an execution ID. +SQLExecution.checkSQLExecutionId(sparkSession) --- End diff -- To make SQL metrics work, we should always wrap the correct QueryExecution with `SparkListenerSQLExecutionStart`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r113087506 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ApplicationEventListener.scala --- @@ -57,4 +58,10 @@ private[spark] class ApplicationEventListener extends SparkListener { adminAclsGroups = allProperties.get("spark.admin.acls.groups") } } + + override def onOtherEvent(event:SparkListenerEvent):Unit = event match { --- End diff -- nit: space after `:` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086978 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/launcher/YarnCommandBuilderUtils.scala --- @@ -17,10 +17,11 @@ package org.apache.spark.launcher -import scala.collection.JavaConverters._ -import scala.collection.mutable.ListBuffer import scala.util.Properties +import org.apache.spark.SparkConf + --- End diff -- nit: one too many empty lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086177 --- Diff: core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java --- @@ -183,6 +183,28 @@ public void testChildProcLauncher() throws Exception { assertEquals(0, app.waitFor()); } + @Test + public void testThreadLauncher() throws Exception { +// This test is failed on Windows due to the failure of initiating executors +// by the path length limitation. See SPARK-18718. +assumeTrue(!Utils.isWindows()); + +launcher + .setMaster("local") + .setAppResource(SparkLauncher.NO_RESOURCE) + .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, +"-Dfoo=bar -Dtest.appender=childproc") + .setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, System.getProperty("java.class.path")) + .setMainClass(SparkLauncherTestApp.class.getName()) + .launchAsThread(true) + .addAppArgs("proc"); +final Process app = launcher.launch(); --- End diff -- What is this testing? `launch()` will always launch a child process. Which indicates two problems: - this test is not testing anything that hasn't been tested before. - `SparkLauncher` should probably be throwing an error if you use `.launchAsThread(true)` and then call `launch()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086519 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -488,11 +549,24 @@ public Process launch() throws IOException { * In all cases, the logger name will start with "org.apache.spark.launcher.app", to fit more * easily into the configuration of commonly-used logging systems. * + * If the application is launched as a thread, the log redirection methods are not supported, + * and the parent process's output and log configuration will be used. + * * @since 1.6.0 * @param listeners Listeners to add to the handle before the app is launched. * @return A handle for the launched application. */ public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners) throws IOException { +if (launchAsThread) { + checkArgument(builder.childEnv.isEmpty(), +"Environment variables are not supported while launching as Thread"); --- End diff -- s/Environment variables/Custom environment variables s/as Thread/in a thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113085844 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -725,9 +722,15 @@ object SparkSubmit extends CommandLineUtils { printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.") } -val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass) -if (!Modifier.isStatic(mainMethod.getModifiers)) { - throw new IllegalStateException("The main method in the given main class must be static") +val sparkAppMainMethod = mainClass.getMethods().find(_.getName == "sparkMain") +val childSparkConf = sysProps.filter{ p => p._1.startsWith("spark.") }.toMap --- End diff -- nit: again, please address the feedback that is given. I've lost count of how many times I've pointed out that there's a missing space between `filter` and `{`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086823 --- Diff: launcher/src/main/java/org/apache/spark/launcher/package-info.java --- @@ -21,12 +21,14 @@ * * This library allows applications to launch Spark programmatically. There's only one entry * point to the library - the {@link org.apache.spark.launcher.SparkLauncher} class. + * Under YARN manager cluster mode, it supports launching in Application in thread or --- End diff -- Delete this sentence. It's actually not correct on top of being a little confusing. You can launch any application in a child thread. What YARN cluster mode currently gives you is that it's safe to launch multiple applications as child threads in the same process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113087052 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -201,6 +192,71 @@ class YarnClusterSuite extends BaseYarnClusterSuite { finalState should be (SparkAppHandle.State.FAILED) } + test("monitor app running in thread using launcher library") { +var handle : SparkAppHandle = null +try { + handle = launchSparkAppWithConf(true, false, "cluster") + handle.stop() + + eventually(timeout(30 seconds), interval(100 millis)) { +handle.getState() should be (SparkAppHandle.State.KILLED) + } +} finally { + handle.kill() +} + } + + test("monitor app using launcher library for proc with auto shutdown") { +var handle : SparkAppHandle = null +try { + handle = launchSparkAppWithConf(false, true, "cluster") + handle.disconnect() + val applicationId = ConverterUtils.toApplicationId(handle.getAppId) + val yarnClient: YarnClient = getYarnClient + eventually(timeout(30 seconds), interval(100 millis)) { +handle.getState() should be (SparkAppHandle.State.LOST) +var status = yarnClient.getApplicationReport(applicationId).getFinalApplicationStatus() +status should be (FinalApplicationStatus.KILLED) + } +} finally { + handle.kill() +} + } + + test("monitor app using launcher library for thread with auto shutdown") { +var handle : SparkAppHandle = null --- End diff -- `val handle = launchSparkAppWithConf(true, true, "cluster")` Otherwise your `finally` block can throw an NPE. Also happens elsewhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086469 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -107,6 +119,34 @@ public static void setConfig(String name, String value) { launcherConfig.put(name, value); } + + + /** + * Specifies that Spark Application be stopped if current process goes away. + * It tries stop/kill Spark Application if launching process goes away. + * + * @since 2.2.0 + * @param autoShutdown Flag for shutdown Spark Application if launcher process goes away. + * @return This launcher. + */ + public SparkLauncher autoShutdown(boolean autoShutdown) { +this.autoShutdown = autoShutdown; +return this; + } + + /** + * Specifies that Spark Submit be launched as a daemon thread. Please note + * this feature is currently supported only for YARN cluster deployment mode. + * + * @since 2.2.0 + * @param launchAsThread Flag for launching app as a thread. --- End diff -- "Whether to launch the Spark application in a new thread in the same process." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113086407 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -107,6 +119,34 @@ public static void setConfig(String name, String value) { launcherConfig.put(name, value); } + + + /** + * Specifies that Spark Application be stopped if current process goes away. + * It tries stop/kill Spark Application if launching process goes away. + * + * @since 2.2.0 + * @param autoShutdown Flag for shutdown Spark Application if launcher process goes away. --- End diff -- "Whether to shut down the Spark application if the launcher process goes away." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17754 **[Test build #76126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76126/testReport)** for PR 17754 at commit [`771e490`](https://github.com/apache/spark/commit/771e490fd46c277479b4a06cfa6bb166d1f62856). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17752 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17752 **[Test build #76124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76124/testReport)** for PR 17752 at commit [`b59573f`](https://github.com/apache/spark/commit/b59573f5ae827e7cb14757297d6bf092bd7f21aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76124/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r113085569 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java --- @@ -95,7 +95,8 @@ public boolean isFinal() { void kill(); /** - * Disconnects the handle from the application, without stopping it. After this method is called, + * Disconnects the handle from the application. If using {@link SparkLauncher#autoShutdown()} --- End diff -- That's because the method is `autoShutdown(boolean)` and not `autoShutdown()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r113085517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) -case n => logError(s"UDF class with ${n} type arguments is not supported ") +case n => + throw new IOException(s"UDF class with ${n} type arguments is not supported.") } } catch { case e @ (_: InstantiationException | _: IllegalArgumentException) => -logError(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") +throw new IOException(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") } } } catch { - case e: ClassNotFoundException => logError(s"Can not load class ${className}, please make sure it is on the classpath") + case e: ClassNotFoundException => throw new IOException(s"Can not load class ${className}, please make sure it is on the classpath") } } /** + * Register a Java UDAF class using reflection, for use from pyspark + * + * @param name UDAF name + * @param classNamefully qualified class name of UDAF --- End diff -- @since is needed for private function ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17582 OK, thanks @tgravescs . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17754 **[Test build #76125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76125/testReport)** for PR 17754 at commit [`dbff961`](https://github.com/apache/spark/commit/dbff96111fd00c2127afe2a46515efc163aa36b8). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76125/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17754 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17754 **[Test build #76125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76125/testReport)** for PR 17754 at commit [`dbff961`](https://github.com/apache/spark/commit/dbff96111fd00c2127afe2a46515efc163aa36b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r113084795 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -161,50 +161,51 @@ object FileFormatWriter extends Logging { } } -SQLExecution.withNewExecutionId(sparkSession, queryExecution) { - // This call shouldn't be put into the `try` block below because it only initializes and - // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called. - committer.setupJob(job) - - try { -val rdd = if (orderingMatched) { - queryExecution.toRdd -} else { - SortExec( -requiredOrdering.map(SortOrder(_, Ascending)), -global = false, -child = queryExecution.executedPlan).execute() -} -val ret = new Array[WriteTaskResult](rdd.partitions.length) -sparkSession.sparkContext.runJob( - rdd, - (taskContext: TaskContext, iter: Iterator[InternalRow]) => { -executeTask( - description = description, - sparkStageId = taskContext.stageId(), - sparkPartitionId = taskContext.partitionId(), - sparkAttemptNumber = taskContext.attemptNumber(), - committer, - iterator = iter) - }, - 0 until rdd.partitions.length, - (index, res: WriteTaskResult) => { -committer.onTaskCommit(res.commitMsg) -ret(index) = res - }) - -val commitMsgs = ret.map(_.commitMsg) -val updatedPartitions = ret.flatMap(_.updatedPartitions) - .distinct.map(PartitioningUtils.parsePathFragment) - -committer.commitJob(job, commitMsgs) -logInfo(s"Job ${job.getJobID} committed.") -refreshFunction(updatedPartitions) - } catch { case cause: Throwable => -logError(s"Aborting job ${job.getJobID}.", cause) -committer.abortJob(job) -throw new SparkException("Job aborted.", cause) +// During tests, make sure there is an execution ID. +SQLExecution.checkSQLExecutionId(sparkSession) --- End diff -- The major issue is this change. For all queries using FileFormatWriter, we won't get any metrics because of https://github.com/apache/spark/blob/7536e2849df6d63587fbf16b4ecb5db06fed7125/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L139 . It creates a new QueryExecution and we don't track it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17754: [FollowUp][SPARK-18901][ML]: Require in LR Logist...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17754 [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? This is a follow-up PR of #17478. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17754.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17754 commit dbff96111fd00c2127afe2a46515efc163aa36b8 Author: wangmiao1981 Date: 2017-04-25T00:11:08Z remove extra require check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/17625 This PR was closed, so, I'll create a new one focusing on just the back end pieces. I'll create a fresh JIRA for more general memory tracking improvements to the UI where we can hash out more of the details. The UI has changed quite a lot since the original PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17752 **[Test build #76124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76124/testReport)** for PR 17752 at commit [`b59573f`](https://github.com/apache/spark/commit/b59573f5ae827e7cb14757297d6bf092bd7f21aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17478 @yanboliang I will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17751 LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17666 @hvanhovell ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17711 @hvanhovell ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17751 **[Test build #76123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76123/testReport)** for PR 17751 at commit [`b9dbb9c`](https://github.com/apache/spark/commit/b9dbb9c9515b2b53cc03e59935cca740a2a56f44). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17222 Will review this PR more carefully in the next few days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/17751#discussion_r113082123 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1726,15 +1726,23 @@ class Dataset[T] private[sql]( // It is possible that the underlying dataframe doesn't guarantee the ordering of rows in its // constituent partitions each time a split is materialized which could result in // overlapping splits. To prevent this, we explicitly sort each input partition to make the -// ordering deterministic. -// MapType cannot be sorted. -val sorted = Sort(logicalPlan.output.filterNot(_.dataType.isInstanceOf[MapType]) - .map(SortOrder(_, Ascending)), global = false, logicalPlan) +// ordering deterministic. Note that MapTypes cannot be sorted and are explicitly pruned out +// from the sort order. +val sortOrder = logicalPlan.output + .filterNot(_.dataType.existsRecursively(dt => dt.isInstanceOf[MapType])) + .map(SortOrder(_, Ascending)) +val plan = if (sortOrder.nonEmpty) { + Sort(sortOrder, global = false, logicalPlan) +} else { + // SPARK-12662: If sort order is empty, we materialize the dataset to guarantee determinism --- End diff -- We actually discussed materialization in https://issues.apache.org/jira/browse/SPARK-12662 so that ticket should provide direct context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r113082057 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) -case n => logError(s"UDF class with ${n} type arguments is not supported ") +case n => + throw new IOException(s"UDF class with ${n} type arguments is not supported.") } } catch { case e @ (_: InstantiationException | _: IllegalArgumentException) => -logError(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") +throw new IOException(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") --- End diff -- Please throw an `AnalysisException` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/17751#discussion_r113081974 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1726,15 +1726,23 @@ class Dataset[T] private[sql]( // It is possible that the underlying dataframe doesn't guarantee the ordering of rows in its // constituent partitions each time a split is materialized which could result in // overlapping splits. To prevent this, we explicitly sort each input partition to make the -// ordering deterministic. -// MapType cannot be sorted. -val sorted = Sort(logicalPlan.output.filterNot(_.dataType.isInstanceOf[MapType]) - .map(SortOrder(_, Ascending)), global = false, logicalPlan) +// ordering deterministic. Note that MapTypes cannot be sorted and are explicitly pruned out +// from the sort order. +val sortOrder = logicalPlan.output + .filterNot(_.dataType.existsRecursively(dt => dt.isInstanceOf[MapType])) --- End diff -- nice, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r113082001 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) -case n => logError(s"UDF class with ${n} type arguments is not supported ") +case n => + throw new IOException(s"UDF class with ${n} type arguments is not supported.") } } catch { case e @ (_: InstantiationException | _: IllegalArgumentException) => -logError(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") +throw new IOException(s"Can not instantiate class ${className}, please make sure it has public non argument constructor") } } } catch { - case e: ClassNotFoundException => logError(s"Can not load class ${className}, please make sure it is on the classpath") + case e: ClassNotFoundException => throw new IOException(s"Can not load class ${className}, please make sure it is on the classpath") } } /** + * Register a Java UDAF class using reflection, for use from pyspark + * + * @param name UDAF name + * @param classNamefully qualified class name of UDAF --- End diff -- Missing @since. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17752: [SPARK-20452][SS][Kafka]Fix a potential Concurren...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17752#discussion_r113081225 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala --- @@ -125,16 +125,15 @@ private[kafka010] class KafkaSourceRDD( context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]] = { val sourcePartition = thePart.asInstanceOf[KafkaSourceRDDPartition] val topic = sourcePartition.offsetRange.topic -if (!reuseKafkaConsumer) { - // if we can't reuse CachedKafkaConsumers, let's reset the groupId to something unique - // to each task (i.e., append the task's unique partition id), because we will have - // multiple tasks (e.g., in the case of union) reading from the same topic partitions - val old = executorKafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] - val id = TaskContext.getPartitionId() - executorKafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, old + "-" + id) -} val kafkaPartition = sourcePartition.offsetRange.partition -val consumer = CachedKafkaConsumer.getOrCreate(topic, kafkaPartition, executorKafkaParams) +val consumer = + if (!reuseKafkaConsumer) { +// If we can't reuse CachedKafkaConsumers, creating a new CachedKafkaConsumer. As here we +// uses `assign`, we don't need to worry about the "group.id" conflicts. +new CachedKafkaConsumer(new TopicPartition(topic, kafkaPartition), executorKafkaParams) --- End diff -- Would be more consistent with `getOrCreate` if you just add `create` method to CachedKafkaConsumer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17752: [SPARK-20452][SS][Kafka]Fix a potential Concurren...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17752#discussion_r113080828 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -95,8 +95,10 @@ private[kafka010] class KafkaOffsetReader( * Closes the connection to Kafka, and cleans up state. */ def close(): Unit = { -consumer.close() -kafkaReaderThread.shutdownNow() +runUninterruptibly { --- End diff -- nvm. i understand that `runUninterruptibly` ensures that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org