[GitHub] spark pull request #21055: [SPARK-23693][SQL] Functions to generate UUIDs
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/21055 [SPARK-23693][SQL] Functions to generate UUIDs ## What changes were proposed in this pull request? The following functions are implemented and available in the `functions` object of SQL API: - time_based_uuid() - random_based_uuid() UUIDs are generated with help of [java-uuid-generator](https://github.com/cowtowncoder/java-uuid-generator). This PR replaces a custom random-based UUID generator that previously was used in some parts of the code. In addition, it provides a new function for time-based UUIDs. For backward compatibility, the new `random_based_uuid()` functions produces same UUID values for retries on the same data set. Thus, the new function is consistent with the legacy `uuid()` function. ## How was this patch tested? Unit tests on the new functions as well as on SQL expressions implementing these functions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-23693 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21055.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21055 commit 37fb2f5730fdf52987873367e057fce48810a6c3 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-10T20:32:22Z [SPARK-23693][SQL] Implement time_based_uuid() function commit df5124aa1edc047fe2977bcf8eb2e1f6a1e842a1 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T05:57:22Z Follow updates in the contract commit 71481e55f4d5ca0547280c958bef625d2e9373b1 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T20:50:28Z [SPARK-23693][SQL] Implement random_based_uuid() function commit 9e45a87a90d81197e677790b7b94c8526d29f635 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T21:08:54Z [SPARK-23693][SQL] Refactor: Extract common functions for UUID SQL expressions to an abstract superclass commit 931a737be7c0ee8c0de1da8fcb546e8d87056efc Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T21:14:59Z [SPARK-23693][SQL] Annotate UUID expressions with @ExpressionDescription commit 1ff072cd0506bd28c0b303d14f4fd7f6c30a69fb Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:18:41Z [SPARK-23693][SQL] Fix random-based UUID: must give same values for retries on the same data frame commit 19c91c5ade5330c4127112dc902da3996e42fffd Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:21:43Z [SPARK-23693][SQL] Switch to new implementation of random-based UUIDs commit 117538be5c87017917c9e9c1ea25432142393925 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:25:08Z [SPARK-23693][SQL] Fix code style violation commit ee850d9244ba4bcc8e8a21fd669a98082a9be08e Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:28:06Z [SPARK-23693][SQL] Cleanup code commit 1f3b700e90c512afb31f28eec2d07462c633cbd7 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:33:37Z [SPARK-23693][SQL] For UUID functions, document behavior on retries commit bcb086f9940da2b26a34ae73553fc6924d587ec2 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-11T22:40:02Z [SPARK-23693][SQL] Remove unused code related to old UUID implementation commit 878c0070cbbdce8ef6c2690605dab0362eb8027e Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-12T08:21:19Z Merge remote-tracking branch 'upstream/master' into SPARK-23693 commit 542dbdefb4ff99bbb1c7caedab2f23a6914de0f2 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-12T11:14:44Z [SPARK-23693][SQL] Switch to new implementation of random-based UUIDs commit d9c560c26cb256693f6e8879f875382d41959a3e Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-12T16:09:33Z [SPARK-23693][SQL] Javadoc conventions commit 77e65d8cfa07a5eca7890ffc12bbed917312a35f Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-04-12T16:11:29Z Merge remote-tracking branch 'upstream/master' into SPARK-23693 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20578: [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/20578#discussion_r167442376 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") ( } private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel = { +val handlePersistence = dataset.storageLevel == StorageLevel.NONE + val data = dataset.select($(itemsCol)) -val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[T](0).toArray) +val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[Any](0).toArray) --- End diff -- An interesting curiosity for me: why FPGrowth contract requires `Array` of items, not `Seq`? First, it's strange for the contract to require a specific implementation rather than an interface. Second, this leads to redundant `toArray` and back `toSeq` transformations. `Seq` would be more convenient, as `Row` class has `getSeq` method but does not have `getArray`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20578: [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/20578#discussion_r167441224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") ( } private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel = { +val handlePersistence = dataset.storageLevel == StorageLevel.NONE + val data = dataset.select($(itemsCol)) -val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[T](0).toArray) +val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[Any](0).toArray) --- End diff -- It is not only for caching. Same ArrayStoreException occurs if one tries to execute collect() on the items RDD. No exception when using a concrete type like String instead of T. Probably the latter explains how it worked before - people invoked dataset.cache() in their code where type parameter of the Dataset is known. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20578: [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/20578#discussion_r167439260 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -158,18 +159,30 @@ class FPGrowth @Since("2.2.0") ( } private def genericFit[T: ClassTag](dataset: Dataset[_]): FPGrowthModel = { +val handlePersistence = dataset.storageLevel == StorageLevel.NONE + val data = dataset.select($(itemsCol)) -val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[T](0).toArray) +val items = data.where(col($(itemsCol)).isNotNull).rdd.map(r => r.getSeq[Any](0).toArray) --- End diff -- Yes it is necessary. Otherwise cache() on items RDD leads to `ArrayStoreException`. It seems that due to type erasure instances of `Array[Nothing]` are created. But `toArray` attemps to add instances of `java.lang.Object`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20578: [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input...
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/20578 [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input data is not cached ## What changes were proposed in this pull request? Cache the RDD of items in ml.FPGrowth before passing it to mllib.FPGrowth. Cache only when the user did not cache the input dataset of transactions. This fixes the warning about uncached data emerging from mllib.FPGrowth. ## How was this patch tested? Manually: 1. Run ml.FPGrowthExample - warning is there 2. Apply the fix 3. Run ml.FPGrowthExample again - no warning anymore You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-23318 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20578.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20578 commit d17d3fbee84fcb0072d3030f3118ca18ce783e0c Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-10T21:16:51Z [SPARK-23318][ML]Workaround for 'ArrayStoreException: [Ljava.lang.Object' when trying to cache the RDD of items. commit e0eb8519bf09db12f5d5bc426eaf17d6488e05c1 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-11T15:21:39Z [SPARK-23318][ML] Cache the RDD of items if the user did not cache the input dataset of transactions. This should eliminate the warning about uncahed data in mllib.FPGrowth. commit 374a49c2bf447f3ddfed655f6eda9c8cd5f45285 Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-02-11T15:23:58Z Merge remote-tracking branch 'upstream/master' into SPARK-23318 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: [Minor][DOC] Fix the path to the examples jar
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/20349 @jerryshao Not found yet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20349: Fix the path to the examples jar
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/20349 Fix the path to the examples jar ## What changes were proposed in this pull request? The example jar file is now in ./examples/jars directory of Spark distribution. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20349.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20349 commit 20d502fd2a271fcec1614a909c3e89934e81582e Author: Arseniy Tashoyan <tashoyan@...> Date: 2018-01-22T08:25:17Z Fix the path to the examples jar --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19711: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan closed the pull request at: https://github.com/apache/spark/pull/19711 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19711: [SPARK-22471][SQL] SQLListener consumes much memory caus...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/19711 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19711: [SPARK-22471][SQL] SQLListener consumes much memory caus...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/19711 Corrupted build node? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan closed the pull request at: https://github.com/apache/spark/pull/19700 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150100750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- Done for branch-2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/19700 Done for branch-2.2: #19711 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19711: [SPARK-22471][SQL] SQLListener consumes much memo...
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/19711 [SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError ## What changes were proposed in this pull request? This PR addresses the issue [SPARK-22471](https://issues.apache.org/jira/browse/SPARK-22471). The modified version of `SQLListener` respects the setting `spark.ui.retainedStages` and keeps the number of the tracked stages within the specified limit. The hash map `_stageIdToStageMetrics` does not outgrow the limit, hence overall memory consumption does not grow with time anymore. A 2.2-compatible fix. Maybe incompatible with 2.3 due to #19681. ## How was this patch tested? A new unit test covers this fix - see `SQLListenerMemorySuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-22471-branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19711.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19711 commit 08b7c82be3effe094e40618fe992d3c50c3e2d98 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T15:41:36Z Add reproducer for the issue SPARK-22471 commit 2502a7e9846e359d793c485db1d3abef8a2c1e12 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T15:41:54Z Add fix for the issue SPARK-22471 commit 2a13530db9ec611b6ee55fc9d79bd8aac5c01862 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T20:39:02Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 98f7b23fb52ffd11ae92716c871e5aa06ea61428 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T20:47:44Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 80755ece91703b3b6436f88e14eb11251ae6678f Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T21:21:42Z Collect memory-related tests on SQLListener in the same suite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/19700 Well, it would be good to have this quick fix in a 2.2-compatible bugfix release, without waiting for 2.3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user tashoyan commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150052324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- @dongjoon-hyun , It is already documented in the same file configuration.md: ``` How many stages the Spark UI and status APIs remember before garbage collecting. This is a target maximum, and fewer elements may be retained in some circumstances. ``` I did not involve a new parameter, I just used an existing one. Regarding renaming to `spark.sql.ui.retainedStages`, I believe it should be done in a separate pull request - if should. This parameter is also used in other parts of Spark code, not only SQL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19700: [SPARK-22471][SQL] SQLListener consumes much memory caus...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/19700 @vanzin would you like to review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/19700 [SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError ## What changes were proposed in this pull request? This PR addresses the issue [SPARK-22471](https://issues.apache.org/jira/browse/SPARK-22471). The modified version of `SQLListener` respects the setting `spark.ui.retainedStages` and keeps the number of the tracked stages within the specified limit. The hash map `_stageIdToStageMetrics` does not outgrow the limit, hence overall memory consumption does not grow with time anymore. ## How was this patch tested? A new unit test covers this fix - see `SQLListenerMemorySuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-22471 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19700.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19700 commit 0388f6ce50d568a0493e7959ec005ee5afc20bd0 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T15:41:36Z Add reproducer for the issue SPARK-22471 commit 42e80272cf0926f0fd978e6b7617685987d8fc93 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T15:41:54Z Add fix for the issue SPARK-22471 commit 2f793ad1f001bc58dd09fa4eaec6ae423445f86f Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T20:39:02Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 4780d95b7d58df741eb8d5756c8109fc7dbfb457 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T20:47:44Z Remove debug print and irrelevant checks. Add a reference to the issue. commit 79c83a715d4a36ad00ff3888e8e2953fcc163d17 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-11-08T21:21:42Z Collect memory-related tests on SQLListener in the same suite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18885: [SPARK-21668][CORE] Ability to run driver program...
Github user tashoyan closed the pull request at: https://github.com/apache/spark/pull/18885 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18885: [SPARK-21668][CORE] Ability to run driver programs withi...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/18885 So the working configuration is: * set `spark.driver.host` to the IP address of the host machine * set `spark.driver.bindAddress` to the IP address of the container I tried this configuration with Spark 2.1.1 and 2.2.0. Works fine! @vanzin thank you for pointing me to the right issue. I think we can close this PR and mark [SPARK-21668](https://issues.apache.org/jira/browse/SPARK-21668) as a duplicate of [SPARK-4563](https://issues.apache.org/jira/browse/SPARK-4563). The working Docker image is here: [docker-spark-submit](https://github.com/tashoyan/docker-spark-submit). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18885: [SPARK-21668][CORE] Ability to run driver programs withi...
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/18885 @vanzin @srowen @jerryshao would you you review please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18885: [SPARK-21668][CORE] Ability to run driver program...
GitHub user tashoyan opened a pull request: https://github.com/apache/spark/pull/18885 [SPARK-21668][CORE] Ability to run driver programs within a container ## What changes were proposed in this pull request? When running inside a container, driver program provides a driver host set to the container IP address. This IP address is visible only on the machine where the container is running. Spark executors running on other machines are not able to communicate to the driver program. Now driver program may use standard SPARK_PUBLIC_DNS variable in order to expose driver host to executors. Just declare SPARK_PUBLIC_DNS= in spark-env.sh within the container. Thanks to exposed ports, all requests from executors are forwarded to the driver program within the container. ## How was this patch tested? I have tested this modification manually. I have a Spark cluster on 3 machines. I run my Spark application in a Docker container; the host machine belongs to the same network as my Spark cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tashoyan/spark SPARK-21668 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18885.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18885 commit 6b16afb3dbfed3c745820bc3e727b4c9a13017f7 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-08-08T11:58:45Z Driver program should advertise the hostname specified in SPARK_PUBLIC_DNS if specified commit 12c0b901ea109f7389d1c83afdc16817c2cb0cfd Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-08-08T12:08:28Z Worker keeps driver on the host provided in the driverUrl. It may differ from the original spark.driver.host value if the driver specified SPARK_PUBLIC_DNS. commit bd7399c1552768d54ab7c8cfc1dfeb27667c7f95 Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-08-08T12:11:27Z When starting executor, take SPARK_PUBLIC_DNS into account for the driver url commit e16d334eb58efe2375f4c85d77739ca3bacccecd Author: Arseniy Tashoyan <tasho...@gmail.com> Date: 2017-08-08T12:49:54Z Honor checkstyle --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org