[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/22615 i haven't had a chance to do any of the jenkins stuff... after being sidetracked by the conversation to move the configs to the spark repo, plus planning for our big event that starts tomorrow, plus zomgmeetings all day today, work won't be able to start until early next week. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97207/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97207/testReport)** for PR 22688 at commit [`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SchemaReadAttemptException(m: String) extends RuntimeException(m)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #97213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97213/testReport)** for PR 20761 at commit [`55c7be9`](https://github.com/apache/spark/commit/55c7be938a6180f0091d6d57db94521d9f8c0abb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22689 Hi @srowen . Yes. Event logs are available for running apps, but with the extension, ".inprogress". We can open webui from the history server for both running and finished applications. This is working for all compressed codecs (lz4, snappy, lzf) , also for uncompressed event logs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #97212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97212/testReport)** for PR 20761 at commit [`7e7a55a`](https://github.com/apache/spark/commit/7e7a55a745ce41b77a5b5811cf72ae1721d713fa). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97212/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #97212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97212/testReport)** for PR 20761 at commit [`7e7a55a`](https://github.com/apache/spark/commit/7e7a55a745ce41b77a5b5811cf72ae1721d713fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97211/testReport)** for PR 22504 at commit [`2bbf137`](https://github.com/apache/spark/commit/2bbf1370ab72a0183cc3ccdc04851b2b6dadf5dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22689 Should the Event Log be available for running apps? Or if it's not going to work, disable it where it can't be shown, but I suppose that could be difficult. This just silently sends you back to the jobs page? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97202/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22657 **[Test build #97202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97202/testReport)** for PR 22657 at commit [`778629d`](https://github.com/apache/spark/commit/778629dbca572d8a8099ad2893f78d16d30920e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224195358 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -288,9 +296,14 @@ private[yarn] class YarnAllocator( s"executorsStarting: ${numExecutorsStarting.get}") if (missing > 0) { - logInfo(s"Will request $missing executor container(s), each with " + -s"${resource.getVirtualCores} core(s) and " + -s"${resource.getMemory} MB memory (including $memoryOverhead MB of overhead)") + var requestContainerMessage = s"Will request $missing executor container(s), each with " + --- End diff -- Fair enough, added a condition to check whether it's enabled. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22689 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22689 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22689 cc @vanzin @srowen . Kindly review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22689 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224194497 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("resource request for invalid resource") { --- End diff -- Okay, I agree with that. Then I think just removal is fine here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22689: [SPARK-25697][CORE]When zstd compression enabled ...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/22689 [SPARK-25697][CORE]When zstd compression enabled in progress application is throwing Error is throwing in the history webui⦠## What changes were proposed in this pull request? When we enable event log compression and compression codec as 'zstd', we are unable to open the webui of the running application from the history server page. The reason is that, Replay listener was unable to read from the zstd compressed eventlog due to the zstd frame was not finished yet. This causes truncated error while reading the eventLog. So, when we try to open the WebUI from the History server page, it throws "truncated error ", and we never able to open running application in the webui, when we enable zstd compression. In this PR, when the IO excpetion happens, and if it is a running application, we log the error, "Failed to read Spark event log: evetLogDirAppName.inprogress", instead of throwing exception. ## How was this patch tested? Test steps: 1)spark.eventLog.compress = true 2)spark.io.compression.codec = zstd 3)restart history server 4) launch bin/spark-shell 5) run some queries 6) Open history server page 7) click on the application Before fix: ![screenshot from 2018-10-10 23-52-12](https://user-images.githubusercontent.com/23054875/46757387-9b4fa580-cce7-11e8-96ad-8938400483ed.png) ![screenshot from 2018-10-10 23-52-28](https://user-images.githubusercontent.com/23054875/46757393-a0145980-cce7-11e8-8cb0-44b583dde648.png) After fix: ![screenshot from 2018-10-10 23-43-49](https://user-images.githubusercontent.com/23054875/46756971-6858e200-cce6-11e8-946c-0bffebb2cfba.png) ![screenshot from 2018-10-10 23-44-05](https://user-images.githubusercontent.com/23054875/46756981-6d1d9600-cce6-11e8-95ea-ff8339a2fdfd.png) (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark SPARK-25697 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22689 commit f6e971f37d39642966689201868420da2441ffa0 Author: Shahid Date: 2018-10-10T17:50:48Z When zstd compression enabled in progress application is throwing Error in UI --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user abellina commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224194007 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala --- @@ -102,4 +103,124 @@ private[v1] class StagesResource extends BaseAppResource { withUI(_.store.taskList(stageId, stageAttemptId, offset, length, sortBy)) } + // This api needs to stay formatted exactly as it is below, since, it is being used by the + // datatables for the stages page. + @GET + @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable") + def taskTable( +@PathParam("stageId") stageId: Int, +@PathParam("stageAttemptId") stageAttemptId: Int, +@QueryParam("details") @DefaultValue("true") details: Boolean, +@Context uriInfo: UriInfo): + HashMap[String, Object] = { +withUI { ui => + val uriQueryParameters = uriInfo.getQueryParameters(true) + val totalRecords = uriQueryParameters.getFirst("numTasks") + var isSearch = false + var searchValue: String = null + var filteredRecords = totalRecords + var _tasksToShow: Seq[TaskData] = null + if (uriQueryParameters.getFirst("search[value]") != null && +uriQueryParameters.getFirst("search[value]").length > 0) { +_tasksToShow = doPagination(uriQueryParameters, stageId, stageAttemptId, true, + totalRecords.toInt) +isSearch = true +searchValue = uriQueryParameters.getFirst("search[value]") + } else { +_tasksToShow = doPagination(uriQueryParameters, stageId, stageAttemptId, false, + totalRecords.toInt) + } + val ret = new HashMap[String, Object]() + if (_tasksToShow.nonEmpty) { +// Performs server-side search based on input from user +if (isSearch) { + val filteredTaskList = filterTaskList(_tasksToShow, searchValue) + filteredRecords = filteredTaskList.length.toString + if (filteredTaskList.length > 0) { +val pageStartIndex = uriQueryParameters.getFirst("start").toInt +val pageLength = uriQueryParameters.getFirst("length").toInt +ret.put("aaData", filteredTaskList.slice(pageStartIndex, pageStartIndex + pageLength)) + } else { +ret.put("aaData", filteredTaskList) + } +} else { + ret.put("aaData", _tasksToShow) +} + } else { +ret.put("aaData", _tasksToShow) + } + ret.put("recordsTotal", totalRecords) + ret.put("recordsFiltered", filteredRecords) + ret +} + } + + // Performs pagination on the server side + def doPagination(queryParameters: MultivaluedMap[String, String], stageId: Int, +stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): Seq[TaskData] = { +val queryParams = queryParameters.keySet() +var columnToSort = 0 +if (queryParams.contains("order[0][column]")) { + columnToSort = queryParameters.getFirst("order[0][column]").toInt +} +var columnNameToSort = queryParameters.getFirst("columns[" + columnToSort + "][name]") +if (columnNameToSort.equalsIgnoreCase("Logs")) { + columnNameToSort = "Index" + columnToSort = 0 +} +val isAscendingStr = queryParameters.getFirst("order[0][dir]") +var pageStartIndex = 0 +var pageLength = totalRecords +if (!isSearch) { + pageStartIndex = queryParameters.getFirst("start").toInt + pageLength = queryParameters.getFirst("length").toInt +} +return withUI(_.store.taskList(stageId, stageAttemptId, pageStartIndex, pageLength, + indexName(columnNameToSort), isAscendingStr.equalsIgnoreCase("asc"))) + } + + // Filters task list based on search parameter + def filterTaskList( +taskDataList: Seq[TaskData], +searchValue: String): Seq[TaskData] = { +val defaultOptionString: String = "d" +// The task metrics dummy object below has been added to avoid throwing exception in cases +// when task metrics for a particular task do not exist as of yet +val dummyTaskMetrics: TaskMetrics = new TaskMetrics(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + new InputMetrics(0, 0), new OutputMetrics(0, 0), + new ShuffleReadMetrics(0, 0, 0, 0, 0, 0, 0), new ShuffleWriteMetrics(0, 0, 0)) +val searchValueLowerCase = searchValue.toLowerCase(Locale.ROOT) +val containsValue = (taskDataParams: Any) => taskDataParams.toString.toLowerCase( + Locale.ROOT).contains(searchValueLowerCase) +val filteredTaskDataSequence: Seq[TaskData] = taskDataList.filter(f => + (containsValue(f.taskId) || containsValue(f.index)
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224193837 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -87,6 +88,20 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter def createAllocator( maxExecutors: Int = 5, rmClient: AMRMClient[ContainerRequest] = rmClient): YarnAllocator = { +createAllocatorInternal(maxExecutors, rmClient, Map()) + } + + def createAllocatorWithAdditionalConfigs( --- End diff -- Fair enough, fixed. Did similarly with the `createResource` method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22674 **[Test build #97203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97203/testReport)** for PR 22674 at commit [`3ffa536`](https://github.com/apache/spark/commit/3ffa536f3c29f6655843a4d45c215393f51e23c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224190144 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -433,4 +465,36 @@ class ClientSuite extends SparkFunSuite with Matchers { classpath(env) } + private def testResourceRequest(expectedResources: Seq[(String, Long)], --- End diff -- Good point, fixed! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97210/testReport)** for PR 22504 at commit [`7fb94fd`](https://github.com/apache/spark/commit/7fb94fd5273f19d91ec74584d2efa6b33ce371c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22684 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...
Github user ankuriitg commented on a diff in the pull request: https://github.com/apache/spark/pull/22504#discussion_r224189470 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -806,6 +806,22 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) } // Clean the blacklist from the expired entries. clearBlacklist(CLEAN_INTERVAL_S) + +// Delete driver logs from the configured spark dfs dir that exceed the configured max age +try { + val hdfsDir = conf.get("spark.driver.log.dfsDir") + val appDirs = fs.listLocatedStatus(new Path(hdfsDir)) + while (appDirs.hasNext()) { +val appDirStatus = appDirs.next() +if (appDirStatus.getModificationTime() < maxTime) { + logInfo(s"Deleting expired driver log for: ${appDirStatus.getPath().getName()}") + deleteLog(appDirStatus.getPath()) --- End diff -- Added the new configurations, with fallback option to existing ones. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224189326 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("resource request for invalid resource") { --- End diff -- You already sort of do that in the tests that call `initializeResourceTypes`, don't you? Otherwise that's a YARN test, it's not interesting to test that functionality in Spark because if it's broken, Spark cannot fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22674 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22674 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97203/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22684 Merged into master. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22684 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3857/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22684 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22684 **[Test build #97209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97209/testReport)** for PR 22684 at commit [`9d9ed2f`](https://github.com/apache/spark/commit/9d9ed2f8295e2d61ff880860abf0e8551afce04b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224187532 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers { appContext.getMaxAppAttempts should be (42) } + test("resource request for invalid resource") { --- End diff -- Yes, this is very YARN-specific unlike the other testcases that verifies the resources in the app context. Removed this testcase. Do you think of any other way to test the scenario when a resource is specified in sparkConf which is not known for YARN? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224186347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -138,39 +138,75 @@ private[sql] object OrcFilters { dataTypeMap: Map[String, DataType], expression: Filter, builder: Builder): Option[Builder] = { +createBuilder(dataTypeMap, expression, builder, canPartialPushDownConjuncts = true) + } + + /** + * @param dataTypeMap a map from the attribute name to its data type. + * @param expression the input filter predicates. + * @param builder the input SearchArgument.Builder. + * @param canPartialPushDownConjuncts whether a subset of conjuncts of predicates can be pushed + *down safely. Pushing ONLY one side of AND down is safe to + *do at the top level or none of its ancestors is NOT and OR. + * @return the builder so far. + */ + private def createBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter, + builder: Builder, + canPartialPushDownConjuncts: Boolean): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = getPredicateLeafType(dataTypeMap(attribute)) import org.apache.spark.sql.sources._ expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = createBuilder(dataTypeMap, left, + newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) +(leftBuilderOption, rightBuilderOption) match { + case (Some(_), Some(_)) => +for { + lhs <- createBuilder(dataTypeMap, left, +builder.startAnd(), canPartialPushDownConjuncts) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts) +} yield rhs.end() + + case (Some(_), None) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, left, builder, canPartialPushDownConjuncts) + + case (None, Some(_)) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, right, builder, canPartialPushDownConjuncts) + + case _ => None +} case Or(left, right) => for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) + _ <- createBuilder(dataTypeMap, left, newBuilder, canPartialPushDownConjuncts = false) + _ <- createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts = false) + lhs <- createBuilder(dataTypeMap, left, +builder.startOr(), canPartialPushDownConjuncts = false) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts = false) } yield rhs.end() case Not(child) => for { - _ <- buildSearchArgument(dataTypeMap, child, newBuilder) - negate <-
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224185444 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.{Integer => JInteger, Long => JLong} +import java.lang.reflect.InvocationTargetException + +import scala.collection.mutable +import scala.util.Try + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX} +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the YARN API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +private object ResourceRequestHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + private val RESOURCE_INFO_CLASS = "org.apache.hadoop.yarn.api.records.ResourceInformation" + private val ERROR_PREFIX: String = "Error:" + + /** + * Validates sparkConf and throws a SparkException if any of standard resources (memory or cores) + * is defined with the property spark.yarn.x.resource.y + * @param sparkConf + */ + def validateResources(sparkConf: SparkConf): Unit = { +val resourceDefinitions = Seq[(String, String)]( + (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"), + (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"), + (DRIVER_MEMORY.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"), + (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"), + (EXECUTOR_MEMORY.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "memory"), + (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores")) +val errorMessage = new mutable.StringBuilder() + +resourceDefinitions.foreach { case (sparkName, resourceRequest) => + if (sparkConf.contains(resourceRequest)) { +errorMessage.append(s"$ERROR_PREFIX Do not use $resourceRequest, " + +s"please use $sparkName instead!\n") + } +} + +if (errorMessage.nonEmpty) { + throw new SparkException(errorMessage.toString()) +} + } + + /** + * Sets resource amount with the corresponding unit to the passed resource object. + * @param resources resource values to set + * @param resource resource object to update + */ + def setResourceRequests( + resources: Map[String, String], + resource: Resource): Unit = { +require(resource != null, "Resource parameter should not be null!") +require(!resources.contains("memory"), --- End diff -- That's true, removed these require checks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22684 LGTM. Just some styling feedback. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224179579 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging { expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = createBuilder(dataTypeMap, left, + newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) +(leftBuilderOption, rightBuilderOption) match { + case (Some(_), Some(_)) => +for { + lhs <- createBuilder(dataTypeMap, left, +builder.startAnd(), canPartialPushDownConjuncts) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts) +} yield rhs.end() + + case (Some(_), None) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, left, builder, canPartialPushDownConjuncts) + + case (None, Some(_)) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, right, builder, canPartialPushDownConjuncts) + + case _ => None +} case Or(left, right) => for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) + _ <- createBuilder(dataTypeMap, left, newBuilder, canPartialPushDownConjuncts = false) + _ <- createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts = false) + lhs <- createBuilder(dataTypeMap, left, +builder.startOr(), canPartialPushDownConjuncts = false) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts = false) } yield rhs.end() case Not(child) => for { - _ <- buildSearchArgument(dataTypeMap, child, newBuilder) - negate <- buildSearchArgument(dataTypeMap, child, builder.startNot()) + _ <- createBuilder(dataTypeMap, child, newBuilder, canPartialPushDownConjuncts = false) + negate <- createBuilder(dataTypeMap, child, builder.startNot(), false) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224179447 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging { expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = createBuilder(dataTypeMap, left, + newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) +(leftBuilderOption, rightBuilderOption) match { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224179095 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.{BeforeAndAfterAll, Matchers} + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX} +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + override def beforeAll(): Unit = { +super.beforeAll() + } + + private def getExpectedUnmatchedErrorMessage(name: String, value: String): String = { +s"Resource request for '$name' ('$value') does not match pattern ([0-9]+)([A-Za-z]*)." + } + + test("resource type value does not match pattern") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "**@#") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#")) + } + + test("resource type just unit defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "m") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m")) + } + + test("resource type with null value should not be allowed") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, null) +} +thrown.getMessage should equal ("requirement failed: Resource parameter should not be null!") + } + + test("resource type with valid value and invalid unit") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp") +val resource = createAResource + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes,
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224178817 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.{BeforeAndAfterAll, Matchers} + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX} +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + override def beforeAll(): Unit = { +super.beforeAll() + } + + private def getExpectedUnmatchedErrorMessage(name: String, value: String): String = { +s"Resource request for '$name' ('$value') does not match pattern ([0-9]+)([A-Za-z]*)." + } + + test("resource type value does not match pattern") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "**@#") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#")) + } + + test("resource type just unit defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "m") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m")) + } + + test("resource type with null value should not be allowed") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, null) +} +thrown.getMessage should equal ("requirement failed: Resource parameter should not be null!") + } + + test("resource type with valid value and invalid unit") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp") +val resource = createAResource + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes,
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224178567 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); +var queryString = document.baseURI.split('?'); +var words = document.baseURI.split('/'); +var stageId = queryString[1].split("&").filter(word => word.includes("id="))[0].split("=")[1]; +if (ind > 0) { +var appId = urlArray[2]; +var indexOfProxy = words.indexOf("proxy"); +var newBaseURI = words.slice(0, indexOfProxy + 2).join('/'); +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} +ind = urlArray.indexOf("history"); +if (ind > 0) { +var appId = urlArray[2]; +var appAttemptId = urlArray[ind + 2]; +var indexOfHistory = words.indexOf("history"); +var newBaseURI = words.slice(0, indexOfHistory).join('/'); +if (isNaN(appAttemptId) || appAttemptId == "0") { +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} else { +return newBaseURI + "/api/v1/applications/" + appId + "/" + appAttemptId + "/stages/" + stageId; +} +} +return location.origin + "/api/v1/applications/" + appId + "/stages/" + stageId; +} + +function getColumnNameForTaskMetricSummary(columnKey) { +switch(columnKey) { +case "executorRunTime": +return "Duration"; +break; + +case "jvmGcTime": +return "GC Time"; +break; + +case "gettingResultTime": +return "Getting Result Time"; +break; + +case "inputMetrics": +return "Input Size / Records"; +break; + +case "outputMetrics": +return "Output Size / Records"; +break; + +case "peakExecutionMemory": +return "Peak Execution Memory"; +break; + +case "resultSerializationTime": +return "Result Serialization Time"; +break; + +case "schedulerDelay": +return "Scheduler Delay"; +break; + +case "diskBytesSpilled": +return "Shuffle spill (disk)"; +break; + +case "memoryBytesSpilled": +return "Shuffle spill (memory)"; +break; + +
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224178252 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.{BeforeAndAfterAll, Matchers} + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX} +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + override def beforeAll(): Unit = { +super.beforeAll() + } + + private def getExpectedUnmatchedErrorMessage(name: String, value: String): String = { +s"Resource request for '$name' ('$value') does not match pattern ([0-9]+)([A-Za-z]*)." + } + + test("resource type value does not match pattern") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "**@#") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#")) + } + + test("resource type just unit defined") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "m") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m")) + } + + test("resource type with null value should not be allowed") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List()) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, null) +} +thrown.getMessage should equal ("requirement failed: Resource parameter should not be null!") + } + + test("resource type with valid value and invalid unit") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp") +val resource = createAResource + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes,
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224178237 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -138,39 +138,75 @@ private[sql] object OrcFilters { dataTypeMap: Map[String, DataType], expression: Filter, builder: Builder): Option[Builder] = { +createBuilder(dataTypeMap, expression, builder, canPartialPushDownConjuncts = true) + } + + /** + * @param dataTypeMap a map from the attribute name to its data type. + * @param expression the input filter predicates. + * @param builder the input SearchArgument.Builder. + * @param canPartialPushDownConjuncts whether a subset of conjuncts of predicates can be pushed + *down safely. Pushing ONLY one side of AND down is safe to + *do at the top level or none of its ancestors is NOT and OR. + * @return the builder so far. + */ + private def createBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter, + builder: Builder, + canPartialPushDownConjuncts: Boolean): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = getPredicateLeafType(dataTypeMap(attribute)) import org.apache.spark.sql.sources._ expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = createBuilder(dataTypeMap, left, + newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) +(leftBuilderOption, rightBuilderOption) match { + case (Some(_), Some(_)) => +for { + lhs <- createBuilder(dataTypeMap, left, +builder.startAnd(), canPartialPushDownConjuncts) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts) +} yield rhs.end() + + case (Some(_), None) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, left, builder, canPartialPushDownConjuncts) + + case (None, Some(_)) if canPartialPushDownConjuncts => +createBuilder(dataTypeMap, right, builder, canPartialPushDownConjuncts) + + case _ => None +} case Or(left, right) => for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) + _ <- createBuilder(dataTypeMap, left, newBuilder, canPartialPushDownConjuncts = false) + _ <- createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts = false) + lhs <- createBuilder(dataTypeMap, left, +builder.startOr(), canPartialPushDownConjuncts = false) + rhs <- createBuilder(dataTypeMap, right, lhs, canPartialPushDownConjuncts = false) } yield rhs.end() case Not(child) => for { - _ <- buildSearchArgument(dataTypeMap, child, newBuilder) - negate <-
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r224177875 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import org.apache.hadoop.yarn.api.records.Resource +import org.apache.hadoop.yarn.util.Records +import org.scalatest.{BeforeAndAfterAll, Matchers} + +import org.apache.spark.{SparkConf, SparkException, SparkFunSuite} +import org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation +import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX} +import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY} + +class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with BeforeAndAfterAll { + + private val CUSTOM_RES_1 = "custom-resource-type-1" + private val CUSTOM_RES_2 = "custom-resource-type-2" + private val MEMORY = "memory" + private val CORES = "cores" + private val NEW_CONFIG_EXECUTOR_MEMORY = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_EXECUTOR_CORES = YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES + private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX + MEMORY + private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX + CORES + + override def beforeAll(): Unit = { +super.beforeAll() + } + + private def getExpectedUnmatchedErrorMessage(name: String, value: String): String = { +s"Resource request for '$name' ('$value') does not match pattern ([0-9]+)([A-Za-z]*)." + } + + test("resource type value does not match pattern") { +assume(ResourceRequestHelper.isYarnResourceTypesAvailable()) +ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1)) + +val resourceTypes = Map(CUSTOM_RES_1 -> "**@#") + +val thrown = intercept[IllegalArgumentException] { + ResourceRequestHelper.setResourceRequests(resourceTypes, createAResource) +} +thrown.getMessage should equal (getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#")) + } + + test("resource type just unit defined") { --- End diff -- Good idea. Extracted the common parts of all test methods in this class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224176616 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html --- @@ -0,0 +1,124 @@ + +
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22657 **[Test build #97201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97201/testReport)** for PR 22657 at commit [`5f8bf79`](https://github.com/apache/spark/commit/5f8bf7956665f19d758bf4fb3b91927c77bd1691). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224174206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -138,39 +138,75 @@ private[sql] object OrcFilters { dataTypeMap: Map[String, DataType], expression: Filter, builder: Builder): Option[Builder] = { +createBuilder(dataTypeMap, expression, builder, canPartialPushDownConjuncts = true) + } + + /** + * @param dataTypeMap a map from the attribute name to its data type. + * @param expression the input filter predicates. + * @param builder the input SearchArgument.Builder. + * @param canPartialPushDownConjuncts whether a subset of conjuncts of predicates can be pushed + *down safely. Pushing ONLY one side of AND down is safe to + *do at the top level or none of its ancestors is NOT and OR. + * @return the builder so far. + */ + private def createBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter, + builder: Builder, + canPartialPushDownConjuncts: Boolean): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = getPredicateLeafType(dataTypeMap(attribute)) import org.apache.spark.sql.sources._ expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = createBuilder(dataTypeMap, left, + newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) --- End diff -- Can you make the format the same as `leftBuilderOption`? Also, add another empty line before `(leftBuilderOption, rightBuilderOption)`. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22678#discussion_r224172678 --- Diff: dev/run-tests-jenkins.py --- @@ -176,7 +177,8 @@ def main(): build_display_name = os.environ["BUILD_DISPLAY_NAME"] build_url = os.environ["BUILD_URL"] -commit_url = "https://github.com/apache/spark/commit/; + ghprb_actual_commit +project_url = os.getenv("SPARK_PROJECT_URL", "https://github.com/apache/spark;) +commit_url = project_url + "/commit/" + ghprb_actual_commit --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22288 **[Test build #97208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97208/testReport)** for PR 22288 at commit [`c361693`](https://github.com/apache/spark/commit/c361693a3d08a1bea1d2919f0a8e970c03959cc8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3856/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22615 Fine with me if the jenkins stuff is sorted out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22615 I tried a release build that causes `--pip` and `--r` to be set, and the result looked OK. Both pyspark and R packages built and seemed normal. The source build worked too and comes before binary builds, so I don't think it can be affected. I will go ahead and merge this, I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3855/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...
Github user dhruve commented on a diff in the pull request: https://github.com/apache/spark/pull/22288#discussion_r224167756 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -415,9 +419,61 @@ private[spark] class TaskSchedulerImpl( launchedAnyTask |= launchedTaskAtCurrentMaxLocality } while (launchedTaskAtCurrentMaxLocality) } + if (!launchedAnyTask) { - taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match { +case taskIndex: Some[Int] => // Returns the taskIndex which was unschedulable + + // If the taskSet is unschedulable we try to find an existing idle blacklisted + // executor. If we cannot find one, we abort immediately. Else we kill the idle + // executor and kick off an abortTimer which if it doesn't schedule a task within the + // the timeout will abort the taskSet if we were unable to schedule any task from the + // taskSet. + // Note 1: We keep track of schedulability on a per taskSet basis rather than on a per + // task basis. + // Note 2: The taskSet can still be aborted when there are more than one idle + // blacklisted executors and dynamic allocation is on. This can happen when a killed + // idle executor isn't replaced in time by ExecutorAllocationManager as it relies on + // pending tasks and doesn't kill executors on idle timeouts, resulting in the abort + // timer to expire and abort the taskSet. + executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) match { +case Some (x) => + val executorId = x._1 + if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) { +blacklistTrackerOpt.foreach(blt => blt.killBlacklistedIdleExecutor(executorId)) + +unschedulableTaskSetToExpiryTime(taskSet) = clock.getTimeMillis() +val timeout = conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000 +logInfo(s"Waiting for $timeout ms for completely " + + s"blacklisted task to be schedulable again before aborting $taskSet.") +abortTimer.schedule(new TimerTask() { + override def run() { +if (unschedulableTaskSetToExpiryTime.contains(taskSet) && + (unschedulableTaskSetToExpiryTime(taskSet) + timeout) +<= clock.getTimeMillis() --- End diff -- it doesn't fit within the 100 char limit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22688 **[Test build #97207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97207/testReport)** for PR 22688 at commit [`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22686: [SPARK-25700][SQL] Partially revert append mode s...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/22686 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224166110 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); +var queryString = document.baseURI.split('?'); +var words = document.baseURI.split('/'); +var stageId = queryString[1].split("&").filter(word => word.includes("id="))[0].split("=")[1]; +if (ind > 0) { +var appId = urlArray[2]; +var indexOfProxy = words.indexOf("proxy"); +var newBaseURI = words.slice(0, indexOfProxy + 2).join('/'); +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} +ind = urlArray.indexOf("history"); +if (ind > 0) { +var appId = urlArray[2]; +var appAttemptId = urlArray[ind + 2]; +var indexOfHistory = words.indexOf("history"); +var newBaseURI = words.slice(0, indexOfHistory).join('/'); +if (isNaN(appAttemptId) || appAttemptId == "0") { +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} else { +return newBaseURI + "/api/v1/applications/" + appId + "/" + appAttemptId + "/stages/" + stageId; +} +} +return location.origin + "/api/v1/applications/" + appId + "/stages/" + stageId; +} + +function getColumnNameForTaskMetricSummary(columnKey) { +switch(columnKey) { +case "executorRunTime": +return "Duration"; +break; + +case "jvmGcTime": +return "GC Time"; +break; + +case "gettingResultTime": +return "Getting Result Time"; +break; + +case "inputMetrics": +return "Input Size / Records"; +break; + +case "outputMetrics": +return "Output Size / Records"; +break; + +case "peakExecutionMemory": +return "Peak Execution Memory"; +break; + +case "resultSerializationTime": +return "Result Serialization Time"; +break; + +case "schedulerDelay": +return "Scheduler Delay"; +break; + +case "diskBytesSpilled": +return "Shuffle spill (disk)"; +break; + +case "memoryBytesSpilled": +return "Shuffle spill (memory)"; +break; + +
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22684 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97199/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22684 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22684 **[Test build #97199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97199/testReport)** for PR 22684 at commit [`c2eb87a`](https://github.com/apache/spark/commit/c2eb87ac8310a7045eb55ba7851a38ebf0498300). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224164134 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); +var queryString = document.baseURI.split('?'); +var words = document.baseURI.split('/'); +var stageId = queryString[1].split("&").filter(word => word.includes("id="))[0].split("=")[1]; +if (ind > 0) { +var appId = urlArray[2]; +var indexOfProxy = words.indexOf("proxy"); +var newBaseURI = words.slice(0, indexOfProxy + 2).join('/'); +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} +ind = urlArray.indexOf("history"); +if (ind > 0) { +var appId = urlArray[2]; +var appAttemptId = urlArray[ind + 2]; +var indexOfHistory = words.indexOf("history"); --- End diff -- similar here can't we get rid of one of the indexes and just use words to look up appId and appAttemptId? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224163278 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); +var queryString = document.baseURI.split('?'); +var words = document.baseURI.split('/'); +var stageId = queryString[1].split("&").filter(word => word.includes("id="))[0].split("=")[1]; +if (ind > 0) { +var appId = urlArray[2]; +var indexOfProxy = words.indexOf("proxy"); --- End diff -- we have index of proxy in 2 places can we get rid of one --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22593 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22593 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97205/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224162695 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); +var queryString = document.baseURI.split('?'); +var words = document.baseURI.split('/'); +var stageId = queryString[1].split("&").filter(word => word.includes("id="))[0].split("=")[1]; --- End diff -- all of these should be val not var --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22593 **[Test build #97205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97205/testReport)** for PR 22593 at commit [`0620ede`](https://github.com/apache/spark/commit/0620ede3e5aaebdbb954ee55ff3fbad6e1cc8fe2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224162463 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); +var ind = urlArray.indexOf("proxy"); --- End diff -- use val and name it proxyInd or something, use different variable for history --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r224162371 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,872 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +var blockUICount = 0; + +$(document).ajaxStop(function () { +if (blockUICount == 0) { +$.unblockUI(); +blockUICount++; +} +}); + +$(document).ajaxStart(function () { +if (blockUICount == 0) { +$.blockUI({message: 'Loading Stage Page...'}); +} +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"duration-pre": ConvertDurationString, + +"duration-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"duration-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. (history) https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1 +// e.g. (proxy) https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1 +function stageEndPoint(appId) { +var urlRegex = /https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm; +var urlArray = urlRegex.exec(document.baseURI); --- End diff -- should be a val don't see it changed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156599 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesExecutorSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for bootstraping the container with ConfigMaps + * containing Hadoop config files mounted as volumes and an ENV variable + * pointed to the mounted file directory. + */ +private[spark] class HadoopConfExecutorFeatureStep( +kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + override def configurePod(pod: SparkPod): SparkPod = { +val sparkConf = kubernetesConf.sparkConf +val hadoopConfDirCMapName = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) +require(hadoopConfDirCMapName.isDefined, + "Ensure that the env `HADOOP_CONF_DIR` is defined either in the client or " + +" using pre-existing ConfigMaps") +logInfo("HADOOP_CONF_DIR defined") +HadoopBootstrapUtil.bootstrapHadoopConfDir(None, None, hadoopConfDirCMapName, pod) + } + + override def getAdditionalPodSystemProperties(): Map[String, String] = Map.empty --- End diff -- No need to address it here but it feels like these methods should have default implementations, given that lots of classes just don't do anything with them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224162093 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import io.fabric8.kubernetes.api.model.SecretBuilder +import org.apache.commons.codec.binary.Base64 + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager + + /** --- End diff -- indentation is wrong. Really, when I asked you to go through all your code, I meant *all* your code, not just the code I commented on. Please do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156872 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import com.google.common.base.Charsets +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata} + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { --- End diff -- Ping. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161045 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** --- End diff -- indentation is wrong --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161661 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param existingKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: Option[String], + newKrb5ConfName: Option[String], + existingKrb5ConfName: Option[String], + pod: SparkPod): SparkPod = { + +val preConfigMapVolume = existingKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() +} + +val createConfigMapVolume = for { + fLocation <- fileLocation + krb5ConfName <- newKrb5ConfName +} yield { + val krb5File = new File(fLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() +.build() +} + +// Breaking up Volume creation for clarity +val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume) +if (configMapVolume.isEmpty) { + logInfo("You have not specified a krb5.conf file locally or via a ConfigMap. " + + "Make sure that you have the krb5.conf locally on the Driver and Executor images") +} + +val kerberizedPodWithDTSecret = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.endSpec() + .build() + +// Optionally add the krb5.conf ConfigMap +val kerberizedPod = configMapVolume.map { cmVolume => + new PodBuilder(kerberizedPodWithDTSecret) +.editSpec() + .addNewVolumeLike(cmVolume) +.endVolume() + .endSpec() +.build() +}.getOrElse(kerberizedPodWithDTSecret) + +val kerberizedContainerWithMounts = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewEnv() +
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160540 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160471 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160090 --- Diff: docs/security.md --- @@ -722,7 +722,83 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR` or +`spark.kubernetes.hadoop.configMapName` as well as either +`spark.kubernetes.kerberos.krb5.path` or `spark.kubernetes.kerberos.krb5.configMapName`. --- End diff -- This needs a small update after the recent changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224161587 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features.hadooputils + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.SparkPod +import org.apache.spark.internal.Logging + +private[spark] object HadoopBootstrapUtil extends Logging { + + /** +* Mounting the DT secret for both the Driver and the executors +* +* @param dtSecretName Name of the secret that stores the Delegation Token +* @param dtSecretItemKey Name of the Item Key storing the Delegation Token +* @param userName Name of the SparkUser to set SPARK_USER +* @param fileLocation Optional Location of the krb5 file +* @param newKrb5ConfName Optional location of the ConfigMap for Krb5 +* @param existingKrb5ConfName Optional name of ConfigMap for Krb5 +* @param pod Input pod to be appended to +* @return a modified SparkPod +*/ + def bootstrapKerberosPod( + dtSecretName: String, + dtSecretItemKey: String, + userName: String, + fileLocation: Option[String], + newKrb5ConfName: Option[String], + existingKrb5ConfName: Option[String], + pod: SparkPod): SparkPod = { + +val preConfigMapVolume = existingKrb5ConfName.map { kconf => + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() + .withName(kconf) + .endConfigMap() +.build() +} + +val createConfigMapVolume = for { + fLocation <- fileLocation + krb5ConfName <- newKrb5ConfName +} yield { + val krb5File = new File(fLocation) + val fileStringPath = krb5File.toPath.getFileName.toString + new VolumeBuilder() +.withName(KRB_FILE_VOLUME) +.withNewConfigMap() +.withName(krb5ConfName) +.withItems(new KeyToPathBuilder() + .withKey(fileStringPath) + .withPath(fileStringPath) + .build()) +.endConfigMap() +.build() +} + +// Breaking up Volume creation for clarity +val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume) +if (configMapVolume.isEmpty) { + logInfo("You have not specified a krb5.conf file locally or via a ConfigMap. " + + "Make sure that you have the krb5.conf locally on the Driver and Executor images") +} + +val kerberizedPodWithDTSecret = new PodBuilder(pod.pod) + .editOrNewSpec() +.addNewVolume() + .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) + .withNewSecret() +.withSecretName(dtSecretName) +.endSecret() + .endVolume() +.endSpec() + .build() + +// Optionally add the krb5.conf ConfigMap +val kerberizedPod = configMapVolume.map { cmVolume => + new PodBuilder(kerberizedPodWithDTSecret) +.editSpec() + .addNewVolumeLike(cmVolume) +.endVolume() + .endSpec() +.build() +}.getOrElse(kerberizedPodWithDTSecret) + +val kerberizedContainerWithMounts = new ContainerBuilder(pod.container) + .addNewVolumeMount() +.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME) +.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR) +.endVolumeMount() + .addNewEnv() +
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224156790 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopSparkUserExecutorFeatureStep.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil +import org.apache.spark.internal.Logging + + /** + * This step is responsible for setting ENV_SPARK_USER when HADOOP_FILES are detected + * however, this step would not be run if Kerberos is enabled, as Kerberos sets SPARK_USER + */ +private[spark] class HadoopSparkUserExecutorFeatureStep( +kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + override def configurePod(pod: SparkPod): SparkPod = { +val sparkUserName = kubernetesConf.sparkConf.get(KERBEROS_SPARK_USER_NAME) + HadoopBootstrapUtil.bootstrapSparkUserPod(sparkUserName, pod) --- End diff -- indentation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r224160567 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import io.fabric8.kubernetes.api.model.HasMetadata + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, SparkPod} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf +import org.apache.spark.deploy.k8s.features.hadooputils._ +import org.apache.spark.internal.Logging + + /** + * Runs the necessary Hadoop-based logic based on Kerberos configs and the presence of the + * HADOOP_CONF_DIR. This runs various bootstrap methods defined in HadoopBootstrapUtil. + */ +private[spark] class KerberosConfDriverFeatureStep( +kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]) +extends KubernetesFeatureConfigStep with Logging { + + require(kubernetesConf.hadoopConfSpec.isDefined, + "Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") + private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get + private val conf = kubernetesConf.sparkConf + private val principal = conf.get(org.apache.spark.internal.config.PRINCIPAL) + private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB) + private val existingSecretName = conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME) + private val existingSecretItemKey = conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY) + private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE) + private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP) + private val kubeTokenManager = kubernetesConf.tokenManager(conf, +SparkHadoopUtil.get.newConfiguration(conf)) + private val isKerberosEnabled = +(hadoopConfDirSpec.hadoopConfDir.isDefined && kubeTokenManager.isSecurityEnabled) || + (hadoopConfDirSpec.hadoopConfigMapName.isDefined && +(krb5File.isDefined || krb5CMap.isDefined)) + require(keytab.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Keytab") + + require(existingSecretName.isEmpty || isKerberosEnabled, +"You must enable Kerberos support if you are specifying a Kerberos Secret") + + KubernetesUtils.requireNandDefined( +krb5File, +krb5CMap, +"Do not specify both a Krb5 local file and the ConfigMap as the creation " + + "of an additional ConfigMap, when one is already specified, is extraneous") + + KubernetesUtils.requireBothOrNeitherDefined( +keytab, +principal, +"If a Kerberos principal is specified you must also specify a Kerberos keytab", +"If a Kerberos keytab is specified you must also specify a Kerberos principal") + + KubernetesUtils.requireBothOrNeitherDefined( +existingSecretName, +existingSecretItemKey, +"If a secret data item-key where the data of the Kerberos Delegation Token is specified" + + " you must also specify the name of the secret", +"If a secret storing a Kerberos Delegation Token is specified you must also" + + " specify the item-key where the data is stored") + + private val hadoopConfigurationFiles = hadoopConfDirSpec.hadoopConfDir.map { hConfDir => +HadoopBootstrapUtil.getHadoopConfFiles(hConfDir) + } + private val newHadoopConfigMapName = +if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) { + Some(kubernetesConf.hadoopConfigMapName) +} else { + None +} + + // Either use pre-existing secret or login to create new Secret with DT stored
[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22686 Closing this per https://github.com/apache/spark/pull/22686 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22595 I read the original PR https://github.com/apache/spark/pull/14617, and it is wired that the two columns are hidden in the very beginning. Also the screenshot in PR description doesn't match the code changes. Ping @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22688 +1 to deal with this as non blocker. I understand data source v2 is under heavy development and unstable but strongly think we should backport .. it breaks a basic operation .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22688#discussion_r224153758 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -190,12 +190,13 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext { test("simple writable data source") { // TODO: java implementation. +val writeOnlySource = classOf[SimpleWriteOnlyDataSource] --- End diff -- can we create a new test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22688 ah good point. I think the original design of append operator assumes the table already exists, so a schema should be provided. If we treat file path as a table, then append should fail for your case because path does not exist, and we should use CTAS. cc @rdblue for confirmation. That said, the change here LGTM. We should only get the relation for append mode. Furthermore, I think in the future we can't simply proxy old `SaveMode` write APIs to new write APIs, as the behavior can be different. e.g. currently we can write data to a non-existing path with append mode for file sources, but the append operator can not. I'm not sure this should block 2.4. Data source v2 API is unstable, so breaking changes are allowed, and we won't treat data source v2 bugs as blockers. We should merge this PR to 2.4, but it's not strong enough to fail an RC. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22657: [SPARK-25670][TEST] Reduce number of tested timez...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22657#discussion_r224151216 --- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala --- @@ -106,4 +107,14 @@ abstract class SparkFunSuite } } + lazy val outstandingTimezones = Seq( --- End diff -- Is this lazy so that it's not evaluated by each test suite? I get it although `TimeZone` caches these, it seems. It won't matter really either way. I'm neutral, it doesn't matter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22623: [SPARK-25636][CORE] spark-submit cuts off the fai...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22623 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22623 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22593 Although it's the Scala API, it's callable from Java just as well. There's no Java-specific API here. So, yeah, actually it makes sense to have javadoc and scaladoc for this. And I think there are probably many markdown-like things that work in scaladoc not in javadoc. While you're welcome to fix all of it ideally, it may be way too much. OK, I'd say go ahead but at least fix all the instances of bullet lists in the .sql package. It's easy enough to search for something like `\*\s+-` as a regex to spot instances. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22547 **[Test build #97206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97206/testReport)** for PR 22547 at commit [`a35d98c`](https://github.com/apache/spark/commit/a35d98cc676c9b3c3b39a45d4b03b43dfe9d0767). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3854/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22594: [SPARK-25674][SQL] If the records are incremented...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22594#discussion_r224146203 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -70,6 +70,8 @@ class FileScanRDD( private val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles private val ignoreMissingFiles = sparkSession.sessionState.conf.ignoreMissingFiles + // only for test + private val inputMetricsTest = sparkSession.sessionState.conf.contains("spark.inputmetrics.test") --- End diff -- Ah you know, we have `spark.testing` for this purpose too. I'm actually on the fence here about whether it's worth this extra complexity for the test. It's a simple change and the overall effect is tested by other tests of input metrics. Hm, what do you think, just drop this? is it necessary for the test to work correctly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org