[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22615
  
i haven't had a chance to do any of the jenkins stuff...  after being 
sidetracked by the conversation to move the configs to the spark repo, plus 
planning for our big event that starts tomorrow, plus zomgmeetings all day 
today, work won't be able to start until early next week.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97207/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97207/testReport)**
 for PR 22688 at commit 
[`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SchemaReadAttemptException(m: String) extends 
RuntimeException(m)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20761
  
**[Test build #97213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97213/testReport)**
 for PR 20761 at commit 
[`55c7be9`](https://github.com/apache/spark/commit/55c7be938a6180f0091d6d57db94521d9f8c0abb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22689
  
Hi @srowen . Yes. Event logs are available for running apps, but with the 
extension, ".inprogress". 
We can open webui from the history server for both running and finished 
applications. 
This is working for all compressed codecs (lz4, snappy, lzf) , also for 
uncompressed event logs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20761
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20761
  
**[Test build #97212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97212/testReport)**
 for PR 20761 at commit 
[`7e7a55a`](https://github.com/apache/spark/commit/7e7a55a745ce41b77a5b5811cf72ae1721d713fa).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20761
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97212/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20761
  
**[Test build #97212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97212/testReport)**
 for PR 20761 at commit 
[`7e7a55a`](https://github.com/apache/spark/commit/7e7a55a745ce41b77a5b5811cf72ae1721d713fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #97211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97211/testReport)**
 for PR 22504 at commit 
[`2bbf137`](https://github.com/apache/spark/commit/2bbf1370ab72a0183cc3ccdc04851b2b6dadf5dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22689
  
Should the Event Log be available for running apps? Or if it's not going to 
work, disable it where it can't be shown, but I suppose that could be 
difficult. This just silently sends you back to the jobs page?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97202/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22657
  
**[Test build #97202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97202/testReport)**
 for PR 22657 at commit 
[`778629d`](https://github.com/apache/spark/commit/778629dbca572d8a8099ad2893f78d16d30920e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224195358
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ---
@@ -288,9 +296,14 @@ private[yarn] class YarnAllocator(
   s"executorsStarting: ${numExecutorsStarting.get}")
 
 if (missing > 0) {
-  logInfo(s"Will request $missing executor container(s), each with " +
-s"${resource.getVirtualCores} core(s) and " +
-s"${resource.getMemory} MB memory (including $memoryOverhead MB of 
overhead)")
+  var requestContainerMessage = s"Will request $missing executor 
container(s), each with " +
--- End diff --

Fair enough, added a condition to check whether it's enabled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22689
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22689
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22689
  
cc @vanzin @srowen . Kindly review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22689
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224194497
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("resource request for invalid resource") {
--- End diff --

Okay, I agree with that. Then I think just removal is fine here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22689: [SPARK-25697][CORE]When zstd compression enabled ...

2018-10-10 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/22689

[SPARK-25697][CORE]When zstd compression enabled in progress application is 
throwing Error is throwing in the history webui…

## What changes were proposed in this pull request?
When we enable event log compression and compression codec as 'zstd', we 
are unable to open the webui of the running application from the history server 
page. 
The reason is that, Replay listener was unable to read from the zstd 
compressed eventlog due to the zstd frame was not finished yet. This causes 
truncated error while reading the eventLog. 

So, when we try to open the WebUI from the History server page, it throws 
"truncated error ", and we never able to open running application in the webui, 
when we enable zstd compression.

In this PR, when the IO excpetion happens, and if it is a running 
application, we  log the error,
"Failed to read Spark event log: evetLogDirAppName.inprogress", instead of 
throwing exception.

## How was this patch tested?
Test steps:
1)spark.eventLog.compress =  true
2)spark.io.compression.codec = zstd 
3)restart history server
4) launch bin/spark-shell
5) run some queries
6) Open history server page
7) click on the application
Before fix:
![screenshot from 2018-10-10 
23-52-12](https://user-images.githubusercontent.com/23054875/46757387-9b4fa580-cce7-11e8-96ad-8938400483ed.png)

![screenshot from 2018-10-10 
23-52-28](https://user-images.githubusercontent.com/23054875/46757393-a0145980-cce7-11e8-8cb0-44b583dde648.png)

After fix:

![screenshot from 2018-10-10 
23-43-49](https://user-images.githubusercontent.com/23054875/46756971-6858e200-cce6-11e8-946c-0bffebb2cfba.png)

![screenshot from 2018-10-10 
23-44-05](https://user-images.githubusercontent.com/23054875/46756981-6d1d9600-cce6-11e8-95ea-ff8339a2fdfd.png)

   

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark SPARK-25697

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22689


commit f6e971f37d39642966689201868420da2441ffa0
Author: Shahid 
Date:   2018-10-10T17:50:48Z

When zstd compression enabled in progress application is throwing Error in 
UI




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread abellina
Github user abellina commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224194007
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,124 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, true,
+  totalRecords.toInt)
+isSearch = true
+searchValue = uriQueryParameters.getFirst("search[value]")
+  } else {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, false,
+  totalRecords.toInt)
+  }
+  val ret = new HashMap[String, Object]()
+  if (_tasksToShow.nonEmpty) {
+// Performs server-side search based on input from user
+if (isSearch) {
+  val filteredTaskList = filterTaskList(_tasksToShow, searchValue)
+  filteredRecords = filteredTaskList.length.toString
+  if (filteredTaskList.length > 0) {
+val pageStartIndex = uriQueryParameters.getFirst("start").toInt
+val pageLength = uriQueryParameters.getFirst("length").toInt
+ret.put("aaData", filteredTaskList.slice(pageStartIndex, 
pageStartIndex + pageLength))
+  } else {
+ret.put("aaData", filteredTaskList)
+  }
+} else {
+  ret.put("aaData", _tasksToShow)
+}
+  } else {
+ret.put("aaData", _tasksToShow)
+  }
+  ret.put("recordsTotal", totalRecords)
+  ret.put("recordsFiltered", filteredRecords)
+  ret
+}
+  }
+
+  // Performs pagination on the server side
+  def doPagination(queryParameters: MultivaluedMap[String, String], 
stageId: Int,
+stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): 
Seq[TaskData] = {
+val queryParams = queryParameters.keySet()
+var columnToSort = 0
+if (queryParams.contains("order[0][column]")) {
+  columnToSort = queryParameters.getFirst("order[0][column]").toInt
+}
+var columnNameToSort = queryParameters.getFirst("columns[" + 
columnToSort + "][name]")
+if (columnNameToSort.equalsIgnoreCase("Logs")) {
+  columnNameToSort = "Index"
+  columnToSort = 0
+}
+val isAscendingStr = queryParameters.getFirst("order[0][dir]")
+var pageStartIndex = 0
+var pageLength = totalRecords
+if (!isSearch) {
+  pageStartIndex = queryParameters.getFirst("start").toInt
+  pageLength = queryParameters.getFirst("length").toInt
+}
+return withUI(_.store.taskList(stageId, stageAttemptId, 
pageStartIndex, pageLength,
+  indexName(columnNameToSort), isAscendingStr.equalsIgnoreCase("asc")))
+  }
+
+  // Filters task list based on search parameter
+  def filterTaskList(
+taskDataList: Seq[TaskData],
+searchValue: String): Seq[TaskData] = {
+val defaultOptionString: String = "d"
+// The task metrics dummy object below has been added to avoid 
throwing exception in cases
+// when task metrics for a particular task do not exist as of yet
+val dummyTaskMetrics: TaskMetrics = new TaskMetrics(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0,
+  new InputMetrics(0, 0), new OutputMetrics(0, 0),
+  new ShuffleReadMetrics(0, 0, 0, 0, 0, 0, 0), new 
ShuffleWriteMetrics(0, 0, 0))
+val searchValueLowerCase = searchValue.toLowerCase(Locale.ROOT)
+val containsValue = (taskDataParams: Any) => 
taskDataParams.toString.toLowerCase(
+  Locale.ROOT).contains(searchValueLowerCase)
+val filteredTaskDataSequence: Seq[TaskData] = taskDataList.filter(f =>
+  (containsValue(f.taskId) || containsValue(f.index) 

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224193837
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
 ---
@@ -87,6 +88,20 @@ class YarnAllocatorSuite extends SparkFunSuite with 
Matchers with BeforeAndAfter
   def createAllocator(
   maxExecutors: Int = 5,
   rmClient: AMRMClient[ContainerRequest] = rmClient): YarnAllocator = {
+createAllocatorInternal(maxExecutors, rmClient, Map())
+  }
+
+  def createAllocatorWithAdditionalConfigs(
--- End diff --

Fair enough, fixed.
Did similarly with the `createResource` method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97203/testReport)**
 for PR 22674 at commit 
[`3ffa536`](https://github.com/apache/spark/commit/3ffa536f3c29f6655843a4d45c215393f51e23c9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224190144
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -433,4 +465,36 @@ class ClientSuite extends SparkFunSuite with Matchers {
 classpath(env)
   }
 
+  private def testResourceRequest(expectedResources: Seq[(String, Long)],
--- End diff --

Good point, fixed!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #97210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97210/testReport)**
 for PR 22504 at commit 
[`7fb94fd`](https://github.com/apache/spark/commit/7fb94fd5273f19d91ec74584d2efa6b33ce371c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22684


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...

2018-10-10 Thread ankuriitg
Github user ankuriitg commented on a diff in the pull request:

https://github.com/apache/spark/pull/22504#discussion_r224189470
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -806,6 +806,22 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
 // Clean the blacklist from the expired entries.
 clearBlacklist(CLEAN_INTERVAL_S)
+
+// Delete driver logs from the configured spark dfs dir that exceed 
the configured max age
+try {
+  val hdfsDir = conf.get("spark.driver.log.dfsDir")
+  val appDirs = fs.listLocatedStatus(new Path(hdfsDir))
+  while (appDirs.hasNext()) {
+val appDirStatus = appDirs.next()
+if (appDirStatus.getModificationTime() < maxTime) {
+  logInfo(s"Deleting expired driver log for: 
${appDirStatus.getPath().getName()}")
+  deleteLog(appDirStatus.getPath())
--- End diff --

Added the new configurations, with fallback option to existing ones.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224189326
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("resource request for invalid resource") {
--- End diff --

You already sort of do that in the tests that call 
`initializeResourceTypes`, don't you?

Otherwise that's a YARN test, it's not interesting to test that 
functionality in Spark because if it's broken, Spark cannot fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97203/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22684
  
Merged into master. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3857/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22684
  
**[Test build #97209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97209/testReport)**
 for PR 22684 at commit 
[`9d9ed2f`](https://github.com/apache/spark/commit/9d9ed2f8295e2d61ff880860abf0e8551afce04b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224187532
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 ---
@@ -199,6 +200,37 @@ class ClientSuite extends SparkFunSuite with Matchers {
 appContext.getMaxAppAttempts should be (42)
   }
 
+  test("resource request for invalid resource") {
--- End diff --

Yes, this is very YARN-specific unlike the other testcases that verifies 
the resources in the app context.
Removed this testcase.
Do you think of any other way to test the scenario when a resource is 
specified in sparkConf which is not known for YARN? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224186347
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
@@ -138,39 +138,75 @@ private[sql] object OrcFilters {
   dataTypeMap: Map[String, DataType],
   expression: Filter,
   builder: Builder): Option[Builder] = {
+createBuilder(dataTypeMap, expression, builder, 
canPartialPushDownConjuncts = true)
+  }
+
+  /**
+   * @param dataTypeMap a map from the attribute name to its data type.
+   * @param expression the input filter predicates.
+   * @param builder the input SearchArgument.Builder.
+   * @param canPartialPushDownConjuncts whether a subset of conjuncts of 
predicates can be pushed
+   *down safely. Pushing ONLY one side 
of AND down is safe to
+   *do at the top level or none of its 
ancestors is NOT and OR.
+   * @return the builder so far.
+   */
+  private def createBuilder(
+  dataTypeMap: Map[String, DataType],
+  expression: Filter,
+  builder: Builder,
+  canPartialPushDownConjuncts: Boolean): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
   getPredicateLeafType(dataTypeMap(attribute))
 
 import org.apache.spark.sql.sources._
 
 expression match {
   case And(left, right) =>
-// At here, it is not safe to just convert one side if we do not 
understand the
-// other side. Here is an example used to explain the reason.
+// At here, it is not safe to just convert one side and remove the 
other side
+// if we do not understand what the parent filters are.
+//
+// Here is an example used to explain the reason.
 // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
 // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
 // NOT(a = 2), which will generate wrong results.
-// Pushing one side of AND down is only safe to do at the top 
level.
-// You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
-for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
-} yield rhs.end()
+//
+// Pushing one side of AND down is only safe to do at the top 
level or in the child
+// AND before hitting NOT or OR conditions, and in this case, the 
unsupported predicate
+// can be safely removed.
+val leftBuilderOption = createBuilder(dataTypeMap, left,
+  newBuilder, canPartialPushDownConjuncts)
+val rightBuilderOption =
+  createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts)
+(leftBuilderOption, rightBuilderOption) match {
+  case (Some(_), Some(_)) =>
+for {
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startAnd(), canPartialPushDownConjuncts)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts)
+} yield rhs.end()
+
+  case (Some(_), None) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, left, builder, 
canPartialPushDownConjuncts)
+
+  case (None, Some(_)) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, right, builder, 
canPartialPushDownConjuncts)
+
+  case _ => None
+}
 
   case Or(left, right) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
+  _ <- createBuilder(dataTypeMap, left, newBuilder, 
canPartialPushDownConjuncts = false)
+  _ <- createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts = false)
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startOr(), canPartialPushDownConjuncts = false)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts = false)
 } yield rhs.end()
 
   case Not(child) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, child, newBuilder)
-  negate <- 

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224185444
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Integer => JInteger, Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, 
DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, 
YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+  private val ERROR_PREFIX: String = "Error:"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   * @param sparkConf
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
+  (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"),
+  (DRIVER_MEMORY.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"),
+  (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"),
+  (EXECUTOR_MEMORY.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + 
"memory"),
+  (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores"))
+val errorMessage = new mutable.StringBuilder()
+
+resourceDefinitions.foreach { case (sparkName, resourceRequest) =>
+  if (sparkConf.contains(resourceRequest)) {
+errorMessage.append(s"$ERROR_PREFIX Do not use $resourceRequest, " 
+
+s"please use $sparkName instead!\n")
+  }
+}
+
+if (errorMessage.nonEmpty) {
+  throw new SparkException(errorMessage.toString())
+}
+  }
+
+  /**
+   * Sets resource amount with the corresponding unit to the passed 
resource object.
+   * @param resources resource values to set
+   * @param resource resource object to update
+   */
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+require(!resources.contains("memory"),
--- End diff --

That's true, removed these require checks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22684
  
LGTM. Just some styling feedback. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224179579
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
@@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging {
 
 expression match {
   case And(left, right) =>
-// At here, it is not safe to just convert one side if we do not 
understand the
-// other side. Here is an example used to explain the reason.
+// At here, it is not safe to just convert one side and remove the 
other side
+// if we do not understand what the parent filters are.
+//
+// Here is an example used to explain the reason.
 // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
 // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
 // NOT(a = 2), which will generate wrong results.
-// Pushing one side of AND down is only safe to do at the top 
level.
-// You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
-for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
-} yield rhs.end()
+//
+// Pushing one side of AND down is only safe to do at the top 
level or in the child
+// AND before hitting NOT or OR conditions, and in this case, the 
unsupported predicate
+// can be safely removed.
+val leftBuilderOption = createBuilder(dataTypeMap, left,
+  newBuilder, canPartialPushDownConjuncts)
+val rightBuilderOption =
+  createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts)
+(leftBuilderOption, rightBuilderOption) match {
+  case (Some(_), Some(_)) =>
+for {
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startAnd(), canPartialPushDownConjuncts)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts)
+} yield rhs.end()
+
+  case (Some(_), None) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, left, builder, 
canPartialPushDownConjuncts)
+
+  case (None, Some(_)) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, right, builder, 
canPartialPushDownConjuncts)
+
+  case _ => None
+}
 
   case Or(left, right) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
+  _ <- createBuilder(dataTypeMap, left, newBuilder, 
canPartialPushDownConjuncts = false)
+  _ <- createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts = false)
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startOr(), canPartialPushDownConjuncts = false)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts = false)
 } yield rhs.end()
 
   case Not(child) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, child, newBuilder)
-  negate <- buildSearchArgument(dataTypeMap, child, 
builder.startNot())
+  _ <- createBuilder(dataTypeMap, child, newBuilder, 
canPartialPushDownConjuncts = false)
+  negate <- createBuilder(dataTypeMap, child, builder.startNot(), 
false)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224179447
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
@@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging {
 
 expression match {
   case And(left, right) =>
-// At here, it is not safe to just convert one side if we do not 
understand the
-// other side. Here is an example used to explain the reason.
+// At here, it is not safe to just convert one side and remove the 
other side
+// if we do not understand what the parent filters are.
+//
+// Here is an example used to explain the reason.
 // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
 // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
 // NOT(a = 2), which will generate wrong results.
-// Pushing one side of AND down is only safe to do at the top 
level.
-// You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
-for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
-} yield rhs.end()
+//
+// Pushing one side of AND down is only safe to do at the top 
level or in the child
+// AND before hitting NOT or OR conditions, and in this case, the 
unsupported predicate
+// can be safely removed.
+val leftBuilderOption = createBuilder(dataTypeMap, left,
+  newBuilder, canPartialPushDownConjuncts)
+val rightBuilderOption =
+  createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts)
+(leftBuilderOption, rightBuilderOption) match {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224179095
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.{BeforeAndAfterAll, Matchers}
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, 
DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, 
YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterAll {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+  }
+
+  private def getExpectedUnmatchedErrorMessage(name: String, value: 
String): String = {
+s"Resource request for '$name' ('$value') does not match pattern 
([0-9]+)([A-Za-z]*)."
+  }
+
+  test("resource type value does not match pattern") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "**@#")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#"))
+  }
+
+  test("resource type just unit defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "m")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m"))
+  }
+
+  test("resource type with null value should not be allowed") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, null)
+}
+thrown.getMessage should equal ("requirement failed: Resource 
parameter should not be null!")
+  }
+
+  test("resource type with valid value and invalid unit") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp")
+val resource = createAResource
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224178817
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.{BeforeAndAfterAll, Matchers}
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, 
DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, 
YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterAll {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+  }
+
+  private def getExpectedUnmatchedErrorMessage(name: String, value: 
String): String = {
+s"Resource request for '$name' ('$value') does not match pattern 
([0-9]+)([A-Za-z]*)."
+  }
+
+  test("resource type value does not match pattern") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "**@#")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#"))
+  }
+
+  test("resource type just unit defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "m")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m"))
+  }
+
+  test("resource type with null value should not be allowed") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, null)
+}
+thrown.getMessage should equal ("requirement failed: Resource 
parameter should not be null!")
+  }
+
+  test("resource type with valid value and invalid unit") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp")
+val resource = createAResource
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224178567
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
+var newBaseURI = words.slice(0, indexOfHistory).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay";
+break;
+
+case "diskBytesSpilled":
+return "Shuffle spill (disk)";
+break;
+
+case "memoryBytesSpilled":
+return "Shuffle spill (memory)";
+break;
+
+

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224178252
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.{BeforeAndAfterAll, Matchers}
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, 
DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, 
YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterAll {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+  }
+
+  private def getExpectedUnmatchedErrorMessage(name: String, value: 
String): String = {
+s"Resource request for '$name' ('$value') does not match pattern 
([0-9]+)([A-Za-z]*)."
+  }
+
+  test("resource type value does not match pattern") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "**@#")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#"))
+  }
+
+  test("resource type just unit defined") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "m")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "m"))
+  }
+
+  test("resource type with null value should not be allowed") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List())
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, null)
+}
+thrown.getMessage should equal ("requirement failed: Resource 
parameter should not be null!")
+  }
+
+  test("resource type with valid value and invalid unit") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "123ppp")
+val resource = createAResource
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224178237
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
@@ -138,39 +138,75 @@ private[sql] object OrcFilters {
   dataTypeMap: Map[String, DataType],
   expression: Filter,
   builder: Builder): Option[Builder] = {
+createBuilder(dataTypeMap, expression, builder, 
canPartialPushDownConjuncts = true)
+  }
+
+  /**
+   * @param dataTypeMap a map from the attribute name to its data type.
+   * @param expression the input filter predicates.
+   * @param builder the input SearchArgument.Builder.
+   * @param canPartialPushDownConjuncts whether a subset of conjuncts of 
predicates can be pushed
+   *down safely. Pushing ONLY one side 
of AND down is safe to
+   *do at the top level or none of its 
ancestors is NOT and OR.
+   * @return the builder so far.
+   */
+  private def createBuilder(
+  dataTypeMap: Map[String, DataType],
+  expression: Filter,
+  builder: Builder,
+  canPartialPushDownConjuncts: Boolean): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
   getPredicateLeafType(dataTypeMap(attribute))
 
 import org.apache.spark.sql.sources._
 
 expression match {
   case And(left, right) =>
-// At here, it is not safe to just convert one side if we do not 
understand the
-// other side. Here is an example used to explain the reason.
+// At here, it is not safe to just convert one side and remove the 
other side
+// if we do not understand what the parent filters are.
+//
+// Here is an example used to explain the reason.
 // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
 // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
 // NOT(a = 2), which will generate wrong results.
-// Pushing one side of AND down is only safe to do at the top 
level.
-// You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
-for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
-} yield rhs.end()
+//
+// Pushing one side of AND down is only safe to do at the top 
level or in the child
+// AND before hitting NOT or OR conditions, and in this case, the 
unsupported predicate
+// can be safely removed.
+val leftBuilderOption = createBuilder(dataTypeMap, left,
+  newBuilder, canPartialPushDownConjuncts)
+val rightBuilderOption =
+  createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts)
+(leftBuilderOption, rightBuilderOption) match {
+  case (Some(_), Some(_)) =>
+for {
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startAnd(), canPartialPushDownConjuncts)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts)
+} yield rhs.end()
+
+  case (Some(_), None) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, left, builder, 
canPartialPushDownConjuncts)
+
+  case (None, Some(_)) if canPartialPushDownConjuncts =>
+createBuilder(dataTypeMap, right, builder, 
canPartialPushDownConjuncts)
+
+  case _ => None
+}
 
   case Or(left, right) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startOr())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
+  _ <- createBuilder(dataTypeMap, left, newBuilder, 
canPartialPushDownConjuncts = false)
+  _ <- createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts = false)
+  lhs <- createBuilder(dataTypeMap, left,
+builder.startOr(), canPartialPushDownConjuncts = false)
+  rhs <- createBuilder(dataTypeMap, right, lhs, 
canPartialPushDownConjuncts = false)
 } yield rhs.end()
 
   case Not(child) =>
 for {
-  _ <- buildSearchArgument(dataTypeMap, child, newBuilder)
-  negate <- 

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread szyszy
Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224177875
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.{BeforeAndAfterAll, Matchers}
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config.{AM_CORES, AM_MEMORY, 
DRIVER_CORES, EXECUTOR_CORES, YARN_AM_RESOURCE_TYPES_PREFIX, 
YARN_DRIVER_RESOURCE_TYPES_PREFIX, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterAll {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+  }
+
+  private def getExpectedUnmatchedErrorMessage(name: String, value: 
String): String = {
+s"Resource request for '$name' ('$value') does not match pattern 
([0-9]+)([A-Za-z]*)."
+  }
+
+  test("resource type value does not match pattern") {
+assume(ResourceRequestHelper.isYarnResourceTypesAvailable())
+ResourceRequestTestHelper.initializeResourceTypes(List(CUSTOM_RES_1))
+
+val resourceTypes = Map(CUSTOM_RES_1 -> "**@#")
+
+val thrown = intercept[IllegalArgumentException] {
+  ResourceRequestHelper.setResourceRequests(resourceTypes, 
createAResource)
+}
+thrown.getMessage should equal 
(getExpectedUnmatchedErrorMessage(CUSTOM_RES_1, "**@#"))
+  }
+
+  test("resource type just unit defined") {
--- End diff --

Good idea. Extracted the common parts of all test methods in this class.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224176616
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html ---
@@ -0,0 +1,124 @@
+
+

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22657
  
**[Test build #97201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97201/testReport)**
 for PR 22657 at commit 
[`5f8bf79`](https://github.com/apache/spark/commit/5f8bf7956665f19d758bf4fb3b91927c77bd1691).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224174206
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
@@ -138,39 +138,75 @@ private[sql] object OrcFilters {
   dataTypeMap: Map[String, DataType],
   expression: Filter,
   builder: Builder): Option[Builder] = {
+createBuilder(dataTypeMap, expression, builder, 
canPartialPushDownConjuncts = true)
+  }
+
+  /**
+   * @param dataTypeMap a map from the attribute name to its data type.
+   * @param expression the input filter predicates.
+   * @param builder the input SearchArgument.Builder.
+   * @param canPartialPushDownConjuncts whether a subset of conjuncts of 
predicates can be pushed
+   *down safely. Pushing ONLY one side 
of AND down is safe to
+   *do at the top level or none of its 
ancestors is NOT and OR.
+   * @return the builder so far.
+   */
+  private def createBuilder(
+  dataTypeMap: Map[String, DataType],
+  expression: Filter,
+  builder: Builder,
+  canPartialPushDownConjuncts: Boolean): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
   getPredicateLeafType(dataTypeMap(attribute))
 
 import org.apache.spark.sql.sources._
 
 expression match {
   case And(left, right) =>
-// At here, it is not safe to just convert one side if we do not 
understand the
-// other side. Here is an example used to explain the reason.
+// At here, it is not safe to just convert one side and remove the 
other side
+// if we do not understand what the parent filters are.
+//
+// Here is an example used to explain the reason.
 // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
 // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
 // NOT(a = 2), which will generate wrong results.
-// Pushing one side of AND down is only safe to do at the top 
level.
-// You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
-for {
-  _ <- buildSearchArgument(dataTypeMap, left, newBuilder)
-  _ <- buildSearchArgument(dataTypeMap, right, newBuilder)
-  lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd())
-  rhs <- buildSearchArgument(dataTypeMap, right, lhs)
-} yield rhs.end()
+//
+// Pushing one side of AND down is only safe to do at the top 
level or in the child
+// AND before hitting NOT or OR conditions, and in this case, the 
unsupported predicate
+// can be safely removed.
+val leftBuilderOption = createBuilder(dataTypeMap, left,
+  newBuilder, canPartialPushDownConjuncts)
+val rightBuilderOption =
+  createBuilder(dataTypeMap, right, newBuilder, 
canPartialPushDownConjuncts)
--- End diff --

Can you make the format the same as `leftBuilderOption`? Also, add another 
empty line before `(leftBuilderOption, rightBuilderOption)`. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22678#discussion_r224172678
  
--- Diff: dev/run-tests-jenkins.py ---
@@ -176,7 +177,8 @@ def main():
 build_display_name = os.environ["BUILD_DISPLAY_NAME"]
 build_url = os.environ["BUILD_URL"]
 
-commit_url = "https://github.com/apache/spark/commit/; + 
ghprb_actual_commit
+project_url = os.getenv("SPARK_PROJECT_URL", 
"https://github.com/apache/spark;)
+commit_url = project_url + "/commit/" + ghprb_actual_commit
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22288
  
**[Test build #97208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97208/testReport)**
 for PR 22288 at commit 
[`c361693`](https://github.com/apache/spark/commit/c361693a3d08a1bea1d2919f0a8e970c03959cc8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3856/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-10 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22615
  
Fine with me if the jenkins stuff is sorted out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22615
  
I tried a release build that causes `--pip` and `--r` to be set, and the 
result looked OK. Both pyspark and R packages built and seemed normal. The 
source build worked too and comes before binary builds, so I don't think it can 
be affected. I will go ahead and merge this, I think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3855/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-10 Thread dhruve
Github user dhruve commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r224167756
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +419,61 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case taskIndex: Some[Int] => // Returns the taskIndex which 
was unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some (x) =>
+  val executorId = x._1
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis()
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(new TimerTask() {
+  override def run() {
+if 
(unschedulableTaskSetToExpiryTime.contains(taskSet) &&
+  (unschedulableTaskSetToExpiryTime(taskSet) + 
timeout)
+<= clock.getTimeMillis()
--- End diff --

it doesn't fit within the 100 char limit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97207/testReport)**
 for PR 22688 at commit 
[`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22686: [SPARK-25700][SQL] Partially revert append mode s...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/22686


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224166110
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
+var newBaseURI = words.slice(0, indexOfHistory).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay";
+break;
+
+case "diskBytesSpilled":
+return "Shuffle spill (disk)";
+break;
+
+case "memoryBytesSpilled":
+return "Shuffle spill (memory)";
+break;
+
+

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97199/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22684
  
**[Test build #97199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97199/testReport)**
 for PR 22684 at commit 
[`c2eb87a`](https://github.com/apache/spark/commit/c2eb87ac8310a7045eb55ba7851a38ebf0498300).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224164134
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
--- End diff --

similar here can't we get rid of one of the indexes and just use words to 
look up appId and appAttemptId?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224163278
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
--- End diff --

we have index of proxy in 2 places can we get rid of one


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22593
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22593
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97205/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224162695
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
--- End diff --

all of these should be val not var


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22593
  
**[Test build #97205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97205/testReport)**
 for PR 22593 at commit 
[`0620ede`](https://github.com/apache/spark/commit/0620ede3e5aaebdbb954ee55ff3fbad6e1cc8fe2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224162463
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
--- End diff --

use val and name it proxyInd or something, use different variable for 
history


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224162371
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
--- End diff --

should be a val don't see it changed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224156599
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala
 ---
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesExecutorSpecificConf, SparkPod}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step is responsible for bootstraping the container with ConfigMaps
+  * containing Hadoop config files mounted as volumes and an ENV variable
+  * pointed to the mounted file directory.
+  */
+private[spark] class HadoopConfExecutorFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+val sparkConf = kubernetesConf.sparkConf
+val hadoopConfDirCMapName = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME)
+require(hadoopConfDirCMapName.isDefined,
+  "Ensure that the env `HADOOP_CONF_DIR` is defined either in the 
client or " +
+" using pre-existing ConfigMaps")
+logInfo("HADOOP_CONF_DIR defined")
+HadoopBootstrapUtil.bootstrapHadoopConfDir(None, None, 
hadoopConfDirCMapName, pod)
+  }
+
+  override def getAdditionalPodSystemProperties(): Map[String, String] = 
Map.empty
--- End diff --

No need to address it here but it feels like these methods should have 
default implementations, given that lots of classes just don't do anything with 
them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224162093
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+ /**
--- End diff --

indentation is wrong.

Really, when I asked you to go through all your code, I meant *all* your 
code, not just the code I commented on. Please do that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224156872
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
--- End diff --

Ping.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224161045
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import org.apache.spark.internal.Logging
+
+private[spark] object HadoopBootstrapUtil extends Logging {
+
+   /**
--- End diff --

indentation is wrong


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224161661
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import org.apache.spark.internal.Logging
+
+private[spark] object HadoopBootstrapUtil extends Logging {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Optional Location of the krb5 file
+* @param newKrb5ConfName Optional location of the ConfigMap for Krb5
+* @param existingKrb5ConfName Optional name of ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: Option[String],
+  newKrb5ConfName: Option[String],
+  existingKrb5ConfName: Option[String],
+  pod: SparkPod): SparkPod = {
+
+val preConfigMapVolume = existingKrb5ConfName.map { kconf =>
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+  .withName(kconf)
+  .endConfigMap()
+.build()
+}
+
+val createConfigMapVolume = for {
+  fLocation <- fileLocation
+  krb5ConfName <- newKrb5ConfName
+} yield {
+  val krb5File = new File(fLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+.build()
+}
+
+// Breaking up Volume creation for clarity
+val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume)
+if (configMapVolume.isEmpty) {
+   logInfo("You have not specified a krb5.conf file locally or via a 
ConfigMap. " +
+ "Make sure that you have the krb5.conf locally on the Driver and 
Executor images")
+}
+
+val kerberizedPodWithDTSecret = new PodBuilder(pod.pod)
+  .editOrNewSpec()
+.addNewVolume()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withNewSecret()
+.withSecretName(dtSecretName)
+.endSecret()
+  .endVolume()
+.endSpec()
+  .build()
+
+// Optionally add the krb5.conf ConfigMap
+val kerberizedPod = configMapVolume.map { cmVolume =>
+  new PodBuilder(kerberizedPodWithDTSecret)
+.editSpec()
+  .addNewVolumeLike(cmVolume)
+.endVolume()
+  .endSpec()
+.build()
+}.getOrElse(kerberizedPodWithDTSecret)
+
+val kerberizedContainerWithMounts = new ContainerBuilder(pod.container)
+  .addNewVolumeMount()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+.endVolumeMount()
+  .addNewEnv()
+

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224160540
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+  require(kubernetesConf.hadoopConfSpec.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+  private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get
+  private val conf = kubernetesConf.sparkConf
+  private val principal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+  private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB)
+  private val existingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+  private val existingSecretItemKey = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+  private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+  private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP)
+  private val kubeTokenManager = kubernetesConf.tokenManager(conf,
+SparkHadoopUtil.get.newConfiguration(conf))
+  private val isKerberosEnabled =
+(hadoopConfDirSpec.hadoopConfDir.isDefined && 
kubeTokenManager.isSecurityEnabled) ||
+  (hadoopConfDirSpec.hadoopConfigMapName.isDefined &&
+(krb5File.isDefined || krb5CMap.isDefined))
+  require(keytab.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+  require(existingSecretName.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+  KubernetesUtils.requireNandDefined(
+krb5File,
+krb5CMap,
+"Do not specify both a Krb5 local file and the ConfigMap as the 
creation " +
+   "of an additional ConfigMap, when one is already specified, is 
extraneous")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+keytab,
+principal,
+"If a Kerberos principal is specified you must also specify a Kerberos 
keytab",
+"If a Kerberos keytab is specified you must also specify a Kerberos 
principal")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+existingSecretName,
+existingSecretItemKey,
+"If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+  " you must also specify the name of the secret",
+"If a secret storing a Kerberos Delegation Token is specified you must 
also" +
+  " specify the item-key where the data is stored")
+
+  private val hadoopConfigurationFiles = 
hadoopConfDirSpec.hadoopConfDir.map { hConfDir =>
+HadoopBootstrapUtil.getHadoopConfFiles(hConfDir)
+  }
+  private val newHadoopConfigMapName =
+if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) {
+  Some(kubernetesConf.hadoopConfigMapName)
+} else {
+  None
+}
+
+  // Either use pre-existing secret or login to create new Secret with DT 
stored 

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224160471
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+  require(kubernetesConf.hadoopConfSpec.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+  private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get
+  private val conf = kubernetesConf.sparkConf
+  private val principal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+  private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB)
+  private val existingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+  private val existingSecretItemKey = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+  private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+  private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP)
+  private val kubeTokenManager = kubernetesConf.tokenManager(conf,
+SparkHadoopUtil.get.newConfiguration(conf))
+  private val isKerberosEnabled =
+(hadoopConfDirSpec.hadoopConfDir.isDefined && 
kubeTokenManager.isSecurityEnabled) ||
+  (hadoopConfDirSpec.hadoopConfigMapName.isDefined &&
+(krb5File.isDefined || krb5CMap.isDefined))
+  require(keytab.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+  require(existingSecretName.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+  KubernetesUtils.requireNandDefined(
+krb5File,
+krb5CMap,
+"Do not specify both a Krb5 local file and the ConfigMap as the 
creation " +
+   "of an additional ConfigMap, when one is already specified, is 
extraneous")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+keytab,
+principal,
+"If a Kerberos principal is specified you must also specify a Kerberos 
keytab",
+"If a Kerberos keytab is specified you must also specify a Kerberos 
principal")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+existingSecretName,
+existingSecretItemKey,
+"If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+  " you must also specify the name of the secret",
+"If a secret storing a Kerberos Delegation Token is specified you must 
also" +
+  " specify the item-key where the data is stored")
+
+  private val hadoopConfigurationFiles = 
hadoopConfDirSpec.hadoopConfDir.map { hConfDir =>
+HadoopBootstrapUtil.getHadoopConfFiles(hConfDir)
+  }
+  private val newHadoopConfigMapName =
+if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) {
+  Some(kubernetesConf.hadoopConfigMapName)
+} else {
+  None
+}
+
+  // Either use pre-existing secret or login to create new Secret with DT 
stored 

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224160090
  
--- Diff: docs/security.md ---
@@ -722,7 +722,83 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a Kerberos job: 
+
+In all cases you must define the environment variable: `HADOOP_CONF_DIR` 
or 
+`spark.kubernetes.hadoop.configMapName` as well as either
+`spark.kubernetes.kerberos.krb5.path` or 
`spark.kubernetes.kerberos.krb5.configMapName`.
--- End diff --

This needs a small update after the recent changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224161587
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import org.apache.spark.internal.Logging
+
+private[spark] object HadoopBootstrapUtil extends Logging {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Optional Location of the krb5 file
+* @param newKrb5ConfName Optional location of the ConfigMap for Krb5
+* @param existingKrb5ConfName Optional name of ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: Option[String],
+  newKrb5ConfName: Option[String],
+  existingKrb5ConfName: Option[String],
+  pod: SparkPod): SparkPod = {
+
+val preConfigMapVolume = existingKrb5ConfName.map { kconf =>
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+  .withName(kconf)
+  .endConfigMap()
+.build()
+}
+
+val createConfigMapVolume = for {
+  fLocation <- fileLocation
+  krb5ConfName <- newKrb5ConfName
+} yield {
+  val krb5File = new File(fLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+.build()
+}
+
+// Breaking up Volume creation for clarity
+val configMapVolume = preConfigMapVolume.orElse(createConfigMapVolume)
+if (configMapVolume.isEmpty) {
+   logInfo("You have not specified a krb5.conf file locally or via a 
ConfigMap. " +
+ "Make sure that you have the krb5.conf locally on the Driver and 
Executor images")
+}
+
+val kerberizedPodWithDTSecret = new PodBuilder(pod.pod)
+  .editOrNewSpec()
+.addNewVolume()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withNewSecret()
+.withSecretName(dtSecretName)
+.endSecret()
+  .endVolume()
+.endSpec()
+  .build()
+
+// Optionally add the krb5.conf ConfigMap
+val kerberizedPod = configMapVolume.map { cmVolume =>
+  new PodBuilder(kerberizedPodWithDTSecret)
+.editSpec()
+  .addNewVolumeLike(cmVolume)
+.endVolume()
+  .endSpec()
+.build()
+}.getOrElse(kerberizedPodWithDTSecret)
+
+val kerberizedContainerWithMounts = new ContainerBuilder(pod.container)
+  .addNewVolumeMount()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+.endVolumeMount()
+  .addNewEnv()
+

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224156790
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopSparkUserExecutorFeatureStep.scala
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesExecutorSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils.HadoopBootstrapUtil
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step is responsible for setting ENV_SPARK_USER when HADOOP_FILES 
are detected
+  * however, this step would not be run if Kerberos is enabled, as 
Kerberos sets SPARK_USER
+  */
+private[spark] class HadoopSparkUserExecutorFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+val sparkUserName = 
kubernetesConf.sparkConf.get(KERBEROS_SPARK_USER_NAME)
+  HadoopBootstrapUtil.bootstrapSparkUserPod(sparkUserName, pod)
--- End diff --

indentation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r224160567
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+  require(kubernetesConf.hadoopConfSpec.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+  private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get
+  private val conf = kubernetesConf.sparkConf
+  private val principal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+  private val keytab = conf.get(org.apache.spark.internal.config.KEYTAB)
+  private val existingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+  private val existingSecretItemKey = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+  private val krb5File = conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+  private val krb5CMap = conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP)
+  private val kubeTokenManager = kubernetesConf.tokenManager(conf,
+SparkHadoopUtil.get.newConfiguration(conf))
+  private val isKerberosEnabled =
+(hadoopConfDirSpec.hadoopConfDir.isDefined && 
kubeTokenManager.isSecurityEnabled) ||
+  (hadoopConfDirSpec.hadoopConfigMapName.isDefined &&
+(krb5File.isDefined || krb5CMap.isDefined))
+  require(keytab.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+  require(existingSecretName.isEmpty || isKerberosEnabled,
+"You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+  KubernetesUtils.requireNandDefined(
+krb5File,
+krb5CMap,
+"Do not specify both a Krb5 local file and the ConfigMap as the 
creation " +
+   "of an additional ConfigMap, when one is already specified, is 
extraneous")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+keytab,
+principal,
+"If a Kerberos principal is specified you must also specify a Kerberos 
keytab",
+"If a Kerberos keytab is specified you must also specify a Kerberos 
principal")
+
+  KubernetesUtils.requireBothOrNeitherDefined(
+existingSecretName,
+existingSecretItemKey,
+"If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+  " you must also specify the name of the secret",
+"If a secret storing a Kerberos Delegation Token is specified you must 
also" +
+  " specify the item-key where the data is stored")
+
+  private val hadoopConfigurationFiles = 
hadoopConfDirSpec.hadoopConfDir.map { hConfDir =>
+HadoopBootstrapUtil.getHadoopConfFiles(hConfDir)
+  }
+  private val newHadoopConfigMapName =
+if (hadoopConfDirSpec.hadoopConfigMapName.isEmpty) {
+  Some(kubernetesConf.hadoopConfigMapName)
+} else {
+  None
+}
+
+  // Either use pre-existing secret or login to create new Secret with DT 
stored 

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22686
  
Closing this per https://github.com/apache/spark/pull/22686


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...

2018-10-10 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22595
  
I read the original PR https://github.com/apache/spark/pull/14617, and it 
is wired that the two columns are hidden in the very beginning.
Also the screenshot in PR description doesn't match the code changes.

Ping @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22688
  
+1 to deal with this as non blocker. I understand data source v2 is under 
heavy development and unstable but strongly think we should backport .. it 
breaks a basic operation ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224153758
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -190,12 +190,13 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
 
   test("simple writable data source") {
 // TODO: java implementation.
+val writeOnlySource = classOf[SimpleWriteOnlyDataSource]
--- End diff --

 can we create a new test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22688
  
ah good point. I think the original design of append operator assumes the 
table already exists, so a schema should be provided. If we treat file path as 
a table, then append should fail for your case because path does not exist, and 
we should use CTAS. cc @rdblue for confirmation.

That said, the change here LGTM. We should only get the relation for append 
mode.

Furthermore, I think in the future we can't simply proxy old `SaveMode` 
write APIs to new write APIs, as the behavior can be different. e.g. currently 
we can write data to a non-existing path with append mode for file sources, but 
the append operator can not.

I'm not sure this should block 2.4. Data source v2 API is unstable, so 
breaking changes are allowed, and we won't treat data source v2 bugs as 
blockers. We should merge this PR to 2.4, but it's not strong enough to fail an 
RC.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22657: [SPARK-25670][TEST] Reduce number of tested timez...

2018-10-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22657#discussion_r224151216
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -106,4 +107,14 @@ abstract class SparkFunSuite
 }
   }
 
+  lazy val outstandingTimezones = Seq(
--- End diff --

Is this lazy so that it's not evaluated by each test suite? I get it 
although `TimeZone` caches these, it seems. It won't matter really either way. 
I'm neutral, it doesn't matter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22623: [SPARK-25636][CORE] spark-submit cuts off the fai...

2018-10-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22623


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-10 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22623
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22593
  
Although it's the Scala API, it's callable from Java just as well. There's 
no Java-specific API here. So, yeah, actually it makes sense to have javadoc 
and scaladoc for this. And I think there are probably many markdown-like things 
that work in scaladoc not in javadoc. While you're welcome to fix all of it 
ideally, it may be way too much. OK, I'd say go ahead but at least fix all the 
instances of bullet lists in the .sql package. It's easy enough to search for 
something like `\*\s+-` as a regex to spot instances.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22547
  
**[Test build #97206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97206/testReport)**
 for PR 22547 at commit 
[`a35d98c`](https://github.com/apache/spark/commit/a35d98cc676c9b3c3b39a45d4b03b43dfe9d0767).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22547
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3854/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22594: [SPARK-25674][SQL] If the records are incremented...

2018-10-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22594#discussion_r224146203
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -70,6 +70,8 @@ class FileScanRDD(
 
   private val ignoreCorruptFiles = 
sparkSession.sessionState.conf.ignoreCorruptFiles
   private val ignoreMissingFiles = 
sparkSession.sessionState.conf.ignoreMissingFiles
+  // only for test
+  private val inputMetricsTest = 
sparkSession.sessionState.conf.contains("spark.inputmetrics.test")
--- End diff --

Ah you know, we have `spark.testing` for this purpose too.

I'm actually on the fence here about whether it's worth this extra 
complexity for the test. It's a simple change and the overall effect is tested 
by other tests of input metrics.

Hm, what do you think, just drop this? is it necessary for the test to work 
correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >