[GitHub] spark issue #15286: [SPARK-17710][HOTFIX] Fix ClassCircularityError in ReplS...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15286 Thank you very much. @tgravescs @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15246 Hi, @srowen Yes, you are right. I am searching the code base to see if we can fix more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs [SPARK-17714](https://issues.apache.org/jira/browse/SPARK-17714) has been created for further investigation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15286: [SPARK-17710][HOTFIX] Fix ClassCircularityError in ReplS...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15286 @JoshRosen Yes. The title has been changed. @tgravescs Yes. Now I am creating a separate jira to investigate this more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15286: [SPARK-16757][HOTFIX] Fix ClassCircularityError in ReplS...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15286 Yes. The title has been changed. Thanks. @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Thanks @tgravescs yes. I have created a PR [15286](https://github.com/apache/spark/pull/15286). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15286: [SPARK-16757][Follow UP] Fix ClassCircularityErro...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/15286 [SPARK-16757][Follow UP] Fix ClassCircularityError in ReplSuite tests in Maven build: use 'Class.forName' instead of 'Utils.classForName' ## What changes were proposed in this pull request? Fix ClassCircularityError in ReplSuite tests when Spark is built by Maven build. ## How was this patch tested? (1) ``` build/mvn -DskipTests -Phadoop-2.3 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl -Pmesos clean package ``` Then test: ``` build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.repl.ReplSuite test ``` ReplSuite tests passed (2) Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode. Need to check if spark caller contexts are written into HDFS hdfs-audit.log and Yarn RM audit log successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark SPARK-16757 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15286 commit 59bfa231600decfd10b29741182107b4b2c52adc Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-09-28T22:04:22Z [SPARK-16757][Follow UP] Fix ClassCircularityError in ReplSuite tests in Maven build: use 'Class.forName' instead of 'Utils.classForName' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @tgravescs @srowen Thanks. Using `Class.forName` which uses `this.getClass().getClassLoader()` by default makes all the tests passed (both sbt and maven). However there must be some reason we prefer `Utils.classForName` instead. Do you have any suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs @srowen Give an intermediate update, If using `Class.forName `instead of Utils.classForName`, Maven build and all of the tests will be passed. ``` def setCurrentContext(): Boolean = { var succeed = false try { // scalastyle:off classforname val callerContext = Class.forName("org.apache.hadoop.ipc.CallerContext") val Builder = Class.forName("org.apache.hadoop.ipc.CallerContext$Builder") // scalastyle:on classforname val builderInst = Builder.getConstructor(classOf[String]).newInstance(context) val hdfsContext = Builder.getMethod("build").invoke(builderInst) callerContext.getMethod("setCurrent", callerContext).invoke(null, hdfsContext) succeed = true } catch { case NonFatal(e) => logInfo("Fail to set Spark caller context", e) } succeed } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @tgravescs @srowen Sorry for the failure. I am looking into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Should we also commit this PR to Branch-2? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15246 Hi, @srowen Thanks a lot for the comments. Yes, setting the working dir can work. However, working dir varies from machine to machine. It would be a little tricky to maintain and troubleshoot in the future. Configuration settings of IDE is not managed and version controlled now. So I think it will be better to make the test case independent with the IDE settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Thanks a lot for the review. @tgravescs @cnauroth @steveloughran @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r80579059 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2421,6 +2421,69 @@ private[spark] object Utils extends Logging { } /** + * An utility class used to set up Spark caller contexts to HDFS and Yarn. The `context` will be + * constructed by parameters passed in. + * When Spark applications run on Yarn and HDFS, its caller contexts will be written into Yarn RM + * audit log and hdfs-audit.log. That can help users to better diagnose and understand how + * specific applications impacting parts of the Hadoop system and potential problems they may be + * creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given HDFS operation, it's + * very helpful to track which upper level job issues it. + * + * @param from who sets up the caller context (TASK, CLIENT, APPMASTER) + * + * The parameters below are optional: + * @param appId id of the app this task belongs to + * @param appAttemptId attempt id of the app this task belongs to + * @param jobId id of the job this task belongs to + * @param stageId id of the stage this task belongs to + * @param stageAttemptId attempt id of the stage this task belongs to + * @param taskId task id + * @param taskAttemptNumber task attempt id + * @since 2.0.1 + */ +private[spark] class CallerContext( + from: String, + appId: Option[String] = None, + appAttemptId: Option[String] = None, + jobId: Option[Int] = None, + stageId: Option[Int] = None, + stageAttemptId: Option[Int] = None, + taskId: Option[Long] = None, + taskAttemptNumber: Option[Int] = None) extends Logging { + + val AppId = if (appId.isDefined) s"_${appId.get}" else "" --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Thanks a lot for the comments. I have updated the PR to rename local vals and remove the `@since` in `Utils.scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r80579032 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2421,6 +2421,69 @@ private[spark] object Utils extends Logging { } /** + * An utility class used to set up Spark caller contexts to HDFS and Yarn. The `context` will be + * constructed by parameters passed in. + * When Spark applications run on Yarn and HDFS, its caller contexts will be written into Yarn RM + * audit log and hdfs-audit.log. That can help users to better diagnose and understand how + * specific applications impacting parts of the Hadoop system and potential problems they may be + * creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given HDFS operation, it's + * very helpful to track which upper level job issues it. + * + * @param from who sets up the caller context (TASK, CLIENT, APPMASTER) + * + * The parameters below are optional: + * @param appId id of the app this task belongs to + * @param appAttemptId attempt id of the app this task belongs to + * @param jobId id of the job this task belongs to + * @param stageId id of the stage this task belongs to + * @param stageAttemptId attempt id of the stage this task belongs to + * @param taskId task id + * @param taskAttemptNumber task attempt id + * @since 2.0.1 --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15246: [MINOR][SQL] Use resource path for test_script.sh
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/15246 [MINOR][SQL] Use resource path for test_script.sh ## What changes were proposed in this pull request? This PR modified the test case `test("script")` to use resource path for `test_script.sh`. Make the test case portable (even in IntelliJ). ## How was this patch tested? Passed the test case. Before: Run `test("script")` in IntelliJ: ``` Caused by: org.apache.spark.SparkException: Subprocess exited with status 127. Error: bash: src/test/resources/test_script.sh: No such file or directory ``` After: Test passed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark hivetest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15246 commit d799eea4ca3e3ad0fc71fe49985e6bc51f197158 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-09-26T21:15:06Z [MINOR][SQL] Use resource path for test_script.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r80323261 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -54,7 +54,10 @@ private[spark] abstract class Task[T]( val partitionId: Int, // The default value is only used in tests. val metrics: TaskMetrics = TaskMetrics.registered, -@transient var localProperties: Properties = new Properties) extends Serializable { +@transient var localProperties: Properties = new Properties, +val jobId: Option[Int] = None, --- End diff -- OK. Thanks, @tgravescs . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r80161383 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -54,7 +54,10 @@ private[spark] abstract class Task[T]( val partitionId: Int, // The default value is only used in tests. val metrics: TaskMetrics = TaskMetrics.registered, -@transient var localProperties: Properties = new Properties) extends Serializable { +@transient var localProperties: Properties = new Properties, +val jobId: Option[Int] = None, --- End diff -- Hi, @tgravescs I want to conform this with you if I can just change and fix up everywhere that calls /extends Task. I can do this, but may change many test classes/cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Thank you very much. Yes. I have updated the PR to make the string of the caller context shorter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 closed the pull request at: https://github.com/apache/spark/pull/15175 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15175 Thanks you, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15175 @gatorsmile Title has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15170 @lresende @rxin Thank for the review. I have created a [PR](https://github.com/apache/spark/pull/15175) to branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15175: [MINOR][BUILD] Fix CheckStyle Error
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/15175 [MINOR][BUILD] Fix CheckStyle Error ## What changes were proposed in this pull request? This PR is to fix the code style errors. ## How was this patch tested? Manual. Before: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[153] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[196] (sizes) LineLength: Line is longer than 100 characters (found 108). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[239] (sizes) LineLength: Line is longer than 100 characters (found 115). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129] (sizes) LineLength: Line is longer than 100 characters (found 104). [ERROR] src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[184] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[304] (regexp) RegexpSingleline: No trailing whitespace allowed. ``` After: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark javastylefix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15175 commit ecefe36645432313e1dc9ca734b38383ce0d8e52 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-09-21T05:28:13Z [MINOR][BUILD] Fix CheckStyle Error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15170 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/15170 There is 0 failures (±0) , 1 skipped (±0) in the Test Result page. I'll re-trigger again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15170: [MINOR][BUILD] Fix CheckStyle Error
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/15170 [MINOR][BUILD] Fix CheckStyle Error ## What changes were proposed in this pull request? This PR is to fix the code style errors before 2.0.1 release. ## How was this patch tested? Manual. Before: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[153] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[196] (sizes) LineLength: Line is longer than 100 characters (found 108). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[239] (sizes) LineLength: Line is longer than 100 characters (found 115). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129] (sizes) LineLength: Line is longer than 100 characters (found 104). [ERROR] src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions. [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[26] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[33] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[38] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[43] (sizes) LineLength: Line is longer than 100 characters (found 106). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[48] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java:[0] (misc) NewlineAtEndOfFile: File does not end with a newline. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[67] (sizes) LineLength: Line is longer than 100 characters (found 106). [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[200] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[309] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[332] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[348] (regexp) RegexpSingleline: No trailing whitespace allowed. ``` After: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark fixjavastyle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15170 commit 91995aa12685a92d033342ccc8981ea5a6968dcb Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-09-20T21:47:28Z [MINOR][BUILD] Fix CheckStyle Error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Thank you so much for the review. I have updated the PR based on your every comment. The only one question left is this one (in `Task`) "are these params all optional just to make it easier for different task types?" I have replied this. Could you check it again and give your opinion? To make the caller context more readable, at commit [10dbc6f](https://github.com/apache/spark/commit/10dbc6f26ac7d224803b721f32a9a0b4306e1f47), I added the static strings `AttemptId` back ( for stage, task and app) which had been deleted at commit [748e7a9](https://github.com/apache/spark/commit/748e7a9b6f6fe928df9e49f8e020d02126123be8). Yes, this PR will set up the caller context for both HDFS and YARN. At very beginning, to make the review easier, I created two different jiras to set up caller contexts for HDFS(SPARK-16757) and YARN (SPARK-16758) although the code is the same. I have updated the jiras, the title of this PR, and the description of this PR. In the âHow was this patch testedâ of the PRâs description, you can see what are showing in HDFS hdfs-audit.log and Yarn RM audit log. When invoking Hadoop CallerContext API in Yarn Client, the caller context (including `SPARK_CLIENT` with AppId only) will be written to both HDFS audit log and Yarn RM audit log. In hdfs-audit.log: ``` 2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005 ``` In Yarn RM log: ``` 2016-09-20 11:59:24,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang IP=127.0.0.1OPERATION=Submit Application RequestTARGET=ClientRMService RESULT=SUCCESS APPID=application_1474394339641_0006 CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006 ``` Also, I have tested this with multiple tasks running in the same executor. Take `application_1474394339641_0006` as example. My command line to run tests as below: ``` ./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5 ``` In Spark History Application page, you can see there are two executors (one is driver), in the executor, there are 46 tasks: https://cloud.githubusercontent.com/assets/8546874/18686920/a2617e70-7f32-11e6-947e-dfe83c4185e3.png;> In HDFS audit log, there are 46 task records.: ``` 2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=mkdirs src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_01/spark-warehouse dst=null perm=wyang:supergroup:rwxr-xr-x proto=rpc callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1 2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0 2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0 2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0 2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0 2016-09-20 11:59:42,432 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_4_AttemptNum_0 2016-09-20 11:59:42,445 INFO FSNamesystem.audit: allowed=true ugi=wyang
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79695565 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } } +private[spark] class CallerContext( + appName: Option[String] = None, + appID: Option[String] = None, + appAttemptID: Option[String] = None, + jobID: Option[Int] = None, + stageID: Option[Int] = None, + stageAttemptId: Option[Int] = None, + taskId: Option[Long] = None, + taskAttemptNumber: Option[Int] = None) extends Logging { + + val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else "" --- End diff -- Yes. Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79695449 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } } +private[spark] class CallerContext( + appName: Option[String] = None, + appID: Option[String] = None, + appAttemptID: Option[String] = None, + jobID: Option[Int] = None, + stageID: Option[Int] = None, + stageAttemptId: Option[Int] = None, + taskId: Option[Long] = None, + taskAttemptNumber: Option[Int] = None) extends Logging { + + val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else "" --- End diff -- I have updated the PR to remove appName, and replace appName with something to differentiate the context from ApplicationMaster vs Yarn Client vs Task. But for AppID, I think it is better to keep it since in hdfs-audit.log, there is no info about application. For example, the record below was produced when Task did a read/write operation to HDFS, except `callerContext`, there is no other info about application: ``` 2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79693044 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -42,7 +42,10 @@ import org.apache.spark.rdd.RDD * input RDD's partitions). * @param localProperties copy of thread-local properties set by the user on the driver side. * @param metrics a [[TaskMetrics]] that is created at driver side and sent to executor side. - */ + * @param jobId id of the job this task belongs to --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79692960 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -196,8 +198,13 @@ private[spark] class ApplicationMaster( // Set this internal configuration if it is running on cluster mode, this // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode. System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString()) + +attemptID = Option(appAttemptId.getAttemptId.toString) } + new CallerContext(Option(System.getProperty("spark.app.name")), --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79693000 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } } +private[spark] class CallerContext( + appName: Option[String] = None, + appID: Option[String] = None, + appAttemptID: Option[String] = None, + jobID: Option[Int] = None, + stageID: Option[Int] = None, + stageAttemptId: Option[Int] = None, + taskId: Option[Long] = None, + taskAttemptNumber: Option[Int] = None) extends Logging { + + val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else "" + val AppID = if (appID.isDefined) s"_AppID_${appID.get}" else "" + val AppAttemptID = if (appAttemptID.isDefined) s"_${appAttemptID.get}" else "" + val JobID = if (jobID.isDefined) s"_JobID_${jobID.get}" else "" + val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else "" + val StageAttemptId = if (stageAttemptId.isDefined) s"_${stageAttemptId.get}" else "" + val TaskId = if (taskId.isDefined) s"_TaskId_${taskId.get}" else "" + val TaskAttemptNumber = if (taskAttemptNumber.isDefined) s"_${taskAttemptNumber.get}" else "" + + val context = "SPARK" + AppName + AppID + AppAttemptID + + JobID + StageID + StageAttemptId + TaskId + TaskAttemptNumber + + def set(): Boolean = { --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79692475 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } } +private[spark] class CallerContext( + appName: Option[String] = None, + appID: Option[String] = None, + appAttemptID: Option[String] = None, + jobID: Option[Int] = None, + stageID: Option[Int] = None, + stageAttemptId: Option[Int] = None, + taskId: Option[Long] = None, + taskAttemptNumber: Option[Int] = None) extends Logging { + + val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else "" + val AppID = if (appID.isDefined) s"_AppID_${appID.get}" else "" + val AppAttemptID = if (appAttemptID.isDefined) s"_${appAttemptID.get}" else "" + val JobID = if (jobID.isDefined) s"_JobID_${jobID.get}" else "" + val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else "" + val StageAttemptId = if (stageAttemptId.isDefined) s"_${stageAttemptId.get}" else "" + val TaskId = if (taskId.isDefined) s"_TaskId_${taskId.get}" else "" + val TaskAttemptNumber = if (taskAttemptNumber.isDefined) s"_${taskAttemptNumber.get}" else "" + + val context = "SPARK" + AppName + AppID + AppAttemptID + + JobID + StageID + StageAttemptId + TaskId + TaskAttemptNumber + + def set(): Boolean = { +var succeed = false +try { + val callerContext = Utils.classForName("org.apache.hadoop.ipc.CallerContext") --- End diff -- Yes. I have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79692046 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -54,7 +54,10 @@ private[spark] abstract class Task[T]( val partitionId: Int, // The default value is only used in tests. val metrics: TaskMetrics = TaskMetrics.registered, -@transient var localProperties: Properties = new Properties) extends Serializable { +@transient var localProperties: Properties = new Properties, +val jobId: Option[Int] = None, --- End diff -- Making these params all optional is not to break current code which uses this API. An alternative way is to mark the current API as deprecated and add a new overloaded function with new parameters. I am going to go this way. Any suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Could you please review this again? I have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 There is "0 failures (±0)" in the Test Result page. All tests passed. I'll re-trigger again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78898595 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + def setCallerContext(context: String): Boolean = { +var succeed = false +try { + val Builder = Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder") + val builderInst = Builder.getConstructor(classOf[String]).newInstance(context) + val ret = Builder.getMethod("build").invoke(builderInst) --- End diff -- Yes. hdfsContext is more readable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78898257 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + def setCallerContext(context: String): Boolean = { +var succeed = false +try { + val Builder = Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder") + val builderInst = Builder.getConstructor(classOf[String]).newInstance(context) + val ret = Builder.getMethod("build").invoke(builderInst) + val callerContext = Utils.classForName("org.apache.hadoop.ipc.CallerContext") --- End diff -- If make `val callerContext = Utils.classForName("org.apache.hadoop.ipc.CallerContext")` out of `try` block, Spark will throw exception when it runs on hadoop before 2.8.0. Also, moving that line to the first of `try` block does not make any difference since `Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder")` also needs to check if `org.apache.hadoop.ipc.CallerContext` exists. I am not sure if I got your point, could you please give more info about it? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78897068 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -79,6 +82,13 @@ private[spark] abstract class Task[T]( metrics) TaskContext.setTaskContext(context) taskThread = Thread.currentThread() + +val callerContext = + s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}" + + s"_JobId_${jobId.getOrElse("0")}_StageID_${stageId}_stageAttemptId_${stageAttemptId}" + +s"_taskID_${taskAttemptId}_attemptNumber_${attemptNumber}" +Utils.setCallerContext(callerContext) --- End diff -- Yes. Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896928 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -79,6 +82,13 @@ private[spark] abstract class Task[T]( metrics) TaskContext.setTaskContext(context) taskThread = Thread.currentThread() + +val callerContext = + s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}" + + s"_JobId_${jobId.getOrElse("0")}_StageID_${stageId}_stageAttemptId_${stageAttemptId}" + +s"_taskID_${taskAttemptId}_attemptNumber_${attemptNumber}" --- End diff -- I have updated the PR to make the string shorter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896972 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -79,6 +82,13 @@ private[spark] abstract class Task[T]( metrics) TaskContext.setTaskContext(context) taskThread = Thread.currentThread() + +val callerContext = + s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}" + --- End diff -- Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896863 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -184,6 +184,9 @@ private[spark] class ApplicationMaster( try { val appAttemptId = client.getAttemptId() + var context = s"Spark_AppName_${System.getProperty("spark.app.name")}" + --- End diff -- A CallerContext class has been added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896816 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging { sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten } } + + def setCallerContext(context: String): Boolean = { +var succeed = false +try { + val Builder = Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder") + val builderInst = Builder.getConstructor(classOf[String]).newInstance(context) + val ret = Builder.getMethod("build").invoke(builderInst) + val callerContext = Utils.classForName("org.apache.hadoop.ipc.CallerContext") + callerContext.getMethod("setCurrent", callerContext).invoke(null, ret) + succeed = true +} catch { + case NonFatal(e) => logDebug(s"$e", e) --- End diff -- I have updated this to "case NonFatal(e) => logInfo("Fail to set Spark caller context", e)" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896707 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -54,7 +54,10 @@ private[spark] abstract class Task[T]( val partitionId: Int, // The default value is only used in tests. val metrics: TaskMetrics = TaskMetrics.registered, -@transient var localProperties: Properties = new Properties) extends Serializable { +@transient var localProperties: Properties = new Properties, +val jobId: Option[Int] = None, --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896692 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -51,8 +51,12 @@ private[spark] class ShuffleMapTask( partition: Partition, @transient private var locs: Seq[TaskLocation], metrics: TaskMetrics, -localProperties: Properties) - extends Task[MapStatus](stageId, stageAttemptId, partition.index, metrics, localProperties) +localProperties: Properties, --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r78896681 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -51,8 +51,12 @@ private[spark] class ResultTask[T, U]( locs: Seq[TaskLocation], val outputId: Int, localProperties: Properties, -metrics: TaskMetrics) - extends Task[U](stageId, stageAttemptId, partition.index, metrics, localProperties) +metrics: TaskMetrics, +jobId: Option[Int] = None, --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Thank you very much for the review. I have updated the PR based on your every comment, including adding a CallerContext class, updating java doc, and made the caller context string shorter, etc. Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode, and spark caller contexts are written into HDFS `hdfs-audit.log` successfully. The following is the screenshot of the audit log (SparkKMeans in yarn client mode): https://cloud.githubusercontent.com/assets/8546874/18539563/1eb16748-7acd-11e6-840a-0e8bfabf5954.png;> This is the caller context which was written into `hdfs-audit.log` by `Yarn Client`: ``` 2016-09-14 22:28:59,341 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppName_SparkKMeans_AppID_application_1473908768790_0007 ``` The callerContext above is `SPARK_AppName_***_AppID_***` These are the caller contexts which were written into `hdfs-audit.log` by `Task`: ``` 2016-09-14 22:29:06,525 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_1_0 2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_0_0 2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0 ``` The callContext above is `SPARK_AppID_***_JobID_***_StageID_***_(StageAttemptID)_TaskId_***_(TaskAttemptNumber)`. The static strings `jobAttemptID`, `stageAttemptID`, and `attemptNumber` of tasks have been deleted. (For `jobAttemptID`, please refer the following records produced by SparkKMeans ran in Yarn cluster mode) The records below were written into `hdfs-audit.log` when SparkKMeans ran in Yarn cluster mode: ``` 2016-09-14 22:25:30,100 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1473908768790_0006/container_1473908768790_0006_01_01/spark-warehouse dst=nullperm=wyang:supergroup:rwxr-xr-x proto=rpc callerContext=SPARK_AppName_org.apache.spark.examples.SparkKMeans_AppID_application_1473908768790_0006_1 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_0_0 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_2_0 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_1_0 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @tgravescs Sure. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Could you please review this PR? Thank you very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @steveloughran Thank you very much. I have updated the PR based on your comments. Also, I have added an unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @srowen Thanks all the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @srowen Could you please review this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @srowen Could you please review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 The only failure is 'basic functionality', but it passed locally. I'll re-trigger again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14769 Yes. Are they ok now? @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14768 @srowen Thanks for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14768 For this piece of code: ![image](https://cloud.githubusercontent.com/assets/8546874/17901844/c4dae970-6919-11e6-8361-a73321a19f86.png) I think lines in` if `and `else` blocks should be at same indentation as logically they are at same level. The line `((UnsafeInMemorySorter.SortedIterator)upstream).getCurrentPageNumber()) ` is a logical block, and I indent 8 spaces instead of 2 spaces to make code more readable. I think it's much better than this: ![image](https://cloud.githubusercontent.com/assets/8546874/17902709/30db2074-691d-11e6-9644-c8f03a9819cf.png) I referred to the [Oracle's Java code conversions](http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-136091.html#248) , but not there is no exactly same case. @srowen Could you please give some directions? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14768#discussion_r75908543 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java --- @@ -61,7 +61,8 @@ public static void main(String[] args) throws Exception { .load(); // Split the lines into words -Dataset words = lines.as(Encoders.STRING()).flatMap(new FlatMapFunction<String, String>() { +Dataset words = lines.as(Encoders.STRING()) + .flatMap(new FlatMapFunction<String, String>() { @Override --- End diff -- Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14768 Thanks, @jerryshao. I have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14769: [MINOR][SQL] Remove implemented functions from co...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14769 [MINOR][SQL] Remove implemented functions from comments of 'HiveSessi⦠## What changes were proposed in this pull request? This PR removes implemented functions from comments of `HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`. ## How was this patch tested? Manual. â¦onCatalog.scala' You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark cleanComment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14769.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14769 commit 8f3e25fe3fb88ba51c8c01013786041f58e80427 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-23T05:43:36Z [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14768 [MINOR][BUILD] Fix Java CheckStyle Error ## What changes were proposed in this pull request? As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release. Before: ``` ./dev/lint-java Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119). [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103). ``` After: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark fixjavastyle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14768.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14768 commit a36989105086f60417f21341d8573b4d3c6bc7eb Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-23T04:42:04Z [MINOR][BUILD] Fix Java CheckStyle Error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Thanks a lot for adding me as âcontributorâ in Hadoop :) @steveloughran @cnauroth --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @steveloughran Thanks a lot for the comments. In the audit log, if users set some configuration in spark-defaults.conf like `spark.eventLog.dir hdfs://localhost:9000/spark-history`, there will be a record below in audit log: ``` 2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=setPermission src=/spark-history/application_1471835208589_0013.lz4.inprogress dst=null perm=wyang:supergroup:rwxrwx--- proto=rpc ``` We can see the application id `application_1471835208589_0013` above. Except that case, there is no Spark application information like application name and application id (or in yarn appID+attemptID) in the audit log. So I think it is better to include application name/id in the caller context. I have updated the PR to include those information. In the commit [5ab2a41](https://github.com/apache/spark/pull/14659/commits/5ab2a41b93bfd73baf3798ba66fc7554b10b78e6), application ID and attemptID (only in yarn cluster mode) are included in the value of the caller context when Yarn `client` (if applications run in Yarn client mode) or `ApplicationMaster` (if applications run in Yarn cluster mode) do some operations in HDFS. So in the audit log, you can see `callercontext = Spark_appName_**_appId_**_attemptID_**`: _Applications in yarn cluster mode_ ``` 2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=nullperm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_01/spark-warehouse dst=nullperm=wyang:supergroup:rwxr-xr-x proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 ``` _Applications in yarn client mode_ ``` 2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=nullperm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=nullperm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 ``` In the commit [1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205) application ID, name and attempt ID (only in yarn cluster mode) are included in the value of the caller context when `Tasks` do operations in HDFS. So in the audit log, you can see `callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**`: _Applications in Yarn cluster mode_ ``` 2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true ugi=wyang
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @cnauroth Thank you very much for the review and suggestion. I have removed the spaces in the value of the caller context, and prepended "Spark" instead (refer to the commit [3b9a17e](https://github.com/apache/spark/pull/14659/commits/3b9a17e6dc9ef60a4c40f8aab2d0409c32b864e1)). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @steveloughran Thank you very much for the comments. I have created an Hadoop jira [HADOOP-13527 ](https://issues.apache.org/jira/browse/HADOOP-13527) and attached the patch, could you please review it? I am unable to assign the jira to me, could you please add me as âcontributorâ role in Hadoop? Thanks again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' t...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14577 Hi, @srowen Thanks a lot for the comments. Sorry for the late reply. You are right. I will check how other pages format the date. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @srowen . Thank you so much for the review. Sorry for the test failure and late update. The failure reasons are that âjobIDâ were none or there was no âspark.app.nameâ in sparkConf. I have updated the PR to set default values to âjobIDâ and âspark.app.nameâ. When a real application runs on Spark, it will always have âjobIDâ and âspark.app.nameâ. What's the use case for this? When users run Spark applications on Yarn on HDFS, Sparkâs caller contexts will be written into hdfs-audit.log. The Spark caller contexts are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applicationsâ name. The caller context can help users to better diagnose and understand how specific applications impacting parts of the Hadoop system and potential problems they may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given HDFS operation, it's very helpful to track which upper level job issues it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14659 [SPARK-16757] Set up Spark caller context to HDFS ## What changes were proposed in this pull request? 1. Pass `jobId` to Task. 2. Invoke Hadoop APIs. A new function `setCallerContext` is added in `Utils`. `setCallerContext` function invokes APIs of `org.apache.hadoop.ipc.CallerContext` to set up spark caller contexts, which will be written into `hdfs-audit.log`. For applications in Yarn client mode, `org.apache.hadoop.ipc.CallerContext` are called in `Task` and Yarn `Client`. For applications in Yarn cluster mode, `org.apache.hadoop.ipc.CallerContext` are be called in `Task` and `ApplicationMaster`. The Spark caller contexts written into `hdfs-audit.log` are applications' name` {spark.app.name}` and `JobID_stageID_stageAttemptId_taskID_attemptNumbe`. ## How was this patch tested? Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode. Need to check if spark caller contexts are written into HDFS hdfs-audit.log successfully. For example, run SparkKmeans in Yarn client mode: `./bin/spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5` Before: There will be no Spark caller context in records of `hdfs-audit.log`. After: Spark caller contexts will be in records of `hdfs-audit.log`. (_Note: spark caller context below since Hadoop caller context API was invoked in Yarn Client_) `2016-07-21 13:52:30,802 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=getfileinfo src=/lr_big.txtdst=nullperm=nullproto=rpc callerContext=SparkKMeans running on Spark ` (_Note: spark caller context below since Hadoop caller context API was invoked in Task_) `2016-07-21 13:52:35,584 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open src=/lr_big.txtdst=nullperm=nullproto=rpc callerContext=JobId_0_StageID_0_stageAttemptId_0_taskID_0_attemptNumber_0` You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark callercontextSubmit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14659.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14659 commit ec6833d32ef14950b2d81790bc908992f6288815 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-16T04:11:41Z [SPARK-16757] Set up Spark caller context to HDFS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14556 @srowen Thanks for the new PR and the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' t...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14577 Hi, @rxin . Thanks for the quick feedback. This PR is to remove time inconsistency between webpages. Right now the times in history page is inconsistent with the times in other pages like spark job pages, that makes users confused. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Compl...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14577 [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' time and 'Last⦠## What changes were proposed in this pull request? In historypage.js, format 'Started' time, 'Completed' time and 'Last Updated' time to user local time. ## How was this patch tested? Test manually. ⦠Updated' time in history server UI to the user local time You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14577.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14577 commit 4218f529c3bd31e6a3bd56852ab607a81b41db35 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-10T05:53:13Z [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' time and 'Last Updated' time in history server UI to the user local time --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14556: [SPARK-16966][Core] Make App Name to the valid na...
Github user Sherry302 closed the pull request at: https://github.com/apache/spark/pull/14556 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14556: [SPARK-16966][Core] Make App Name to the valid na...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14556 [SPARK-16966][Core] Make App Name to the valid name instead of a rand⦠## What changes were proposed in this pull request? In the SparkSession, before setting "spark.app.name" to "java.util.UUID.randomUUID().toString", sparkConf.contains("spark.app.name") should be checked instead of options.contains("spark.app.name") ## How was this patch tested? Manual. E.g.: ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar The application "org.apache.spark.examples.SparkKMeans" above did not invoke ".appName()". Before this commit, in the history server UI: App Name was a randomUUID 70c06dc5-1b99-4b4a-a826-ea27497e977b. Now, with this commit, the App Name is the valid name "myApplicationTest". â¦omUUID when 'spark.app.name' exists You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14556.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14556 commit a21937be7de24a353a3e8c9bbe7471b31a1f4719 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-09T06:42:39Z [SPARK-16966][Core] Make App Name to the valid name instead of a randomUUID when 'spark.app.name' exists --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14532: SPARK-16945: Fix Java Lint errors
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14532 SPARK-16945: Fix Java Lint errors ## What changes were proposed in this pull request? This PR is to fix the minor Java linter errors as following: [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10] (modifier) RedundantModifier: Redundant 'final' modifier. ## How was this patch tested? Manual test. dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14532.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14532 commit 736cee23f3e795ca122009f67c344f4fe7c7fbc6 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-08-08T05:12:36Z SPARK-16945: Fix Java Lint errors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...
Github user Sherry302 closed the pull request at: https://github.com/apache/spark/pull/14312 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72536143 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -66,6 +66,9 @@ private[spark] class Client( import Client._ import YarnSparkHadoopUtil._ + val context: String = s"${sparkConf.get("spark.app.name")} running on Spark" + Utils.setCallerContext(context) --- End diff -- If spark applications are in Yarn cluster mode, the record with that caller context below will be in HDFS log: 2016-07-21 14:32:33,404 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/spark-history dst=nullperm=null proto=rpc callerContext=org.apache.spark.examples.SparkKMeans running on Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14312: [SPARK-15857]Add caller context in Spark: invoke YARN/HD...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14312 Thanks the feedback, Jerry. I am going to update the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14312 [SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set⦠## What changes were proposed in this pull request? 1. Pass 'jobId' to Task. 2. Add a new function 'setCallerContext' in Utils. 'setCallerContext' function will call APIs of 'org.apache.hadoop.ipc.CallerContext' to set up spark caller contexts, which will be written into HDFS hdfs-audit.log or Yarn resource manager log. 3. 'setCallerContext' function will be called in Yarn client, ApplicationMaster, and Task class. The Spark caller context written into HDFS log will be "JobID_stageID_stageAttemptId_taskID_attemptNumbe on Spark", and the Spark caller context written into Yarn log will be"{spark.app.name} running on Spark". ## How was this patch tested? Manual Tests against some Spark applications in Yarn client mode and cluster mode. Need to check if spark caller contexts were written into HDFS hdfs-audit.log and Yarn resource manager log successfully. For example, run SparkKmeans on Spark: In Yarn resource manager log, there will be a record with the spark caller context. ... 2016-07-21 13:36:26,318 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang IP=127.0.0.1OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1469125587135_0004 CALLERCONTEXT=SparkKMeans running on Spark ... In HDFS hdfs-audit.log, there will be records with spark caller contexts. ... 2016-07-21 13:38:30,799 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=null perm=null proto=rpccallerContext=SparkKMeans running on Spark ... 2016-07-21 13:39:35,584 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE)ip=/127.0.0.1 cmd=opensrc=/lr_big.txt dst=null perm=nullproto=rpc callerContext=JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0 on Spark ... If the hadoop version on which Spark runs does not have CallerContext APIs, there will be no information of Spark caller context in those logs. ⦠up caller context You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14312 commit 38c4f58dbf30d541260ee1b0381993a9bec393f8 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-07-22T01:21:03Z [SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set up caller context --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14163: [SPARK-15923][YARN] Spark Application rest api returns '...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14163 Updated the doc based on the feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14163: [SPARK-15923][YARN] Spark Application rest api re...
Github user Sherry302 commented on a diff in the pull request: https://github.com/apache/spark/pull/14163#discussion_r71425048 --- Diff: docs/monitoring.md --- @@ -224,10 +224,12 @@ both running applications, and in the history server. The endpoints are mounted for the history server, they would typically be accessible at `http://:18080/api/v1`, and for a running application, at `http://localhost:4040/api/v1`. -In the API, an application is referenced by its application ID, `[app-id]`. -When running on YARN, each application may have multiple attempts; each identified by their `[attempt-id]`. -In the API listed below, `[app-id]` will actually be `[base-app-id]/[attempt-id]`, -where `[base-app-id]` is the YARN application ID. +In the API, an application is referenced by its application ID, `[app-id]`. +Spark on YARN supports multiple application attempts in cluster mode but not in client mode. --- End diff -- Thanks for the feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14163: [SPARK-15923][YARN] Spark Application rest api re...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14163 [SPARK-15923][YARN] Spark Application rest api returns 'no such app: ⦠## What changes were proposed in this pull request? Update monitoring.md. â¦' You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14163.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14163 commit aa2129e1480cd863c42872c82e08cb8eef2d992b Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-07-12T22:13:21Z [SPARK-15923][YARN] Spark Application rest api returns 'no such app: ' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14024: [SPARK-15923][YARN] Spark Application rest api re...
Github user Sherry302 closed the pull request at: https://github.com/apache/spark/pull/14024 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14024: [SPARK-15923][YARN] Spark Application rest api re...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14024 [SPARK-15923][YARN] Spark Application rest api returns 'no such app: ⦠## What changes were proposed in this pull request? 1. Updated the monitoring.md doc. 2. In YarnSchedulerBackend.scala: make applications run in Yarn cluster mode have attemptID "1" by default. ## How was this patch tested? Manual tests passed. â¦' You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14024.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14024 commit a15dee1aee3afa53a455c4b0aba5e3388a0129d3 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-07-02T01:45:38Z [SPARK-15923][YARN] Spark Application rest api returns 'no such app: ' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13448: [SPARK-15707][SQL] Make Code Neat - Use map instead of i...
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/13448 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13448: [SPARK-15707][SQL] Make Code Neat - Use map inste...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/13448 [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. ## What changes were proposed in this pull request? In forType function of object RandomDataGenerator, the code following: if (maybeSqlTypeGenerator.isDefined){ Some(generator) } else{ None } will be changed. Instead, maybeSqlTypeGenerator.map will be used. ## How was this patch tested? All of the current unit tests passed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13448.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13448 commit 010110ebe18b2de291f03c03ebaa9183ed7b3987 Author: Weiqing Yang <yangweiqing...@gmail.com> Date: 2016-06-01T18:12:59Z [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org