[GitHub] spark issue #15286: [SPARK-17710][HOTFIX] Fix ClassCircularityError in ReplS...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15286
  
Thank you very much. @tgravescs @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15246
  
Hi, @srowen Yes, you are right. I am searching the code base to see if we 
can fix more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs 
[SPARK-17714](https://issues.apache.org/jira/browse/SPARK-17714) has been 
created for further investigation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15286: [SPARK-17710][HOTFIX] Fix ClassCircularityError in ReplS...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15286
  
@JoshRosen Yes. The title has been changed.
@tgravescs Yes. Now I am creating a separate jira to investigate this more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15286: [SPARK-16757][HOTFIX] Fix ClassCircularityError in ReplS...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15286
  
Yes. The title has been changed. Thanks. @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Thanks @tgravescs yes. I have created a PR 
[15286](https://github.com/apache/spark/pull/15286). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15286: [SPARK-16757][Follow UP] Fix ClassCircularityErro...

2016-09-28 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/15286

[SPARK-16757][Follow UP] Fix ClassCircularityError in ReplSuite tests in 
Maven build: use 'Class.forName' instead of 'Utils.classForName'

## What changes were proposed in this pull request?
Fix ClassCircularityError in ReplSuite tests when Spark is built by Maven 
build.

## How was this patch tested?
(1)
```
build/mvn -DskipTests -Phadoop-2.3 -Pyarn -Phive -Phive-thriftserver 
-Pkinesis-asl -Pmesos clean package
```
Then test:
```
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.repl.ReplSuite test
```
ReplSuite tests passed

(2)
Manual Tests against some Spark applications in Yarn client mode and Yarn 
cluster mode. Need to check if spark caller contexts are written into HDFS 
hdfs-audit.log and Yarn RM audit log successfully.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark SPARK-16757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15286


commit 59bfa231600decfd10b29741182107b4b2c52adc
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-09-28T22:04:22Z

[SPARK-16757][Follow UP] Fix ClassCircularityError in ReplSuite tests in 
Maven build: use 'Class.forName' instead of 'Utils.classForName'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@tgravescs @srowen Thanks. Using `Class.forName` which uses 
`this.getClass().getClassLoader()` by default makes all the tests passed (both 
sbt and maven). However there must be some reason we prefer 
`Utils.classForName` instead. Do you have any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs @srowen 
Give an intermediate update, If using `Class.forName  `instead of 
Utils.classForName`, Maven build and all of the tests will be passed. 

```
  def setCurrentContext(): Boolean = {
var succeed = false
try {
  // scalastyle:off classforname
  val callerContext = 
Class.forName("org.apache.hadoop.ipc.CallerContext")
  val Builder = 
Class.forName("org.apache.hadoop.ipc.CallerContext$Builder")
  // scalastyle:on classforname
  val builderInst = 
Builder.getConstructor(classOf[String]).newInstance(context)
  val hdfsContext = Builder.getMethod("build").invoke(builderInst)
  callerContext.getMethod("setCurrent", callerContext).invoke(null, 
hdfsContext)
  succeed = true
} catch {
  case NonFatal(e) => logInfo("Fail to set Spark caller context", e)
}
succeed
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-28 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@tgravescs @srowen Sorry for the failure. I am looking into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-27 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Should we also commit this PR to Branch-2? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-09-27 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15246
  
Hi, @srowen Thanks a lot for the comments. Yes, setting the working dir can 
work. However, working dir varies from machine to machine. It would be a little 
tricky to maintain and troubleshoot in the future. Configuration settings of 
IDE is not managed and version controlled now. So I think it will be better to 
make the test case independent with the IDE settings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-27 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Thanks a lot for the review. @tgravescs @cnauroth @steveloughran @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-26 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r80579059
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2421,6 +2421,69 @@ private[spark] object Utils extends Logging {
 }
 
 /**
+ * An utility class used to set up Spark caller contexts to HDFS and Yarn. 
The `context` will be
+ * constructed by parameters passed in.
+ * When Spark applications run on Yarn and HDFS, its caller contexts will 
be written into Yarn RM
+ * audit log and hdfs-audit.log. That can help users to better diagnose 
and understand how
+ * specific applications impacting parts of the Hadoop system and 
potential problems they may be
+ * creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a 
given HDFS operation, it's
+ * very helpful to track which upper level job issues it.
+ *
+ * @param from who sets up the caller context (TASK, CLIENT, APPMASTER)
+ *
+ * The parameters below are optional:
+ * @param appId id of the app this task belongs to
+ * @param appAttemptId attempt id of the app this task belongs to
+ * @param jobId id of the job this task belongs to
+ * @param stageId id of the stage this task belongs to
+ * @param stageAttemptId attempt id of the stage this task belongs to
+ * @param taskId task id
+ * @param taskAttemptNumber task attempt id
+ * @since 2.0.1
+ */
+private[spark] class CallerContext(
+   from: String,
+   appId: Option[String] = None,
+   appAttemptId: Option[String] = None,
+   jobId: Option[Int] = None,
+   stageId: Option[Int] = None,
+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+
+   val AppId = if (appId.isDefined) s"_${appId.get}" else ""
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-26 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Thanks a lot for the comments. I have updated the PR to 
rename local vals and remove the `@since` in `Utils.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-26 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r80579032
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2421,6 +2421,69 @@ private[spark] object Utils extends Logging {
 }
 
 /**
+ * An utility class used to set up Spark caller contexts to HDFS and Yarn. 
The `context` will be
+ * constructed by parameters passed in.
+ * When Spark applications run on Yarn and HDFS, its caller contexts will 
be written into Yarn RM
+ * audit log and hdfs-audit.log. That can help users to better diagnose 
and understand how
+ * specific applications impacting parts of the Hadoop system and 
potential problems they may be
+ * creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a 
given HDFS operation, it's
+ * very helpful to track which upper level job issues it.
+ *
+ * @param from who sets up the caller context (TASK, CLIENT, APPMASTER)
+ *
+ * The parameters below are optional:
+ * @param appId id of the app this task belongs to
+ * @param appAttemptId attempt id of the app this task belongs to
+ * @param jobId id of the job this task belongs to
+ * @param stageId id of the stage this task belongs to
+ * @param stageAttemptId attempt id of the stage this task belongs to
+ * @param taskId task id
+ * @param taskAttemptNumber task attempt id
+ * @since 2.0.1
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-09-26 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/15246

[MINOR][SQL] Use resource path for test_script.sh

## What changes were proposed in this pull request?
This PR modified the test case `test("script")` to use resource path for 
`test_script.sh`. Make the test case portable (even in IntelliJ).


## How was this patch tested?
Passed the test case.
Before:
Run `test("script")` in IntelliJ:
```
Caused by: org.apache.spark.SparkException: Subprocess exited with status 
127. Error: bash: src/test/resources/test_script.sh: No such file or directory 
```
After:
Test passed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark hivetest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15246


commit d799eea4ca3e3ad0fc71fe49985e6bc51f197158
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-09-26T21:15:06Z

[MINOR][SQL] Use resource path for test_script.sh




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-23 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r80323261
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -54,7 +54,10 @@ private[spark] abstract class Task[T](
 val partitionId: Int,
 // The default value is only used in tests.
 val metrics: TaskMetrics = TaskMetrics.registered,
-@transient var localProperties: Properties = new Properties) extends 
Serializable {
+@transient var localProperties: Properties = new Properties,
+val jobId: Option[Int] = None,
--- End diff --

OK. Thanks, @tgravescs .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-22 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r80161383
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -54,7 +54,10 @@ private[spark] abstract class Task[T](
 val partitionId: Int,
 // The default value is only used in tests.
 val metrics: TaskMetrics = TaskMetrics.registered,
-@transient var localProperties: Properties = new Properties) extends 
Serializable {
+@transient var localProperties: Properties = new Properties,
+val jobId: Option[Int] = None,
--- End diff --

Hi, @tgravescs I want to conform this with you if I can just change and fix 
up everywhere that calls /extends Task. I can do this, but may change many test 
classes/cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-22 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Thank you very much. Yes. I have updated the PR to make the 
string of the caller context shorter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error

2016-09-21 Thread Sherry302
Github user Sherry302 closed the pull request at:

https://github.com/apache/spark/pull/15175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error

2016-09-21 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15175
  
Thanks you, @srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15175: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error

2016-09-21 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15175
  
@gatorsmile Title has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15170
  
@lresende @rxin Thank for the review. I have created a 
[PR](https://github.com/apache/spark/pull/15175) to branch-2.0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15175: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/15175

[MINOR][BUILD] Fix CheckStyle Error

## What changes were proposed in this pull request?
This PR is to fix the code style errors.

## How was this patch tested?
Manual.

Before:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[153] 
(sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[196] 
(sizes) LineLength: Line is longer than 100 characters (found 108).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[239] 
(sizes) LineLength: Line is longer than 100 characters (found 115).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119]
 (sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129]
 (sizes) LineLength: Line is longer than 100 characters (found 104).
[ERROR] 
src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] 
(modifier) ModifierOrder: 'static' modifier out of order with the JLS 
suggestions.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[184] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[304] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
 ```
After:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark javastylefix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15175


commit ecefe36645432313e1dc9ca734b38383ce0d8e52
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-09-21T05:28:13Z

[MINOR][BUILD] Fix CheckStyle Error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15170
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15170: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/15170
  
There is 0 failures (±0) , 1 skipped (±0) in the Test Result page. I'll 
re-trigger again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15170: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/15170

[MINOR][BUILD] Fix CheckStyle Error

## What changes were proposed in this pull request?
This PR is to fix the code style errors before 2.0.1 release.

## How was this patch tested?
Manual.

Before:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[153] 
(sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[196] 
(sizes) LineLength: Line is longer than 100 characters (found 108).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[239] 
(sizes) LineLength: Line is longer than 100 characters (found 115).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119]
 (sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129]
 (sizes) LineLength: Line is longer than 100 characters (found 104).
[ERROR] 
src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] 
(modifier) ModifierOrder: 'static' modifier out of order with the JLS 
suggestions.
[ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[26] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[33]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[38]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[43]
 (sizes) LineLength: Line is longer than 100 characters (found 106).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[48]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java:[0]
 (misc) NewlineAtEndOfFile: File does not end with a newline.
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[67]
 (sizes) LineLength: Line is longer than 100 characters (found 106).
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[200] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[309] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[332] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[348] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
 ```
After:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark fixjavastyle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15170


commit 91995aa12685a92d033342ccc8981ea5a6968dcb
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-09-20T21:47:28Z

[MINOR][BUILD] Fix CheckStyle Error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Thank you so much for the review. I have updated the PR 
based on your every comment.

The only one question left is this one (in `Task`) "are these params all 
optional just to make it easier for different task types?" I have replied this. 
Could you check it again and give your opinion?

To make the caller context more readable, at commit 
[10dbc6f](https://github.com/apache/spark/commit/10dbc6f26ac7d224803b721f32a9a0b4306e1f47),
 I added the static strings `AttemptId` back ( for stage, task and app) which 
had been deleted at commit 
[748e7a9](https://github.com/apache/spark/commit/748e7a9b6f6fe928df9e49f8e020d02126123be8).

Yes, this PR will set up the caller context for both HDFS and YARN. At very 
beginning, to make the review easier, I created two different jiras to set up 
caller contexts for HDFS(SPARK-16757) and YARN (SPARK-16758) although the code 
is the same. I have updated the jiras, the title of this PR, and the 
description of this PR. In the “How was this patch tested” of the PR’s 
description, you can see what are showing in HDFS hdfs-audit.log and Yarn RM 
audit log. 

When invoking Hadoop CallerContext API in Yarn Client, the caller context 
(including `SPARK_CLIENT` with AppId only) will be written to both HDFS audit 
log and Yarn RM audit log. 
In hdfs-audit.log:
```
2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true   ugi=wyang 
(auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt dst=nullperm=null 
  proto=rpc   callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005
```
In Yarn RM log:
```
2016-09-20 11:59:24,050 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang
IP=127.0.0.1OPERATION=Submit Application RequestTARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_1474394339641_0006
CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006
```
Also, I have tested this with multiple tasks running in the same executor. 
Take `application_1474394339641_0006` as example.

My command line to run tests as below:
```
./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master 
yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans 
examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
hdfs://localhost:9000/lr_big.txt 2 5
```
In Spark History Application page, you can see there are two executors (one 
is driver), in the executor, there are 46 tasks:
https://cloud.githubusercontent.com/assets/8546874/18686920/a2617e70-7f32-11e6-947e-dfe83c4185e3.png;>
In HDFS audit log, there are 46 task records.:
```
2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=mkdirs 
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_01/spark-warehouse
dst=null   perm=wyang:supergroup:rwxr-xr-x   proto=rpc
callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1
2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open   
src=/lr_big.txt dst=null   perm=null  proto=rpc 
callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open   
src=/lr_big.txt dst=null   perm=null  proto=rpc 
callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open   
src=/lr_big.txt dst=null   perm=null  proto=rpc 
callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open   
src=/lr_big.txt dst=null   perm=null  proto=rpc 
callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0
2016-09-20 11:59:42,432 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open   
src=/lr_big.txt dst=null   perm=null  proto=rpc 
callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_4_AttemptNum_0
2016-09-20 11:59:42,445 INFO FSNamesystem.audit: allowed=true
ugi=wyang

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79695565
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging {
   }
 }
 
+private[spark] class CallerContext(
+   appName: Option[String] = None,
+   appID: Option[String] = None,
+   appAttemptID: Option[String] = None,
+   jobID: Option[Int] = None,
+   stageID: Option[Int] = None,
+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+
+   val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else ""
--- End diff --

Yes. Done. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79695449
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging {
   }
 }
 
+private[spark] class CallerContext(
+   appName: Option[String] = None,
+   appID: Option[String] = None,
+   appAttemptID: Option[String] = None,
+   jobID: Option[Int] = None,
+   stageID: Option[Int] = None,
+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+
+   val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else ""
--- End diff --

I have updated the PR to remove appName, and replace appName with something 
to differentiate the context from ApplicationMaster vs Yarn Client vs Task. But 
for AppID, I think it is better to keep it since in hdfs-audit.log, there is no 
info about application. For example, the record below was produced when Task 
did a read/write operation to HDFS, except `callerContext`, there is no other 
info about application:
```
2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true   ugi=wyang 
(auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt dst=nullperm=null 
  proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79693044
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -42,7 +42,10 @@ import org.apache.spark.rdd.RDD
  * input RDD's partitions).
  * @param localProperties copy of thread-local properties set by the user 
on the driver side.
  * @param metrics a [[TaskMetrics]] that is created at driver side and 
sent to executor side.
- */
+ * @param jobId id of the job this task belongs to
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79692960
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -196,8 +198,13 @@ private[spark] class ApplicationMaster(
 // Set this internal configuration if it is running on cluster 
mode, this
 // configuration will be checked in SparkContext to avoid misuse 
of yarn cluster mode.
 System.setProperty("spark.yarn.app.id", 
appAttemptId.getApplicationId().toString())
+
+attemptID = Option(appAttemptId.getAttemptId.toString)
   }
 
+  new CallerContext(Option(System.getProperty("spark.app.name")),
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79693000
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging {
   }
 }
 
+private[spark] class CallerContext(
+   appName: Option[String] = None,
+   appID: Option[String] = None,
+   appAttemptID: Option[String] = None,
+   jobID: Option[Int] = None,
+   stageID: Option[Int] = None,
+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+
+   val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else ""
+   val AppID = if (appID.isDefined) s"_AppID_${appID.get}" else ""
+   val AppAttemptID = if (appAttemptID.isDefined) s"_${appAttemptID.get}" 
else ""
+   val JobID = if (jobID.isDefined) s"_JobID_${jobID.get}" else ""
+   val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else ""
+   val StageAttemptId = if (stageAttemptId.isDefined) 
s"_${stageAttemptId.get}" else ""
+   val TaskId = if (taskId.isDefined) s"_TaskId_${taskId.get}" else ""
+   val TaskAttemptNumber = if (taskAttemptNumber.isDefined) 
s"_${taskAttemptNumber.get}" else ""
+
+   val context = "SPARK" + AppName + AppID + AppAttemptID +
+ JobID + StageID + StageAttemptId + TaskId + TaskAttemptNumber
+
+  def set(): Boolean = {
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79692475
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging {
   }
 }
 
+private[spark] class CallerContext(
+   appName: Option[String] = None,
+   appID: Option[String] = None,
+   appAttemptID: Option[String] = None,
+   jobID: Option[Int] = None,
+   stageID: Option[Int] = None,
+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+
+   val AppName = if (appName.isDefined) s"_AppName_${appName.get}" else ""
+   val AppID = if (appID.isDefined) s"_AppID_${appID.get}" else ""
+   val AppAttemptID = if (appAttemptID.isDefined) s"_${appAttemptID.get}" 
else ""
+   val JobID = if (jobID.isDefined) s"_JobID_${jobID.get}" else ""
+   val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else ""
+   val StageAttemptId = if (stageAttemptId.isDefined) 
s"_${stageAttemptId.get}" else ""
+   val TaskId = if (taskId.isDefined) s"_TaskId_${taskId.get}" else ""
+   val TaskAttemptNumber = if (taskAttemptNumber.isDefined) 
s"_${taskAttemptNumber.get}" else ""
+
+   val context = "SPARK" + AppName + AppID + AppAttemptID +
+ JobID + StageID + StageAttemptId + TaskId + TaskAttemptNumber
+
+  def set(): Boolean = {
+var succeed = false
+try {
+  val callerContext = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext")
--- End diff --

Yes. I have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS...

2016-09-20 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r79692046
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -54,7 +54,10 @@ private[spark] abstract class Task[T](
 val partitionId: Int,
 // The default value is only used in tests.
 val metrics: TaskMetrics = TaskMetrics.registered,
-@transient var localProperties: Properties = new Properties) extends 
Serializable {
+@transient var localProperties: Properties = new Properties,
+val jobId: Option[Int] = None,
--- End diff --

Making these params all optional is not to break current code which uses 
this API. An alternative way is to mark the current API as deprecated and add a 
new overloaded function with new parameters. I am going to go this way. Any 
suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Could you please review this again? I have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
There is "0 failures (±0)" in the Test Result page. All tests passed. I'll 
re-trigger again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78898595
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  def setCallerContext(context: String): Boolean = {
+var succeed = false
+try {
+  val Builder = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder")
+  val builderInst = 
Builder.getConstructor(classOf[String]).newInstance(context)
+  val ret = Builder.getMethod("build").invoke(builderInst)
--- End diff --

Yes. hdfsContext is more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78898257
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  def setCallerContext(context: String): Boolean = {
+var succeed = false
+try {
+  val Builder = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder")
+  val builderInst = 
Builder.getConstructor(classOf[String]).newInstance(context)
+  val ret = Builder.getMethod("build").invoke(builderInst)
+  val callerContext = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext")
--- End diff --

If make `val callerContext = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext")` out of `try` block, 
Spark will throw exception when it runs on hadoop before 2.8.0. Also, moving 
that line to the first of `try` block does not make any difference since 
`Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder")` also needs 
to check if `org.apache.hadoop.ipc.CallerContext` exists. I am not sure if I 
got your point, could you please give more info about it? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78897068
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -79,6 +82,13 @@ private[spark] abstract class Task[T](
   metrics)
 TaskContext.setTaskContext(context)
 taskThread = Thread.currentThread()
+
+val callerContext =
+  
s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}"
 +
+
s"_JobId_${jobId.getOrElse("0")}_StageID_${stageId}_stageAttemptId_${stageAttemptId}"
 +
+s"_taskID_${taskAttemptId}_attemptNumber_${attemptNumber}"
+Utils.setCallerContext(callerContext)
--- End diff --

Yes. Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896928
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -79,6 +82,13 @@ private[spark] abstract class Task[T](
   metrics)
 TaskContext.setTaskContext(context)
 taskThread = Thread.currentThread()
+
+val callerContext =
+  
s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}"
 +
+
s"_JobId_${jobId.getOrElse("0")}_StageID_${stageId}_stageAttemptId_${stageAttemptId}"
 +
+s"_taskID_${taskAttemptId}_attemptNumber_${attemptNumber}"
--- End diff --

I have updated the PR to make the string shorter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896972
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -79,6 +82,13 @@ private[spark] abstract class Task[T](
   metrics)
 TaskContext.setTaskContext(context)
 taskThread = Thread.currentThread()
+
+val callerContext =
+  
s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}"
 +
--- End diff --

Yes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896863
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -184,6 +184,9 @@ private[spark] class ApplicationMaster(
 try {
   val appAttemptId = client.getAttemptId()
 
+  var context = 
s"Spark_AppName_${System.getProperty("spark.app.name")}" +
--- End diff --

A CallerContext class has been added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896816
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2418,6 +2418,21 @@ private[spark] object Utils extends Logging {
   sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
 }
   }
+
+  def setCallerContext(context: String): Boolean = {
+var succeed = false
+try {
+  val Builder = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext$Builder")
+  val builderInst = 
Builder.getConstructor(classOf[String]).newInstance(context)
+  val ret = Builder.getMethod("build").invoke(builderInst)
+  val callerContext = 
Utils.classForName("org.apache.hadoop.ipc.CallerContext")
+  callerContext.getMethod("setCurrent", callerContext).invoke(null, 
ret)
+  succeed = true
+} catch {
+  case NonFatal(e) => logDebug(s"$e", e)
--- End diff --

I have updated this to "case NonFatal(e) => logInfo("Fail to set Spark 
caller context", e)"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896707
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -54,7 +54,10 @@ private[spark] abstract class Task[T](
 val partitionId: Int,
 // The default value is only used in tests.
 val metrics: TaskMetrics = TaskMetrics.registered,
-@transient var localProperties: Properties = new Properties) extends 
Serializable {
+@transient var localProperties: Properties = new Properties,
+val jobId: Option[Int] = None,
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896692
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -51,8 +51,12 @@ private[spark] class ShuffleMapTask(
 partition: Partition,
 @transient private var locs: Seq[TaskLocation],
 metrics: TaskMetrics,
-localProperties: Properties)
-  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
metrics, localProperties)
+localProperties: Properties,
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14659#discussion_r78896681
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -51,8 +51,12 @@ private[spark] class ResultTask[T, U](
 locs: Seq[TaskLocation],
 val outputId: Int,
 localProperties: Properties,
-metrics: TaskMetrics)
-  extends Task[U](stageId, stageAttemptId, partition.index, metrics, 
localProperties)
+metrics: TaskMetrics,
+jobId: Option[Int] = None,
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-15 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Thank you very much for the review. I have updated the PR 
based on your every comment, including adding a CallerContext class, updating 
java doc, and made the caller context string shorter, etc. Manual Tests against 
some Spark applications in Yarn client mode and Yarn cluster mode, and spark 
caller contexts are written into HDFS `hdfs-audit.log` successfully.

The following is the screenshot of the audit log (SparkKMeans in yarn 
client mode):

https://cloud.githubusercontent.com/assets/8546874/18539563/1eb16748-7acd-11e6-840a-0e8bfabf5954.png;>

This is the caller context which was written into `hdfs-audit.log` by `Yarn 
Client`:
```
2016-09-14 22:28:59,341 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppName_SparkKMeans_AppID_application_1473908768790_0007
```
The callerContext above is `SPARK_AppName_***_AppID_***`

These are the caller contexts which were written into `hdfs-audit.log` by 
`Task`:
```
2016-09-14 22:29:06,525 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_1_0
2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_0_0
2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0
```
The callContext above is 
`SPARK_AppID_***_JobID_***_StageID_***_(StageAttemptID)_TaskId_***_(TaskAttemptNumber)`.
 The static strings `jobAttemptID`, `stageAttemptID`, and `attemptNumber` of 
tasks have been deleted. (For `jobAttemptID`, please refer the following 
records produced by SparkKMeans ran in Yarn cluster mode)

The records below were written into `hdfs-audit.log` when SparkKMeans ran 
in Yarn cluster mode:

```
2016-09-14 22:25:30,100 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs  
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1473908768790_0006/container_1473908768790_0006_01_01/spark-warehouse
   dst=nullperm=wyang:supergroup:rwxr-xr-x proto=rpc   
callerContext=SPARK_AppName_org.apache.spark.examples.SparkKMeans_AppID_application_1473908768790_0006_1
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_0_0
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_2_0
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_1_0
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-13 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@tgravescs Sure. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-12 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @tgravescs Could you please review this PR? Thank you very much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@steveloughran Thank you very much. I have updated the PR based on your 
comments. Also, I have added an unit test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@srowen Thanks all the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-30 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @srowen Could you please review this PR again? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-25 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @srowen Could you please review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-24 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-24 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
The only failure is 'basic functionality', but it passed locally. I'll 
re-trigger again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...

2016-08-23 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14769
  
Yes. Are they ok now? @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-23 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14768
  
@srowen Thanks for the review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-23 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14768
  
For this piece of code:

![image](https://cloud.githubusercontent.com/assets/8546874/17901844/c4dae970-6919-11e6-8361-a73321a19f86.png)
I think lines in` if `and `else` blocks should be at same indentation as 
logically they are at same level. The line 
`((UnsafeInMemorySorter.SortedIterator)upstream).getCurrentPageNumber()) ` is a 
logical block, and I indent 8 spaces instead of 2 spaces to make code more 
readable.  I think it's much better than this:

![image](https://cloud.githubusercontent.com/assets/8546874/17902709/30db2074-691d-11e6-9644-c8f03a9819cf.png)
I referred to the [Oracle's Java code 
conversions](http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-136091.html#248)
 , but not there is no exactly same case. @srowen Could you please give some 
directions? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-23 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14768#discussion_r75908543
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java
 ---
@@ -61,7 +61,8 @@ public static void main(String[] args) throws Exception {
   .load();
 
 // Split the lines into words
-Dataset words = lines.as(Encoders.STRING()).flatMap(new 
FlatMapFunction<String, String>() {
+Dataset words = lines.as(Encoders.STRING())
+  .flatMap(new FlatMapFunction<String, String>() {
   @Override
--- End diff --

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-23 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14768
  
Thanks, @jerryshao. I have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14769: [MINOR][SQL] Remove implemented functions from co...

2016-08-22 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14769

[MINOR][SQL] Remove implemented functions from comments of 'HiveSessi…

## What changes were proposed in this pull request?
This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.

## How was this patch tested?
Manual.

…onCatalog.scala'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark cleanComment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14769


commit 8f3e25fe3fb88ba51c8c01013786041f58e80427
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-23T05:43:36Z

[MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14768

[MINOR][BUILD] Fix Java CheckStyle Error

## What changes were proposed in this pull request?
As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing 
list), besides the critical bugs, it's better to fix the code style errors 
before the release.

Before:  
```
./dev/lint-java
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525]
 (sizes) LineLength: Line is longer than 100 characters (found 119).
[ERROR] 
src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64]
 (sizes) LineLength: Line is longer than 100 characters (found 103).
```
After:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```
## How was this patch tested?
Manual.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark fixjavastyle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14768.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14768


commit a36989105086f60417f21341d8573b4d3c6bc7eb
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-23T04:42:04Z

[MINOR][BUILD] Fix Java CheckStyle Error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-22 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Thanks a lot for adding me as “contributor” in Hadoop :) @steveloughran 
@cnauroth


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-22 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @steveloughran Thanks a lot for the comments.

In the audit log, if users set some configuration in spark-defaults.conf 
like `spark.eventLog.dir hdfs://localhost:9000/spark-history`, there will be a 
record below in audit log:
```
2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=setPermission  
src=/spark-history/application_1471835208589_0013.lz4.inprogress dst=null   
 perm=wyang:supergroup:rwxrwx---   proto=rpc
```
We can see the application id `application_1471835208589_0013` above. 
Except that case, there is no Spark application information like application 
name and application id (or in yarn appID+attemptID) in the audit log. So I 
think it is better to include application name/id in the caller context. I have 
updated the PR to include those information.

In the commit   
[5ab2a41](https://github.com/apache/spark/pull/14659/commits/5ab2a41b93bfd73baf3798ba66fc7554b10b78e6),
 application ID and attemptID (only in yarn cluster mode) are included in the 
value of the caller context when Yarn `client` (if applications run in Yarn 
client mode) or `ApplicationMaster` (if applications run in Yarn cluster mode) 
do some operations in HDFS. So in the audit log, you can see `callercontext = 
Spark_appName_**_appId_**_attemptID_**`:
_Applications in yarn cluster mode_
```
2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo 
src=/lr_big.txt/_spark_metadata dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs  
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_01/spark-warehouse
   dst=nullperm=wyang:supergroup:rwxr-xr-x proto=rpc   
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
```
_Applications in yarn client mode_
```
2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo 
src=/lr_big.txt/_spark_metadata dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=opensrc=/lr_big.txt 
dst=nullperm=null   proto=rpc   
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
```
 In the commit 
[1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205)
 application ID, name and attempt ID (only in yarn cluster mode) are included 
in the value of the caller context when `Tasks` do operations in HDFS. So in 
the audit log, you can see 
`callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**`:
_Applications in Yarn cluster mode_
```
2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true   
ugi=wyang

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-22 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @cnauroth Thank you very much for the review and suggestion. I have 
removed the spaces in the value of the caller context, and prepended "Spark" 
instead (refer to the commit 
[3b9a17e](https://github.com/apache/spark/pull/14659/commits/3b9a17e6dc9ef60a4c40f8aab2d0409c32b864e1)).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-20 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @steveloughran Thank you very much for the comments. I have created an 
Hadoop jira [HADOOP-13527 ](https://issues.apache.org/jira/browse/HADOOP-13527) 
and attached the patch, could you please review it? I am unable to assign the 
jira to me, could you please add me as “contributor” role in Hadoop? Thanks 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' t...

2016-08-19 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14577
  
Hi, @srowen Thanks a lot for the comments. Sorry for the late reply. You 
are right. I will check how other pages format the date. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-19 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @srowen . Thank you so much for the review. Sorry for the test
failure and late update. The failure reasons are that ‘jobID’ were
none or there was no ‘spark.app.name’ in sparkConf. I have updated the 
PR to set
default values to ‘jobID’ and ‘spark.app.name’. When a real 
application runs on
Spark, it will always have ‘jobID’ and ‘spark.app.name’. 

What's the use case for this?
When users run Spark applications on Yarn on HDFS, Spark’s
caller contexts will be written into hdfs-audit.log. The Spark caller 
contexts
are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applications’ 
name. 

The caller context can help users to better diagnose and understand how 
specific
applications impacting parts of the Hadoop system and potential problems 
they
may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a
given HDFS operation, it's very helpful to track which upper level job 
issues
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-15 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14659

[SPARK-16757] Set up Spark caller context to HDFS

## What changes were proposed in this pull request?

1. Pass `jobId` to Task.
2. Invoke Hadoop APIs. 

A new function `setCallerContext` is added in `Utils`. `setCallerContext` 
function invokes APIs of   `org.apache.hadoop.ipc.CallerContext` to set up 
spark caller contexts, which will be written into `hdfs-audit.log`.

For applications in Yarn client mode, `org.apache.hadoop.ipc.CallerContext` 
are called in `Task` and Yarn `Client`. For applications in Yarn cluster mode, 
`org.apache.hadoop.ipc.CallerContext` are be called in `Task` and 
`ApplicationMaster`.

The Spark caller contexts written into `hdfs-audit.log` are applications' 
name` {spark.app.name}` and `JobID_stageID_stageAttemptId_taskID_attemptNumbe`.

## How was this patch tested?
Manual Tests against some Spark applications in Yarn client mode and Yarn 
cluster mode. Need to check if spark caller contexts are written into HDFS 
hdfs-audit.log successfully.

For example, run SparkKmeans in Yarn client mode: 
`./bin/spark-submit  --master yarn --deploy-mode client --class 
org.apache.spark.examples.SparkKMeans 
examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
hdfs://localhost:9000/lr_big.txt 2 5`

Before:
There will be no Spark caller context in records of `hdfs-audit.log`.

After:
Spark caller contexts will be in records of `hdfs-audit.log`.
(_Note: spark caller context below since Hadoop caller context API was 
invoked in Yarn Client_)
`2016-07-21 13:52:30,802 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=getfileinfo
src=/lr_big.txtdst=nullperm=nullproto=rpc
callerContext=SparkKMeans running on Spark 
`
(_Note: spark caller context below since Hadoop caller context API was 
invoked in Task_)
`2016-07-21 13:52:35,584 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1cmd=open
src=/lr_big.txtdst=nullperm=nullproto=rpc
callerContext=JobId_0_StageID_0_stageAttemptId_0_taskID_0_attemptNumber_0`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark callercontextSubmit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14659


commit ec6833d32ef14950b2d81790bc908992f6288815
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-16T04:11:41Z

[SPARK-16757] Set up Spark caller context to HDFS




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...

2016-08-13 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14556
  
@srowen Thanks for the new PR and the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' t...

2016-08-10 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14577
  
Hi, @rxin . Thanks for the quick feedback. This PR is to remove time
inconsistency between webpages. Right now the times in history page is 
inconsistent
with the times in other pages like spark job pages, that makes users 
confused.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Compl...

2016-08-09 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14577

[SPARK-16986][WEB UI] Make 'Started' time, 'Completed' time and 'Last…

## What changes were proposed in this pull request?
In historypage.js, format 'Started' time, 'Completed' time and 'Last 
Updated' time to user local time.


## How was this patch tested?
Test manually.

… Updated' time in history server UI to the user local time

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14577


commit 4218f529c3bd31e6a3bd56852ab607a81b41db35
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-10T05:53:13Z

[SPARK-16986][WEB UI] Make 'Started' time, 'Completed' time and 'Last 
Updated' time in history server UI to the user local time




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14556: [SPARK-16966][Core] Make App Name to the valid na...

2016-08-09 Thread Sherry302
Github user Sherry302 closed the pull request at:

https://github.com/apache/spark/pull/14556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14556: [SPARK-16966][Core] Make App Name to the valid na...

2016-08-09 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14556

[SPARK-16966][Core] Make App Name to the valid name instead of a rand…

## What changes were proposed in this pull request?
In the SparkSession, before setting "spark.app.name" to 
"java.util.UUID.randomUUID().toString", sparkConf.contains("spark.app.name") 
should be checked instead of options.contains("spark.app.name") 

## How was this patch tested?
Manual.
E.g.:
./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 
--num-executors 1 --master yarn --deploy-mode client --class 
org.apache.spark.examples.SparkKMeans 
examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar 
The application "org.apache.spark.examples.SparkKMeans" above did not 
invoke ".appName()".

Before this commit, in the history server UI:
App Name was a randomUUID 70c06dc5-1b99-4b4a-a826-ea27497e977b.
Now, with this commit, the App Name is the valid name "myApplicationTest".

…omUUID when 'spark.app.name' exists

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14556


commit a21937be7de24a353a3e8c9bbe7471b31a1f4719
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-09T06:42:39Z

[SPARK-16966][Core] Make App Name to the valid name instead of a randomUUID 
when 'spark.app.name' exists




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14532: SPARK-16945: Fix Java Lint errors

2016-08-07 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14532

SPARK-16945: Fix Java Lint errors

## What changes were proposed in this pull request?
This PR is to fix the minor Java linter errors as following:
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.

## How was this patch tested?
Manual test.
dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14532


commit 736cee23f3e795ca122009f67c344f4fe7c7fbc6
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-08-08T05:12:36Z

SPARK-16945: Fix Java Lint errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-28 Thread Sherry302
Github user Sherry302 closed the pull request at:

https://github.com/apache/spark/pull/14312


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-27 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14312#discussion_r72536143
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -66,6 +66,9 @@ private[spark] class Client(
   import Client._
   import YarnSparkHadoopUtil._
 
+  val context: String = s"${sparkConf.get("spark.app.name")} running on 
Spark"
+  Utils.setCallerContext(context)
--- End diff --

If spark applications are in Yarn cluster mode, the record with that caller 
context below will be in HDFS log:
2016-07-21 14:32:33,404 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/spark-history  
dst=nullperm=null   proto=rpc   
callerContext=org.apache.spark.examples.SparkKMeans running on Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14312: [SPARK-15857]Add caller context in Spark: invoke YARN/HD...

2016-07-27 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14312
  
Thanks the feedback, Jerry. I am going to update the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-21 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14312

[SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set…

## What changes were proposed in this pull request?
1. Pass 'jobId' to Task.
2. Add a new function 'setCallerContext' in Utils. 'setCallerContext' 
function will call APIs of 'org.apache.hadoop.ipc.CallerContext' to set up 
spark caller contexts, which will be written into HDFS hdfs-audit.log or Yarn 
resource manager log.
3. 'setCallerContext' function will be called in Yarn client, 
ApplicationMaster, and Task class.
 
 The Spark caller context written into HDFS log will be 
"JobID_stageID_stageAttemptId_taskID_attemptNumbe on Spark", and the Spark 
caller context written into Yarn log will be"{spark.app.name} running on Spark".

## How was this patch tested?
Manual Tests against some Spark applications in Yarn client mode and 
cluster mode. Need to check if spark caller contexts were written into HDFS 
hdfs-audit.log and Yarn resource manager log successfully. 

For example, run SparkKmeans on Spark:
In Yarn resource manager log, there will be a record with the spark caller 
context.
...
   2016-07-21 13:36:26,318 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang
IP=127.0.0.1OPERATION=Submit Application Request  
TARGET=ClientRMService   RESULT=SUCCESS
APPID=application_1469125587135_0004   CALLERCONTEXT=SparkKMeans running on 
Spark
...

 In HDFS hdfs-audit.log, there will be records with spark caller contexts.
...
2016-07-21 13:38:30,799 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1   cmd=getfileinfo
src=/lr_big.txt/_spark_metadata  dst=null   perm=null
proto=rpccallerContext=SparkKMeans running on Spark
...
2016-07-21 13:39:35,584 INFO FSNamesystem.audit: allowed=true   
ugi=wyang (auth:SIMPLE)ip=/127.0.0.1   cmd=opensrc=/lr_big.txt  
dst=null   perm=nullproto=rpc
callerContext=JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0 on 
Spark
...

If the hadoop version on which Spark runs does not have CallerContext APIs, 
there will be no information of Spark caller context in those logs.

… up caller context

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14312


commit 38c4f58dbf30d541260ee1b0381993a9bec393f8
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-07-22T01:21:03Z

[SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set up 
caller context




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14163: [SPARK-15923][YARN] Spark Application rest api returns '...

2016-07-19 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14163
  
Updated the doc based on the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14163: [SPARK-15923][YARN] Spark Application rest api re...

2016-07-19 Thread Sherry302
Github user Sherry302 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14163#discussion_r71425048
  
--- Diff: docs/monitoring.md ---
@@ -224,10 +224,12 @@ both running applications, and in the history server. 
 The endpoints are mounted
 for the history server, they would typically be accessible at 
`http://:18080/api/v1`, and
 for a running application, at `http://localhost:4040/api/v1`.
 
-In the API, an application is referenced by its application ID, `[app-id]`.
-When running on YARN, each application may have multiple attempts; each 
identified by their `[attempt-id]`.
-In the API listed below, `[app-id]` will actually be 
`[base-app-id]/[attempt-id]`,
-where `[base-app-id]` is the YARN application ID.
+In the API, an application is referenced by its application ID, 
`[app-id]`.
+Spark on YARN supports multiple application attempts in cluster mode but 
not in client mode.
--- End diff --

Thanks for the feedback. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14163: [SPARK-15923][YARN] Spark Application rest api re...

2016-07-12 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14163

[SPARK-15923][YARN] Spark Application rest api returns 'no such app: …

## What changes were proposed in this pull request?
Update monitoring.md.

…'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14163


commit aa2129e1480cd863c42872c82e08cb8eef2d992b
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-07-12T22:13:21Z

[SPARK-15923][YARN] Spark Application rest api returns 'no such app: 
'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14024: [SPARK-15923][YARN] Spark Application rest api re...

2016-07-04 Thread Sherry302
Github user Sherry302 closed the pull request at:

https://github.com/apache/spark/pull/14024


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14024: [SPARK-15923][YARN] Spark Application rest api re...

2016-07-01 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14024

[SPARK-15923][YARN] Spark Application rest api returns 'no such app: …

## What changes were proposed in this pull request?
1. Updated the monitoring.md doc.
2. In YarnSchedulerBackend.scala: make applications run in Yarn cluster 
mode have attemptID "1" by default.

## How was this patch tested?
Manual tests passed.

…'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14024


commit a15dee1aee3afa53a455c4b0aba5e3388a0129d3
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-07-02T01:45:38Z

[SPARK-15923][YARN] Spark Application rest api returns 'no such app: 
'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13448: [SPARK-15707][SQL] Make Code Neat - Use map instead of i...

2016-06-04 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/13448
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13448: [SPARK-15707][SQL] Make Code Neat - Use map inste...

2016-06-01 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/13448

[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

## What changes were proposed in this pull request?
In forType function of object RandomDataGenerator, the code following:
if (maybeSqlTypeGenerator.isDefined){
  
  Some(generator)
} else{
 None
}
will be changed. Instead, maybeSqlTypeGenerator.map will be used.

## How was this patch tested?
All of the current unit tests passed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13448


commit 010110ebe18b2de291f03c03ebaa9183ed7b3987
Author: Weiqing Yang <yangweiqing...@gmail.com>
Date:   2016-06-01T18:12:59Z

[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org