[GitHub] spark issue #22926: [SPARK-25917][Spark UI] memoryMetrics should be Json ign...

2018-11-20 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22926
  
Thanks @vanzin . I was using 2.3, and with your comment, I found there was 
one check-in about one month ago that handled this case. Will close this PR, 
and sorry for the misreporting, I will keep in mind testing trunk before 
reporting next time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22926: [SPARK-25917][Spark UI] memoryMetrics should be J...

2018-11-20 Thread jianjianjiao
Github user jianjianjiao closed the pull request at:

https://github.com/apache/spark/pull/22926


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22926: [SPARK-25917][Spark UI] memoryMetrics should be Json ign...

2018-11-06 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22926
  
@AmplabJenkins  Could you please find someone to review this? I believe 
this is a bug in Spark UI. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22926: [SPARK-25917][Spark UI] memoryMetrics should be Json ign...

2018-11-05 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22926
  
@mccheah, @smurakozi  @vanzin  Could you please help take a look at this 
PR? Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22926: [SPARK-25917][Spark UI] memoryMetrics should be J...

2018-11-01 Thread jianjianjiao
GitHub user jianjianjiao opened a pull request:

https://github.com/apache/spark/pull/22926

[SPARK-25917][Spark UI] memoryMetrics should be Json ignored when being none

## What changes were proposed in this pull request?

Spark UI's executors page loads forever when memoryMetrics in None. Fix is 
to JSON ignore memorymetrics when it is None.

## How was this patch tested?

Before fix: (loads forever)

![image](https://user-images.githubusercontent.com/1785565/47875681-64dfe480-ddd4-11e8-8d15-5ed1457bc24f.png)

After fix:


![image](https://user-images.githubusercontent.com/1785565/47875691-6b6e5c00-ddd4-11e8-9895-db8dd9730ee1.png)


That is because code in executorspage.js, Line 268
exec.memoryMetrics = exec.hasOwnProperty('memoryMetrics') ? 
exec.memoryMetrics : memoryMetrics;


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jianjianjiao/spark 
users/rotang/FixExecutorsPageLoadingForever

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22926


commit 5826424c931fbba81cc246c3b1afe3f64626e051
Author: Rong Tang 
Date:   2018-11-01T19:37:45Z

mmemoryMetrics should not json ignored when being none




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...

2018-09-26 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22520
  
@srowen  that makes sense, I will be more patient next time. ^_^. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...

2018-09-25 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22520
  
@srowen  @vanzin  tests passed. What should I do now to make it approved to 
merge?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...

2018-09-25 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22520
  
@srowen  Thanks for confirmation. I have sent out new iteration. Could you 
please authorize testing on this.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...

2018-09-25 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22520
  
@srowen  thanks for reviewing this PR, and your comments. 
1. have fixed the coding style, thanks.
2. These are the only 2 places using PosixFilePermissions to handle file 
operations. 
In fact, Maybe the way for windows(first create directories, and then 
chmod700) is OK for both windows and other OS.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-21 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22444
  
@squito  Yes, you are correct. I was trying to make the applications 
running during the scan be picked up quicker.  It turns out the SPARK-6951 has 
done great job in achieving this.  




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-21 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22444
  
@vanzin   Really thanks for you suggestions. It becomes much faster loading 
event logs. from more than 2.5 hours, to 19 minutes, loading 17K event logs, 
some of them are larger than 10G.

1. To enable SHS V2 to caching things on disk. We are using Windows, there 
is a small "posix.permissions not supported in windows" issue, I create a new 
PR here https://github.com/apache/spark/pull/22520 , could you please take a 
look?  This change doesn't speed up loading very much, but it improves other 
part. 

2. Tried 2.4, and also tried applying  SPARK-6951 to 2.3.  this is the 
critical part improving the speed.

I will close this PR, as it is useless now.  Thanks again.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22444: [SPARK-25409][Core]Speed up Spark History loading...

2018-09-21 Thread jianjianjiao
Github user jianjianjiao closed the pull request at:

https://github.com/apache/spark/pull/22444


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22520: [SPARK-25509][Core]Windows doesn't support POSIX ...

2018-09-21 Thread jianjianjiao
GitHub user jianjianjiao opened a pull request:

https://github.com/apache/spark/pull/22520

[SPARK-25509][Core]Windows doesn't support POSIX permissions

## What changes were proposed in this pull request?

SHS V2 cannot enabled in Windoes, because windows doesn't support POSIX 
permission. 

## How was this patch tested?

test case fails in windows without this fix. 
org.apache.spark.deploy.history.HistoryServerDiskManagerSuite  
test("leasing space")


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jianjianjiao/spark FixWindowsPermssionsIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22520


commit fe74feeef42fc6fb6fb5f5e869e23b349f3a1697
Author: Rong Tang 
Date:   2018-09-21T17:07:44Z

Windows doesn't support Posix permissions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22444: [SPARK-25409][Core]Speed up Spark History loading...

2018-09-17 Thread jianjianjiao
Github user jianjianjiao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22444#discussion_r218292773
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -465,20 +475,31 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
   } catch {
 case _: NoSuchElementException =>
-  // If the file is currently not being tracked by the SHS, 
add an entry for it and try
-  // to parse it. This will allow the cleaner code to detect 
the file as stale later on
-  // if it was not possible to parse it.
-  listing.write(LogInfo(entry.getPath().toString(), 
newLastScanTime, None, None,
-entry.getLen()))
--- End diff --

Hi, @squito  thanks for looking into this PR.

When Spark history starts, it will scan event logs folder, and using 
multi-threads to handle. it will not do next scan before the first finishes.  
That is the problem, in our cluster, there are about 20K event-log files(often 
bigger than 1G), including like 1K .inprogress files, it takes about 2 and a 
half hours to do the first scan. that means, during this 2.5 hours, if an user 
submit a spark application, and it finishes, user cannot find it via the spark 
history UI, and has to wait for the next scan.

That is why I add a limit of how much to scan each time, like set to 3K.  
That means no matter how many log files in the event-logs folder, it will first 
scan the first 3K and handle them, and then do the second scan, let's assume 
that during the first scan, there are 5 applications scanned, and there are 
another 10 applications updated. then the second scan will handle these 15 
applications and another 2885 files ( from 3001 to 5885) in the event folder. 

 checkForLogs scan event-log folders, and only handles files that are 
updated or not handled. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-17 Thread jianjianjiao
Github user jianjianjiao commented on the issue:

https://github.com/apache/spark/pull/22444
  
Add @vanzin @steveloughran  @squito who made changes to related code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22444: implement incremental loading and add a flag to l...

2018-09-17 Thread jianjianjiao
GitHub user jianjianjiao opened a pull request:

https://github.com/apache/spark/pull/22444

implement incremental loading and add a flag to load incomplete or not

## What changes were proposed in this pull request?

1.  Instead of loading all event logs in every loading, load only a certain 
amount of event logs. That is because if there are tens of thousands of event 
logs, loading all of them take long time. 
2.  If we run Spark on Yarn, Spark jobs information can be obtained by Yarn 
Application master, this is no need to load incomplete applications, so add a 
flag not to load them. 

## How was this patch tested?
This is tested manually in our production cluster.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jianjianjiao/spark speedUpSparkHistoryLoading

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22444


commit 1190ffcb109025bd62c909059b0cf16e6a748de9
Author: Rong Tang 
Date:   2018-09-17T22:00:23Z

implement incremental loading and add a flag to load incomplete or not




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org