[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
I will close the PR and document in our tests to use spark.ui.retainedTasks 
= 1 (or some very low value) : which will, in effect, be equivalent to this PR 
- and loose information anyway; but will be specific to the application which 
choose to do it (and which does not care about the api).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  


> I tried that a lot while writing this code, but the disk store is just 
too slow

That is unfortunate ... I was actually hoping that would be a good 
compromise if expanded api needs to be preserved.

>  if UI is disabled, any task related metrics were reported as zero:

I was incorrect in this regard - they were not zero, but 
JobProgressListener had stage/job counters which maintained this.
This was limited to job and stage level though - keeping the actual 
overhead fairly minimal and honoring 
spark.ui.retainedStages/spark.ui.retainedJobs.


Given this, current PR would be a functionality regression - since we end 
up loosing information we would otherwise expose.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22609
  
Task events still might need to be processed - even for live applications 
with the UI disabled. See the API that has been mentioned before 
(`SparkStatusTracker`) for why. Processing task events is different from 
keeping information about individual tasks, though.

I agree it's a regression - but, at the same time, if you set the 
"retainedTasks" limit to a low value it should at least help a bit.

> Any implications of making it disk backed ?

I tried that a lot while writing this code, but the disk store is just too 
slow for when you have bursts of lots of events. Even with a lot of 
improvements I made to this code, I still got dropped events with the disk 
store, or high memory usage if I tried to do fancy things like using a separate 
writer thread to unblock the listener bus...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
@vanzin any reason why liveStore is hardcoded to be in-memory ? Any 
implications of making it disk backed ? That might be another option to unblock 
- requires a config change to existing applications; but way better than OOM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
@vanzin , @gengliangwang To clarify earlier behavior - if UI is disabled, 
any task related metrics were reported as zero: granularity of update was at 
the stage level.
UI is typically disabled when application does not want additional 
overheads associated with it (users leverage history server to see the 
application state) : particularly in multi-tenant settings (like thrift server) 
or very large applications (processing PB's of data).
If UI is enabled (default) or if application is not `live` (history 
server), task events would be processed.

Unfortunately, application + config, which used to work before 2.3 are now 
throwing OOM (typically, the long running tests are run for 7+ days to find any 
kerberos related issues - here a LR stress test failed within 28 hours) - and 
large part of the heap was occupied by TaskDataWrapper; making this a 
regression.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22609
  
@gengliangwang is right; there are still things that need to be done when 
the UI is disabled; job, stage and executor info are all still exposed even if 
the UI is disabled, and the task even callbacks update those.

It might be as easy as not adding the task to `liveTasks` in `onTaskStart`, 
but haven't really looked in detail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22609
  
Hi @mridulm ,

it seems that the changes will affect the metrics in `SparkStatusTracker`, 
e.g.  number of active tasks in `getExecutorInfos` or `getStageInfo` .

Maybe we should maintain number of active/total/complete/failed tasks as 
well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22609
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22609
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96846/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22609
  
**[Test build #96846 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96846/testReport)**
 for PR 22609 at commit 
[`c2f6b4a`](https://github.com/apache/spark/commit/c2f6b4a50ef3693292f600a2d4d7743ea870b96e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
This is a regression introduced in 2.3; unfortunately we did not notice it
until now.


On Tue, Oct 2, 2018 at 4:56 AM Shahid  wrote:

> Hi @mridulm  , Can't we limit the task
> information by setting 'spark.ui.retainedTasks' lesser, to avoid OOM?
> correct me if I am wrong.
>
> —
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22609
  
Hi @mridulm , Can't we limit the task information by setting 
'spark.ui.retainedTasks' lesser, to avoid OOM? correct me if I am wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22609
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22609
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22609
  
**[Test build #96846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96846/testReport)**
 for PR 22609 at commit 
[`c2f6b4a`](https://github.com/apache/spark/commit/c2f6b4a50ef3693292f600a2d4d7743ea870b96e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22609: [SPARK-25594] [Core] Avoid maintaining task information ...

2018-10-02 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22609
  
+CC @vanzin, @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org