[jira] [Comment Edited] (SPARK-18085) Better History Server scalability for many / large applications

Dmitry Buzolin (JIRA) Tue, 06 Dec 2016 08:00:03 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725880#comment-15725880
 ]


Dmitry Buzolin edited comment on SPARK-18085 at 12/6/16 3:58 PM:
-----------------------------------------------------------------

Spark log size is directly depending on few things:

- the underlying schema-less data format you are using - JSON
- the current logging implementation where the log size is directly dependent 
on the number of tasks

Since SHS keeps this data in memory I don't see how these issues are orthogonal 
to the memory issues in SHS, they are causing them in my opinion. JSON is great 
as data interchange or configuration format it's good for small payloads, but 
using it for logging? I see this first time. I understand you may not change 
this, but it worth keep this in mind and think about it.

Thank you.


was (Author: dbuzolin):
Spark log size is directly depending on few things:

- the underlying schema-less data format you are using - JSON
- the current logging implementation where the log size is directly dependent 
on the number of tasks

Since SHS keeps this data in memory I don't see how these issues are orthogonal 
to the memory issues in SHS, they are causing them in my opinion. JSON is great 
as data interchange or configuration format it's good for small payloads, but 
using it for logging, I honestly saw this first time on last 20 years being in 
IT. I understand you may not change this, but it worth keep this in mind and 
think about it.

Thank you.

> Better History Server scalability for many / large applications
> ---------------------------------------------------------------
>
>                 Key: SPARK-18085
>                 URL: https://issues.apache.org/jira/browse/SPARK-18085
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>         Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18085) Better History Server scalability for many / large applications

Reply via email to