[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725880#comment-15725880 ]
Dmitry Buzolin edited comment on SPARK-18085 at 12/6/16 3:58 PM: ----------------------------------------------------------------- Spark log size is directly depending on few things: - the underlying schema-less data format you are using - JSON - the current logging implementation where the log size is directly dependent on the number of tasks Since SHS keeps this data in memory I don't see how these issues are orthogonal to the memory issues in SHS, they are causing them in my opinion. JSON is great as data interchange or configuration format it's good for small payloads, but using it for logging? I see this first time. I understand you may not change this, but it worth keep this in mind and think about it. Thank you. was (Author: dbuzolin): Spark log size is directly depending on few things: - the underlying schema-less data format you are using - JSON - the current logging implementation where the log size is directly dependent on the number of tasks Since SHS keeps this data in memory I don't see how these issues are orthogonal to the memory issues in SHS, they are causing them in my opinion. JSON is great as data interchange or configuration format it's good for small payloads, but using it for logging, I honestly saw this first time on last 20 years being in IT. I understand you may not change this, but it worth keep this in mind and think about it. Thank you. > Better History Server scalability for many / large applications > --------------------------------------------------------------- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI > Affects Versions: 2.0.0 > Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org