[jira] [Commented] (SPARK-22805) Use aliases for StorageLevel in event logs

Imran Rashid (JIRA) Thu, 04 Jan 2018 21:43:15 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312499#comment-16312499
 ]


Imran Rashid commented on SPARK-22805:
--------------------------------------

I'm leaning slightly against this, though could go either way.

For 2.3+, the gains are pretty small, and it means an old history server can't 
read new logs (I know we don't guarantee that anyway, but might as well keep it 
if we can).

For < 2.3, there would be notable improvements in log sizes, but I don't like 
the compatibility story.  I don't think there are any explicit guarantees but 
seems pretty annoying to have a 2.2.1 SHS not read logs from spark 2.2.2.

sorry [~lebedev], I appreciate the work you've put into this anyhow.

> Use aliases for StorageLevel in event logs
> ------------------------------------------
>
>                 Key: SPARK-22805
>                 URL: https://issues.apache.org/jira/browse/SPARK-22805
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.2, 2.2.1
>            Reporter: Sergei Lebedev
>            Priority: Minor
>
> Fact 1: {{StorageLevel}} has a private constructor, therefore a list of 
> predefined levels is not extendable (by the users).
> Fact 2: The format of event logs uses redundant representation for storage 
> levels 
> {code}
> >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, 
> >>> "Replication": 1}')
> 79
> >>> len('DISK_ONLY')
> 9
> {code}
> Fact 3: This leads to excessive log sizes for workloads with lots of 
> partitions, because every partition would have the storage level field which 
> is 60-70 bytes more than it should be.
> Suggested quick win: use the names of the predefined levels to identify them 
> in the event log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22805) Use aliases for StorageLevel in event logs

Reply via email to