[ 
https://issues.apache.org/jira/browse/SPARK-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12196:
------------------------------------

    Assignee: Apache Spark

> Store blocks in storage devices with hierarchy way
> --------------------------------------------------
>
>                 Key: SPARK-12196
>                 URL: https://issues.apache.org/jira/browse/SPARK-12196
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: yucai
>            Assignee: Apache Spark
>
> Problem:
>     Nowadays, users have both SSDs and HDDs. 
>     SSDs have great performance, but capacity is low. HDDs have good 
> capacity, but x2-x3 lower than SSDs.
>     How can we get both good?
> Solution:
>     Our idea is to build hierarchy store: use SSDs as cache and HDDs as 
> backup storage. 
>     When Spark core allocates blocks for RDD (either shuffle or RDD cache), 
> it gets blocks from SSDs first, and when SSD’s useable space is less than 
> some threshold, getting blocks from HDDs.
> In our implementation, we actually go further. We support a way to build any 
> level hierarchy store access all storage medias (NVM, SSD, HDD etc.).
> Performance:
>     1. At the best case, our solution performs the same as all SSDs.
>         At the worst case, like all data are spilled to HDDs, no performance 
> regression.
>     2. Compared with all HDDs, hierarchy store improves more than x1.86 (it 
> could be higher, CPU reaches bottleneck in our test environment).
>     3. Compared with Tachyon, our hierarchy store still x1.3 faster. Because 
> we support both RDD cache and shuffle and no extra inter process 
> communication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to