[jira] [Updated] (SPARK-12196) Store/retrieve blocks in different speed storage devices by hierarchy way

yucai (JIRA) Tue, 10 Jan 2017 22:00:14 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yucai updated SPARK-12196:
--------------------------
    Description: 
*Motivation*
Our customers want to use SSD to speed up machine learning and SQL workload, 
but all SSDs are quite expensive and SSD's capacity is still smaller than HDD.

*Proposal*
Our solution is to build tiered storage: use SSDs as cache and HDDs as backup. 
When Spark core allocates blocks (either for shuffle or RDD cache), it stores 
blocks in SSDs first, and when the SSD’s free space is less than some 
threshold, starting to use HDDs.

*Performance Evaluation*
1. At the best case, our solution performs the same as all SSDs.
2. At the worst case, like all data are spilled to HDDs, no performance 
regression.
3. Compared with all HDDs, tiered store improves `+x1.86` (it could be higher, 
CPU reaches bottleneck in our test environment).

*Test Environment*
1. 4 IVB box(40 cores, 192GB memory, 10GB Nic, 11HDDs/11SSDs/PCIE SSD) 
2. Test Case: NWeight(graph analysis), which is to compute associations between 
two vertices that are n-hop away(e.g., friend-to-friend or video-to-video 
relationship for recommendation). 
    Data Size: 22GB, Vertices: 41 milion, Edges: 1.4 billion.

*Usage*
1. Enable tiered storage in spark-default.conf.
{code}
spark.diskStore.allocator      tiered
{code}
2. Configure storage hierarchy, for Yarn user, see below example:
{code}
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/mnt/DP_disk1/yucai/yarn/local,/mnt/DP_disk2/yucai/yarn/local,
           /mnt/DP_disk3/yucai/yarn/local,/mnt/DP_disk4/yucai/yarn/local,
           /mnt/DP_disk5/yucai/yarn/local,/mnt/DP_disk6/yucai/yarn/local,
    </value>
  </property>
  <property>
    <name>yarn.nodemanager.spark-dirs-tiers</name>
    <value>001111</value>
  </property>
{code}
It means DP_disk1-2 are in tier1 and DP_disk2-6 make up tier2.
 
*More tiers*
In our implementation, we support to build any number tiers cross various 
storage medias (NVMe, SSD, HDD etc.). For example:
{code}
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/mnt/DP_disk1/yucai/yarn/local,/mnt/DP_disk2/yucai/yarn/local,
           /mnt/DP_disk3/yucai/yarn/local,/mnt/DP_disk4/yucai/yarn/local,
           /mnt/DP_disk5/yucai/yarn/local,/mnt/DP_disk6/yucai/yarn/local,
    </value>
  </property>
  <property>
    <name>yarn.nodemanager.spark-dirs-tiers</name>
    <value>001122</value>
  </property>
{code}
It means DP_disk1-2 are in tier1, DP_disk3-4 are in tier2 and DP_disk5-6 make 
up tier3.

  was:
*Motivation*
Nowadays, customers have both SSDs(SATA SSD/PCIe SSD) and HDDs. 
SSDs have great performance, but capacity is small. 
HDDs have good capacity, but much slower than SSDs(x2-x3 slower than SATA SSD, 
x20 slower than PCIe SSD).
How can we get both good?

*Proposal*
One solution is to build hierarchy store: use SSDs as cache and HDDs as backup 
storage. 
When Spark core allocates blocks (either for shuffle or RDD cache), it gets 
blocks from SSDs first, and when SSD’s useable space is less than some 
threshold, getting blocks from HDDs.

In our implementation, we actually go further. We support a way to build any 
level hierarchy store access various storage medias (MEM, NVM, SSD, HDD etc.).

*Performance*
1. At the best case, our solution performs the same as all SSDs.
2. At the worst case, like all data are spilled to HDDs, no performance 
regression.
3. Compared with all HDDs, hierarchy store improves more than *_x1.86_* (it 
could be higher, CPU reaches bottleneck in our test environment).
4. Compared with Tachyon, our hierarchy store still *_x1.3_* faster. Because we 
support both RDD cache and shuffle and no extra inter process communication.

*Test Environment*
1. 4 IVB box(40 cores, 192GB memory, 10GB Nic, 11HDDs/11SATA SSDs/PCIE SSD) 
2. Real customer case NWeight(graph analysis), which is to compute associations 
between two vertices that are n-hop away(e.g., friend-to-friend or 
video-to-video relationship for recommendation). 
3. Data Size: 22GB, Vertices: 41 milion, Edges: 1.4 billion.

*Usage*
1. Set the priority and threshold for each layer in 
spark.storage.hierarchyStore.
{code}
spark.storage.hierarchyStore='nvm 40GB,ssd 20GB'
{code}
It builds a 3 layers hierarchy store: the 1st is "nvm", the 2nd is "sdd", all 
the rest form the last layer.

2. Configure each layer's location, user just needs put the keyword like "nvm", 
"ssd", which are specified in step 1, into local dirs, like spark.local.dir or 
yarn.nodemanager.local-dirs.
{code}
spark.local.dir=/mnt/nvm1,/mnt/ssd1,/mnt/ssd2,/mnt/ssd3,/mnt/disk1,/mnt/disk2,/mnt/disk3,/mnt/disk4,/mnt/others
{code}

After then, restart your Spark application, it will allocate blocks from nvm 
first.
When nvm's usable space is less than 40GB, it starts to allocate from ssd.
When ssd's usable space is less than 20GB, it starts to allocate from the last 
layer.


> Store/retrieve blocks in different speed storage devices by hierarchy way
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12196
>                 URL: https://issues.apache.org/jira/browse/SPARK-12196
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: yucai
>
> *Motivation*
> Our customers want to use SSD to speed up machine learning and SQL workload, 
> but all SSDs are quite expensive and SSD's capacity is still smaller than HDD.
> *Proposal*
> Our solution is to build tiered storage: use SSDs as cache and HDDs as 
> backup. 
> When Spark core allocates blocks (either for shuffle or RDD cache), it stores 
> blocks in SSDs first, and when the SSD’s free space is less than some 
> threshold, starting to use HDDs.
> *Performance Evaluation*
> 1. At the best case, our solution performs the same as all SSDs.
> 2. At the worst case, like all data are spilled to HDDs, no performance 
> regression.
> 3. Compared with all HDDs, tiered store improves `+x1.86` (it could be 
> higher, CPU reaches bottleneck in our test environment).
> *Test Environment*
> 1. 4 IVB box(40 cores, 192GB memory, 10GB Nic, 11HDDs/11SSDs/PCIE SSD) 
> 2. Test Case: NWeight(graph analysis), which is to compute associations 
> between two vertices that are n-hop away(e.g., friend-to-friend or 
> video-to-video relationship for recommendation). 
>     Data Size: 22GB, Vertices: 41 milion, Edges: 1.4 billion.
> *Usage*
> 1. Enable tiered storage in spark-default.conf.
> {code}
> spark.diskStore.allocator      tiered
> {code}
> 2. Configure storage hierarchy, for Yarn user, see below example:
> {code}
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/mnt/DP_disk1/yucai/yarn/local,/mnt/DP_disk2/yucai/yarn/local,
>            /mnt/DP_disk3/yucai/yarn/local,/mnt/DP_disk4/yucai/yarn/local,
>            /mnt/DP_disk5/yucai/yarn/local,/mnt/DP_disk6/yucai/yarn/local,
>     </value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.spark-dirs-tiers</name>
>     <value>001111</value>
>   </property>
> {code}
> It means DP_disk1-2 are in tier1 and DP_disk2-6 make up tier2.
>  
> *More tiers*
> In our implementation, we support to build any number tiers cross various 
> storage medias (NVMe, SSD, HDD etc.). For example:
> {code}
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/mnt/DP_disk1/yucai/yarn/local,/mnt/DP_disk2/yucai/yarn/local,
>            /mnt/DP_disk3/yucai/yarn/local,/mnt/DP_disk4/yucai/yarn/local,
>            /mnt/DP_disk5/yucai/yarn/local,/mnt/DP_disk6/yucai/yarn/local,
>     </value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.spark-dirs-tiers</name>
>     <value>001122</value>
>   </property>
> {code}
> It means DP_disk1-2 are in tier1, DP_disk3-4 are in tier2 and DP_disk5-6 make 
> up tier3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12196) Store/retrieve blocks in different speed storage devices by hierarchy way

Reply via email to