[ 
https://issues.apache.org/jira/browse/HDDS-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDDS-4395:
-------------------------------------

    Assignee: Wei-Chiu Chuang

> Ozone Data Generator for Fast Scale Test
> ----------------------------------------
>
>                 Key: HDDS-4395
>                 URL: https://issues.apache.org/jira/browse/HDDS-4395
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: Tools
>    Affects Versions: 1.0.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>
> I've been working on this fun project and would like to share with the 
> community.
>  
> h1. Synopsis
> We want to prove Ozone runs well at scale, in terms of number of keys 
> (billions of keys), as well as dense DataNodes where each DN has hundreds of 
> TB or even PB-scale capacity.
> h1. Challenge: Data generation
> The challenge is to generate a huge data set fast so that we can benchmark 
> the system quickly. No existing tool is capable at this scale. 
>  
> h1. Proposal:
> The major bottleneck is OM’s key insertion performance. In addition, Ozone 
> uses a single pipeline to write data, unless multi-raft is enabled.
>  
> Instead of using Ozone's client API to generate data, We should write 
> directly to OM, SCM and DN’s rocksdb. RocksDB can support u[p to a million 
> key|https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks] bulk 
> load operations.
>  
> Similarly, we can skip the normal Ozone client write path; populate the 
> container db and block files directly.
>  
> (more details in the design doc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to