[jira] [Created] (HDDS-4395) Ozone Data Generator for Fast Scale Test

Wei-Chiu Chuang (Jira) Mon, 26 Oct 2020 09:38:22 -0700

Wei-Chiu Chuang created HDDS-4395:
-------------------------------------

             Summary: Ozone Data Generator for Fast Scale Test
                 Key: HDDS-4395
                 URL: https://issues.apache.org/jira/browse/HDDS-4395
             Project: Hadoop Distributed Data Store
          Issue Type: New Feature
          Components: Tools
    Affects Versions: 1.0.0
            Reporter: Wei-Chiu Chuang



I've been working on this fun project and would like to share with the 
community.

 
h1. Synopsis

We want to prove Ozone runs well at scale, in terms of number of keys (billions 
of keys), as well as dense DataNodes where each DN has hundreds of TB or even 
PB-scale capacity.
h1. Challenge: Data generation

The challenge is to generate a huge data set fast so that we can benchmark the 
system quickly. No existing tool is capable at this scale. 

 
h1. Proposal:

The major bottleneck is OM’s key insertion performance. In addition, Ozone uses 
a single pipeline to write data, unless multi-raft is enabled.

 

Instead of using Ozone's client API to generate data, We should write directly 
to OM, SCM and DN’s rocksdb. RocksDB can support u[p to a million 
key|https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks] bulk load 
operations.

 

Similarly, we can skip the normal Ozone client write path; populate the 
container db and block files directly.

 

(more details in the design doc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-4395) Ozone Data Generator for Fast Scale Test

Reply via email to