[ https://issues.apache.org/jira/browse/HDDS-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang reassigned HDDS-4395: ------------------------------------- Assignee: Wei-Chiu Chuang > Ozone Data Generator for Fast Scale Test > ---------------------------------------- > > Key: HDDS-4395 > URL: https://issues.apache.org/jira/browse/HDDS-4395 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Tools > Affects Versions: 1.0.0 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Priority: Major > > I've been working on this fun project and would like to share with the > community. > > h1. Synopsis > We want to prove Ozone runs well at scale, in terms of number of keys > (billions of keys), as well as dense DataNodes where each DN has hundreds of > TB or even PB-scale capacity. > h1. Challenge: Data generation > The challenge is to generate a huge data set fast so that we can benchmark > the system quickly. No existing tool is capable at this scale. > > h1. Proposal: > The major bottleneck is OM’s key insertion performance. In addition, Ozone > uses a single pipeline to write data, unless multi-raft is enabled. > > Instead of using Ozone's client API to generate data, We should write > directly to OM, SCM and DN’s rocksdb. RocksDB can support u[p to a million > key|https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks] bulk > load operations. > > Similarly, we can skip the normal Ozone client write path; populate the > container db and block files directly. > > (more details in the design doc) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org