[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825801#comment-16825801 ]
Jitendra Nath Pandey commented on HDDS-1452: -------------------------------------------- There are two very problems to solve here: 1) Ability to write smaller chunks. Each chunk is a separate file, and therefore really small chunks bloat the number of files. Large chunks make the IO bursty, and we don't get effective pipelining in the IO path. This hurts the performance for large file sizes when compared to HDFS. So, we need ability to be able to stream smaller chunks without creating lots of small files. 2) For small files, we end up with small file sizes in datanode irrespective of the chunk sizes. This bloats the number of individual files. Therefore, it is desirable to pack multiple blocks into a single file. This leads to some additional considerations * Even if multiple blocks share the same file, and we write small chunks, we need blocks to be contiguously allocated, so that we get a decent scan speed. * When deleting blocks the compaction logic for container will need to re-write lot more data and metadata. That said, these problems are not orthogonal. Solution for 1 can be a stepping stone for 2, given we know the direction. > All chunk writes should happen to a single file for a block in datanode > ----------------------------------------------------------------------- > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Affects Versions: 0.5.0 > Reporter: Shashikant Banerjee > Assignee: Shashikant Banerjee > Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org