[jira] [Commented] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode

Jitendra Nath Pandey (JIRA) Thu, 25 Apr 2019 00:11:27 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825801#comment-16825801
 ]


Jitendra Nath Pandey commented on HDDS-1452:
--------------------------------------------

There are two very problems to solve here:
 1) Ability to write smaller chunks. Each chunk is a separate file, and 
therefore really small chunks bloat the number of files. Large chunks make the 
IO bursty, and we don't get effective pipelining in the IO path. This hurts the 
performance for large file sizes when compared to HDFS. So, we need ability to 
be able to stream smaller chunks without creating lots of small files.
 2) For small files, we end up with small file sizes in datanode irrespective 
of the chunk sizes. This bloats the number of individual files. Therefore, it 
is desirable to pack multiple blocks into a single file. This leads to some 
additional considerations
 * Even if multiple blocks share the same file, and we write small chunks, we 
need blocks to be contiguously allocated, so that we get a decent scan speed.
 * When deleting blocks the compaction logic for container will need to 
re-write lot more data and metadata.

That said, these problems are not orthogonal. Solution for 1 can be a stepping 
stone for 2, given we know the direction.

 

 

> All chunk writes should happen to a single file for a block in datanode
> -----------------------------------------------------------------------
>
>                 Key: HDDS-1452
>                 URL: https://issues.apache.org/jira/browse/HDDS-1452
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.5.0
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>             Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode

Reply via email to