[jira] [Commented] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode

2019-04-26 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827159#comment-16827159
 ] 

Anu Engineer commented on HDDS-1452:


I agree it is not orthogonal. I was thinking we can skip step one completely if 
we do the second one. Since the code changes are exactly in the same place. 
Most Object stores and file systems use Extend based allocation and writes. 
Ozone would benefit from moving into some kind of extend based system. In fact, 
it would be best if can allocate extents on SSD, keep the data in those extents 
for 24 hours and move it to spinning disks later. This is similar to what ZFS 
does, and you automatically get SSD caching. If you are writing to a a spinning 
disk, all writes are sequential which increases the write speed.

> All chunk writes should happen to a single file for a block in datanode
> ---
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode

2019-04-25 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825801#comment-16825801
 ] 

Jitendra Nath Pandey commented on HDDS-1452:


There are two very problems to solve here:
 1) Ability to write smaller chunks. Each chunk is a separate file, and 
therefore really small chunks bloat the number of files. Large chunks make the 
IO bursty, and we don't get effective pipelining in the IO path. This hurts the 
performance for large file sizes when compared to HDFS. So, we need ability to 
be able to stream smaller chunks without creating lots of small files.
 2) For small files, we end up with small file sizes in datanode irrespective 
of the chunk sizes. This bloats the number of individual files. Therefore, it 
is desirable to pack multiple blocks into a single file. This leads to some 
additional considerations
 * Even if multiple blocks share the same file, and we write small chunks, we 
need blocks to be contiguously allocated, so that we get a decent scan speed.
 * When deleting blocks the compaction logic for container will need to 
re-write lot more data and metadata.

That said, these problems are not orthogonal. Solution for 1 can be a stepping 
stone for 2, given we know the direction.

 

 

> All chunk writes should happen to a single file for a block in datanode
> ---
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org