[ 
https://issues.apache.org/jira/browse/HDFS-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732434#comment-14732434
 ] 

Yong Zhang commented on HDFS-8998:
----------------------------------

Hi [~andrew.wang], thanks for your comments, sorry for late reply.
I read the Ozone design document and view exist code, Ozone is design for 
cloudy file system with multi-tenancy, it is different goal, but also store 
some metadata in LevelDB to reduce memory used.
{quote}
The design you proposed sounds like it needs compaction to be coordinated by 
the NN, rather than offloading to the DNs. Level/RocksDB I think would also 
better handle concurrent writes without the concept of "locked" and "unlocked" 
blocks.
{quote}
Maybe small file zone is not exact for hdfs directory, but just call it 'small 
file zone'. Client creating file under small file zone, ad these file written 
is just append on exist block, if one block is being written, it is blocked, 
and this blocked info keep in NN to let other client not use this block until 
write finish. 
Compaction is only happened on block rewrite because one block belong to more 
than one file, and file deletion is only deleting INode, one block will be 
rewritten if more than threshold data should be removed, it is controlled by 
NN, in other words delete operation is offline.

{quote}
Also, could you comment on the usecase where you see the issues with # of files 
affecting DNs before NNs? IIUC this design does not address NN memory 
consumption, which is the issue we see first in practice.
{quote}
Yes, most work on DN because of we already have jira of keeping meta in 
LevelDB. 

{quote}
Goal # of files, expected size of a "small" file
Any bad behavior if a large file is accidentally written to the small file zone?
{quote}
I also have face some problems on it, user may copy file from local to hdfs, or 
streaming writing to hdfs, it is hard to identify, so just like I mentioned 
before, all data writing is append on exist block.

{quote}
Support for rename into / out of small file zone?
{quote}
Yes. but rename is only meta changed, and will add more xattr to identify small 
file move out of small file zone.

{quote}
Is there a way to convert a bunch of small files into a compacted file, like 
with HAR?
{quote}
Files will be bunched on block level.

{quote}
How common is it for a user to know apriori that a bunch of small files will be 
written, and is okay putting them in a zone? A lot of the time I see this 
happening by accident, either a poorly written app or misconfiguration.
{quote}
when file written finish, we call close output stream, block append will 
finish, I will try to do some test on exist append feature, and update document 
about chapter Reliability.  

 I will propose design updated soon.

> Small files storage supported inside HDFS
> -----------------------------------------
>
>                 Key: HDFS-8998
>                 URL: https://issues.apache.org/jira/browse/HDFS-8998
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Yong Zhang
>            Assignee: Yong Zhang
>         Attachments: HDFS-8998.design.001.pdf
>
>
> HDFS has problems on store small files, just like this blog said 
> (http://blog.cloudera.com/blog/2009/02/the-small-files-problem).
> This blog also tell us some way how to store small file in HDFS, but they are 
> not good way, seems HAR files and Sequence Files are better for read-only 
> files.
> Current each HDFS block is only for one HDFS file, if too many small file 
> there, many small blocks will be in DataNode, which will make DataNode heavy 
> loading.
> This jira will show how to online merge small blocks to big one, and how to 
> delete small file, and so on.
> Cerrentlly we have many open jira for improving HDFS scalability on NameNode, 
> such as HDFS-7836, HDFS-8286 and so on. 
> So small file meta (INode and BlocksMap) will also be in NameNode.
> Design document will be uploaded soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to