[
https://issues.apache.org/jira/browse/HBASE-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636673#action_12636673
]
stack commented on HBASE-911:
-----------------------------
I took a look. Blocks are not all 64MB in size. Last block in a file is the
size of the files tail.
I set up a clean hdfs on four nodes. I took the size of the dfs directory:
{code}
[branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb
/bfd/hadoop-stack/dfs"; done
37527 /bfd/hadoop-stack/dfs
20795 /bfd/hadoop-stack/dfs
20795 /bfd/hadoop-stack/dfs
20794 /bfd/hadoop-stack/dfs
{code}
Next I uploaded a file of 98 bytes up into hdfs:
{code}
[branch-0.18]$ ls -la /tmp/xxxx.txt
-rw-r--r-- 1 stack powerset 98 Sep 26 23:54 /tmp/xxxx.txt
[EMAIL PROTECTED] branch-0.18]$ ./bin/hadoop fs -put /tmp/xxxx.txt /
{code}
Then I did a new listing:
{code}
[branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb
/bfd/hadoop-stack/dfs"; done
37840 /bfd/hadoop-stack/dfs
20904 /bfd/hadoop-stack/dfs
20904 /bfd/hadoop-stack/dfs
20794 /bfd/hadoop-stack/dfs
{code}
Sizes changed in three locations, one per replication.
Listing the dfs data directory on one of the replicas, I see a block of size 98
bytes and some accompanying metadata:
{code}
[branch-0.18]$ ls -la /bfd/hadoop-stack/dfs/data/current/
total 20
drwxr-sr-x 2 stack powerset 4096 Oct 3 16:40 .
drwxr-sr-x 5 stack powerset 4096 Oct 3 16:39 ..
-rw-r--r-- 1 stack powerset 158 Oct 3 16:39 VERSION
-rw-r--r-- 1 stack powerset 98 Oct 3 16:40 blk_-343955609951300745
-rw-r--r-- 1 stack powerset 11 Oct 3 16:40 blk_-343955609951300745_1001.meta
-rw-r--r-- 1 stack powerset 0 Oct 3 16:39 dncp_block_verification.log.curr
{code}
> Minimize filesystem footprint
> -----------------------------
>
> Key: HBASE-911
> URL: https://issues.apache.org/jira/browse/HBASE-911
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
>
> This issue is about looking into how much space in filesystem hbases uses.
> Daniel Ploeg suggests that hbase is profligate in its use of space in hdfs.
> Given that block sizes by default are 64MB, and that every time hbase writes
> a store file that its accompanied by an index file and a very small metadata
> file, thats 3*64MB even if the file is empty (TODO: Prove this). The
> situation is aggrevated by the fact that hbase does a flush of whatever is in
> memory every 30 minutes to minimize loss in the absence of appends; this
> latter action makes for lots of small files.
> The solution to the above is implement append so optional flush is not
> necessary and a file format that aggregates info, index and data all in the
> one file. Short-term, we should set block size on the info/metadata file
> down to 4k or some such small size and look into doing likewise for the
> mapfile index.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.