[
https://issues.apache.org/jira/browse/HADOOP-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519181
]
Konstantin Shvachko commented on HADOOP-1656:
---------------------------------------------
I think we should be careful introducing new persistent fields. It is really
simple to add a field,
but it may be hard to remove or support it in the future.
Why do we need the block size to be persistent? What is the use case?
Right now we create files with the default block size, at least that is what
map-reduce does.
Block size is used by map-reduce to calculate splits. I don't know whether this
patch will break the
semantics of generating splits.
On the other hand if we store the block size per file, what is the semantics of
that field?
Currently we have flexibility to create blocks of different sizes within the
file, do we loose that flexibility
from now on? And if we don't why do we need to store it?
This looks like one of those simple changes that can lead to big consequences.
Currently if there is more than one block in the file the block size is
returned correct;y. The problem is with one block files only.
I'd propose one of the 3 variants in this case:
# keep it as it is: return the first block size;
# return the default block size;
# return -1 as the block size from the name-node, and let DFSClient return its
default size further up to the application.
Most probably that will the size this file was created with.
On a side note we need to deprecate getBlockSize() both in DFSClient and
ClientProtocol because it is never used.
The correct way is to call getFileInfo().getBlockSize().
> HDFS does not record the blocksize for a file
> ---------------------------------------------
>
> Key: HADOOP-1656
> URL: https://issues.apache.org/jira/browse/HADOOP-1656
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.13.0
> Reporter: Sameer Paranjpye
> Assignee: dhruba borthakur
> Fix For: 0.15.0
>
> Attachments: blockSize2.patch
>
>
> The blocksize that a file is created with is not recorded by the Namenode. It
> is used only by the client when it writes the file. Invoking 'getBlockSize'
> merely returns the size of the first block. The Namenode should record the
> blocksize.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.