[ 
https://issues.apache.org/jira/browse/HADOOP-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-11188:
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.7.0
           Status: Resolved  (was: Patch Available)

> hadoop-azure: automatically expand page blobs when they become full
> -------------------------------------------------------------------
>
>                 Key: HADOOP-11188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11188
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Eric Hanson
>            Assignee: Eric Hanson
>             Fix For: 2.7.0
>
>         Attachments: hadoop-11188.01.patch
>
>
> Right now, page blobs are initialized to a fixed size 
> (fs.azure.page.blob.size) and cannot be expanded. This task is to make them 
> automatically expand when they get to be nearly full.
> Design: if a write occurs that does not have enough room in the file to 
> finish, then flush all preceding operations, extend the file, and complete 
> the write. This will be synchronized (to have exclusive access) in access to 
> PageBlobOutputStream so there won't be race conditions.
> The file will be extended by fs.azure.page.blob.extension.size bytes, which 
> must be a multiple of 512. The internal default for 
> fs.azure.page.blob.extension size will be 128 * 1024 * 1024. The minimum 
> extension size will be 4 * 1024 * 1024 which is the maximum write size, so 
> the new write will finish. 
> Extension will stop when the file size reaches 1TB. The final extension may 
> be less than fs.azure.page.blob.extension.size if the remainder (1TB - 
> current_file_size) is smaller than fs.azure.page.blob.extension.size.
> An alternative to this is to make the default size 1TB. This is much simpler 
> to implement. It's a one-line change. Or even simpler, don't change it at all 
> because it is adequate for HBase.
> Rationale for this file size extension feature:
> 1) be able to download files to local disk easily with CloudXplorer and 
> similar tools. Downloading a 1TB page blob is not practical if you don't have 
> 1TB disk space since on the local side it expands to the full file size, 
> locally filled with zeros where there is no valid data.
> 2) don't make customers uncomfortable when they see large 1TB files. They 
> often ask if they have to pay for it, even though they only pay for the space 
> actually used in the page blob.
> I think rationale 2 is a relatively minor issue, because 98% of customers for 
> HBase will never notice. They will just use it and not look at what kind of 
> files are used for the logs. They don't pay for the unused space, so it is 
> not a problem for them. We can document this. Also, if they use hadoop fs 
> -ls, they will see the actual size of the files since I put in a fix for that.
> Rationale 1 is a minor issue because you cannot interpret the data on your 
> local file system anyway due to the data format. So really, the only reason 
> to copy data locally in its binary format would be if you are moving it 
> around or archiving it. Copying a 1TB page blob from one location in the 
> cloud to another is pretty fast with smart copy utilities that don't actually 
> move the 0-filled parts of the file.
> Nevertheless, this is a convenience feature for users. They won't have to 
> worry about setting fs.azure.page.blob.size under normal circumstances and 
> can make the files grow as big as they want.
> If we make the change to extend the file size on the fly, that introduces new 
> possible error or failure modes for HBase. We should included retry logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to