[ 
https://issues.apache.org/jira/browse/HDFS-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341999#comment-16341999
 ] 

LiXin Ge commented on HDFS-13021:
---------------------------------

Personally,It's a hassle to deal with snapshot more or less, not only in 
distributed file system like HDFS, but also in local file systems. Even so, we 
still have to deal with it, or just to explain it clearly :)
 FYI, some personnal opinions:


 First of all, the definition of HDFS snapshots in [hadoop wiki 
|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html]
 are *read-only, point-in-time* copies of the file system. If the 
file/directory status in a snapshot changes with the origin file/directory, it 
breaks the semantic of *read-only* and *point-in-time* to some extent.


 Secondly, It may bring some confusions to the admin users. For example, people 
set a directory's storage policy to {{All_SSD}} to store it in SSD for better 
performance and take a snapshot. Afterwards the policy changed back to {{Hot}} 
for some reason, and months later the admin forgets the {{All_SSD}} policy 
exist in the past(or just is another admin) and want to find out why the SSD is 
full of data, he looks over all the directories include the snapshots, only to 
find {{HOT}} policy which will not use the SSD, and there is no clue or 
document to remind him to run the mover for a try – seems like a bug.

> Incorrect storage policy of snapshot file was returned by getStoragePolicy 
> command
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13021
>                 URL: https://issues.apache.org/jira/browse/HDFS-13021
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, snapshots
>    Affects Versions: 3.1.0
>            Reporter: LiXin Ge
>            Assignee: LiXin Ge
>            Priority: Major
>         Attachments: HDFS-13021.001.patch
>
>
> Snapshots are supposed to be immutable and read only, so the file status 
> which in a snapshot path shouldn't follow the original file's change.
> The StoragePolicy in snapshot situation acts like a bug now.
> -----------
> Reproduction:Operation on snapshottable dir {{/storagePolicy}}
> *before make snapshot:*
> {code:java}
>  [bin]# hdfs storagepolicies -setStoragePolicy -path /storagePolicy -policy 
> PROVIDED
>  Set storage policy PROVIDED on /storagePolicy
>  [bin]# hadoop fs -put /home/file /storagePolicy/file_PROVIDED
>  [bin]# hdfs storagepolicies -getStoragePolicy -path 
> /storagePolicy/file_PROVIDED
>  The storage policy of /storagePolicy/file_PROVIDED:
>  BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
> replicationFallbacks=[ARCHIVE]}
> {code}
> *make snapshot and check:*
> {code:java}
> [bin]# hdfs dfs -createSnapshot /storagePolicy s3_PROVIDED
> Created snapshot /storagePolicy/.snapshot/s3_PROVIDED
> [bin]# hdfs storagepolicies -getStoragePolicy -path 
> /storagePolicy/.snapshot/s3_PROVIDED/file_PROVIDED
> The storage policy of /storagePolicy/.snapshot/s3_PROVIDED/file_PROVIDED:
> BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], 
> creationFallbacks=[PROVIDED, DISK], replicationFallbacks=[PROVIDED, DISK]} 
> {code}
> *change the StroagePolicy and check again:*
> {code:java}
> [bin]# hdfs storagepolicies -setStoragePolicy -path /storagePolicy -policy HOT
> Set storage policy HOT on /storagePolicy
> [bin]# hdfs storagepolicies -getStoragePolicy -path 
> /storagePolicy/.snapshot/s3_PROVIDED/file_PROVIDED
> The storage policy of /storagePolicy/.snapshot/s3_PROVIDED/file_PROVIDED:
> BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
> replicationFallbacks=[ARCHIVE]}   ---- It shouldn't be HOT
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to