[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278844#comment-16278844
 ] 

Uma Maheswara Rao G edited comment on HDFS-10285 at 12/5/17 4:52 PM:
---------------------------------------------------------------------

[~chris.douglas] do you have some opinion here, how to move forward ? 
Appreciate others thoughts as well.
# One is to keep SPS running in Namenode, as it is done now. This avoids any 
operational cost and additional process maintenance (against to keeping 
outside) . A Throttling would try to control the additional over burden on 
Namenode. SPS is kind of extension to HSM feature for more usability, as HSM is 
Namenode's feature, it make sense to keep in Namenode itself..
# Other thought is to run SPS out side as an independent process to avoid any 
burdens on Namenode due to SPS. COmparing to Balancer/DiskBalancers, SPS also 
moving blocks, so make sense to run as separate process. In other ide, this 
could increase RPC calls to Namenode for getting meta information of file while 
processing and for other co-ordinations. And extra process maintenance cost to 
this additional process for the deployments perspective.   
Many other points discussed above for more information.


was (Author: umamaheswararao):
[~chris.douglas] do you have some opinion here to move forward ? Appreciate 
others thoughts as well.
# One is to keep SPS running in Namenode, as it is done now. This avoids any 
operational cost and additional process maintenance (against to keeping 
outside) . A Throttling would try to control the additional over burden on 
Namenode. SPS is kind of extension to HSM feature for more usability, as HSM is 
Namenode's feature, it make sense to keep in Namenode itself..
# Other thought is to run SPS out side as an independent process to avoid any 
burdens on Namenode due to SPS. COmparing to Balancer/DiskBalancers, SPS also 
moving blocks, so make sense to run as separate process. In other ide, this 
could increase RPC calls to Namenode for getting meta information of file while 
processing and for other co-ordinations. And extra process maintenance cost to 
this additional process for the deployments perspective.   
Many other points discussed above for more information.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to