[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283102#comment-16283102
 ] 

Uma Maheswara Rao G edited comment on HDFS-10285 at 12/8/17 8:44 AM:
---------------------------------------------------------------------

{quote}
Here's a rhetorical question: If managing multiple services is hard, why not 
bundle oozie, spark, storm, sqoop, kafka, ranger, knox, hive server, etc in the 
same process? Or ZK so HA is easier to deploy/manage?
{quote}
Few of my thoughts on this question, each of these projects build for their own 
purpose, with its own spec, not for just for helping HDFS or any other single 
project. And none of that projects need to access other project internal data 
structures. Where as SPS is only functions for HDFS and access internal data 
structures. Even forcibly separated out, we need to expose ‘for SPS only’ RPC 
APIs. This strikes me to put a question in other way as well, is it make sense 
to separate ReplicationMonitor as one separate process? is it fine to start 
EDEK as one separate? is it ok to start other threads (like decommissioning 
task) as separate processes and co-ordinate via RPC? so that NameSystem class 
may become very light weight? I think its the value vs cost will decide whether 
to separate or merge into single. 

Coming to ZK part, As ZK is not build only for HDFS, I don’t think to have any 
such thoughts. Its general purpose co-ordination system. Technically we can’t 
keep monitoring services inside NN, because the worry itself is, NN may die, 
need failover and so need external process to monitor. Anyway. I think the 
whole discussion is about services inside a project, but not cross projects 
itself, IMHO.
Here SPS providing only the missing functionality of HSM, that is end-end 
policy satisfaction. So, IMV, for users it may not worth to manage additional 
process to achieve that missing functionality for particular feature.

{quote}
Today, I looked at the code more closely. It can hold the lock (read lock, but 
still) way too long. Notably, but not limited to, you can’t hold the lock while 
doing block placement.
{quote}

Appreciate your review Daryn. I think it should be easy to address. We will 
make sure to address the comment before merge? is that make sense.

{quote}
I should start sending bills to everyone who makes this fraudulent claim. . 
FSDirectory#addToInodeMap imposes a nontrivial performance penalty even when 
SPS is not enabled. We had to hack out the similar EZ check because it had a 
noticeable performance impact esp. on startup. However now that we support EZ, 
I need to revisit optimizing it.
{quote}
Thanks for review!. Nice find. Fundamentally, if sps disabled we don't even 
need to load the things into Qs as no one will process them. So, adding 
condition of checking enabled, can avoid even that enqueuing calls, in disabled 
case. So, it will end up having one extra bool if check if disabled. With this 
change, impact should be negligible IIUC. we will take this comment. Thanks

{quote}
I’m curious why it isn’t just part of the standard replication monitoring. If 
the DN is told to replicate to itself, it just does the storage movement.
{quote}
That's a good question. Overall approach is exactly same as RM. RM is has its 
own q build up for redundancy blocks, and Underreplication scan/check happens 
at block level, it make sense. Where as in SPS, policy changes for file, so all 
blocks in that file needs movement and policy check should happen 
in-co-ordination with replication blocks where they stored currently. So, we 
track the queues at file level here and scan/check all block for that files 
together at once. Also , We wanted to provide, on the fly reconfigure feature 
and we carefully thought that, we don’t want to interfere replication logic 
should be given more priority than SPS work. While scheduling blocks, we 
respect xmits counts, they are shared between, RM, SPS for controlling DN load. 
Assignment priority given to replication/EC blocks, then SPS blocks, when 
sending tasks to DN. So, as part of impact analysis, we thought of keeping SPS 
in it's own thread than RM thread would be clean and safer than running in that 
same loop of RM.


was (Author: umamaheswararao):
{quote}
Here's a rhetorical question: If managing multiple services is hard, why not 
bundle oozie, spark, storm, sqoop, kafka, ranger, knox, hive server, etc in the 
same process? Or ZK so HA is easier to deploy/manage?
{quote}
Few of my thoughts on this question, each of these projects build for their own 
purpose, with its own spec, not for just for helping HDFS or any other single 
project. And none of that projects need to access other project internal data 
structures. Where as SPS is only functions for HDFS and access internal data 
structures. Even forcibly separated out, we need to expose ‘for SPS only’ RPC 
APIs. This strikes me to put a question in other way as well, is it make sense 
to separate ReplicationMonitor as one separate process? is it fine to start 
EDEK as one separate? is it ok to start other threads (like decommissioning 
task) as separate processes and co-ordinate via RPC? so that NameSystem class 
may become very light weight? I think its the value vs cost will decide whether 
to separate or merge into single. 

Coming to ZK part, As ZK is not build only for HDFS, I don’t think to have any 
such thoughts. Its general purpose co-ordination system. Technically we can’t 
keep monitoring services inside NN, because the worry itself is, NN may die, 
need failover and so need external process to monitor. Anyway. I think the 
whole discussion is about services inside a project, but not cross projects 
itself, IMHO.
Here SPS providing only the missing functionality of HSM, that is end-end 
policy satisfaction. So, IMV, for users it may not worth to manage additional 
process to achieve that missing functionality for particular feature.

{quote}
Today, I looked at the code more closely. It can hold the lock (read lock, but 
still) way too long. Notably, but not limited to, you can’t hold the lock while 
doing block placement.
{quote}

Appreciate your review Daryn. I think it should be easy to address. We will 
make sure to address the comment before merge? is that make sense.


{quote}
I’m curious why it isn’t just part of the standard replication monitoring. If 
the DN is told to replicate to itself, it just does the storage movement.
{quote}
That's a good question. Overall approach is exactly same as RM. RM is has its 
own q build up for redundancy blocks, and Underreplication scan/check happens 
at block level, it make sense. Where as in SPS, policy changes for file, so all 
blocks in that file needs movement and policy check should happen 
in-co-ordination with replication blocks where they stored currently. So, we 
track the queues at file level here and scan/check all block for that files 
together at once. Also , We wanted to provide, on the fly reconfigure feature 
and we carefully thought that, we don’t want to interfere replication logic 
should be given more priority than SPS work. While scheduling blocks, we 
respect xmits counts, they are shared between, RM, SPS for controlling DN load. 
Assignment priority given to replication/EC blocks, then SPS blocks, when 
sending tasks to DN. So, as part of impact analysis, we thought of keeping SPS 
in it's own thread than RM thread would be clean and safer than running in that 
same loop of RM.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to