[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384553#comment-17384553 ] Konstantin Shvachko commented on HADOOP-17028: -- Committed PR #3218 to branch-2.10. Thank you [~abhishekd] > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380242#comment-17380242 ] Konstantin Shvachko commented on HADOOP-17028: -- I committed this to trunk and branches 3.1, 3.2, 3.3. Thanks [~abhishekd] for working on it. Will need a separate patch for branch-2.10. Too many conflicts. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370546#comment-17370546 ] Steve Loughran commented on HADOOP-17028: - bq. Steve, this looks like incompatible change as it replaced the parameter to a different class... In a world of @Functional params, it is compatible. But I suspect that you are right about link-time matching; the JVM probably isn't going to blindly treat them as equivalent. bq. One good thing about it that you can remove the deprecated classes as they have never been used. not used in hadoop-*. My fear was that they had been picked up/used in google GCS. I've been building that connector and I don't see it in use. Given that I'll target it for removal in 3.3.2 bq. You don't seem to care about older versions, but many people do. I really do. I am the one currently staring at the ABFS Code and a branch-2 build. At the same time, being more isolated (and, let's be ruthless: lower risk), the object storage code has been able to evolve faster than other bits of the codebase. FWIW the main troublespot in backporting cloud storage changes, esp in hadoop-azure, is actually mockito versions. something which works on mockito 2 is still likely to compile on mockito 1.x but then fail with some "impossible" stack trace, and I'm left trying to distinguish between "mockito lib version issues, fix test or cut", "different codepath breaks mockito test" and "test has actually found a regression". It's a key reason why I don't like mockito-based testing. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370329#comment-17370329 ] Konstantin Shvachko commented on HADOOP-17028: -- Steve, this looks like incompatible change as it replaced the parameter to a different class: {code} public static CompletableFuture eval( - FunctionsRaisingIOE.CallableRaisingIOE callable) { + CallableRaisingIOE callable) { CompletableFuture result = new CompletableFuture<>(); {code} Also introduction of dead code should have been avoided. Looking at the history, you introduced {{org.apache.hadoop.fs.impl.FunctionsRaisingIOE.FunctionRaisingIOE}} as a part of HADOOP-15183. But it wasn't used anywhere. Then HADOOP-17450 deprecated it. One good thing about it that you can remove the deprecated classes as they have never been used. Supporting dead code is just a waste of energy it also makes back porting really hard. You don't seem to care about older versions, but many people do. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358592#comment-17358592 ] Steve Loughran commented on HADOOP-17028: - bq. It looks like Steve Loughran deprecated this class later after his comment. Don't know what prompted the refactoring, but {{org.apache.hadoop.fs.impl.FunctionsRaisingIOE.FunctionRaisingIOE}} should be now {{org.apache.hadoop.util.functional.FunctionRaisingIOE}} I moved it because it turns out that wrapping/unwrapping IOEs is critical to using this in applications using the FS API, and since the relevant methods/interfaces were not public, the only way to do that is to have them public. Accordingly I # replicated the functional interfaces in the public/unstable package {{org.apache.hadoop.util.functional}} where I'm trying to make it possible to use IOE-raising stuff (including RemoteIterator) in apps. # tagged the old ones, as Deprecated, so new code will use it. bq. Moving interfaces, which are public by default, from one package to another is considered an incompatible change, especially since the previous variant had been released Two points to note # I tagged the original package "Fs.impl" as not public then, isn't it? # left the old interface alone, on the basis that if filesystems outside the hadoop codebase (gcs?) were using them, all would be good. {code} @InterfaceAudience.LimitedPrivate("Filesystems") @InterfaceStability.Unstable {code} therefore, I do not consider this to be an incompatible change since # it wasn't public, outside filesystems # it hasn't been removed, just deprecated. bq. I prefer to avoid using FunctionRaisingIOE in this patch if possible Use the o.a.h.fs.impl for compatibility across Hadoop 3.3+ For older releases, no, it's not there. Not sure what to do there. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357581#comment-17357581 ] Konstantin Shvachko commented on HADOOP-17028: -- I'll re-post here mu comment from the linked PR for visibility. It looks like [~ste...@apache.org] deprecated this class later after his comment. Don't know what prompted the refactoring, but {{org.apache.hadoop.fs.impl.FunctionsRaisingIOE.FunctionRaisingIOE}} should be now {{org.apache.hadoop.util.functional.FunctionRaisingIOE}} Moving interfaces, which are public by default, from one package to another is considered an incompatible change, especially since the previous variant had been released. Besides, it has been committed only to branch-3.3 contributing to further divergence of the supported branches and making backporting yet harder. So [~abhishekd], I prefer to avoid using {{FunctionRaisingIOE}} in this patch if possible. This will simplify the backport and avoid using unstable APIs. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356063#comment-17356063 ] Konstantin Shvachko commented on HADOOP-17028: -- Left minor comments on the PR. The approach looks reasonable to me. The important thing is to get a Jenkins build. Looks like it could not run. May be you fork got stale. You should probably rebase on current trunk. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188186#comment-17188186 ] Uma Maheswara Rao G commented on HADOOP-17028: -- Thank you [~abhishekd] for the patch. I have this in my list, I will provide my feedback soon. Thank you. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186870#comment-17186870 ] Abhishek Das commented on HADOOP-17028: --- PR for this: [https://github.com/apache/hadoop/pull/2260] [~umamaheswararao] can you please review the change when you get a chance. Thanks in advance > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127459#comment-17127459 ] Abhishek Das commented on HADOOP-17028: --- The filesystems for both the non-leaf nodes (InternalDirOfViewFs) and leaf nodes (ChRootedFileSystem) gets constructed during the viewfs initialize phase. During this the fs object gets created though FsGetter. Here is my rough idea about the implementation. * Make sure the FsGetter.getNewInstance() doesn't initialize the FileSystem object when asked for. * For InternalDirOfViewFs, the initialize is called at the constructor but this call is calling base initailize method. * For ChRootedFileSystem, if the constructor gets invoked through ChRootedFileSystem(final URI uri, Configuration conf) then it gets the fs object from FileSystem.get(uri, conf) which will initalize the fs object but this constructor is not used except in tests, so we are good. The other constructor gets the fs object as argument, so we have to make sure caller wont initialize the fs object before invoking this constructor. * When a FileSystem api gets invoked for ChRootedFileSystem, before calling the actual implementation of the underlying fs object through FilterFileSystem, it can check whether the fs object has been initialized, so that the fs object gets initialized only once. We can tap at ChRootedFileSystem.fullPath(path) to check the initialization (through a class level variable) [~umamaheswararao] let me know your thoughts about the approach. I can start working on this. > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Priority: Major > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17028) ViewFS should initialize target filesystems lazily
[ https://issues.apache.org/jira/browse/HADOOP-17028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099863#comment-17099863 ] Steve Loughran commented on HADOOP-17028: - this would be good. All the object stores can be slow. FWIW, if you set fs.s3a.bucket.probe = 0 we skip that check for a bucket, but s3guard still kicks off its conversation with DynamoDB > ViewFS should initialize target filesystems lazily > -- > > Key: HADOOP-17028 > URL: https://issues.apache.org/jira/browse/HADOOP-17028 > Project: Hadoop Common > Issue Type: Bug > Components: client-mounts, fs, viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Priority: Major > > Currently viewFS initialize all configured target filesystems when > viewfs#init itself. > Some target file system initialization involve creating heavy objects and > proxy connections. Ex: DistributedFileSystem#initialize will create DFSClient > object which will create proxy connections to NN etc. > For example: if ViewFS configured with 10 target fs with hdfs uri and 2 > targets with s3a. > If one of the client only work with s3a target, But ViewFS will initialize > all targets irrespective of what clients interested to work with. That means, > here client will create 10 DFS initializations and 2 s3a initializations. Its > unnecessary to have DFS initialization here. So, it will be a good idea to > initialize the target fs only when first time usage call come to particular > target fs scheme. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org