On Sun, 13 Dec 2020 at 21:08, Konstantin Shvachko <shv.had...@gmail.com> wrote:
> Hi Steve, > > I am not sure I fully understand what is broken here. It is not an > incompatible change, right? > The issue is that the FileSystem/FileContext APIs are something we have to maintain ~forever, so every API change needs to be - something we are happy with being there for the life of the class - defined strictly enough that people implementing other filesystems can re-implement without having to reverse-engineer HDFS and then conclude "that is what they meant to do". That's with a bit of - and with an AbstractFileSystemContractTest for implementors. That's it: define, specify, add a contract test rather than just something for HDFS. > Could you please explain what you think the process is. > Would be best if you could share a link to a document describing it. > I would be glad to follow up with tests and documentation that are needed. > > ideally, hadoop-common-project/hadoop-common/src/site/markdown/filesystem/extending.md Pulling up something from hdfs is different from saying "hey, new rename API!", but it's still time to actually define what it does so that not only can other people like me reimplement it, but to actually define it well enough that we can all see when there's a regression. Equally important: is there a way to test that it works? We've been using hasPathCapabilities() to probe for an FS having a given feature down a path; the idea is to let code check upfront for a feature before having to call it and catching an exception, We can add that for an API even if it has shipped already. For example, here is a PR to do exactly that for Syncable https://github.com/apache/hadoop/pull/2102 > As you can see I proposed multiple solutions to the problem in the jira. > Seemed nobody was objecting, so I chose one and explained why. > I believe we call it lazy consensus. > I'm happy with lazy consensus, but can you involve more people? In particular. i was filed in an HDFS JIRA so it didn't surface in hadoop-common. If you'd done a HADOOP- JIRA "pull up msync into FileSystem API" or even just a note to hadoop-common saying "we need to do this" that would have been enough to start a discussion. As it is I only noticed after some rebasing with a "hang on a minute. here's a new method who's behaviour doesn't seem to have defined other than 'whatever hdfs does'". Which, if you've ever tried to work out what rename() does, you'll recognise as danger. Anyway, to finish off, have a look at the extending.md doc and just add a new method definition in the filesystem.md saying what it is meant to do. Now: what about viewfs? maybe: For all mounted fileystems which declare their support, call msync()? Or just "call it and swallow all exceptions?" > Stay safe, > yeah: ) > --Konstantin > >>