Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

Steve Loughran Tue, 15 Dec 2020 11:16:06 -0800

On Sun, 13 Dec 2020 at 21:08, Konstantin Shvachko <shv.had...@gmail.com>
wrote:


> Hi Steve,
>
> I am not sure I fully understand what is broken here. It is not an
> incompatible change, right?
>

The issue is that the FileSystem/FileContext APIs are something we have to
maintain ~forever, so every API change needs to be

- something we are happy with being there for the life of the class
- defined strictly enough that people implementing other filesystems can
re-implement without having to reverse-engineer HDFS and then conclude
"that is what they meant to do". That's with a bit of
- and with an AbstractFileSystemContractTest for implementors.

That's it: define, specify, add a contract test rather than just something
for HDFS.



> Could you please explain what you think the process is.
> Would be best if you could share a link to a document describing it.
> I would be glad to follow up with tests and documentation that are needed.
>
>
ideally,
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/extending.md

Pulling up something from hdfs is different from saying "hey, new rename
API!", but it's still time to actually define what it does so that not only
can other people like me reimplement it, but to actually define it well
enough that we can all see when there's a regression.

Equally important: is there a way to test that it works?

We've been using hasPathCapabilities() to probe for an FS having a given
feature down a path; the idea is to let code check upfront for a feature
before having to call it and catching an exception,

We can add that for an API even if it has shipped already. For example,
here is a PR to do exactly that for Syncable
https://github.com/apache/hadoop/pull/2102



> As you can see I proposed multiple solutions to the problem in the jira.
> Seemed nobody was objecting, so I chose one and explained why.
> I believe we call it lazy consensus.
>

I'm happy with lazy consensus, but can you involve more people? In
particular. i was filed in an HDFS JIRA so it didn't surface in
hadoop-common.

If you'd done a HADOOP- JIRA "pull up msync into FileSystem API" or even
just a note to hadoop-common saying "we need to do this" that would have
been enough to start a discussion.

 As it is I only noticed after some rebasing with a "hang on a minute.
here's a new method who's behaviour doesn't seem to have defined other than
'whatever hdfs does'". Which, if you've ever tried to work out what
rename() does, you'll recognise as danger.

Anyway, to finish off, have a look at the extending.md doc and just add a
new method definition in the filesystem.md saying what it is meant to do.

Now: what about viewfs? maybe: For all mounted fileystems which declare
their support, call msync()? Or just "call it and swallow all exceptions?"


> Stay safe,
>


yeah: )

> --Konstantin
>

>>

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

Reply via email to