As Chris mentioned earlier, it would be wise to do this in pieces that can
be reviewed properly.
Bringing large refactorings in all at once, as Garret mentioned, is not
likely to just get a +1.

We do have a feature branch process and criteria and we could determine
specific criteria for such a critical piece of code going through such
changes.

The question of funding strikes me as odd on this list - though I
understand the motivation and relevance to you, for sure.
Feel free to look me up on LI and if there are contacts or anything that I
can help with let me know.

In the meantime, I would think this thread and list should discuss
solutions to problems and our ability to get there with reasonable effort
and practices that we are familiar with.

On Wed, Nov 30, 2022 at 12:46 PM Garret Wilson <gar...@globalmentor.com>
wrote:

> On 11/29/2022 8:16 AM, Gautham Banasandra wrote:
> > …
> > However, I don't see anyone stopping you from working on removing
> > winutils. I encourage you to put across a PR and I would be glad to
> review
> > the same.
>
> That's not how it works. This is an intense undertaking. If I spend six
> months with no income, just rewriting all the native `FileSystem`
> implementations, and you simply gave thumbs up to the PR, then yay,
> Apache would integrate my changes into the codebase? I hardly think so.
> There has to be official buy-in across the group and authorization to
> make such extensive changes. It's naive to say, "oh, just go rewrite it,
> and I'll review it and then it will be done".
>
> However I am interested in who funds your work. Do you work on Apache
> for free in your extra time? Or does some corporation pay you? If the
> latter, I'll be happy to submit my resume to them, so that they can fund
> me as well and I can start the work immediately. But as I've mentioned a
> couple of times already, financially I cannot justify sitting here
> rewriting Hadoop file systems without any income. If you find a creative
> way for it to be financially viable for me, I would love to do it.
> > One question I've is - how will you validate that your changes work fine
> > and don't regress the existing functionality, given that we don't yet
> have
> > a CI for Hadoop on Windows?
>
> It's tempting to start to give you a detailed answer here, because it's
> a legitimate question. The more general answer is that we would discuss
> and form a plan with the group; you'll likely find that 1) the existing
> code doesn't even have sufficient tests, and 2) the existing API isn't
> even sufficiently documented. But  your question was formulated in a way
> completely different than I conceptualize the issue. What I would be
> writing would be a completely native Java implementation of
> `FileSystem`. The tests accordingly should be written agnostic to the
> platform. If the tests run on Linux, they will run on Windows; if not,
> we need to file a bug against the JDK. I'm not even thinking in terms of
> a "CI for Hadoop for Windows". I just want to build the Java project,
> whether I'm running on Mac or Linux or Windows or whatever. (That was
> the point of my wanting to get rid of Winutils to begin with.)
>
> I also know that pragmatically whatever I do with the `FileSystem`
> implementation, something will initially break—not because of anything I
> did incorrectly, but because the Hadoop API is inadequate and people
> have therefore made a thousand brittle assumptions in their use of the
> API. Things will break already with or without my `FileSystem`
> implementation; that's why Hadoop is still using
> `DeprecatedRawLocalFileStatus`: someone made a new version but had to
> switch it off because something broke (HADOOP-9652
> <https://issues.apache.org/jira/browse/HADOOP-9652> according to the
> comments).
>
> In summary, yes, if I ever get buy-in and funding to rewrite
> `FileSystem` for native Java, we need to have a discussion with the
> wider group to form a plan for improving the documentation and for
> testing. But whatever discussion or plan we do, things will eventually
> break because Hadoop doesn't have a well-documented API and doesn't
> cleanly separate the interface from the implementation. If I were to
> work on it, I would improve that situation so that things would be
> better documented and less brittle.
>
> In the meantime my Bare Naked Local FileSystem
> <https://github.com/globalmentor/hadoop-bare-naked-local-fs> is meeting
> my needs pragmatically, and I'm leaving this mailing list—not to be
> antisocial, but because the unrelated (mostly automated) chatter is
> distracting to my other work.
>
> Have a wonderful holiday season, and feel free to reach out directly.
>
> Best,
>
> Garret
>

Reply via email to