I would like to start a discussion to make a separate Apache project for
Ozone
### HISTORY [1]
* Apache Hadoop Ozone development started on a feature branch of
Hadoop repository (HDFS-7240)
* In the October of 2017 a discussion has been started to merge it to
the Hadoop main branch
* After a long discussion it's merged to Hadoop trunk at the March of 2018
* During the discussion of the merge, it was suggested multiple times
to create a separated project for the Ozone. But at that time:
1). Ozone was tightly integrated with Hadoop/HDFS
2). There was an active plan to use Block layer of Ozone (HDDS or
HDSL at that time) as the block level of HDFS
3). The community of Ozone was a subset of the HDFS community
* The first beta release of Ozone was just released. Seems to be a
good time before the first GA to make a decision about the future.
### WHAT HAS BEEN CHANGED
During the last years Ozone became more and more independent both at
the community and code side. The separation has been suggested again and
again (for example by Owen [2] and Vinod [3])
From COMMUNITY point of view:
* Fortunately more and more new contributors are helping Ozone.
Originally the Ozone community was a subset of HDFS project. But now a
bigger and bigger part of the community is related to Ozone only.
* It seems to be easier to _build_ the community as a separated project.
* A new, younger project might have different practices
(communication, commiter criteria, development style) compared to old,
mature project
* It's easier to communicate (and improve) these standards in a
separated projects with clean boundaries
* Separated project/brand can help to increase the adoption rate and
attract more individual contributor (AFAIK it has been seen in Submarine
after a similar move)
* Contribution process can be communicated more easily, we can make
first time contribution more easy
From CODE point of view Ozone became more and more independent:
* Ozone has different release cycle
* Code is already separated from Hadoop code base
(apache/hadoop-ozone.git)
* It has separated CI (github actions)
* Ozone uses different (more strict) coding style (zero toleration of
unit test / checkstyle errors)
* The code itself became more and more independent from Hadoop on
Maven level. Originally it was compiled together with the in-tree latest
Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC,
Configuration...)
* It starts to use multiple version of Hadoop (on client side)
* Volume of resolved issues are already very high on Ozone side (Ozone
had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all
together in the last 2-3 months)
Summary: Before the first Ozone GA release, It seems to be a good time
to discuss the long-term future of Ozone. Managing it as a separated TLP
project seems to have more benefits.
Please let me know what your opinion is...
Thanks a lot,
Marton
[1]: For more details, see:
https://github.com/apache/hadoop-ozone/blob/master/HISTORY.md
[2]:
https://lists.apache.org/thread.html/0d0253f6e5fa4f609bd9b917df8e1e4d8848e2b7fdb3099b730095e6%40%3Cprivate.hadoop.apache.org%3E
[3]:
https://lists.apache.org/thread.html/8be74421ea495a62e159f2b15d74627c63ea1f67a2464fa02c85d4aa%40%3Chdfs-dev.hadoop.apache.org%3E
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org