I would like to start a discussion to make a separate Apache project for Ozone



### HISTORY [1]

* Apache Hadoop Ozone development started on a feature branch of Hadoop repository (HDFS-7240)

* In the October of 2017 a discussion has been started to merge it to the Hadoop main branch

 * After a long discussion it's merged to Hadoop trunk at the March of 2018

* During the discussion of the merge, it was suggested multiple times to create a separated project for the Ozone. But at that time:
    1). Ozone was tightly integrated with Hadoop/HDFS
2). There was an active plan to use Block layer of Ozone (HDDS or HDSL at that time) as the block level of HDFS
    3). The community of Ozone was a subset of the HDFS community

* The first beta release of Ozone was just released. Seems to be a good time before the first GA to make a decision about the future.



### WHAT HAS BEEN CHANGED

During the last years Ozone became more and more independent both at the community and code side. The separation has been suggested again and again (for example by Owen [2] and Vinod [3])



 From COMMUNITY point of view:


* Fortunately more and more new contributors are helping Ozone. Originally the Ozone community was a subset of HDFS project. But now a bigger and bigger part of the community is related to Ozone only.

  * It seems to be easier to _build_ the community as a separated project.

* A new, younger project might have different practices (communication, commiter criteria, development style) compared to old, mature project

* It's easier to communicate (and improve) these standards in a separated projects with clean boundaries

* Separated project/brand can help to increase the adoption rate and attract more individual contributor (AFAIK it has been seen in Submarine after a similar move)

* Contribution process can be communicated more easily, we can make first time contribution more easy



 From CODE point of view Ozone became more and more independent:


 * Ozone has different release cycle

* Code is already separated from Hadoop code base (apache/hadoop-ozone.git)

 * It has separated CI (github actions)

* Ozone uses different (more strict) coding style (zero toleration of unit test / checkstyle errors)

* The code itself became more and more independent from Hadoop on Maven level. Originally it was compiled together with the in-tree latest Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC, Configuration...)

 * It starts to use multiple version of Hadoop (on client side)

* Volume of resolved issues are already very high on Ozone side (Ozone had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all together in the last 2-3 months)


Summary: Before the first Ozone GA release, It seems to be a good time to discuss the long-term future of Ozone. Managing it as a separated TLP project seems to have more benefits.


Please let me know what your opinion is...

Thanks a lot,
Marton





[1]: For more details, see: https://github.com/apache/hadoop-ozone/blob/master/HISTORY.md

[2]: https://lists.apache.org/thread.html/0d0253f6e5fa4f609bd9b917df8e1e4d8848e2b7fdb3099b730095e6%40%3Cprivate.hadoop.apache.org%3E

[3]: https://lists.apache.org/thread.html/8be74421ea495a62e159f2b15d74627c63ea1f67a2464fa02c85d4aa%40%3Chdfs-dev.hadoop.apache.org%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to