Sean, (and everyone else)

You mentioned that you want to create separate maven modules to upgrade hive & 
hbase.  The Flume build is already very large. In addition, Upgrading to Hive 3 
looks like it will require Hadoop 3 while Hive 2 runs with Hadoop 2. This means 
both dependencies would need to be in the parent pom. I find this problematic 
for the following reasons:
Flume contains a ton of dependencies and even more transitive dependencies that 
are not declared. This makes creating new releases really hard given how many 
dependencies have to be checked and upgraded.
As more modules are added the build is just going to get slower.
Some modules have dependencies on things that are no longer supported. Again, 
that makes creating a full Flume release hard.

I would suggest that unless security fixes require it we hold off on creating 
upgrades in 1.10.0 for HBase and Hive beyond what you have already done. 
Instead, we should create new repositories for the parts of Flume we want to 
separate and maintain independently. The HBase and Hive upgrades would end up 
goring there.

I believe this will speed up development since builds will no longer take so 
long.It also means that PRs will go against the target repo which should 
simplify things. Jira would remain the same as it is today. The component would 
be used to identify the target repo.

I would suggest that what should remain in the main Flume build would be 
primarily, configuration, core, node, sdk, and some of configfilters.  I would 
expect we would have separate repos for hbase, hdfs, hive, Kafka, 
embedded-agent, tools, and legacy to start.

Thoughts?

Ralph

Reply via email to