[ https://issues.apache.org/jira/browse/HUDI-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit updated HUDI-3674: ------------------------------ Fix Version/s: 0.13.0 (was: 0.12.0) > Remove unnecessary HBase-related dependencies from bundles if there is any > -------------------------------------------------------------------------- > > Key: HUDI-3674 > URL: https://issues.apache.org/jira/browse/HUDI-3674 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Ethan Guo > Priority: Blocker > Fix For: 0.13.0 > > > [https://github.com/apache/hudi/pull/5004/files] A follow-up of HUDI-1180. > vinothchandar 6 days ago Member > is the absolute minimal set of artifacts needed > > alexeykudinkin 6 days ago Contributor > Need not to take as part of this PR, but i actually want to suggest one step > further: > Since we're mostly reliant on HFile and the classes it's dependent on, can we > try to filter out packages that won't break it? > My hunch is that we can greatly reduce 16Mb overhead number by just cleaning > up all the stuff that is bolted onto HBase. > 👍 > 1 > > codope 4 days ago Member > That's a good idea. In fact, i've tried out but it's a very manual > time-consuming process to verify. I gave up after a few failures. And keep > future upgrades in mind. But, i would be very happy to reduce the bundle size > in any way we can and we should take another stab at this idea in future. > > yihua 4 days ago Author Member > Yeah, that's good to have. The problem as @codope pointed out is that such a > process is time-consuming. For now, what I can say is that the newly added > artifacts are necessary, since I started with the old pom, incrementally > added new artifacts as I saw NoClassDef exception until every test can pass. > One thing we may try later is to add and trim hudi-hbase-shaded by excluding > transitives and only depend on hudi-hbase-shaded here. > > alexeykudinkin 3 days ago Contributor > Yeah, it's tedious manual process for sure, but i think we can do it pretty > fast: we just look at the packages imported by HFile, then look at files that > are imported by HFile, and so on. Then after that we can run the tests if we > collected it properly or not. > The hypothesis is that this set should be reasonably bounded (why wouldn't > it?) so this iteration should be pretty fast. > Can you please create a task and link it here to follow-up? -- This message was sent by Atlassian Jira (v8.20.10#820010)