[ 
https://issues.apache.org/jira/browse/HUDI-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-3674:
----------------------------
    Fix Version/s: 0.14.0
                       (was: 0.13.1)

> Remove unnecessary HBase-related dependencies from bundles if there is any
> --------------------------------------------------------------------------
>
>                 Key: HUDI-3674
>                 URL: https://issues.apache.org/jira/browse/HUDI-3674
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: dependencies
>            Reporter: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.14.0
>
>
> [https://github.com/apache/hudi/pull/5004/files] A follow-up of HUDI-1180. 
> vinothchandar 6 days ago Member
> is the absolute minimal set of artifacts needed
>  
>  alexeykudinkin 6 days ago Contributor
> Need not to take as part of this PR, but i actually want to suggest one step 
> further:
> Since we're mostly reliant on HFile and the classes it's dependent on, can we 
> try to filter out packages that won't break it?
> My hunch is that we can greatly reduce 16Mb overhead number by just cleaning 
> up all the stuff that is bolted onto HBase.
> 👍
> 1
>  
>  codope 4 days ago Member
> That's a good idea. In fact, i've tried out but it's a very manual 
> time-consuming process to verify. I gave up after a few failures. And keep 
> future upgrades in mind. But, i would be very happy to reduce the bundle size 
> in any way we can and we should take another stab at this idea in future.
>  
>  yihua 4 days ago Author Member
> Yeah, that's good to have. The problem as @codope pointed out is that such a 
> process is time-consuming. For now, what I can say is that the newly added 
> artifacts are necessary, since I started with the old pom, incrementally 
> added new artifacts as I saw NoClassDef exception until every test can pass.
> One thing we may try later is to add and trim hudi-hbase-shaded by excluding 
> transitives and only depend on hudi-hbase-shaded here.
>  
>  alexeykudinkin 3 days ago Contributor
> Yeah, it's tedious manual process for sure, but i think we can do it pretty 
> fast: we just look at the packages imported by HFile, then look at files that 
> are imported by HFile, and so on. Then after that we can run the tests if we 
> collected it properly or not.
> The hypothesis is that this set should be reasonably bounded (why wouldn't 
> it?) so this iteration should be pretty fast.
> Can you please create a task and link it here to follow-up?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to