Hi Team,
I was there when we decided to move the Hive development from the Iceberg
repo to the Hive repo. Apart from the previously mentioned reasons there
was another big blocker which prevented us moving forward with the
development in the Iceberg repo - the lack of Hive releases. We needed
several changes in the Hive repo to continue to improve upon the Iceberg
integration, and the Hive main branch was not in a shape to allow for a
release.
AFAIK this has been changed, I have seen several Hive 4.0 beta releases,
and Hive 4.0.0 is finally available for the users. If I understand Denys
correctly, there are also plans to continue releasing new versions.

I think it is important to have a few engine implementations closely
coupled with the Iceberg core. Currently we have Spark (main engine), Flink
(other streaming engine) and Hive (other SQL engine) in the Iceberg repo.
This allows us to immediately notice issues which could cause problems with
integrations.

Since the previous blockers are removed, my preferred solution would be to
keep Hive integration (and update it to the current version) in the Iceberg
repo to provide cross engine compatibility testing. This is a viable
option, only if Hive Java compatibility doesn't block the progress of the
Iceberg core development.

Also it would be nice to have other reviewers on Flink/Hive PRs, as I have
limited bandwidth, and cannot keep up with the incoming changes :(

Thanks,
Peter

Denys Kuzmenko <dkuzme...@apache.org> ezt írta (időpont: 2024. júl. 18.,
Cs, 11:29):

> Hi All,
>
> Let me chime in here and add some Hive perspective on that.
>
> The only reason Iceberg support has moved into Hive, is that we didn't get
> enough support from the existing community. Our PRs got stuck pending
> review and even if we got some +1 those were not binding. Don't take me
> wrong, it is what it is, I am not blaming anyone, just trying to shed some
> light on things from our side.
>
> To progress, the only option for us was to move the bits in Hive hoping we
> could contribute them back later: Hive 4 integration, MV support, advanced
> column statistics, vectorization, caching, etc.
> Since we don't have Hive representatives in the iceberg community it
> became painful and frustrating.
> We are eager to contribute, but the lack of support stopped us from even
> considering that.
>
> Our original goal was(is) to move iceberg-catalog and handler modules from
> Hive back to iceberg, but we would need some help here. We are actively
> working on jdk17 migration, hoping to keep at least the existing
> hive-connector in iceberg repo.
>
> If hive-iceberg modules are dropped, Hive-3 won't have any upgrade path,
> and based on stats it remains the most actively used version.
>
> Kind regards,
> Denys
>

Reply via email to