Hey team!

Thanks, Stamatis, for initiating this thread. I hope we can go further this
time than last time.

1. Are there objections in creating a new Git repo under the apache/hive
> namespace?

2. What name would you prefer?


I can answer both at the same time. I prefer maintaining infra code in the
hive repo, especially as long as it is no more than a few files.
This applies to what you were referring to as hive-ci
<https://github.com/apache/hive-ci>. As I mentioned on HIVE-29591
<https://issues.apache.org/jira/browse/HIVE-29591>, hive ci code basically
is nothing more than a Dockerfile, considering that originally, hive-dev-box
<https://github.com/apache/hive-dev-box> covered a way more than we
actually need. I'm ready to provide a vanilla precommit image for this
purpose.

Regarding: hive-infra <https://github.com/apache/hive-infra>, hive-datasets
<https://github.com/apache/hive-datasets>, I don't have a strong opinion.

I think *hive-infra* is also better kept in the hive repository. The only
thing we might want to take care of is not triggering a full pre-commit
each time infra code is pushed to the repo, because it won't test anything
(since infra code is not deployed to the GCP project in the PR scope).

Regarding *hive-datasets*: I agree that huge raw data or dumps cannot be
part of the Hive repository, so a separate apache/hive-datasests would
suffice, we need to just mention it in our Docker README, and it's done :)
https://github.com/apache/hive/blob/master/packaging/src/docker/README.md


Regards,
Laszlo Bodor



On Mon, 4 May 2026 at 09:32, Stamatis Zampetakis <[email protected]> wrote:

> Hey team,
>
> Given the recent activity under HIVE-29590 [1], I would like to revive
> this discussion about creating a dedicated Git repository for
> ci/test/dataset related stuff. Our lack of reactivity on this topic makes
> our whole test/ci infrastructure depend on personal/user specific
> repositories. This is not aligned with the ASF way and and makes us depend
> too much on individual users/contributors leading to a single point of
> failure.
>
> The lack of dedicated repo blocked various useful contributions in the
> past (e.g., [2]) that became stale and eventually were closed without
> action.
>
> Summing up I have two questions:
> 1. Are there objections in creating a new Git repo under the apache/hive
> namespace?
> 2. What name would you prefer?
> * https://github.com/apache/hive-datasets
> * https://github.com/apache/hive-ci
> * https://github.com/apache/hive-infra
>
> At the moment that main things that we want to put there is everything
> under HIVE-29590, HIVE-26830, and HIVE-28339.
>
> Best,
> Stamatis
>
> [1] https://issues.apache.org/jira/browse/HIVE-29590
> [2] https://lists.apache.org/thread/4qb3z3yx9ovnxbsr4b02ohz6twlkrlx9
>
> On 2025/10/24 12:22:12 Stamatis Zampetakis wrote:
> > Thanks for starting the discussion Thomas!
> >
> > In fact, I would go one step further and instead of storing the
> > dumps/dockerfiles in personal git repositories such as [1] to create
> > an apache git repo for that purpose:
> > https://github.com/apache/hive-datasets
> > I know that git is not the perfect place to store large files but I
> > feel that moving from a personal managed repo to a community managed
> > repo is something worth doing.
> > Subsequently, having also a corresponding namespace in Docker Hub
> > makes sense to me.
> >
> > Best,
> > Stamatis
> >
> > [1] https://github.com/zabetak/hive-postgres-metastore
> >
> > On Fri, Oct 24, 2025 at 12:10 PM Thomas Rebele <[email protected]>
> wrote:
> > >
> > > Hi Hive community,
> > >
> > > I'm working on creating a docker image for a TPC-DS 30TB metastore
> with histogram statistics [HIVE-26830](
> https://issues.apache.org/jira/browse/HIVE-26830).
> > >
> > > The previous TPC-DS metastore docker images have been published at
> https://hub.docker.com/r/zabetak/postgres-tpcds-metastore. Stamatis
> suggested to create a repo under https://hub.docker.com/u/apache, maybe
> called "hive-dataset".
> > >
> > > What do you think about this approach?
> > >
> > > Best regards,
> > > Thomas Rebele
> > >
> >
>

Reply via email to