Hey team! Thanks, Stamatis, for initiating this thread. I hope we can go further this time than last time.
1. Are there objections in creating a new Git repo under the apache/hive > namespace? 2. What name would you prefer? I can answer both at the same time. I prefer maintaining infra code in the hive repo, especially as long as it is no more than a few files. This applies to what you were referring to as hive-ci <https://github.com/apache/hive-ci>. As I mentioned on HIVE-29591 <https://issues.apache.org/jira/browse/HIVE-29591>, hive ci code basically is nothing more than a Dockerfile, considering that originally, hive-dev-box <https://github.com/apache/hive-dev-box> covered a way more than we actually need. I'm ready to provide a vanilla precommit image for this purpose. Regarding: hive-infra <https://github.com/apache/hive-infra>, hive-datasets <https://github.com/apache/hive-datasets>, I don't have a strong opinion. I think *hive-infra* is also better kept in the hive repository. The only thing we might want to take care of is not triggering a full pre-commit each time infra code is pushed to the repo, because it won't test anything (since infra code is not deployed to the GCP project in the PR scope). Regarding *hive-datasets*: I agree that huge raw data or dumps cannot be part of the Hive repository, so a separate apache/hive-datasests would suffice, we need to just mention it in our Docker README, and it's done :) https://github.com/apache/hive/blob/master/packaging/src/docker/README.md Regards, Laszlo Bodor On Mon, 4 May 2026 at 09:32, Stamatis Zampetakis <[email protected]> wrote: > Hey team, > > Given the recent activity under HIVE-29590 [1], I would like to revive > this discussion about creating a dedicated Git repository for > ci/test/dataset related stuff. Our lack of reactivity on this topic makes > our whole test/ci infrastructure depend on personal/user specific > repositories. This is not aligned with the ASF way and and makes us depend > too much on individual users/contributors leading to a single point of > failure. > > The lack of dedicated repo blocked various useful contributions in the > past (e.g., [2]) that became stale and eventually were closed without > action. > > Summing up I have two questions: > 1. Are there objections in creating a new Git repo under the apache/hive > namespace? > 2. What name would you prefer? > * https://github.com/apache/hive-datasets > * https://github.com/apache/hive-ci > * https://github.com/apache/hive-infra > > At the moment that main things that we want to put there is everything > under HIVE-29590, HIVE-26830, and HIVE-28339. > > Best, > Stamatis > > [1] https://issues.apache.org/jira/browse/HIVE-29590 > [2] https://lists.apache.org/thread/4qb3z3yx9ovnxbsr4b02ohz6twlkrlx9 > > On 2025/10/24 12:22:12 Stamatis Zampetakis wrote: > > Thanks for starting the discussion Thomas! > > > > In fact, I would go one step further and instead of storing the > > dumps/dockerfiles in personal git repositories such as [1] to create > > an apache git repo for that purpose: > > https://github.com/apache/hive-datasets > > I know that git is not the perfect place to store large files but I > > feel that moving from a personal managed repo to a community managed > > repo is something worth doing. > > Subsequently, having also a corresponding namespace in Docker Hub > > makes sense to me. > > > > Best, > > Stamatis > > > > [1] https://github.com/zabetak/hive-postgres-metastore > > > > On Fri, Oct 24, 2025 at 12:10 PM Thomas Rebele <[email protected]> > wrote: > > > > > > Hi Hive community, > > > > > > I'm working on creating a docker image for a TPC-DS 30TB metastore > with histogram statistics [HIVE-26830]( > https://issues.apache.org/jira/browse/HIVE-26830). > > > > > > The previous TPC-DS metastore docker images have been published at > https://hub.docker.com/r/zabetak/postgres-tpcds-metastore. Stamatis > suggested to create a repo under https://hub.docker.com/u/apache, maybe > called "hive-dataset". > > > > > > What do you think about this approach? > > > > > > Best regards, > > > Thomas Rebele > > > > > >
