Hey team, Given the recent activity under HIVE-29590 [1], I would like to revive this discussion about creating a dedicated Git repository for ci/test/dataset related stuff. Our lack of reactivity on this topic makes our whole test/ci infrastructure depend on personal/user specific repositories. This is not aligned with the ASF way and and makes us depend too much on individual users/contributors leading to a single point of failure.
The lack of dedicated repo blocked various useful contributions in the past (e.g., [2]) that became stale and eventually were closed without action. Summing up I have two questions: 1. Are there objections in creating a new Git repo under the apache/hive namespace? 2. What name would you prefer? * https://github.com/apache/hive-datasets * https://github.com/apache/hive-ci * https://github.com/apache/hive-infra At the moment that main things that we want to put there is everything under HIVE-29590, HIVE-26830, and HIVE-28339. Best, Stamatis [1] https://issues.apache.org/jira/browse/HIVE-29590 [2] https://lists.apache.org/thread/4qb3z3yx9ovnxbsr4b02ohz6twlkrlx9 On 2025/10/24 12:22:12 Stamatis Zampetakis wrote: > Thanks for starting the discussion Thomas! > > In fact, I would go one step further and instead of storing the > dumps/dockerfiles in personal git repositories such as [1] to create > an apache git repo for that purpose: > https://github.com/apache/hive-datasets > I know that git is not the perfect place to store large files but I > feel that moving from a personal managed repo to a community managed > repo is something worth doing. > Subsequently, having also a corresponding namespace in Docker Hub > makes sense to me. > > Best, > Stamatis > > [1] https://github.com/zabetak/hive-postgres-metastore > > On Fri, Oct 24, 2025 at 12:10 PM Thomas Rebele <[email protected]> > wrote: > > > > Hi Hive community, > > > > I'm working on creating a docker image for a TPC-DS 30TB metastore with > > histogram statistics > > [HIVE-26830](https://issues.apache.org/jira/browse/HIVE-26830). > > > > The previous TPC-DS metastore docker images have been published at > > https://hub.docker.com/r/zabetak/postgres-tpcds-metastore. Stamatis > > suggested to create a repo under https://hub.docker.com/u/apache, maybe > > called "hive-dataset". > > > > What do you think about this approach? > > > > Best regards, > > Thomas Rebele > > >
