Hey team,

Given the recent activity under HIVE-29590 [1], I would like to revive this 
discussion about creating a dedicated Git repository for ci/test/dataset 
related stuff. Our lack of reactivity on this topic makes our whole test/ci 
infrastructure depend on personal/user specific repositories. This is not 
aligned with the ASF way and and makes us depend too much on individual 
users/contributors leading to a single point of failure.

The lack of dedicated repo blocked various useful contributions in the past 
(e.g., [2]) that became stale and eventually were closed without action.

Summing up I have two questions:
1. Are there objections in creating a new Git repo under the apache/hive 
namespace?
2. What name would you prefer?
* https://github.com/apache/hive-datasets
* https://github.com/apache/hive-ci
* https://github.com/apache/hive-infra

At the moment that main things that we want to put there is everything under 
HIVE-29590, HIVE-26830, and HIVE-28339.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-29590
[2] https://lists.apache.org/thread/4qb3z3yx9ovnxbsr4b02ohz6twlkrlx9

On 2025/10/24 12:22:12 Stamatis Zampetakis wrote:
> Thanks for starting the discussion Thomas!
> 
> In fact, I would go one step further and instead of storing the
> dumps/dockerfiles in personal git repositories such as [1] to create
> an apache git repo for that purpose:
> https://github.com/apache/hive-datasets
> I know that git is not the perfect place to store large files but I
> feel that moving from a personal managed repo to a community managed
> repo is something worth doing.
> Subsequently, having also a corresponding namespace in Docker Hub
> makes sense to me.
> 
> Best,
> Stamatis
> 
> [1] https://github.com/zabetak/hive-postgres-metastore
> 
> On Fri, Oct 24, 2025 at 12:10 PM Thomas Rebele <[email protected]> 
> wrote:
> >
> > Hi Hive community,
> >
> > I'm working on creating a docker image for a TPC-DS 30TB metastore with 
> > histogram statistics 
> > [HIVE-26830](https://issues.apache.org/jira/browse/HIVE-26830).
> >
> > The previous TPC-DS metastore docker images have been published at 
> > https://hub.docker.com/r/zabetak/postgres-tpcds-metastore. Stamatis 
> > suggested to create a repo under https://hub.docker.com/u/apache, maybe 
> > called "hive-dataset".
> >
> > What do you think about this approach?
> >
> > Best regards,
> > Thomas Rebele
> >
> 

Reply via email to