Hi Butao, Allowing anonymous access to CI/Jenkins etc., is considered a security vulnerability and it was disabled on purpose [1]. I guess the best we can do at this stage to help new contributors is improve our contribution guidelines [2] or a similar page.
The CI limitations are due to the underlying resources. Among the main reasons that I am proposing this event is because I want people to understand how CI works and where these limitations come from. Running all Hive tests in a single machine sequentially requires more than 24hrs. The GitHub actions also have limitations regarding resource usage for Apache projects[3]. If someone wants to explore that option they can definitely do it but it will not be trivial especially if we want to keep both in place. Best, Stamatis [1] https://lists.apache.org/thread/c697bb63xxsd3g7zws4n31z59gtbgt4t [2] https://hive.apache.org/docs/latest/howtocontribute_27362107/ [3] https://infra.apache.org/github-actions-policy.html On Thu, Jul 10, 2025 at 9:37 AM Butao Zhang <zhangbu...@apache.org> wrote: > Hi Stamatis, > > Thank you for initiating this event. I know you’ve put a lot of effort > into maintaining the CI infrastructure, and I really appreciate that. > I might not be able to attend this event, but I’d like to share some > thoughts, especially since I’ve noticed some pain points in the CI while > preparing for the 4.1 release. Here are some points for discussion: > > 1) Issue with requiring login to view CI job details > Many first-time contributors don’t know how to log in to the CI system to > check error messages [1]. Could we make the CI interface anonymously > accessible to ensure a better development experience for new users? > > 2) Limited CI concurrency and slow rescheduling after cancellation > Many users frequently modify and submit code. When a new commit is pushed, > the CI is canceled and then rescheduled, but the rescheduling process seems > to take a long time. As a result, many users (especially new contributors) > often close and reopen their PRs to retrigger the CI. Some even create a > new PR altogether. In short, the rescheduling delay is too long, leaving > users waiting for extended periods without seeing CI progress, which > significantly impacts the development experience. > Additionally, when many PRs are submitted concurrently, the scheduling and > execution time of the CI seem to increase dramatically. If a user makes a > code change and wants to check the CI results, they might have to wait half > a day or even a full day. This long waiting period is frustrating for > developers, as they may have other work commitments and can’t afford to > wait indefinitely. In some cases, the PR might even be forgotten. > > 3) Could we consider using GitHub Actions resources for CI? > I understand that the CI concurrency limit might be due to limited > resources, and the provider (Cloudera) needs to impose restrictions. So, > could we explore using GitHub Actions as an additional resource? I’ve > noticed that Apache Spark uses GitHub Actions [2], and they’re even > considering it for release version [3] With GitHub Actions, Spark’s CI > workflow seems to run much faster. Could we evaluate using GitHub Actions > as a supplementary resource to alleviate the current CI resource > constraints? > Just to clarify—I haven’t deeply researched GitHub Actions yet, but since > many Apache projects are adopting it, I think it’s worth considering as a > potential part of our future CI infrastructure. > > [1] https://github.com/apache/hive/pull/5547#issuecomment-2480098937 > [2] https://github.com/apache/spark/pull/32092 > [3] https://issues.apache.org/jira/browse/SPARK-52176 > > Thanks, > Butao Zhang > ---- Replied Message ---- > From Stamatis Zampetakis<zabe...@gmail.com> <zabe...@gmail.com> > Date 7/9/2025 18:28 > To dev<dev@hive.apache.org> <dev@hive.apache.org> > Subject [EVENT] Apache Hive CI Introduction & QA > > Hi everyone, > > The Hive CI and precommit infrastructure is very important part of our > daily life as Hive contributors and has great impact on productivity > and overall contributor experience. > I think it would be very useful for everyone contributing to Hive to > get a better understanding of how the CI works and what lies > underneath. > > For this purpose, I would like to propose a virtual event on July 23, > 2025 at 17:00 CEST [1] in an attempt to facilitate contributions and > troubleshooting around this area. I know that the time is not > convenient for everyone globally (and it is impossible to find one > slot that works for all) but we could possibly shift the date if that > could help in getting greater attendance. > > The format that I had in mind is a small introductory presentation > followed by casual and informal QA. I created a google doc [1] to > gather questions that people may have in this area. Feel free to > append your questions there so that we have a tentative agenda of what > to cover during the event. > > I can lead the presentation/discussion during the event and I would be > more happy to co-present with anyone else willing to help. There are > people with probably better understanding than myself in this area so > it would be great to have them onboard. > > How do people feel about the idea? Is there interest in attending such an > event? > > Best, > Stamatis > > [1] > https://www.timeanddate.com/worldclock/fixedtime.html?msg=Apache+Hive+CI+Introduction+%26+QA&iso=20250723T17&p1=195&ah=1&am=30 > [2] > https://docs.google.com/document/d/1P5H5N2QUSIwM83Yz00lzItcAQgMWQ5qpjOhoSAs9Tbw/edit?usp=sharing > >