Hi Butao,

Allowing anonymous access to CI/Jenkins etc., is considered a security
vulnerability and it was disabled on purpose [1]. I guess the best we can
do at this stage to help new contributors is improve our contribution
guidelines [2] or a similar page.

The CI limitations are due to the underlying resources. Among the main
reasons that I am proposing this event is because I want people to
understand how CI works and where these limitations come from.

Running all Hive tests in a single machine sequentially requires more than
24hrs. The GitHub actions also have limitations regarding resource usage
for Apache projects[3]. If someone wants to explore that option they can
definitely do it but it will not be trivial especially if we want to keep
both in place.

Best,
Stamatis

[1] https://lists.apache.org/thread/c697bb63xxsd3g7zws4n31z59gtbgt4t
[2] https://hive.apache.org/docs/latest/howtocontribute_27362107/
[3] https://infra.apache.org/github-actions-policy.html

On Thu, Jul 10, 2025 at 9:37 AM Butao Zhang <zhangbu...@apache.org> wrote:

> Hi Stamatis,
>
> Thank you for initiating this event. I know you’ve put a lot of effort
> into maintaining the CI infrastructure, and I really appreciate that.
> I might not be able to attend this event, but I’d like to share some
> thoughts, especially since I’ve noticed some pain points in the CI while
> preparing for the 4.1 release. Here are some points for discussion:
>
> 1) Issue with requiring login to view CI job details
> Many first-time contributors don’t know how to log in to the CI system to
> check error messages [1]. Could we make the CI interface anonymously
> accessible to ensure a better development experience for new users?
>
> 2) Limited CI concurrency and slow rescheduling after cancellation
> Many users frequently modify and submit code. When a new commit is pushed,
> the CI is canceled and then rescheduled, but the rescheduling process seems
> to take a long time. As a result, many users (especially new contributors)
> often close and reopen their PRs to retrigger the CI. Some even create a
> new PR altogether. In short, the rescheduling delay is too long, leaving
> users waiting for extended periods without seeing CI progress, which
> significantly impacts the development experience.
> Additionally, when many PRs are submitted concurrently, the scheduling and
> execution time of the CI seem to increase dramatically. If a user makes a
> code change and wants to check the CI results, they might have to wait half
> a day or even a full day. This long waiting period is frustrating for
> developers, as they may have other work commitments and can’t afford to
> wait indefinitely. In some cases, the PR might even be forgotten.
>
> 3) Could we consider using GitHub Actions resources for CI?
> I understand that the CI concurrency limit might be due to limited
> resources, and the provider (Cloudera) needs to impose restrictions. So,
> could we explore using GitHub Actions as an additional resource? I’ve
> noticed that Apache Spark uses GitHub Actions [2], and they’re even
> considering it for release version [3] With GitHub Actions, Spark’s CI
> workflow seems to run much faster. Could we evaluate using GitHub Actions
> as a supplementary resource to alleviate the current CI resource
> constraints?
> Just to clarify—I haven’t deeply researched GitHub Actions yet, but since
> many Apache projects are adopting it, I think it’s worth considering as a
> potential part of our future CI infrastructure.
>
> [1] https://github.com/apache/hive/pull/5547#issuecomment-2480098937
> [2] https://github.com/apache/spark/pull/32092
> [3] https://issues.apache.org/jira/browse/SPARK-52176
>
> Thanks,
> Butao Zhang
> ---- Replied Message ----
> From Stamatis Zampetakis<zabe...@gmail.com> <zabe...@gmail.com>
> Date 7/9/2025 18:28
> To dev<dev@hive.apache.org> <dev@hive.apache.org>
> Subject [EVENT] Apache Hive CI Introduction & QA
>
> Hi everyone,
>
> The Hive CI and precommit infrastructure is very important part of our
> daily life as Hive contributors and has great impact on productivity
> and overall contributor experience.
> I think it would be very useful for everyone contributing to Hive to
> get a better understanding of how the CI works and what lies
> underneath.
>
> For this purpose, I would like to propose a virtual event on July 23,
> 2025 at 17:00 CEST [1] in an attempt to facilitate contributions and
> troubleshooting around this area. I know that the time is not
> convenient for everyone globally (and it is impossible to find one
> slot that works for all) but we could possibly shift the date if that
> could help in getting greater attendance.
>
> The format that I had in mind is a small introductory presentation
> followed by casual and informal QA. I created a google doc [1] to
> gather questions that people may have in this area. Feel free to
> append your questions there so that we have a tentative agenda of what
> to cover during the event.
>
> I can lead the presentation/discussion during the event and I would be
> more happy to co-present with anyone else willing to help. There are
> people with probably better understanding than myself in this area so
> it would be great to have them onboard.
>
> How do people feel about the idea? Is there interest in attending such an 
> event?
>
> Best,
> Stamatis
>
> [1] 
> https://www.timeanddate.com/worldclock/fixedtime.html?msg=Apache+Hive+CI+Introduction+%26+QA&iso=20250723T17&p1=195&ah=1&am=30
> [2] 
> https://docs.google.com/document/d/1P5H5N2QUSIwM83Yz00lzItcAQgMWQ5qpjOhoSAs9Tbw/edit?usp=sharing
>
>

Reply via email to