The "malicious release manager" is an interesting attack, one that the ASF "we trust the community" doesn't defend against. The risk here is that someone generates a set of malicious artifacts (maybe just publishes them to maven), while the source code is safe.
To help defend against this, here's some code which will do a bytecode-level diff between JARs, ignoring debugging stuff, generated metadata etc. Enjoy https://github.com/steveloughran/auditor This wouldn't defend against someone adding a malicious dependency to the artifacts they publish on maven, so really the tool should audit that too. But you can at least check out a spark branch, build the binaries and then audit the RC's artifacts against them to look for tangible variations. On Wed, 22 Apr 2026 at 23:53, Tian Gao via dev <[email protected]> wrote: > So are you suggesting that we don't enforce this 1-week buffer for all > Apache projects? I agree that a legitimate Apache project release is > well-vetted and generally safe, but there could be situations where a > release is maliciously executed by stealing identities of people who have > access to make releases - that's where many supply chain attacks occur. > Moreover, it would be more difficult to enforce this (whether for LLM or > for human) to treat Apache projects differently. Also I think a 7-day delay > to accept an Apache project release is not a big deal for us. > > Regarding the Spark-related projects, we don't need to enforce the policy > for them. > > I think for supply chain attacks, we are defending ourselves not only > against package developers, but more importantly, we are defending > ourselves against potential loopholes in the release process. We must > assume that there could be something wrong during the release process of > any project. > > Tian > > On Wed, Apr 22, 2026 at 3:32 PM Dongjoon Hyun <[email protected]> wrote: > >> To be clear, this discussion should be applied to Apache Spark main >> repository only. >> >> https://github.com/apache/spark >> >> It's because subprojects need to consume Apache Spark releases ASAP. For >> example, Apache Spark K8s Operator will upgrade its dependency on the same >> day of Apache Spark release because we trust our release process (including >> vote). >> >> In addition, probably, we may want to extend our exceptions to include >> all ASF project releases (Apache Hadoop, Avro, Parquet, ORC, Kafka, ...) >> which have established community vote process. >> >> Dongjoon. >> >> On 2026/04/22 22:21:41 Dongjoon Hyun wrote: >> > Thank you for the suggestion. >> > >> > +1 for the general predefined (1-week) grace-period policy sounds good >> to me. >> > >> > For the exception cases, I believe we can let the PMC members make the >> final decision on merge timing like the PMC members decides the `Blocker` >> level priority of JIRA issues already. >> > >> > If we have a voted policy, it would be great if we can add the policy >> to AGENTS.md explicitly to apply the policy from the PR steps. >> > >> > Best, >> > Dongjoon. >> > >> > On 2026/04/22 20:47:24 Steve Loughran wrote: >> > > 7 days is long enough to catch most (all?) malicious attacks. >> > > >> > > Regarding developers, there's a strong case to be made for only doing >> > > builds and especially tests in isolated containers, even though >> artifacts >> > > will leak across shared containers through a shared maven repo. It >> still >> > > limits the damage malicious binaries can do. >> > > >> > > On Tue, 21 Apr 2026 at 23:58, Jungtaek Lim < >> [email protected]> >> > > wrote: >> > > >> > > > +1 >> > > > >> > > > We tend to consider that merging to master branch gives some time >> to bake >> > > > before releasing. But we (Spark devs) are people who build Spark and >> > > > run some tests against the master branch almost day to day. For us, >> there >> > > > is literally no time for these library upgrades to be baked - we are >> > > > exposed to any kind of potential CVE from these library upgrades. >> > > > >> > > > It's arguable whether we should stay up to date with the recent >> release >> > > > version for dependencies, but that'd probably be uneasy to make >> consensus; >> > > > there is a clear trade-off. The current proposal sounds to me as a >> good >> > > > compromise - IMHO delaying by 2 weeks (14 days) seems reasonable, >> but >> > > > strict 1 week (7 days) is better than nothing if anyone is >> concerned 2 >> > > > weeks is too long. >> > > > >> > > > On Tue, Apr 21, 2026 at 9:45 PM Szehon Ho <[email protected]> >> wrote: >> > > > >> > > >> +1 make sense to me as well. We should of course be fast for >> security >> > > >> upgrades, but make sense to avoid such eager upgrades for the rest >> of >> > > >> the hundreds of Spark dependencies, due to the increased supply >> chain >> > > >> attack risks in the ecosystem. >> > > >> >> > > >> Thanks >> > > >> Szehon >> > > >> >> > > >> On Tue, Apr 21, 2026 at 3:32 AM Wenchen Fan <[email protected]> >> wrote: >> > > >> >> > > >>> Thanks for starting this discussion! I did a data analysis a >> while ago >> > > >>> but didn't have time to act on it. The analysis shows: >> > > >>> >> > > >>> *58* maven dep upgrades in the last 3 months. >> > > >>> *46%* (27/58) within 7 days of release >> > > >>> ≤7d : 27 / 58 (47%) >> > > >>> 8d–30d : 12 / 58 (21%) >> > > >>> >30d : 19 / 58 (32%) >> > > >>> >> > > >>> You can find the raw data in the attached file. This does look a >> bit >> > > >>> aggressive. I build Spark locally everyday, and I believe I'm not >> the only >> > > >>> one. Having a couple of weeks as the buffer time is a good idea >> to protect >> > > >>> developers like me from potential supply chain attacks. >> > > >>> >> > > >>> On Tue, Apr 21, 2026 at 6:24 AM Hyukjin Kwon < >> [email protected]> >> > > >>> wrote: >> > > >>> >> > > >>>> SGTM I think it's good practice to give a couple of weeks before >> the >> > > >>>> upgrade >> > > >>>> >> > > >>>> On Tue, 21 Apr 2026 at 07:13, Tian Gao via dev < >> [email protected]> >> > > >>>> wrote: >> > > >>>> >> > > >>>>> Hi, I want to start a discussion about our dependency upgrade >> policy >> > > >>>>> for active development. >> > > >>>>> >> > > >>>>> Our current dependency upgrade (mostly for Java, but Python >> should be >> > > >>>>> included too) is a bit spontaneous. People find that a >> dependency has a new >> > > >>>>> version available and we just do the upgrade. >> > > >>>>> >> > > >>>>> This raises concerns about potential supply chain attacks. We >> already >> > > >>>>> established a few sets of rules (including pinning the github >> action >> > > >>>>> versions) to avoid the supply chain attack, but manually >> upgrading the >> > > >>>>> dependency version too eagerly could also be risky. >> > > >>>>> >> > > >>>>> It normally takes time for a bad release to be recognized, so I >> think >> > > >>>>> we should set a buffer time before upgrading to the latest >> version. For >> > > >>>>> example, we can wait a week or two after the latest release >> before we set >> > > >>>>> our development dependency to it. This could reduce the >> possibility of >> > > >>>>> being impacted by malicious releases, or just give them enough >> time to fix >> > > >>>>> their own severe bugs. >> > > >>>>> >> > > >>>>> The cost for this policy is very low - it barely impacts us if >> we >> > > >>>>> can’t use the “latest” version of dependencies. >> > > >>>>> >> > > >>>>> Of course, there should be exceptions when dependency upgrades >> include >> > > >>>>> security fixes for known vulnerabilities; we should upgrade as >> fast as >> > > >>>>> possible. >> > > >>>>> >> > > >>>>> Tian >> > > >>>>> >> > > >>>> >> > > >>> >> --------------------------------------------------------------------- >> > > >>> To unsubscribe e-mail: [email protected] >> > > >> >> > > >> >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: [email protected] >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: [email protected] >> >>
