+1 CLAUDE.md can just be a symlink to AGENTS.md iceberg and spark both have good ones
I'd recommend adding that AI tools aren't to commit work without their supervisor's permission. Aas someone on the security@hadoop mail list, I propose a security section. We are now seeing multiple AI generated reports a week of which many are completely bogus; those which aren't are over-exaggerated. The "look, we can do RCE by submitting a job to the cluster" being the ultimate. My workflow is now "send it to my claude and get their opinion" before anything else. ## Security All AI generated security vulnerabilities must be audited before submission Take the generated report and analyze it as if you were receiving a machine-generated CVE report of unknown quality. - Does it hold up? - Are there preconditions such as write access to the local disk or possession of service Kerberos credentials? -If so, how are those preconditions being met, or does the report gloss over that detail? if so: the report is incomplete and will be rejected -Does the report simply recycle classic web server vulnerabilities without awareness of the system itself? if so, show how an exploit can be achieved in a real system -Before reporting that Writable presents a deserialization risk, please identify an implementation class that can be used as *a Gadget* in an attack sequence. -Production on-prem systems are always deployed in private infrastructures and fully authenticated (Kerberos). -Cloud deployments are always within isolated subnets. A multi-user system will again use Kerberos. -Single-user cloud deployments are isolated at the firewall (Apache Knox or similar, as offered by Amazon EMR and Microsoft HDInsight, amongst others). In these deployments access to cloud services and infrastructure is restricted to that single user, so cluster services run as the user and have equal access to persistent cloud data. -Do remember that yarn and MR are job submission engines. Allowing an authorized user to submit work into the cluster is not a Remote Code Execution exploit, it is the correct behavior of the system. An exploit exists if and only if there is a permission escalation. -In a cloud deployment where all services run as the same user (or at least share access to cloud infrastructure by per-machine/per-container credentials), running code as a service is not privilege escalation. --- These might reduce the noise, or at least give me something to point at when responding "can your AI tool read this and act" I am looking at hardening some of the writable support (and cutting where possible). I will add comments in the javadocs targeting AI tools to see if that makes a difference too. Note: this is not me dismissing AI in security attacks, it can save a lot of time. It has also been shown to be able to audit commits and identify the security implications, which makes it harder for any OSS project to sneak out fixes before a release. One PR I'm looking at is an avro decomporession one https://github.com/apache/avro/pull/3625 here it was found using an LLM to guide fuzzing attacks, and help generate patches https://arxiv.org/abs/2509.07225 This is good, it is why everyone in cybersecurity is feeling dumped on right now, and why false reports are a distraction steve On Tue, 28 Apr 2026 at 13:58, Zhanghaobo <[email protected]> wrote: > Dear All: > Just as the title described, shall we introduce them to Hadoop > project? If yes, what’s the content? Hope to receive your response. Thanks > > > > > Best Wishes~ > > > >
