ppkarwasz commented on PR #427: URL: https://github.com/apache/commons-codec/pull/427#issuecomment-4141870534
> I'm not sure what Commons component the above should belong. I think you mean it to belong in Codec but I can't tell what's supposed to be an interface vs. implementation. Would this PR be reimplemented in terms of the above? Or would this PR provide the implementation for the above? > > The name TreeBuilder is confusing to me without Javadoc. It's not building a tree, it's building a byte array. Do you mean it processes a directory tree? I can't tell. I am not sure which component this belongs either. To add more context: I am trying to create SLSA Provenance attestations for Java builds. For such attestations to have some value, they need to record some invariants of the build toolchain. When you build on your local machine, the most important build data is what you usually add to the vote e-mail: the Maven and JDK version. Maven and JDK are already unpacked on your build machine, so it's not possible to get a classical hash of their distribution, but it is possible to make a “gitTree” hash, which is also among the [digests allowed in SLSA](https://github.com/in-toto/attestation/blob/main/spec/v1/digest_set.md#fields). That's why I am looking to introduce some support for `gitBlob` and `gitTree` in Commons Codec. It is probably the best choice, because three main libraries provide digest helper in the Java ecosystem: `plexus-digest` (tiny and rarely updated), Commons Codec and Guava. I am trying to introduce support for “gitTree” in two steps: ### Step 1 Initially I would need to just compute `gitTree` on a file system. This PR tries to introduce that with the minimal API changes. ### Step 2 Once we compute the `gitTree` SHA-1 or SHA-256 hash of an **unpacked** Maven distribution, we would probably like to compare it with the **packed** Maven tarball. This is where we should offer users a more extensive API to compute the “gitTree” of a virtual tree of files (like a TAR archive). Devising the best API is complex, so I would leave it for now, but I would take it into consideration to decide, where to put `gitBlob` and `gitTree`, so we don't need to deprecate methods later. **TL;DR** What would you say about refactoring this PR to create some helper methods in a new `GitIdentifiers` class? ```java public final class GitIdentifiers { public static byte[] blobId(MessageDigest digest, byte[] content); public static byte[] blobId(MessageDigest digest, InputStream input) throws IOException; public static byte[] blobId(MessageDigest digest, Path path) throws IOException; public static byte[] treeId(MessageDigest digest, Path path) throws IOException; } Later on, we could extend that class to allow computing a `treeId` for other types of tree data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
