ppkarwasz commented on PR #427:
URL: https://github.com/apache/commons-codec/pull/427#issuecomment-4141870534

   > I'm not sure what Commons component the above should belong. I think you 
mean it to belong in Codec but I can't tell what's supposed to be an interface 
vs. implementation. Would this PR be reimplemented in terms of the above? Or 
would this PR provide the implementation for the above?
   > 
   > The name TreeBuilder is confusing to me without Javadoc. It's not building 
a tree, it's building a byte array. Do you mean it processes a directory tree? 
I can't tell.
   
   I am not sure which component this belongs either.
   
   To add more context: I am trying to create SLSA Provenance attestations for 
Java builds. For such attestations to have some value, they need to record some 
invariants of the build toolchain. When you build on your local machine, the 
most important build data is what you usually add to the vote e-mail: the Maven 
and JDK version.
   
   Maven and JDK are already unpacked on your build machine, so it's not 
possible to get a classical hash of their distribution, but it is possible to 
make a “gitTree” hash, which is also among the [digests allowed in 
SLSA](https://github.com/in-toto/attestation/blob/main/spec/v1/digest_set.md#fields).
   
   That's why I am looking to introduce some support for `gitBlob` and 
`gitTree` in Commons Codec. It is probably the best choice, because three main 
libraries provide digest helper in the Java ecosystem: `plexus-digest` (tiny 
and rarely updated), Commons Codec and Guava.
   
   I am trying to introduce support for “gitTree” in two steps:
   
   ### Step 1
   
   Initially I would need to just compute `gitTree` on a file system. This PR 
tries to introduce that with the minimal API changes.
   
   ### Step 2
   
   Once we compute the `gitTree` SHA-1 or SHA-256 hash of an **unpacked** Maven 
distribution, we would probably like to compare it with the **packed** Maven 
tarball. This is where we should offer users a more extensive API to compute 
the “gitTree” of a virtual tree of files (like a TAR archive).
   
   Devising the best API is complex, so I would leave it for now, but I would 
take it into consideration to decide, where to put `gitBlob` and `gitTree`, so 
we don't need to deprecate methods later.
   
   **TL;DR** What would you say about refactoring this PR to create some helper 
methods in a new `GitIdentifiers` class?
   
   ```java
   public final class GitIdentifiers {
   
       public static byte[] blobId(MessageDigest digest, byte[] content);
   
       public static byte[] blobId(MessageDigest digest, InputStream input) 
throws IOException;
   
       public static byte[] blobId(MessageDigest digest, Path path) throws 
IOException;
   
       public static byte[] treeId(MessageDigest digest, Path path) throws 
IOException;
   }
   
   Later on, we could extend that class to allow computing a `treeId` for other 
types of tree data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to