On Fri, 5 Nov 2021 11:31:34 GMT, Andrew Leonard <aleon...@openjdk.org> wrote:

>> This PR enables reproducible Jars, Jmods and openjdk image zip files 
>> (eg.src.zip).
>> It provides support for SOURCE_DATE_EPOCH for Jar, Jmod and underlying 
>> ZipOutputStream's.
>> It fixes the following keys issues relating to reproducibility:
>> - Jar and ZipOutputStream are not SOURCE_DATE_EPOCH aware
>>   - Jar and ZipOutputStream now detect SOURCE_DATE_EPOCH environment setting
>> - Jar and Jmod file content ordering was non-determinsitic
>>   - Fixes to Jar and Jmod main's to ensure sorted classes content ordering
>> - openjdk image zip file generation used "zip" which is non-determinsitic
>>   - New openjdk build tool "GenerateZip" which produces the final 
>> determinsitic zip files as part of the build and also detects 
>> SOURCE_DATE_EPOCH
>> 
>> Signed-off-by: Andrew Leonard <anleo...@redhat.com>
>
>> I think this is going to require discussion, a PR may be too premature. Is 
>> your goal to get the JDK build and run-time images creates with jlink to use 
>> the SOURCE_DATE_EPOCH? Are you looking for project builds that use the 
>> jar/jmod/etc. tools to use it? Or are you looking to have every library and 
>> application that uses the java.util.zip APIs to read it? If it includes the 
>> latter, and the patch in the PR suggests you are, then it will require 
>> analysis to work through the API spec changes.
>> 
>> One suggestion is to look at JDK-8231640 and PR 5372 [1]. The original 
>> proposal was to use SOURCE_DATE_EPOCH. After a lengthy discussion we 
>> converged on the system property java.properties.date rather than the env 
>> variable. I suspect that much of that discussion applies here.
>> 
>> -Alan
>> 
>> [1] https://github.com/[/pull/5372](https://github.com/openjdk/jdk/pull/5372)
> 
> @AlanBateman Hi Alan, thank you for the comments. So yes, totally agree this 
> has a way to go, I thought creating a PR would kick things off...! So just to 
> clarify the initial objectives which you raise above. My current aim is 
> actually purely an OpenJDK JDK image perspective, building on the work 
> @magicus has been doing with SOURCE_DATE_EPOCH in the build, hence my natural 
> approach to this PR. So from that perspective, it's purely a "build" change. 
> However, the openjdk build obviously uses openjdk Jar, Jmod tooling to build 
> itself, hence the direction I took in this PR.
> 
> Magnus's PR #5372 is useful thank you, I had not discovered that change, as 
> you say the direction there is similar to mine, and I understand the 
> potential pitfalls of adding doPrivileged's etc. So that resolved to a new 
> system property java.properties.date, which makes sense.
> 
> So with that in mind, and given the objective of JDK image reproducibility, 
> this would imply presumably an approach of a new cmd line option to Jar and 
> Jmod tools (eg.--source-date <date>), since these tools are cmds?
> 
> One of the key changes in this PR is to ZipOutputStream where it sets the 
> ZipEntry.xdostime if not set, to the current time. Now I am thinking given 
> the "objective", if we fix all upstream tooling to ensure they set ZipEntry 
> times, using "--source-date" (or whatever mechanism), then we can then avoid 
> changing ZipOutputStream I believe. JDK tools like jar, jmod generate/update 
> extra files like Manifests, and in fact I think the current jmod 
> implementation writes all its content using the current time.
> 
> So something along these lines:
>     jar -cf myjar.jar --source-date "<date>" * 
>     jmod create --source-date "<date>" ... my.jmod
> The jar, jmod,.. tooling then ensure all ZipEntry times are set to 
> "source-date".
> 
> I chose "source-date" name based on being synonymous with SOURCE_DATE_EPOCH, 
> implying a similar "use case".
> 
> So taking this approach for JDK tooling should achieve the reproducible JDK 
> image objective, without a java API behavior change I am hoping.
> Thoughts?
> 
> Regards
> Andrew

> @andrew-m-leonard Just to be clear: changing arguments to the command line 
> utilities is still an API change and will need a CSR.
> 
> My gut feeling is that using a system property that affect all users of 
> ZipStream is preferable. That way a user could run like `java 
> -Djava.properties.date=$(SOURCE_DATE_EPOCH) 
> -Djava.zipfile.date=$(SOURCE_DATE_EPOCH) ...` (or whatever) and get 
> reproducible behavior in all their code, for places that could otherwise be 
> hard to fix.

Thanks @magicus 
Yes, I understand a CSR is required, I was trying to state there would not be a 
java.util.zip API behavior change necessarily.

Just to state again, and although I need to check this, I think if all JDK 
tooling ensures ZipEntry's had date set to whatever <source-date> prior to 
calling ZipOutputStream.putNextEntry(e), then I think no change to 
ZipOutputStream would be needed (I think).
However, you do elude to the benefit to "others" of changing ZipOutputStream to 
use (say -Djava.zipfile.date), in that it makes it easier for consumers to make 
reproducible builds rather than tracking down all the code that calls it in 
their project.

GenerateZip as you say is slightly separate, and bespoke to the openjdk build. 
I did start by trying to create my own "zip", but that is actually quite 
involved due to how openjdk build makes zip files, and how it leverages zip 
functionality for things like --includes & --excludes. It is far easier 
actually to just construct the final deterministic zip file using this class at 
the end. But I agree, if someone could write a determinisitic "zip", that would 
be great please!
I will move GenerateZip to a separate PR, and make it take "timestamp" just 
like CreateSymbols already does.

Thanks again
Andrew

-------------

PR: https://git.openjdk.java.net/jdk/pull/6268

Reply via email to