Hi,

It would seem useful in the long term to move the inclusion of the date to be part of the comment and it would be up to the caller of store() to include a date (as a string) where ever appropriate. And drop the writing hardcoded date; but its not clear how to get there given compatibility concerns.


One more option would be to add a separate method on a Properties instance to set or omit the date written by store().

It would not have a global affect on all Properties.store calls but would be specific to an instance. It would still require callers to have access to the instance but not have to modify the immediate caller of store().

Without a better understanding of the uses that affect reproducible builds, this might not be able to address them all.

There is likely no good (clean) solution since the reproducible build goal implies a global modification of behavior and all the alternatives that are specific to Properties have only a local effect.

$.02, Roger



On 8/27/21 10:16 AM, Jaikiran Pai wrote:

On 26/08/21 5:06 pm, Alan Bateman wrote:
On 25/08/2021 15:57, Jaikiran Pai wrote:
:

Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.

Properties save/store has always emitted a comment line with the current date/time, it goes back to JDK 1.0. It's unfortunate that newer store methods didn't re-examine this, also unfortunate that it continued with 8859-1. In the discussion on jdk-dev about reproducibility then I think we concluded that we don't want to change the behavior of existing methods, hence the discussion on introducing a new method.

An new overload of store would give the maximum flexibility to new code but it would require programs used in builds to use it consistently. It's possible that libraries or tools that are using Properties::store have no idea that they will ultimately be used in a system that is trying to produce reproducible builds. So I have some sympathy to the argument that there should a way to omit the date or emit it as the Unix epoch time or a fixed value. This would mean changing the spec to allow for some implementation means to do this, and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. I think Roger has misgivings about this.

So let's list the options and the pros/cons and see if we can converge on an approach.

Keeping aside the discussion about whether to introduce a new API or change the spec of the existing API, just for a moment, I think the main question is whether the current date comment that gets added by the store(...) is being used by anything out there, other than maybe for visual aspects while reading the properties file. From the discussions I have seen so far in the threads on openjdk and other mailing lists, I don't think anyone is using that comment for anything. So if we are indeed willing to change the spec of the store(...) API would it be too big/unacceptable a change to disable writing this date comment, at least in certain specific cases? With my limited usage of this API, my guess is that it's higly unlikely that it will break anything - after all it's a comment that was being added. The only potential breakages perhaps would be some scripts? But even those, I don't know how it would break them since anyway the date comment wasn't a constant value. Having said that, I looked at the patch that you pointed out[1] that got integrated into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a bit surprised that the patched implementation decided to write out a formatted reproducible date instead of patching it to just not write the date. After all that IMO would have been much simpler and it would anyway have changed (removed) the exact same line in the code that was patched. So maybe I'm underestimating the usage of this date comment.

So now coming to the API part of it. The potential ways to tackle this, that I could think of are as follows:

1. No new APIs to be added. Instead, the existing "store(...)" APIs specification will be changed to allow for reproducibility of the content that gets stored. The possible specification changes that we could attempt are:

    1a. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will use that value to construct a (UTC) Date and write out the date comment.

        Pros:
        ----
        - Existing callers of these APIs won't have to change their code to use this new semantics.         - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value.

        Cons:
        ----
        - Behaviour of the same API will change dynamically based on the current environment of the process. However, this change in behaviour will not have functional impact on the generated output, since it's just a comment that changes.         - There is no way to disable writing out comments to the output. i.e. even if user passes a null "comments" value to the APIs, there will always be the date comment in the output.
        - Requires spec change

    OR

    1b. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will _not_ write out any date comment. This is unlike 1a where we use the value of the environment variable to compute the Date. This alternate approach of not using the value but instead considering the presence of the environment variable being set, to not write out any date comment will still provide reproducibility and also will get rid of the unnecessary date comment.

        Pros:
        ----
        - Existing callers of these APIs won't have to change their code to use this new semantics.         - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value.         - The date comment will no longer end up in the output in environments where SOURCE_DATE_EPOCH is set (to any value).

        Cons:
        ----
        - Just like 1a, the API output is determined by the current environment of the process.
        - Requires spec change.
        - The only way to generate a output without any comments would require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a necessity.         - When SOURCE_DATE_EPOCH is set, it will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.

    OR

    1c. The store(...) APIs which both currently take a "comments" parameter will write out the date comment only if the "comments" parameter is null. If the "comments" value is non-null, then, only that user passed comment will be written out to the output.

        Pros:
        ----
        - There will now be a way for the callers to decide what exact comment appears in the output. It will either be their explicit comment or a default date comment.         - For reproducibility, the callers can pass any reproducible comment of their choice and it no longer will be the responsibility of the store(...) API to decide what values for the comments would provide reproducibility of the output.

        (Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).

        Cons:
        ----
        - This "may" require source code changes in callers' code. If callers are already passing a comment, they will no longer see the additional default date comment being added. However, if callers are not passing any comment, then they will continue to see the default date comment being added. In essence, callers who aren't passing any comment to the APIs, MUST change their code, if they want to stop seeing the date comment.         - There is no way to disable writing out the comment to the output. Every single store(...) output will have a comment, either user supplied or the default date comment.
        - Requires spec change.
        - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.

    OR

    1d. Since we are thinking of changing the spec, I will include this option as well. The store(...) APIs which both current take the "comments" parameter will write out the caller passed comment only if the comment is non-null. No default comment will be written out. If the caller passed comment is null, then only the property key and values will be written out and the output will have no comments.

        Pros:
        ----
        - No code changes needed in callers' code.
        - Gets rid of the (IMO, unnecessary) date comment from the output.         - Callers now have full control over what comment gets written out.         - There's now a way to disable writing out any comments to the output. If callers provide no comment, then only the property key values will be written out.         - These APIs will now feel much more natural without any "if this, else that" or any "magic/default" comments appearing in the output.         - Reproducibility of the comments is now decided by the callers and it no longer will be the responsibility of these store(...) APIs to come up with reproducible comments.

        (Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).

        Cons:
        ----
        - Requires spec change.
        - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.

2. Introduce a new (and only one) overloaded API as follows:

    public void store(Writer writer, String comments, java.time.Instant dateComment) throws IOException;

    This new API will write out an optional date comment, if the java.time.Instant parameter value is not null. In such a case where the Instant value is non-null, a java.time.ZonedDateTime will be computed out of the Instant value, for the "UTC" ZoneId and the string representation will be written out. In the case where the Instant parameter value is null, no date comment will be added in the output.     If the user passes the "comments" value, then that will be written out as the comments in the output.

    Pros:
    ----
    - No spec change needed
    - Callers now have control over what date gets written out as a comment     - Callers now have control to completely disable writing out any comments, by passing null for both "comments" and the Instant parameters     - Reproducibility of the date comment will not be a concern of this new API and instead is controlled by the callers.

    Cons:
    ----
    - Requires code changes in callers' code. Potentially in much more places than any of other proposed options, since this is a completely new API.     - The new API still allows the Date comment to be written out, which IMO adds no value. Furthermore, this now introduces 2 ways to add a comment - one by passing value to the "comments" parameter and the other by passing a value to the "java.time.Instant" parameter.     - Introduces/continues with, avoidable, Instant parsing and date formatting logic/complexity into the new API.

Let me know if I might have missed any other potential option.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5

-Jaikiran



Reply via email to