Hi,
It would seem useful in the long term to move the inclusion of the date
to be part of the comment
and it would be up to the caller of store() to include a date (as a
string) where ever appropriate.
And drop the writing hardcoded date; but its not clear how to get there
given compatibility concerns.
One more option would be to add a separate method on a Properties
instance to set or omit the date written by store().
It would not have a global affect on all Properties.store calls but
would be specific to an instance.
It would still require callers to have access to the instance but not
have to modify the immediate caller of store().
Without a better understanding of the uses that affect reproducible
builds, this might not be able to address them all.
There is likely no good (clean) solution since the reproducible build
goal implies a global modification of behavior and all the alternatives
that are specific to Properties have only a local effect.
$.02, Roger
On 8/27/21 10:16 AM, Jaikiran Pai wrote:
On 26/08/21 5:06 pm, Alan Bateman wrote:
On 25/08/2021 15:57, Jaikiran Pai wrote:
:
Introducing an overloaded "store" which takes the timestamp or
implicitly using the SOURCE_DATE_EPOCH environment variable would
mean that the Properties.store(...) APIs will continue to always
write out a Date representation into the outputstream. That
effectively means that there will continue to be no way to disable
writing out a (date) comment. More specifically, even if a user
doesn't explicitly specify a comment, we would end up writing a
default (date) comment. Do we want that? Or while we are doing these
changes, should we introduce this ability of disabling writing out
these date comments by default? That, IMO, can only be done by
introducing new APIs instead of trying to slightly change the spec
and the implementation of the current "store" APIs. After all, if
any callers do want a date (and a reproducible one at that), they
could always do so by passing that as a value for the "comment"
parameter of these (new) "storeXXX" APIs.
Properties save/store has always emitted a comment line with the
current date/time, it goes back to JDK 1.0. It's unfortunate that
newer store methods didn't re-examine this, also unfortunate that it
continued with 8859-1. In the discussion on jdk-dev about
reproducibility then I think we concluded that we don't want to
change the behavior of existing methods, hence the discussion on
introducing a new method.
An new overload of store would give the maximum flexibility to new
code but it would require programs used in builds to use it
consistently. It's possible that libraries or tools that are using
Properties::store have no idea that they will ultimately be used in a
system that is trying to produce reproducible builds. So I have some
sympathy to the argument that there should a way to omit the date or
emit it as the Unix epoch time or a fixed value. This would mean
changing the spec to allow for some implementation means to do this,
and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH.
I think Roger has misgivings about this.
So let's list the options and the pros/cons and see if we can
converge on an approach.
Keeping aside the discussion about whether to introduce a new API or
change the spec of the existing API, just for a moment, I think the
main question is whether the current date comment that gets added by
the store(...) is being used by anything out there, other than maybe
for visual aspects while reading the properties file. From the
discussions I have seen so far in the threads on openjdk and other
mailing lists, I don't think anyone is using that comment for
anything. So if we are indeed willing to change the spec of the
store(...) API would it be too big/unacceptable a change to disable
writing this date comment, at least in certain specific cases? With my
limited usage of this API, my guess is that it's higly unlikely that
it will break anything - after all it's a comment that was being
added. The only potential breakages perhaps would be some scripts? But
even those, I don't know how it would break them since anyway the date
comment wasn't a constant value. Having said that, I looked at the
patch that you pointed out[1] that got integrated into the JDK shipped
in Ubuntu, for introducing reproducibility. I'm a bit surprised that
the patched implementation decided to write out a formatted
reproducible date instead of patching it to just not write the date.
After all that IMO would have been much simpler and it would anyway
have changed (removed) the exact same line in the code that was
patched. So maybe I'm underestimating the usage of this date comment.
So now coming to the API part of it. The potential ways to tackle
this, that I could think of are as follows:
1. No new APIs to be added. Instead, the existing "store(...)" APIs
specification will be changed to allow for reproducibility of the
content that gets stored. The possible specification changes that we
could attempt are:
1a. The store(...) APIs by default will continue to write out a
date comment, which represents the current date. In the case where
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs
will use that value to construct a (UTC) Date and write out the date
comment.
Pros:
----
- Existing callers of these APIs won't have to change their
code to use this new semantics.
- Tools/environments where reproducibility is required, can
configure their environment (and they probably already would have) to
set the SOURCE_DATE_EPOCH value.
Cons:
----
- Behaviour of the same API will change dynamically based on
the current environment of the process. However, this change in
behaviour will not have functional impact on the generated output,
since it's just a comment that changes.
- There is no way to disable writing out comments to the
output. i.e. even if user passes a null "comments" value to the APIs,
there will always be the date comment in the output.
- Requires spec change
OR
1b. The store(...) APIs by default will continue to write out a
date comment, which represents the current date. In the case where
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs
will _not_ write out any date comment. This is unlike 1a where we use
the value of the environment variable to compute the Date. This
alternate approach of not using the value but instead considering the
presence of the environment variable being set, to not write out any
date comment will still provide reproducibility and also will get rid
of the unnecessary date comment.
Pros:
----
- Existing callers of these APIs won't have to change their
code to use this new semantics.
- Tools/environments where reproducibility is required, can
configure their environment (and they probably already would have) to
set the SOURCE_DATE_EPOCH value.
- The date comment will no longer end up in the output in
environments where SOURCE_DATE_EPOCH is set (to any value).
Cons:
----
- Just like 1a, the API output is determined by the current
environment of the process.
- Requires spec change.
- The only way to generate a output without any comments would
require setting the SOURCE_DATE_EPOCH even when reproducibility isn't
a necessity.
- When SOURCE_DATE_EPOCH is set, it will potentially (even if
low probability) will breaks existing tools, code(?) or scripts that
might be looking for some/any date in the comments of the output.
OR
1c. The store(...) APIs which both currently take a "comments"
parameter will write out the date comment only if the "comments"
parameter is null. If the "comments" value is non-null, then, only
that user passed comment will be written out to the output.
Pros:
----
- There will now be a way for the callers to decide what exact
comment appears in the output. It will either be their explicit
comment or a default date comment.
- For reproducibility, the callers can pass any reproducible
comment of their choice and it no longer will be the responsibility of
the store(...) API to decide what values for the comments would
provide reproducibility of the output.
(Note that the store(...) APIs will still take it upon
themselves to write out the property keys in a reproducible order,
that's now been decided in this discussion).
Cons:
----
- This "may" require source code changes in callers' code. If
callers are already passing a comment, they will no longer see the
additional default date comment being added. However, if callers are
not passing any comment, then they will continue to see the default
date comment being added. In essence, callers who aren't passing any
comment to the APIs, MUST change their code, if they want to stop
seeing the date comment.
- There is no way to disable writing out the comment to the
output. Every single store(...) output will have a comment, either
user supplied or the default date comment.
- Requires spec change.
- It will potentially (even if low probability) will breaks
existing tools, code(?) or scripts that might be looking for some/any
date in the comments of the output.
OR
1d. Since we are thinking of changing the spec, I will include
this option as well. The store(...) APIs which both current take the
"comments" parameter will write out the caller passed comment only if
the comment is non-null. No default comment will be written out. If
the caller passed comment is null, then only the property key and
values will be written out and the output will have no comments.
Pros:
----
- No code changes needed in callers' code.
- Gets rid of the (IMO, unnecessary) date comment from the
output.
- Callers now have full control over what comment gets written
out.
- There's now a way to disable writing out any comments to the
output. If callers provide no comment, then only the property key
values will be written out.
- These APIs will now feel much more natural without any "if
this, else that" or any "magic/default" comments appearing in the output.
- Reproducibility of the comments is now decided by the
callers and it no longer will be the responsibility of these
store(...) APIs to come up with reproducible comments.
(Note that the store(...) APIs will still take it upon
themselves to write out the property keys in a reproducible order,
that's now been decided in this discussion).
Cons:
----
- Requires spec change.
- It will potentially (even if low probability) will breaks
existing tools, code(?) or scripts that might be looking for some/any
date in the comments of the output.
2. Introduce a new (and only one) overloaded API as follows:
public void store(Writer writer, String comments,
java.time.Instant dateComment) throws IOException;
This new API will write out an optional date comment, if the
java.time.Instant parameter value is not null. In such a case where
the Instant value is non-null, a java.time.ZonedDateTime will be
computed out of the Instant value, for the "UTC" ZoneId and the string
representation will be written out. In the case where the Instant
parameter value is null, no date comment will be added in the output.
If the user passes the "comments" value, then that will be written
out as the comments in the output.
Pros:
----
- No spec change needed
- Callers now have control over what date gets written out as a
comment
- Callers now have control to completely disable writing out any
comments, by passing null for both "comments" and the Instant parameters
- Reproducibility of the date comment will not be a concern of
this new API and instead is controlled by the callers.
Cons:
----
- Requires code changes in callers' code. Potentially in much more
places than any of other proposed options, since this is a completely
new API.
- The new API still allows the Date comment to be written out,
which IMO adds no value. Furthermore, this now introduces 2 ways to
add a comment - one by passing value to the "comments" parameter and
the other by passing a value to the "java.time.Instant" parameter.
- Introduces/continues with, avoidable, Instant parsing and date
formatting logic/complexity into the new API.
Let me know if I might have missed any other potential option.
[1]
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5
-Jaikiran