Hi Philipp,
Manifest.java:
- Line 258: creating a new array for two characters on each call isn't
as efficient as:
out.write('\r');
out.write('\n').
The new test that need internal access can gain that access by adding:
@modules java.base/java.util.jar:+open
That instructs testng to add the correct command line switches.
Then you can remove --illegal-access=warn and the tests will work.
In the test ValueUtf8Coding, just a mention of a method to create a
string with repeats.
"-".repeat(80);
On 12/17/2018 01:42 AM, Philipp Kunz wrote:
Hi Roger,
Thank you very much for your review and the feedback. Please find my
comments below and a new patch attached.
Philipp
On Wed, 2018-12-12 at 10:52 -0500, Roger Riggs wrote:
Hi Phillip,
Sorry, got busy...
Can you rebase the patch to the current repo, it did not apply cleanly.
I know you are focused on removing the deprecation, but a few
localized improvements
would be good.
In Attributes.java : 346-337, it uses StringBuffer, please change it
to StringBuilder.
Unless thread safety is an issue, StringBuilder is recommended, it is
slightly more efficient since it does no synchronization.
I did deliberately not touch the StringBuffer in the previous patch
but fully agree now I know it has a chance to be accepted. Would you
accept to replace StringBuffer with plain string concatenation after
http://openjdk.java.net/jeps/280 which was not in place at the time
those StringBuffers were introduced?
Using "+" concat is fine.
- And the StringBuilder should be sized when it is created, to avoid
needing to resize it later.
Using a single StringBuilder all the entries, using setLength(0),
would save allocating
for each entry.
Jep 280 would also avoid having to size the buffers far as I understand.
- check the indentation @line 308-20 and 311.
The indentation was weird, 5 instead of 4 spaces on some lines and I
re-indented only the lines I touched anyway in the previous patch.
Now, some lines appear changed only due to the indentation. After the
StringBuffer removal only two of them are left in the current patch
and certainly don't add significantly many unrelated changed lines now
any more.
ok
In Manifest.java:
- write72 method !String.isEmpty() is preferred over the .length() > 0.
ok
- if the line is empty, should it write the LINE_BREAK_BYTES?
A blank line in the manifest may be seen as significant.
Before the patch, a line break was always added to the end of the
StringBuffer after passing to make72Safe and before writing it. Now
with the previous patch, write72 added it. Altogether makes no
difference. But after reconsidering your point, I found a clearer
approach, I hope, than passing an empty string for having a line break
requested to be output which now also seems to me having been not the
most obvious way and more like a kind of a hack before. In the course
of that change I also renamed write72 to println and println72 and
also added a test for it. Hope you also like it better that way.
- Line 257: println() always makes me think of the system specific
line separator.
I'd name it writeln() and write72ln. I think, write is clearer
that it is bytes being written with
no charset implications.
- in the write method: Change StringBuffer to StringBuilder
- The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION should
use "#".
* {@link Attributes.Name#MANIFEST_VERSION} or
* {@link Attributes.Name#SIGNATURE_VERSION} must be set in
removed it again because it applies more to bug 8196371 or 6910466
ok
Thanks, Roger
Thanks, Roger
On 12/04/2018 03:34 AM, Philipp Kunz wrote:
Hi Roger,
I'm afraid the previous patch missed the tests, should be included
this time.
The intention of the patch is to solve only bug 8066619 about
deprecation. I sincerely hope the changes are neutral.
The new ValueUtf8Coding test heavily coincides/overlaps with 6202130
which is why I mentioned it. I'm however not satisfied that that
test alone also completely solves 6202130 because 6202130 has or
might have implications with breaking characters encoded in UTF-8
with more than one bytes across a line break onto a continuation
line which is not part of the current patch proposed for 8066619. At
some point I proposed ValueUtf8Coding with only removing the
comments from the implementation in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.html
but I have changed my mind since then and think now 6202130 should
also change the implementation not to break lines inside of
multi-byte characters which is not part of the current patch and is
probably easier after the current patch if necessary at all. Both
6202130 and the current patch for 8066619 here touch the UTF-8
coding of manifests and which ever is solved first should add a
corresponding test because no such test exists yet I believe. Worth
to mention are test/jdk/tools/launcher/DiacriticTest.java and
test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM
launch and have a somewhat different purpose. I haven't found any
other test for the specifically changed lines of code apart from
probably many tests that use manifests indirectly in some form.
Regards,
Philipp
On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
Hi Phillip,
The amount detail obscures the general purpose.
And there appears to be more than 1.
The Jira issue IDs mentioned are 8066619 and 6202130.
Is this functionally neutral and only fixes the deprecations?
There is a mention that a test is needed for multi-byte chars, but a test
is not included. Is there an existing test for that?
Its probably best to identify the main functional improvement (multi-byte)
and fix the deprecation as a side effect.
Thanks for digging through the issues and the explanations;
it will take a bit of study to unravel and understand everything in this
changeset.
Regards, Roger
On 12/01/2018 06:49 AM, Philipp Kunz wrote:
Find the proposed patch attached. Some comments and explanations,
here: There is a quite interesting implementation in Manifest and
Attributes worth quite some explanation. The way it used to work
before was: 1. put manifest header name, colon and space into a
StringBuffer -> the buffer now contains a string of characters
each high-byte of which is zero as explained later why this is
important. the high-bytes are zero because the set of allowed
characters is very limited to ":", " ", "a" - "z", "A" - "Z", "0"
- "9", "_", and "-" according to Attributes.Name#hash(String) so
far with only the name and the separator and yet without the
values. 2. if the value is not null, encode it in UTF-8 into a
byte array and instantiate a String with it using deprecated
String#String(byte[],int,int,int) resulting in a String with the
same length as the byte array before holding one byte in each
character's low-byte. This makes a difference for characters
encoded with more than one byte in UTF-8. The new String is
potentially longer than the original value. 3. if the value is not
null, append value to buffer. The one UTF-8 encoded byte per
character from the appended string is preserved also in the buffer
along with the previous buffer contents. 3alt. if the value is
null, add "null" to the buffer. See
java.lang.AbstractStringBuilder#append(String). Neither of the
characters of "null" has a non-zero high-byte encoded as UTF-16
chars. 4. make72Safe inserts line breaks with continuation spaces.
Note that the buffer here contains only one byte per character
because all high- bytes are still zero so that line.length() and
line.insert(index, ...) effectively operate with byte offsets and
not characters. 5. buffer.toString() 6.
DataOutputStream#writeBytes(String). First of all read the JavaDoc
comment for it, which explains it all: Writes out the string to
the underlying output stream as a sequence of bytes. Each
character in the string is written out, in sequence, by
discarding its high eight bits. If no exception is thrown, the
counter <code>written</code> is incremented by the length of
<code>s</code> This restores the earlier UTF-8 encoding correctly.
The topic has been discussed and mentioned already in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
ml https://bugs.openjdk.java.net/browse/JDK-6202130
String(byte[],int,int,int) works "well" or "well enough" only
together with DataOutputStream#writeBytes(String). When removing
String(byte[],int,int,int) from Manifest and Attributes because
deprecated, it makes no sense to keep using
DataOutputStream#writeBytes(String) either. For the same reason
as String#String(byte[],int,int,int) has been deprecated, I
suggest to also deprecate java.io.DataOutput#writeBytes(String) as
a separate issue. This might relate to
https://bugs.openjdk.java.net/browse/JDK-6400767 but that one came
to a different conclusion some ten years ago. I preferred to stick
with the DataOutputStream even though not strictly necessary any
more. It is and has been in the API of Attributes (unfortunately
not private) and should better not be removed by changing the
parameter type. Same for Manifest#make72Safe(StringBuffer) which I
deprecated rather than having removed. Someone could have extended
a class from Manifest and use such a method and when changing the
signature it could no longer even compile in a far-fetched case.
LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "
with the idea to slightly improve performance this way. I figured
it does not need JavaDoc comments but would be happy to add them
if desired. I removed "XXX Need to handle UTF8 values." from
Manifest#read after adding a test for it in ValueUtf8Coding. This
change and test also relate to bug 6202130 but does not solve that
one completely. ValueUtf8Coding demonstrates that Manifest can
read UTF-8 encoded values which is a necessary test case to cover
for this patch here. ValueUtf8Coding is the same test as already
submitted and suggested earlier. See
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
d.html#55848 Indentation in Attributes#write(DataOutputStream) was
five spaces on most lines. I fixed indentation only on the lines
changed anyway. I replaced String#String(byte[],int,int,String)
with
String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
which as a difference does not declare to throw a
java.io.UnsupportedEncodingException. That also replaced "UTF8" as
a charset name which I would consider not optimal regarding
sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). In
my opinion there is still some duplicated or at least very similar
code in Manifest#write, Attributes#writeMain, and Attributes#write
but I preferred to change less rather than more and not to further
refactor and re-combine it. In EmptyKeysAndValues and
NullKeysAndValues tests I tried to demonstrate that the changed
implementation does not change behaviour also in edge cases. I
would have expected not having to test all these cases but then I
realized it was possible to test and is therefore possible in a
real use case as well however far-fetched. At least the if (value
!= null) { lines (three times) most obviously demand to test the
null value cases. I'm looking curiously forward to any kind of
feedback or opinion. Philipp