Re: RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Roger Riggs Wed, 12 Dec 2018 07:56:25 -0800

Hi Phillip,

Sorry, got busy...


Can you rebase the patch to the current repo, it did not apply cleanly.

I know you are focused on removing the deprecation, but a few localizedimprovements

would be good.

In Attributes.java : 346-337, it uses StringBuffer, please change it toStringBuilder.

  Unless thread safety is an issue, StringBuilder is recommended, it is
  slightly more efficient since it does no synchronization.

- And the StringBuilder should be sized when it is created, to avoidneeding to resize it later. Using a single StringBuilder all the entries, using setLength(0),would save allocating

  for each entry.

 - check the indentation @line 308-20 and 311.


In Manifest.java:
 - write72 method  !String.isEmpty() is preferred over the .length() > 0.
 - if the line is empty, should it write the LINE_BREAK_BYTES?
   A blank line in the manifest may be seen as significant.

 - in the write method: Change StringBuffer to StringBuilder

- The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION shoulduse "#".

     * {@link Attributes.Name#MANIFEST_VERSION} or
     * {@link Attributes.Name#SIGNATURE_VERSION} must be set in

Thanks, Roger


On 12/04/2018 03:34 AM, Philipp Kunz wrote:

Hi Roger,
I'm afraid the previous patch missed the tests, should be includedthis time.
The intention of the patch is to solve only bug 8066619 aboutdeprecation. I sincerely hope the changes are neutral.
The new ValueUtf8Coding test heavily coincides/overlaps with 6202130which is why I mentioned it. I'm however not satisfied that that testalone also completely solves 6202130 because 6202130 has or might haveimplications with breaking characters encoded in UTF-8 with more thanone bytes across a line break onto a continuation line which is notpart of the current patch proposed for 8066619. At some point Iproposed ValueUtf8Coding with only removing the comments from theimplementation inhttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.htmlbut I have changed my mind since then and think now 6202130 shouldalso change the implementation not to break lines inside of multi-bytecharacters which is not part of the current patch and is probablyeasier after the current patch if necessary at all. Both 6202130 andthe current patch for 8066619 here touch the UTF-8 coding of manifestsand which ever is solved first should add a corresponding test becauseno such test exists yet I believe. Worth to mention aretest/jdk/tools/launcher/DiacriticTest.java andtest/jdk/tools/launcher/UnicodeTest.java both of which test the JVMlaunch and have a somewhat different purpose. I haven't found anyother test for the specifically changed lines of code apart fromprobably many tests that use manifests indirectly in some form.
Regards,
Philipp


On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
Hi Phillip,

The amount detail obscures the general purpose.
And there appears to be more than 1.
The Jira issue IDs mentioned are 8066619 and 6202130.

Is this functionally neutral and only fixes the deprecations?

There is a mention that a test is needed for multi-byte chars, but a test
is not included.  Is there an existing test for that?

Its probably best to identify the main functional improvement (multi-byte)
and fix the deprecation as a side effect.

Thanks for digging through the issues and the explanations;
it will take a bit of study to unravel and understand everything in this
changeset.

Regards, Roger


On 12/01/2018 06:49 AM, Philipp Kunz wrote:
Find the proposed patch attached. Some comments and explanations,here: There is a quite interesting implementation in Manifest andAttributes worth quite some explanation. The way it used to workbefore was: 1. put manifest header name, colon and space into aStringBuffer -> the buffer now contains a string of characters eachhigh-byte of which is zero as explained later why this is important.the high-bytes are zero because the set of allowed characters isvery limited to ":", " ", "a" - "z", "A" - "Z", "0" - "9", "_", and"-" according to Attributes.Name#hash(String) so far with only thename and the separator and yet without the values. 2. if the valueis not null, encode it in UTF-8 into a byte array and instantiate aString with it using deprecated String#String(byte[],int,int,int)resulting in a String with the same length as the byte array beforeholding one byte in each character's low-byte. This makes adifference for characters encoded with more than one byte in UTF-8.The new String is potentially longer than the original value. 3. ifthe value is not null, append value to buffer. The one UTF-8 encodedbyte per character from the appended string is preserved also in thebuffer along with the previous buffer contents. 3alt. if the valueis null, add "null" to the buffer. Seejava.lang.AbstractStringBuilder#append(String). Neither of thecharacters of "null" has a non-zero high-byte encoded as UTF-16chars. 4. make72Safe inserts line breaks with continuation spaces.Note that the buffer here contains only one byte per characterbecause all high- bytes are still zero so that line.length() andline.insert(index, ...) effectively operate with byte offsets andnot characters. 5. buffer.toString() 6.DataOutputStream#writeBytes(String). First of all read the JavaDoccomment for it, which explains it all: Writes out the string to theunderlying output stream as a sequence of bytes. Each characterin the string is written out, in sequence, by discarding itshigh eight bits. If no exception is thrown, the counter<code>written</code> is incremented by the length of<code>s</code> This restores the earlier UTF-8 encoding correctly.The topic has been discussed and mentioned already inhttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.html https://bugs.openjdk.java.net/browse/JDK-6202130String(byte[],int,int,int) works "well" or "well enough" onlytogether with DataOutputStream#writeBytes(String). When removingString(byte[],int,int,int) from Manifest and Attributes becausedeprecated, it makes no sense to keep usingDataOutputStream#writeBytes(String) either. For the same reasonas String#String(byte[],int,int,int) has been deprecated, I suggestto also deprecate java.io.DataOutput#writeBytes(String) as aseparate issue. This might relate tohttps://bugs.openjdk.java.net/browse/JDK-6400767 but that one cameto a different conclusion some ten years ago. I preferred to stickwith the DataOutputStream even though not strictly necessary anymore. It is and has been in the API of Attributes (unfortunately notprivate) and should better not be removed by changing the parametertype. Same for Manifest#make72Safe(StringBuffer) which I deprecatedrather than having removed. Someone could have extended a class fromManifest and use such a method and when changing the signature itcould no longer even compile in a far-fetched case. LINE_BREAK,CONTINUATION_SPACE, LINE_BREAK_BYTES, andLINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having toinvoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "with the idea to slightly improve performance this way. I figured itdoes not need JavaDoc comments but would be happy to add them ifdesired. I removed "XXX Need to handle UTF8 values." fromManifest#read after adding a test for it in ValueUtf8Coding. Thischange and test also relate to bug 6202130 but does not solve thatone completely. ValueUtf8Coding demonstrates that Manifest can readUTF-8 encoded values which is a necessary test case to cover forthis patch here. ValueUtf8Coding is the same test as alreadysubmitted and suggested earlier. Seehttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/thread.html#55848 Indentation in Attributes#write(DataOutputStream) wasfive spaces on most lines. I fixed indentation only on the lineschanged anyway. I replaced String#String(byte[],int,int,String) withString#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)which as a difference does not declare to throw ajava.io.UnsupportedEncodingException. That also replaced "UTF8" as acharset name which I would consider not optimal regardingsun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). Inmy opinion there is still some duplicated or at least very similarcode in Manifest#write, Attributes#writeMain, and Attributes#writebut I preferred to change less rather than more and not to furtherrefactor and re-combine it. In EmptyKeysAndValues andNullKeysAndValues tests I tried to demonstrate that the changedimplementation does not change behaviour also in edge cases. I wouldhave expected not having to test all these cases but then I realizedit was possible to test and is therefore possible in a real use caseas well however far-fetched. At least the if (value != null) { lines(three times) most obviously demand to test the null value cases.I'm looking curiously forward to any kind of feedback or opinion.Philipp

Re: RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Reply via email to