Re: RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Roger Riggs Mon, 17 Dec 2018 09:14:03 -0800

Hi Philipp,

Manifest.java:

- Line 258: creating a new array for two characters on each call isn'tas efficient as:

    out.write('\r');
    out.write('\n').

The new test that need internal access can gain that access by adding:
   @modules java.base/java.util.jar:+open

That instructs testng to add the correct command line switches.
Then you can remove --illegal-access=warn and the tests will work.

In the test ValueUtf8Coding, just a mention of a method to create astring with repeats.

     "-".repeat(80);


On 12/17/2018 01:42 AM, Philipp Kunz wrote:

Hi Roger,
Thank you very much for your review and the feedback. Please find mycomments below and a new patch attached.
Philipp


On Wed, 2018-12-12 at 10:52 -0500, Roger Riggs wrote:
Hi Phillip,

Sorry, got busy...

Can you rebase the patch to the current repo, it did not apply cleanly.
I know you are focused on removing the deprecation, but a fewlocalized improvements
would be good.
In Attributes.java : 346-337, it uses StringBuffer, please change itto StringBuilder.
  Unless thread safety is an issue, StringBuilder is recommended, it is
  slightly more efficient since it does no synchronization.
I did deliberately not touch the StringBuffer in the previous patchbut fully agree now I know it has a chance to be accepted. Would youaccept to replace StringBuffer with plain string concatenation afterhttp://openjdk.java.net/jeps/280 which was not in place at the timethose StringBuffers were introduced?

Using "+" concat is fine.

- And the StringBuilder should be sized when it is created, to avoidneeding to resize it later. Using a single StringBuilder all the entries, using setLength(0),would save allocating
  for each entry.
Jep 280 would also avoid having to size the buffers far as I understand.
 - check the indentation @line 308-20 and 311.
The indentation was weird, 5 instead of 4 spaces on some lines and Ire-indented only the lines I touched anyway in the previous patch.Now, some lines appear changed only due to the indentation. After theStringBuffer removal only two of them are left in the current patchand certainly don't add significantly many unrelated changed lines nowany more.

ok

In Manifest.java:
 - write72 method  !String.isEmpty() is preferred over the .length() > 0.
ok
 - if the line is empty, should it write the LINE_BREAK_BYTES?
   A blank line in the manifest may be seen as significant.
Before the patch, a line break was always added to the end of theStringBuffer after passing to make72Safe and before writing it. Nowwith the previous patch, write72 added it. Altogether makes nodifference. But after reconsidering your point, I found a clearerapproach, I hope, than passing an empty string for having a line breakrequested to be output which now also seems to me having been not themost obvious way and more like a kind of a hack before. In the courseof that change I also renamed write72 to println and println72 andalso added a test for it. Hope you also like it better that way.

- Line 257: println() always makes me think of the system specificline separator. I'd name it writeln() and write72ln. I think, write is clearerthat it is bytes being written with

     no charset implications.


 - in the write method: Change StringBuffer to StringBuilder

- The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION shoulduse "#".

     * {@link Attributes.Name#MANIFEST_VERSION} or
     * {@link Attributes.Name#SIGNATURE_VERSION} must be set in

removed it again because it applies more to bug 8196371 or 6910466

ok

Thanks, Roger

Thanks, Roger


On 12/04/2018 03:34 AM, Philipp Kunz wrote:
Hi Roger,
I'm afraid the previous patch missed the tests, should be includedthis time.
The intention of the patch is to solve only bug 8066619 aboutdeprecation. I sincerely hope the changes are neutral.
The new ValueUtf8Coding test heavily coincides/overlaps with 6202130which is why I mentioned it. I'm however not satisfied that thattest alone also completely solves 6202130 because 6202130 has ormight have implications with breaking characters encoded in UTF-8with more than one bytes across a line break onto a continuationline which is not part of the current patch proposed for 8066619. Atsome point I proposed ValueUtf8Coding with only removing thecomments from the implementation inhttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.htmlbut I have changed my mind since then and think now 6202130 shouldalso change the implementation not to break lines inside ofmulti-byte characters which is not part of the current patch and isprobably easier after the current patch if necessary at all. Both6202130 and the current patch for 8066619 here touch the UTF-8coding of manifests and which ever is solved first should add acorresponding test because no such test exists yet I believe. Worthto mention are test/jdk/tools/launcher/DiacriticTest.java andtest/jdk/tools/launcher/UnicodeTest.java both of which test the JVMlaunch and have a somewhat different purpose. I haven't found anyother test for the specifically changed lines of code apart fromprobably many tests that use manifests indirectly in some form.
Regards,
Philipp


On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
Hi Phillip,

The amount detail obscures the general purpose.
And there appears to be more than 1.
The Jira issue IDs mentioned are 8066619 and 6202130.

Is this functionally neutral and only fixes the deprecations?

There is a mention that a test is needed for multi-byte chars, but a test
is not included.  Is there an existing test for that?

Its probably best to identify the main functional improvement (multi-byte)
and fix the deprecation as a side effect.

Thanks for digging through the issues and the explanations;
it will take a bit of study to unravel and understand everything in this
changeset.

Regards, Roger


On 12/01/2018 06:49 AM, Philipp Kunz wrote:
Find the proposed patch attached. Some comments and explanations,here: There is a quite interesting implementation in Manifest andAttributes worth quite some explanation. The way it used to workbefore was: 1. put manifest header name, colon and space into aStringBuffer -> the buffer now contains a string of characterseach high-byte of which is zero as explained later why this isimportant. the high-bytes are zero because the set of allowedcharacters is very limited to ":", " ", "a" - "z", "A" - "Z", "0"- "9", "_", and "-" according to Attributes.Name#hash(String) sofar with only the name and the separator and yet without thevalues. 2. if the value is not null, encode it in UTF-8 into abyte array and instantiate a String with it using deprecatedString#String(byte[],int,int,int) resulting in a String with thesame length as the byte array before holding one byte in eachcharacter's low-byte. This makes a difference for charactersencoded with more than one byte in UTF-8. The new String ispotentially longer than the original value. 3. if the value is notnull, append value to buffer. The one UTF-8 encoded byte percharacter from the appended string is preserved also in the bufferalong with the previous buffer contents. 3alt. if the value isnull, add "null" to the buffer. Seejava.lang.AbstractStringBuilder#append(String). Neither of thecharacters of "null" has a non-zero high-byte encoded as UTF-16chars. 4. make72Safe inserts line breaks with continuation spaces.Note that the buffer here contains only one byte per characterbecause all high- bytes are still zero so that line.length() andline.insert(index, ...) effectively operate with byte offsets andnot characters. 5. buffer.toString() 6.DataOutputStream#writeBytes(String). First of all read the JavaDoccomment for it, which explains it all: Writes out the string tothe underlying output stream as a sequence of bytes. Eachcharacter in the string is written out, in sequence, bydiscarding its high eight bits. If no exception is thrown, thecounter <code>written</code> is incremented by the length of<code>s</code> This restores the earlier UTF-8 encoding correctly.The topic has been discussed and mentioned already inhttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.html https://bugs.openjdk.java.net/browse/JDK-6202130String(byte[],int,int,int) works "well" or "well enough" onlytogether with DataOutputStream#writeBytes(String). When removingString(byte[],int,int,int) from Manifest and Attributes becausedeprecated, it makes no sense to keep usingDataOutputStream#writeBytes(String) either. For the same reasonas String#String(byte[],int,int,int) has been deprecated, Isuggest to also deprecate java.io.DataOutput#writeBytes(String) asa separate issue. This might relate tohttps://bugs.openjdk.java.net/browse/JDK-6400767 but that one cameto a different conclusion some ten years ago. I preferred to stickwith the DataOutputStream even though not strictly necessary anymore. It is and has been in the API of Attributes (unfortunatelynot private) and should better not be removed by changing theparameter type. Same for Manifest#make72Safe(StringBuffer) which Ideprecated rather than having removed. Someone could have extendeda class from Manifest and use such a method and when changing thesignature it could no longer even compile in a far-fetched case.LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, andLINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having toinvoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "with the idea to slightly improve performance this way. I figuredit does not need JavaDoc comments but would be happy to add themif desired. I removed "XXX Need to handle UTF8 values." fromManifest#read after adding a test for it in ValueUtf8Coding. Thischange and test also relate to bug 6202130 but does not solve thatone completely. ValueUtf8Coding demonstrates that Manifest canread UTF-8 encoded values which is a necessary test case to coverfor this patch here. ValueUtf8Coding is the same test as alreadysubmitted and suggested earlier. Seehttp://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/thread.html#55848 Indentation in Attributes#write(DataOutputStream) wasfive spaces on most lines. I fixed indentation only on the lineschanged anyway. I replaced String#String(byte[],int,int,String)withString#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)which as a difference does not declare to throw ajava.io.UnsupportedEncodingException. That also replaced "UTF8" asa charset name which I would consider not optimal regardingsun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). Inmy opinion there is still some duplicated or at least very similarcode in Manifest#write, Attributes#writeMain, and Attributes#writebut I preferred to change less rather than more and not to furtherrefactor and re-combine it. In EmptyKeysAndValues andNullKeysAndValues tests I tried to demonstrate that the changedimplementation does not change behaviour also in edge cases. Iwould have expected not having to test all these cases but then Irealized it was possible to test and is therefore possible in areal use case as well however far-fetched. At least the if (value!= null) { lines (three times) most obviously demand to test thenull value cases. I'm looking curiously forward to any kind offeedback or opinion. Philipp

Re: RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Reply via email to