Re: 6202130: Need to handle UTF-8 values and break up lines longer than 72 bytes

2020-03-10 Thread Philipp Kunz
Hi Naoto,
Thank you for your reply. I fully agree to your reluctance or
inhibition to change the regex part just along with the jar manifest.
On the other hand, i'm quite confident that all invocations of
Grapheme.nextBoundary with my previous patch start matching at the
previous boundary and therefore I believe the cited requirement is met
which also the existing tests confirm. Obviously, it is difficult to
prove the correctness or absence of a mistake. In fact I was kind of
hoping someone would get inspired who knows the area better than I do.
Maybe relates to JDK-8216332 or JDK-8046101 to some extent, but I guess
not exactly.
Shall we file a new bug about a performance issue in regex grapheme
matchers, the performance of which potentially depends more than
linearly and necessarily on the total input sequence size
in Grapheme.nextBoundary?Shall we file another bug about an issue of
copy and paste in GraphemeTest?I am not entitled to create new bugs
myself but if there were I might contribute a patch later.
Regards,Philipp

On Tue, 2020-03-10 at 10:45 -0700, naoto.s...@oracle.com wrote:
> Hi Philipp,
> The reason that the current implementation of Grapheme boundary check
> does it is that, some rules require information before the boundary,
> i.e. the rule GB12 in Unicode Segmentation [1] reads:
> "do not break between regional indicator (RI) symbols if there is an
> odd number of RI characters *before* the break point."
> Of course this itself can be improved, but I am not so keen on
> modifying Grapheme/Pattern code at this time, so using BreakIterator
> (with Locale.US) is more preferred solution to me.
> Naoto
> [1] https://unicode.org/reports/tr29/
> 
> 
> On 3/10/20 12:11 AM, Philipp Kunz wrote:
> > Hi Naoto and everyone,
> > When I tried the regular expression approach you proposed which
> > really looks delightful, ValueUtf8Coding test ran into a timeout,
> > which is why I changed Pattern and Grapheme classes to start the
> > loop in nextBoundary to start at the offset off rather than at the
> > beginning of the character sequence src rendering isBoundary
> > redundant. The changes to Pattern and Grapheme relate only by means
> > of performance and could otherwise be applied seperately as well. I
> > admit I'm not too confident and familiar with the regex parts of
> > the openjdk and would welcome not only thorough review as usual but
> > also any other feedback or thought shared about the matter.
> > Then I started to wonder about an issue of copy paste between
> > Grapheme and GraphemeTest already out of date and also about some
> > comments in RegExTest and if no more matches shouldn't also be
> > asserted, which also could possibly be changed in a different
> > change but still relate to the actual changes to certain extent. I
> > could not find a way to access UCD_FILES which has no or is in the
> > default package.
> > What about passing the name and the value parameters of
> > Manifest.printLine72 to it as separate parameters? Because names
> > cannot exceed 70 bytes and the allowed characters are limited to
> > ones that are represented with one byte each in UTF-8 the name
> > can/could always be printed without a line break. I don't know if
> > Manifest.println72 can be removed or has to be retained and
> > deprecated for compatibility.
> > See attached patch.
> > Looking forward to receiving any kind of feedback and
> > regards,Philipp
> > 
> > 
> > 
> > 
> > On Tue, 2020-02-11 at 14:20 -0800, naoto.s...@oracle.com wrote:
> > > Hi,
> > > Can the code in Manifest.java be simplified using regex? E.g.,
> > > var gc =
> > > Pattern.compile("\\X");gc.matcher(line).results()  .map(Match
> > > Result::group)  .forEach( // print till 72 bytes in UTF-
> > > 8  );
> > > Just a thought. If BreakIterator is preferred, it should take
> > > Locale.USas an argument to the factory method, so that it would
> > > produce the sameresult no matter what the default locale is.
> > > Naoto
> > > On 2/10/20 1:22 PM, Lance Andersen wrote:
> > > > Hi all,
> > > > Here is a webrev for the patch that Philipp is proposing which
> > > > will make it easier to review:
> > > > http://cr.openjdk.java.net/~lancea/6202130/webrev.00
> > > >    > > > > On Dec 26, 2019, at 11:50 AM, Philipp Kunz <
> > > > > philipp.k...@paratix.ch
> > > > >  
> > > > > > wrote:
> > > > > 
> > > > > Hi,The specification says, a line break in a manifest can
> > > > > occur beforeorafter a Unicode character encoded in UTF-8.
> > > > > > ...>  value: SPACE *otherchar newline
> > > > > *continuation>  continuation:  SPACE
> > > > > *othercharnewline>...>  otherchar: any UTF-8
> > > > > character except NUL, CRand LFThe current implementation
> > > > > breaks manifest lines at 72 bytes regardlessofhow the bytes
> > > > > around the break are part of a sequence of bytesencoding
> > > > > acharacter. Code points may

Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread Yasumasa Suenaga

Thanks Sato-san!

Yasumasa


On 2020/03/11 12:27, naoto.s...@oracle.com wrote:

Looks good to me. Thanks for the fix!

Naoto

On 3/10/20 5:39 PM, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment.
I uploaded new webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.03/

On 2020/03/11 2:08, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

Thank you for the update. I think your changes to the return values of 
MultiByteToWideChar look good.

Then I noticed (should have noticed in the first place, sorry) that the error handling in canonicalize_md.c is 
incorrect (unrelated to your change). All error cases "goto finish" which would end up "free(NULL)" 
in some cases, e.g., line 340 (in the new file) should not end up calling "free(wresult)" and 
"free(wpath)"

java_md.c

- "convert_to_unicode()" has different indentation than others (2 spaces which 
should be 4).


I fixed it.



- Should "wpath" be set to NULL before returning EINVAL at line 524? I don't know the 
contract of "create_unc_path", but the caller would receive a garbage pointer as a return 
value.


create_unc_path() is static function, and it would be called by just JLI_Open().
If create_unc_path() would be failed (it means `err` is set excepting 
ERROR_SUCCESS), wpath would call free() when it is not NULL.

Thus we should set related values as below:

   A: wpath = NULL, err != ERROR_SUCCESS
   B: wpath != NULL (not free'ed), err != ERROR_SUCCESS

I implemented A in this webrev because it is simple.


Thanks,

Yasumasa



Nit: please add "noreg-hard" tag to the JIRA issue.

Naoto


On 3/9/20 7:52 PM, Yasumasa Suenaga wrote:

I forgot to free buffer if MultiByteToWideChar() failed in zip_util.c .
So I updated webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.02/


Yasumasa


On 2020/03/10 9:12, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment!

I updated webrev. Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.01/


Yasumasa


On 2020/03/10 2:24, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

I think the return value from the second MultiByteToWideChar invocation should 
be examined (non-zero), and if it was not successful, appropriate action should 
be taken.

Naoto

On 3/9/20 3:50 AM, Yasumasa Suenaga wrote:

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8240725
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.00/

We found the issue that HotSpot does not start when it is deployed on the path 
which contains CJK character(s), and it has been fixed in JDK-8240197.

On the review of JDK-8240197 [1], we concern similar issue might occur in other 
place, and I found potentially problem in below:

- ZFILE_Open() @ zip_util.c
- JDK_Canonicalize() @ canonicalize_md.c (for Windows)
- create_unc_path() @ java_md.c (for Windows)
- Platform::MultibyteStringToWideString() @ WindowsPlatform.cpp

This change passed tests on submit repo 
(mach5-one-ysuenaga-JDK-8240725-20200309-0811-9304139).


Thanks,

Yasumasa


[1] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-March/038397.html


Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread naoto . sato

Looks good to me. Thanks for the fix!

Naoto

On 3/10/20 5:39 PM, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment.
I uploaded new webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.03/

On 2020/03/11 2:08, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

Thank you for the update. I think your changes to the return values of 
MultiByteToWideChar look good.


Then I noticed (should have noticed in the first place, sorry) that 
the error handling in canonicalize_md.c is incorrect (unrelated to 
your change). All error cases "goto finish" which would end up 
"free(NULL)" in some cases, e.g., line 340 (in the new file) should 
not end up calling "free(wresult)" and "free(wpath)"


java_md.c

- "convert_to_unicode()" has different indentation than others (2 
spaces which should be 4).


I fixed it.


- Should "wpath" be set to NULL before returning EINVAL at line 524? I 
don't know the contract of "create_unc_path", but the caller would 
receive a garbage pointer as a return value.


create_unc_path() is static function, and it would be called by just 
JLI_Open().
If create_unc_path() would be failed (it means `err` is set excepting 
ERROR_SUCCESS), wpath would call free() when it is not NULL.


Thus we should set related values as below:

   A: wpath = NULL, err != ERROR_SUCCESS
   B: wpath != NULL (not free'ed), err != ERROR_SUCCESS

I implemented A in this webrev because it is simple.


Thanks,

Yasumasa



Nit: please add "noreg-hard" tag to the JIRA issue.

Naoto


On 3/9/20 7:52 PM, Yasumasa Suenaga wrote:

I forgot to free buffer if MultiByteToWideChar() failed in zip_util.c .
So I updated webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.02/


Yasumasa


On 2020/03/10 9:12, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment!

I updated webrev. Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.01/


Yasumasa


On 2020/03/10 2:24, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

I think the return value from the second MultiByteToWideChar 
invocation should be examined (non-zero), and if it was not 
successful, appropriate action should be taken.


Naoto

On 3/9/20 3:50 AM, Yasumasa Suenaga wrote:

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8240725
   webrev: 
http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.00/


We found the issue that HotSpot does not start when it is deployed 
on the path which contains CJK character(s), and it has been fixed 
in JDK-8240197.


On the review of JDK-8240197 [1], we concern similar issue might 
occur in other place, and I found potentially problem in below:


- ZFILE_Open() @ zip_util.c
- JDK_Canonicalize() @ canonicalize_md.c (for Windows)
- create_unc_path() @ java_md.c (for Windows)
- Platform::MultibyteStringToWideString() @ WindowsPlatform.cpp

This change passed tests on submit repo 
(mach5-one-ysuenaga-JDK-8240725-20200309-0811-9304139).



Thanks,

Yasumasa


[1] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-March/038397.html 



RE: RFR(S): 8240734: ModuleHashes attribute not reproducible between builds

2020-03-10 Thread Yangfei (Felix)
> -Original Message-
> From: Alan Bateman [mailto:alan.bate...@oracle.com]
> Sent: Tuesday, March 10, 2020 3:53 PM
> To: Yangfei (Felix) ; core-libs-dev@openjdk.java.net
> Subject: Re: RFR(S): 8240734: ModuleHashes attribute not reproducible
> between builds
> 
> On 10/03/2020 02:54, Yangfei (Felix) wrote:
> > Hi,
> >
> >We found module-info.class in java.base.jmod is not always consistent
> across different builds.
> >The ModuleHashes attribute in this module-info.class is not reproducible
> between builds.
> >Patch fixes the issue by using TreeMap instead of HashMap when
> computing ModuleHashes.
> >
> >Bug: https://bugs.openjdk.java.net/browse/JDK-8240734
> >Webrev: http://cr.openjdk.java.net/~fyang/8240734/webrev.00/
> >
> This looks okay (and to Bernd question, this is used at link time). I think 
> we need
> to extend the test coverage for reproducible builds, the existing
> JLinkReproducibleTest.java would ideally have caught the difference due to the
> random ordering of hashes.

Thanks for reviewing this.  Pushed.  

We see two other issues which affect building reproducible jdk are resolved in 
12: 
  1. 8214230: Classes generated by SystemModulesPlugin.java are not 
reproducible. 
  2. 8211057: Gensrc step CompileProperties generates unstable 
CompilerProperties output. 

I am inclined to backport these three to 11u so that we can get consistent 11u 
builds for verification.
What do you think? 

Best regards,
Felix


Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread Yasumasa Suenaga

Hi Sato-san,

Thanks for your comment.
I uploaded new webrev:

  http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.03/

On 2020/03/11 2:08, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

Thank you for the update. I think your changes to the return values of 
MultiByteToWideChar look good.

Then I noticed (should have noticed in the first place, sorry) that the error handling in canonicalize_md.c is 
incorrect (unrelated to your change). All error cases "goto finish" which would end up "free(NULL)" 
in some cases, e.g., line 340 (in the new file) should not end up calling "free(wresult)" and 
"free(wpath)"

java_md.c

- "convert_to_unicode()" has different indentation than others (2 spaces which 
should be 4).


I fixed it.



- Should "wpath" be set to NULL before returning EINVAL at line 524? I don't know the 
contract of "create_unc_path", but the caller would receive a garbage pointer as a return 
value.


create_unc_path() is static function, and it would be called by just JLI_Open().
If create_unc_path() would be failed (it means `err` is set excepting 
ERROR_SUCCESS), wpath would call free() when it is not NULL.

Thus we should set related values as below:

  A: wpath = NULL, err != ERROR_SUCCESS
  B: wpath != NULL (not free'ed), err != ERROR_SUCCESS

I implemented A in this webrev because it is simple.


Thanks,

Yasumasa



Nit: please add "noreg-hard" tag to the JIRA issue.

Naoto


On 3/9/20 7:52 PM, Yasumasa Suenaga wrote:

I forgot to free buffer if MultiByteToWideChar() failed in zip_util.c .
So I updated webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.02/


Yasumasa


On 2020/03/10 9:12, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment!

I updated webrev. Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.01/


Yasumasa


On 2020/03/10 2:24, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

I think the return value from the second MultiByteToWideChar invocation should 
be examined (non-zero), and if it was not successful, appropriate action should 
be taken.

Naoto

On 3/9/20 3:50 AM, Yasumasa Suenaga wrote:

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8240725
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.00/

We found the issue that HotSpot does not start when it is deployed on the path 
which contains CJK character(s), and it has been fixed in JDK-8240197.

On the review of JDK-8240197 [1], we concern similar issue might occur in other 
place, and I found potentially problem in below:

- ZFILE_Open() @ zip_util.c
- JDK_Canonicalize() @ canonicalize_md.c (for Windows)
- create_unc_path() @ java_md.c (for Windows)
- Platform::MultibyteStringToWideString() @ WindowsPlatform.cpp

This change passed tests on submit repo 
(mach5-one-ysuenaga-JDK-8240725-20200309-0811-9304139).


Thanks,

Yasumasa


[1] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-March/038397.html


Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread naoto . sato

Ah, thanks Martin. Never mind about the comment wrt free(NULL) then.

Naoto

On 3/10/20 11:30 AM, Martin Buchholz wrote:



On Tue, Mar 10, 2020 at 10:08 AM > wrote:



Then I noticed (should have noticed in the first place, sorry) that the
error handling in canonicalize_md.c is incorrect (unrelated to your
change). All error cases "goto finish" which would end up "free(NULL)"
in some cases, e.g.


FYI  free(NULL) is a no-op
https://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html 



Re: Review Request: JDK-8240242: improve the javadoc for Lookup::dropLookupModes w.r.t. dropping UNCONDITIONAL

2020-03-10 Thread Chris Hegarty
This looks good to me Mandy.

Thanks,
-Chris.

> On 10 Mar 2020, at 18:27, Mandy Chung  wrote:
> 
> Hi Chris,
> 
> Below is the revised patch per your suggestion.  I made some minor fixes in 
> the method name shown in the "Access modes" section as well.
> 
> Mandy
> 
> diff --git a/src/java.base/share/classes/java/lang/invoke/MethodHandles.java 
> b/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
> --- a/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
> +++ b/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
> @@ -740,11 +740,11 @@
>   * The table below shows the access modes of a {@code Lookup} produced by
>   * any of the following factory or transformation methods:
>   * 
> - * {@link #lookup() MethodHandles.lookup()}
> - * {@link #publicLookup() MethodHandles.publicLookup()}
> - * {@link #privateLookupIn(Class, Lookup) 
> MethodHandles.privateLookupIn}
> - * {@link Lookup#in}
> - * {@link Lookup#dropLookupMode(int)}
> + * {@link #lookup() MethodHandles::lookup}
> + * {@link #publicLookup() MethodHandles::publicLookup}
> + * {@link #privateLookupIn(Class, Lookup) 
> MethodHandles::privateLookupIn}
> + * {@link Lookup#in Lookup::in}
> + * {@link Lookup#dropLookupMode(int) Lookup::dropLookupMode}
>   * 
>   *
>   * 
> @@ -1524,14 +1524,22 @@
>   * Creates a lookup on the same lookup class which this lookup object
>   * finds members, but with a lookup mode that has lost the given 
> lookup mode.
>   * The lookup mode to drop is one of {@link #PUBLIC PUBLIC}, {@link 
> #MODULE
> - * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED} 
> or {@link #PRIVATE PRIVATE}.
> - * {@link #PROTECTED PROTECTED} is always
> - * dropped and so the resulting lookup mode will never have this 
> access capability.
> - * When dropping {@code PACKAGE} then the resulting lookup will not 
> have {@code PACKAGE}
> - * or {@code PRIVATE} access. When dropping {@code MODULE} then the 
> resulting lookup will
> - * not have {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} 
> access. If {@code PUBLIC}
> - * is dropped then the resulting lookup has no access. If {@code 
> UNCONDITIONAL}
> - * is dropped then the resulting lookup has no access.
> + * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED},
> + * {@link #PRIVATE PRIVATE}, or {@link #UNCONDITIONAL UNCONDITIONAL}.
> + *
> + *  If this lookup is a {@linkplain MethodHandles#publicLookup() 
> public lookup},
> + * this lookup has {@code UNCONDITIONAL} mode set and it has no 
> other mode set.
> + * When dropping {@code UNCONDITIONAL} on a public lookup then the 
> resulting
> + * lookup has no access.
> + *
> + *  If this lookup is not a public lookup, then the following 
> applies
> + * regardless of its {@linkplain #lookupModes() lookup modes}.
> + * {@link #PROTECTED PROTECTED} is always dropped and so the 
> resulting lookup
> + * mode will never have this access capability. When dropping {@code 
> PACKAGE}
> + * then the resulting lookup will not have {@code PACKAGE} or {@code 
> PRIVATE}
> + * access. When dropping {@code MODULE} then the resulting lookup 
> will not
> + * have {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} access.
> + * When dropping {@code PUBLIC} then the resulting lookup has no 
> access.
>   *
>   * @apiNote
>   * A lookup with {@code PACKAGE} but not {@code PRIVATE} mode can 
> safely



Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread Martin Buchholz
On Tue, Mar 10, 2020 at 10:08 AM  wrote:

>
> Then I noticed (should have noticed in the first place, sorry) that the
> error handling in canonicalize_md.c is incorrect (unrelated to your
> change). All error cases "goto finish" which would end up "free(NULL)"
> in some cases, e.g.
>

FYI  free(NULL) is a no-op
https://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html


Review Request: JDK-8240242: improve the javadoc for Lookup::dropLookupModes w.r.t. dropping UNCONDITIONAL

2020-03-10 Thread Mandy Chung

Hi Chris,

Below is the revised patch per your suggestion.  I made some minor fixes 
in the method name shown in the "Access modes" section as well.


Mandy

diff --git 
a/src/java.base/share/classes/java/lang/invoke/MethodHandles.java 
b/src/java.base/share/classes/java/lang/invoke/MethodHandles.java

--- a/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
+++ b/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
@@ -740,11 +740,11 @@
  * The table below shows the access modes of a {@code Lookup} 
produced by

  * any of the following factory or transformation methods:
  * 
- * {@link #lookup() MethodHandles.lookup()}
- * {@link #publicLookup() MethodHandles.publicLookup()}
- * {@link #privateLookupIn(Class, Lookup) 
MethodHandles.privateLookupIn}

- * {@link Lookup#in}
- * {@link Lookup#dropLookupMode(int)}
+ * {@link #lookup() MethodHandles::lookup}
+ * {@link #publicLookup() MethodHandles::publicLookup}
+ * {@link #privateLookupIn(Class, Lookup) 
MethodHandles::privateLookupIn}

+ * {@link Lookup#in Lookup::in}
+ * {@link Lookup#dropLookupMode(int) Lookup::dropLookupMode}
  * 
  *
  * 
@@ -1524,14 +1524,22 @@
  * Creates a lookup on the same lookup class which this lookup 
object
  * finds members, but with a lookup mode that has lost the 
given lookup mode.
  * The lookup mode to drop is one of {@link #PUBLIC PUBLIC}, 
{@link #MODULE
- * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED 
PROTECTED} or {@link #PRIVATE PRIVATE}.

- * {@link #PROTECTED PROTECTED} is always
- * dropped and so the resulting lookup mode will never have 
this access capability.
- * When dropping {@code PACKAGE} then the resulting lookup will 
not have {@code PACKAGE}
- * or {@code PRIVATE} access. When dropping {@code MODULE} then 
the resulting lookup will
- * not have {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} 
access. If {@code PUBLIC}
- * is dropped then the resulting lookup has no access. If 
{@code UNCONDITIONAL}

- * is dropped then the resulting lookup has no access.
+ * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED},
+ * {@link #PRIVATE PRIVATE}, or {@link #UNCONDITIONAL 
UNCONDITIONAL}.

+ *
+ *  If this lookup is a {@linkplain 
MethodHandles#publicLookup() public lookup},
+ * this lookup has {@code UNCONDITIONAL} mode set and it has no 
other mode set.
+ * When dropping {@code UNCONDITIONAL} on a public lookup then 
the resulting

+ * lookup has no access.
+ *
+ *  If this lookup is not a public lookup, then the 
following applies

+ * regardless of its {@linkplain #lookupModes() lookup modes}.
+ * {@link #PROTECTED PROTECTED} is always dropped and so the 
resulting lookup
+ * mode will never have this access capability. When dropping 
{@code PACKAGE}
+ * then the resulting lookup will not have {@code PACKAGE} or 
{@code PRIVATE}
+ * access. When dropping {@code MODULE} then the resulting 
lookup will not

+ * have {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} access.
+ * When dropping {@code PUBLIC} then the resulting lookup has 
no access.

  *
  * @apiNote
  * A lookup with {@code PACKAGE} but not {@code PRIVATE} mode 
can safely


Re: 6202130: Need to handle UTF-8 values and break up lines longer than 72 bytes

2020-03-10 Thread naoto . sato

Hi Philipp,

The reason that the current implementation of Grapheme boundary check 
does it is that, some rules require information before the boundary, 
i.e. the rule GB12 in Unicode Segmentation [1] reads:


"do not break between regional indicator (RI) symbols if there is an odd 
number of RI characters *before* the break point."


Of course this itself can be improved, but I am not so keen on modifying 
Grapheme/Pattern code at this time, so using BreakIterator (with 
Locale.US) is more preferred solution to me.


Naoto

[1] https://unicode.org/reports/tr29/


On 3/10/20 12:11 AM, Philipp Kunz wrote:

Hi Naoto and everyone,

When I tried the regular expression approach you proposed which really 
looks delightful, ValueUtf8Coding test ran into a timeout, which is why 
I changed Pattern and Grapheme classes to start the loop in nextBoundary 
to start at the offset off rather than at the beginning of the character 
sequence src rendering isBoundary redundant. The changes to Pattern and 
Grapheme relate only by means of performance and could otherwise be 
applied seperately as well. I admit I'm not too confident and familiar 
with the regex parts of the openjdk and would welcome not only thorough 
review as usual but also any other feedback or thought shared about the 
matter.


Then I started to wonder about an issue of copy paste between Grapheme 
and GraphemeTest already out of date and also about some comments in 
RegExTest and if no more matches shouldn't also be asserted, which also 
could possibly be changed in a different change but still relate to the 
actual changes to certain extent. I could not find a way to access 
UCD_FILES which has no or is in the default package.


What about passing the name and the value parameters of 
Manifest.printLine72 to it as separate parameters? Because names cannot 
exceed 70 bytes and the allowed characters are limited to ones that are 
represented with one byte each in UTF-8 the name can/could always be 
printed without a line break. I don't know if Manifest.println72 can be 
removed or has to be retained and deprecated for compatibility.


See attached patch.

Looking forward to receiving any kind of feedback and regards,
Philipp





On Tue, 2020-02-11 at 14:20 -0800, naoto.s...@oracle.com wrote:

Hi,

Can the code in Manifest.java be simplified using regex? E.g.,

var gc = Pattern.compile("\\X");
gc.matcher(line).results()
  .map(MatchResult::group)
  .forEach(
// print till 72 bytes in UTF-8
  );

Just a thought. If BreakIterator is preferred, it should take Locale.US
as an argument to the factory method, so that it would produce the same
result no matter what the default locale is.

Naoto

On 2/10/20 1:22 PM, Lance Andersen wrote:

Hi all,

Here is a webrev for the patch that Philipp is proposing which will make it 
easier to review:
http://cr.openjdk.java.net/~lancea/6202130/webrev.00
  <
http://cr.openjdk.java.net/~lancea/6202130/webrev.00
>


On Dec 26, 2019, at 11:50 AM, Philipp Kunz <
philipp.k...@paratix.ch
 
> wrote:

Hi,
The specification says, a line break in a manifest can occur beforeor
after a Unicode character encoded in UTF-8.

...>  value: SPACE *otherchar newline

*continuation>  continuation:  SPACE *otherchar
newline>...>  otherchar: any UTF-8 character except NUL, CR
and LF
The current implementation breaks manifest lines at 72 bytes regardless
ofhow the bytes around the break are part of a sequence of bytes
encoding acharacter. Code points may use up to four bytes when encoded
in UTF-8.Manifests with line breaks inside of sequences of bytes
encoding Unicodecharacters in UTF-8 with more than one bytes not only
are invalid UTF-8but also look ugly in text editors.For example, a
manifest could look like this:
import java.util.jar.Manifest;import java.util.jar.Attributes;import
static java.util.jar.Attributes.Name;
public class CharacterBrokenDemo1 {public static void main(String[]
args) throws Exception{Manifest mf = new
Manifest();Attributes attrs =
mf.getMainAttributes();attrs.put(Name.MANIFEST_VERSION,
"1.0");attrs.put(new Name("Some-Key"),  "Some
languages have decorated characters, " +   "for
example: español"); // or
"espa\u00D1ol"mf.write(System.out);}}
Above code produces a result as follows with some unexpected question
markswhere the encoding is invalid:

Manifest-Version: 1.0>Some-Key: Some languages have decorated

characters, for example: espa?> ?ol
This is of course an example written with actual question marks to get
a validtext for this message. The trick here is that "Some-Key" to
"example :espa"amounts to exactly one byte less encoded in UTF-8 than
would fit on one linewith the 72 byte limit so that the subsequent
character encoded with two bytesgets broken inside of the sequence of
two bytes for this character across acontinuation line break.
When 

Re: RFR(S): 8240734: ModuleHashes attribute not reproducible between builds

2020-03-10 Thread Alan Bateman

On 10/03/2020 15:10, Martin Buchholz wrote:
Aside: if all you want to do is make a use of HashMap more 
reproducible, the standard reaction is s/HashMap/LinkedHashMap/g
Yes, if the insertion order is reproducible. We've had several issues 
with trying to get to reproducible builds, I suspect this one was missed 
because the tests aren't using --keep-packaged-modules so there isn't 
JMOD files in the run-time images.


-Alan


Re: RFR: 8240725: Some functions might not work with CJK character

2020-03-10 Thread naoto . sato

Hi Suenaga-san,

Thank you for the update. I think your changes to the return values of 
MultiByteToWideChar look good.


Then I noticed (should have noticed in the first place, sorry) that the 
error handling in canonicalize_md.c is incorrect (unrelated to your 
change). All error cases "goto finish" which would end up "free(NULL)" 
in some cases, e.g., line 340 (in the new file) should not end up 
calling "free(wresult)" and "free(wpath)"


java_md.c

- "convert_to_unicode()" has different indentation than others (2 spaces 
which should be 4).


- Should "wpath" be set to NULL before returning EINVAL at line 524? I 
don't know the contract of "create_unc_path", but the caller would 
receive a garbage pointer as a return value.


Nit: please add "noreg-hard" tag to the JIRA issue.

Naoto


On 3/9/20 7:52 PM, Yasumasa Suenaga wrote:

I forgot to free buffer if MultiByteToWideChar() failed in zip_util.c .
So I updated webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.02/


Yasumasa


On 2020/03/10 9:12, Yasumasa Suenaga wrote:

Hi Sato-san,

Thanks for your comment!

I updated webrev. Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.01/


Yasumasa


On 2020/03/10 2:24, naoto.s...@oracle.com wrote:

Hi Suenaga-san,

I think the return value from the second MultiByteToWideChar 
invocation should be examined (non-zero), and if it was not 
successful, appropriate action should be taken.


Naoto

On 3/9/20 3:50 AM, Yasumasa Suenaga wrote:

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8240725
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240725/webrev.00/

We found the issue that HotSpot does not start when it is deployed 
on the path which contains CJK character(s), and it has been fixed 
in JDK-8240197.


On the review of JDK-8240197 [1], we concern similar issue might 
occur in other place, and I found potentially problem in below:


- ZFILE_Open() @ zip_util.c
- JDK_Canonicalize() @ canonicalize_md.c (for Windows)
- create_unc_path() @ java_md.c (for Windows)
- Platform::MultibyteStringToWideString() @ WindowsPlatform.cpp

This change passed tests on submit repo 
(mach5-one-ysuenaga-JDK-8240725-20200309-0811-9304139).



Thanks,

Yasumasa


[1] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-March/038397.html 



Re: RFR(S): 8240734: ModuleHashes attribute not reproducible between builds

2020-03-10 Thread Martin Buchholz
Aside: if all you want to do is make a use of HashMap more reproducible,
the standard reaction is s/HashMap/LinkedHashMap/g

On Tue, Mar 10, 2020 at 12:53 AM Alan Bateman 
wrote:

> On 10/03/2020 02:54, Yangfei (Felix) wrote:
> > Hi,
> >
> >We found module-info.class in java.base.jmod is not always consistent
> across different builds.
> >The ModuleHashes attribute in this module-info.class is not
> reproducible between builds.
> >Patch fixes the issue by using TreeMap instead of HashMap when
> computing ModuleHashes.
> >
> >Bug: https://bugs.openjdk.java.net/browse/JDK-8240734
> >Webrev: http://cr.openjdk.java.net/~fyang/8240734/webrev.00/
> >
> This looks okay (and to Bernd question, this is used at link time). I
> think we need to extend the test coverage for reproducible builds, the
> existing JLinkReproducibleTest.java would ideally have caught the
> difference due to the random ordering of hashes.
>
> -Alan.
>
>


Re: JDK14 spec query : MethodHandles:dropLookupMode(int)

2020-03-10 Thread Chris Hegarty
Mandy,

> On 9 Mar 2020, at 18:55, Mandy Chung  wrote:
> 
> ...
> 
> Here is the spec clarification I am thinking of that may explain why the 
> focus is not whether MODULE bit is set or not.
> 
> @@ -1524,14 +1524,20 @@
>   * Creates a lookup on the same lookup class which this lookup object
>   * finds members, but with a lookup mode that has lost the given 
> lookup mode.
>   * The lookup mode to drop is one of {@link #PUBLIC PUBLIC}, {@link 
> #MODULE
> - * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED} 
> or {@link #PRIVATE PRIVATE}.
> - * {@link #PROTECTED PROTECTED} is always
> - * dropped and so the resulting lookup mode will never have this 
> access capability.
> - * When dropping {@code PACKAGE} then the resulting lookup will not 
> have {@code PACKAGE}
> - * or {@code PRIVATE} access. When dropping {@code MODULE} then the 
> resulting lookup will
> - * not have {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} 
> access. If {@code PUBLIC}
> - * is dropped then the resulting lookup has no access. If {@code 
> UNCONDITIONAL}
> - * is dropped then the resulting lookup has no access.
> + * MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED},
> + * {@link #PRIVATE PRIVATE}, or {@code UNCONDITIONAL}.

All six lookup modes are enumerated. Good.

> + * If this lookup has at least {@code PUBLIC} mode then
> + * {@link #PROTECTED PROTECTED} is always dropped and so the 
> resulting lookup
> + * mode will never have this access capability.  When dropping 
> {@code PACKAGE}
> + * then the resulting lookup will not have {@code PACKAGE} or {@code 
> PRIVATE} access.
> + * When dropping {@code MODULE} then the resulting lookup will not 
> have
> + * {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} access.
> + * When dropping {@code PUBLIC} then the result lookup has no access.

I’m afraid this is still a little confusing to me ( but that could be just me! 
).

> + *  If this lookup has {@code UNCONDITIONAL} mode, this lookup is 
> a
> + * {@linkplain MethodHandles#publicLookup() public lookup} and it 
> has no
> + * other mode set.  When dropping {@code UNCONDITIONAL} on a public 
> lookup
> + * then the resulting lookup has has no access.

The public lookup scenario is very clear.

Just an idea; now that you have this new UNCONDITIONAL paragraph, it could be 
simpler to reorder things a little to build upon it. For example:


  /**
* Creates a lookup on the same lookup class which this lookup object
* finds members, but with a lookup mode that has lost the given lookup mode.
* The lookup mode to drop is one of {@link #PUBLIC PUBLIC}, {@link #MODULE
* MODULE}, {@link #PACKAGE PACKAGE}, {@link #PROTECTED PROTECTED},
* {@link #PRIVATE PRIVATE}, or {@code UNCONDITIONAL}.
*
*  If this lookup has {@code UNCONDITIONAL} mode, this lookup is a
* {@linkplain MethodHandles#publicLookup() public lookup} and it has no
* other mode set. When dropping {@code UNCONDITIONAL} on a public lookup
* then the resulting lookup has has no access.
*
*  If this lookup is not a public lookup, then the following
* applies regardless of the actual lookup modes held.
* The {@link #PROTECTED PROTECTED} mode is always dropped and so the 
resulting lookup
* mode will never have this access capability.  When dropping {@code 
PACKAGE}
* then the resulting lookup will not have {@code PACKAGE} or {@code 
PRIVATE} access.
* When dropping {@code MODULE} then the resulting lookup will not have
* {@code MODULE}, {@code PACKAGE}, or {@code PRIVATE} access.
* When dropping {@code PUBLIC} then the result lookup has no access.
* ...


-Chris.




Re: RFR(S): 8240734: ModuleHashes attribute not reproducible between builds

2020-03-10 Thread Alan Bateman

On 10/03/2020 02:54, Yangfei (Felix) wrote:

Hi,

   We found module-info.class in java.base.jmod is not always consistent across 
different builds.
   The ModuleHashes attribute in this module-info.class is not reproducible 
between builds.
   Patch fixes the issue by using TreeMap instead of HashMap when computing 
ModuleHashes.

   Bug: https://bugs.openjdk.java.net/browse/JDK-8240734
   Webrev: http://cr.openjdk.java.net/~fyang/8240734/webrev.00/

This looks okay (and to Bernd question, this is used at link time). I 
think we need to extend the test coverage for reproducible builds, the 
existing JLinkReproducibleTest.java would ideally have caught the 
difference due to the random ordering of hashes.


-Alan.



Re: 6202130: Need to handle UTF-8 values and break up lines longer than 72 bytes

2020-03-10 Thread Philipp Kunz
Hi Naoto and everyone,
When I tried the regular expression approach you proposed which really
looks delightful, ValueUtf8Coding test ran into a timeout, which is why
I changed Pattern and Grapheme classes to start the loop in
nextBoundary to start at the offset off rather than at the beginning of
the character sequence src rendering isBoundary redundant. The changes
to Pattern and Grapheme relate only by means of performance and could
otherwise be applied seperately as well. I admit I'm not too confident
and familiar with the regex parts of the openjdk and would welcome not
only thorough review as usual but also any other feedback or
thought shared about the matter.
Then I started to wonder about an issue of copy paste between Grapheme
and GraphemeTest already out of date and also about some comments in
RegExTest and if no more matches shouldn't also be asserted, which also
could possibly be changed in a different change but still relate to the
actual changes to certain extent. I could not find a way to access
UCD_FILES which has no or is in the default package.
What about passing the name and the value parameters of
Manifest.printLine72 to it as separate parameters? Because names cannot
exceed 70 bytes and the allowed characters are limited to ones that are
represented with one byte each in UTF-8 the name can/could always be
printed without a line break. I don't know if Manifest.println72 can be
removed or has to be retained and deprecated for compatibility.
See attached patch.
Looking forward to receiving any kind of feedback and regards,Philipp




On Tue, 2020-02-11 at 14:20 -0800, naoto.s...@oracle.com wrote:
> Hi,
> Can the code in Manifest.java be simplified using regex? E.g.,
> var gc =
> Pattern.compile("\\X");gc.matcher(line).results() .map(MatchResul
> t::group) .forEach(   // print till 72 bytes in UTF-8 );
> Just a thought. If BreakIterator is preferred, it should take
> Locale.US as an argument to the factory method, so that it would
> produce the same result no matter what the default locale is.
> Naoto
> On 2/10/20 1:22 PM, Lance Andersen wrote:
> > Hi all,
> > Here is a webrev for the patch that Philipp is proposing which will
> > make it easier to review:  
> > http://cr.openjdk.java.net/~lancea/6202130/webrev.00 <
> > http://cr.openjdk.java.net/~lancea/6202130/webrev.00>
> > > On Dec 26, 2019, at 11:50 AM, Philipp Kunz <
> > > philipp.k...@paratix.ch> wrote:
> > > Hi,The specification says, a line break in a manifest can occur
> > > beforeorafter a Unicode character encoded in UTF-8.
> > > >...>  value: SPACE *otherchar newline
> > > *continuation>  continuation:  SPACE
> > > *othercharnewline>...>  otherchar: any UTF-8
> > > character except NUL, CRand LFThe current implementation breaks
> > > manifest lines at 72 bytes regardlessofhow the bytes around the
> > > break are part of a sequence of bytesencoding acharacter. Code
> > > points may use up to four bytes when encodedin UTF-8.Manifests
> > > with line breaks inside of sequences of bytesencoding
> > > Unicodecharacters in UTF-8 with more than one bytes not onlyare
> > > invalid UTF-8but also look ugly in text editors.For example,
> > > amanifest could look like this:import
> > > java.util.jar.Manifest;import
> > > java.util.jar.Attributes;importstatic
> > > java.util.jar.Attributes.Name;public class CharacterBrokenDemo1
> > > {public static void main(String[]args) throws
> > > Exception{Manifest mf = newManifest();Attributes
> > > attrs
> > > =mf.getMainAttributes();attrs.put(Name.MANIFEST_VERSION,"
> > > 1.0");attrs.put(new Name("Some-
> > > Key"),  "Somelanguages have decorated characters,
> > > " +   "forexample: español"); //
> > > or"espa\u00D1ol"mf.write(System.out);}}Above code
> > > produces a result as follows with some unexpected
> > > questionmarkswhere the encoding is invalid:
> > > >Manifest-Version: 1.0>Some-Key: Some languages have
> > > > decorated
> > > characters, for example: espa?> ?olThis is of course an
> > > example written with actual question marks to geta validtext for
> > > this message. The trick here is that "Some-Key" to"example
> > > :espa"amounts to exactly one byte less encoded in UTF-8 thanwould
> > > fit on one linewith the 72 byte limit so that the
> > > subsequentcharacter encoded with two bytesgets broken inside of
> > > the sequence oftwo bytes for this character across acontinuation
> > > line break.When decoding the resulting bytes from UTF-8 as one
> > > whole string, thetwoquestion marks will not fit together again
> > > even if the line breakwith thecontinuation space is removed.
> > > However, Manifest::read removesthe continuationline breaks ("\r\n
> > > ") before decoding the manifestheader value from UTF-8 andhence
> > > can reproduce the original value.Characters encoded in UTF-8 can
> > > not only span up to four bytes for onecodepoint each, the