On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad
<claes.redes...@oracle.com <mailto:claes.redes...@oracle.com>> wrote:
<adding build-dev; see
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html
<http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>
for missing context>
Updated webrevs:
http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
http://cr.openjdk.java.net/~redestad/8181147/top.04/
<http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>
In this version there are various cleanups based on Martin's
comments, improved test coverage and some small changes to
ensure the updated test builds and runs on all our supported
platforms.
If someone with access to any of the platforms not supported
by Oracle could take this for a spin and ensure linking the newly
added test against libjava works as expected (or suggest how
to make it work if it doesn't) then that'd be much appreciated.
Thanks!
/Claes
On 06/13/2017 02:01 AM, Martin Buchholz wrote:
I'm hoping Xueming will review as well.
+ char asciiCheck;
+ for (asciiCheck = 0, p = str; *p != '\0'; p++) {
+ asciiCheck |= *p;
+ }
+ len = (p - str);
Probably conversion from ptrdiff_t to int needs a cast.
Someday we might need to worry about input string longer than
2^31, but that's a pre-existing problem.
The signed-ness of char being implementation-defined makes me
nervous.
Maybe do all the bit operations using explicit unsigned char,
then the final test becomes simply
asciiCheck <= 0x80
On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
<claes.redes...@oracle.com <mailto:claes.redes...@oracle.com>
<mailto:claes.redes...@oracle.com
<mailto:claes.redes...@oracle.com>>> wrote:
Hi Martin,
thanks for reviewing!
On 2017-06-12 22:14, Martin Buchholz wrote:
+/* Optimized for char set UTF-8 */
"charset" is a (poor misnomer!) jargon term, not two
words.
I got that from the existing use of "char set" in this
file, but
will fix it in all places.
---
+ for (b = str[len]; b != '\0'; len++, b =
str[len]) {
+ if (isAscii && b & 0x80) {
+ isAscii = JNI_FALSE;
+ }
+ }
I would write this more like
const signed char *p;
int isAscii;
for (isAscii = 0, p = (const signed char *) str; *p !=
'\0';
p++) isAscii &= (*p >= 0);
Did you mean for isAscii to be initialized to 1 (true) and
then be
cleared to 0 (false) when *p >= 0 is false?
Then length is (p - str)
How about something like this to hoist the second
comparison from
the loop:
int len;
char asciiCheck;
for (asciiCheck = 0, p = str; *p != '\0'; p++) {
asciiCheck |= *p;
}
len = (p - str);
if (asciiCheck & 0x80) {
// ascii fast-path
return newSizedString8859_1(env, str, len);
}
...
---
+ jbyteArray hab = NULL;
I'm having a hard time decoding the name "hab"
Not sure, but my guess is "heap allocated byteArray".
---
The code below is not UTF-8 specific. Can it be
refactored?
+ hab = (*env)->NewByteArray(env, len);
+ if (hab != 0) {
+ jclass strClazz = JNU_ClassString(env);
+ CHECK_NULL_RETURN(strClazz, 0);
+ (*env)->SetByteArrayRegion(env, hab, 0, len,
(jbyte
*)str);
+ result = (*env)->NewObject(env, strClazz,
+ String_init_ID, hab,
jnuEncoding);
+ (*env)->DeleteLocalRef(env, hab);
+ return result;
+ }
Yes, probably. The excessive copy-paste here is due to a
crash
issue I ran into when
Charset.isSupported was called without proper
initialization. It
has since been
resolved, but seems I forgot to revisit this part.
I also realized I haven't added a test for this method.
I'll look
into doing the
refactoring and adding some sanity testing tomorrow.
---
We probably want to use unicode escapes in out java
sources to
keep all source files strictly ASCII.
You mean in the test? Sure.
Refreshed the jdk webrev:
http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
more to
come.
Thanks!
/Claes