Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
On 16/07/2012 19:42, Xueming Shen wrote: Hi Alan, I gave the continue -else if a try, it appears the server vm obviously likes the continue approach (it is consistent with what we experienced in the past when we did similar approach for ascii, in which we separate/singled the ascii path out). So I guess we probably want to keep the continue here. Here are the data. dbcs2_new is to replace the continue with else if and the dbcs2 is the continue. http://cr.openjdk.java.net/~sherman/7183053/dbcs2_new http://cr.openjdk.java.net/~sherman/7183053/dbcs2 -Sherman Thanks for checking, I had expected it would ultimately get compiled to the same code but it seems not. In that case the changes in the webrev look fine to me. -Alan
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Hi Alan, I gave the continue -else if a try, it appears the server vm obviously likes the continue approach (it is consistent with what we experienced in the past when we did similar approach for ascii, in which we separate/singled the ascii path out). So I guess we probably want to keep the continue here. Here are the data. dbcs2_new is to replace the continue with else if and the dbcs2 is the continue. http://cr.openjdk.java.net/~sherman/7183053/dbcs2_new http://cr.openjdk.java.net/~sherman/7183053/dbcs2 -Sherman On 07/13/2012 10:09 AM, Xueming Shen wrote: In JDK7, the decoder and encoder implementation of most of our single-byte charsets and UTF-8 charset are optimized to implement the internal interfce sun.nio.cs.ArrayDecoder/ Encoder to provide a fastpath for String.getBytes(...) and new String(byte[]...) operations. I have an old blog regarding this optimization at https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs This rfe, as the followup for above changes, is to implement ArrayDe/Encoder for most of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is the webrev http://cr.openjdk.java.net/~sherman/7183053/webrev I've taken a pass over this and it's great to see DoubleByte.Decoder/Encoder implementing sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small number of regressions (Big5 at len=32 for example) but this is a micro benchmark and I'm sure there are fluctuations. I don't see anything obviously wrong with the EBCDIC changes I'd need a history book to remember how the shifts between DBCS and SBCS. I think our tests our good for this area so I'm happy. One minor nit is the continue in both encode methods, I think it would be cleaner to use else if (bb ... instead. The continue might make the vm happy, but this is the code I did last Oct, so I might be wrong. Will give a couple run later with else
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Am 13.07.2012 23:59, schrieb Dalibor Topic: Tim is going through then issues in Bugzilla and closing the already fixed ones, and migrating the rest over to bug.sun.com in preparation for the the JIRA migration, afaik. So if I understand Joe correct, the patches form bugs.openjdk.java.net have been copied to the appropriate bug.sun.com bugs, and later will be again publicly visible after migration to JIRA. So bugs.openjdk.java.net will be no more used as an interim solution to file patches. Please correct me if I'm wrong. I'm in a kind of vacation from OpenJDK development. One reason is the missing appreciation from official side. All attempts to at least have an account on http://cr.openjdk.java.net/, for easier publishing webrevs, have ended in *no answer* regardless my activity: See [Who Sent It?] on: http://markmail.org/search/+list:net.java.openjdk.core-libs-dev If this patch is eventually headed for JDK 7 Updates, I could grant you an Author role there, if you wish. Thanks, that would be great. Please use the email of above From: field, not my old GMX address. -Ulf
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
On 11/07/2012 00:11, Xueming Shen wrote: Hi, In JDK7, the decoder and encoder implementation of most of our single-byte charsets and UTF-8 charset are optimized to implement the internal interfce sun.nio.cs.ArrayDecoder/ Encoder to provide a fastpath for String.getBytes(...) and new String(byte[]...) operations. I have an old blog regarding this optimization at https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs This rfe, as the followup for above changes, is to implement ArrayDe/Encoder for most of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is the webrev http://cr.openjdk.java.net/~sherman/7183053/webrev I've taken a pass over this and it's great to see DoubleByte.Decoder/Encoder implementing sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small number of regressions (Big5 at len=32 for example) but this is a micro benchmark and I'm sure there are fluctuations. I don't see anything obviously wrong with the EBCDIC changes I'd need a history book to remember how the shifts between DBCS and SBCS. I think our tests our good for this area so I'm happy. One minor nit is the continue in both encode methods, I think it would be cleaner to use else if (bb ... instead. I see in TestStringCoding.java that you've commented out the test that goes over the buffer limit - would I be correct to say that this isn't an issue and this happens with DB charsets today? Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? -Alan.
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
On 07/13/2012 05:19 AM, Alan Bateman wrote: On 11/07/2012 00:11, Xueming Shen wrote: Hi, In JDK7, the decoder and encoder implementation of most of our single-byte charsets and UTF-8 charset are optimized to implement the internal interfce sun.nio.cs.ArrayDecoder/ Encoder to provide a fastpath for String.getBytes(...) and new String(byte[]...) operations. I have an old blog regarding this optimization at https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs This rfe, as the followup for above changes, is to implement ArrayDe/Encoder for most of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is the webrev http://cr.openjdk.java.net/~sherman/7183053/webrev I've taken a pass over this and it's great to see DoubleByte.Decoder/Encoder implementing sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small number of regressions (Big5 at len=32 for example) but this is a micro benchmark and I'm sure there are fluctuations. I don't see anything obviously wrong with the EBCDIC changes I'd need a history book to remember how the shifts between DBCS and SBCS. I think our tests our good for this area so I'm happy. One minor nit is the continue in both encode methods, I think it would be cleaner to use else if (bb ... instead. The continue might make the vm happy, but this is the code I did last Oct, so I might be wrong. Will give a couple run later with else I see in TestStringCoding.java that you've commented out the test that goes over the buffer limit - would I be correct to say that this isn't an issue and this happens with DB charsets today? This is also true for utf-8 I did last year, but utf-8 is excluded at the beginning of the test. For SB, it takes the advantage that the output char[] should always be the same as the length of the input bytes, so this can be checked at the very beginning together. For mb, to check both sp and dp slow down the de/encoding (vm obviously does not like too many ifs). Given this is an internal interface used exclusively by StringCoding, in which it has already calculated the max buf to feed in, I think this is something that can be optimized. -Sherman Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? -Alan.
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Am 13.07.2012 14:19, schrieb Alan Bateman: Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? First, most of those patches have been closed from official side, don't really understand why: https://bugs.openjdk.java.net/buglist.cgi?query_format=advancedshort_desc_type=allwordssubstrshort_desc=long_desc_type=substringlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=status_whiteboard_type=allwordssubstrstatus_whiteboard=emailassigned_to1=1emailtype1=substringemail1=emailassigned_to2=1emailreporter2=1emailqa_contact2=1emailcc2=1emailtype2=substringemail2=ulf.zibisbugidtype=includebug_id=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Importancefield0-0-0=nooptype0-0-0=noopvalue0-0-0= Yes, I have many work prepared for performance tuning upon charsets, but I do not have JDK-8 environment installed at the moment. I'm in a kind of vacation from OpenJDK development. One reason is the missing appreciation from official side. All attempts to at least have an account on http://cr.openjdk.java.net/, for easier publishing webrevs, have ended in *no answer* regardless my activity: See [Who Sent It?] on: http://markmail.org/search/+list:net.java.openjdk.core-libs-dev -Ulf
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Am 13.07.2012 14:19, schrieb Alan Bateman: Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? Additionally, I'm wondering, that EUC_TW is not part of the patch. I've worked on that for some months last year, gaining an improvement of ~30 % but have no final result though. -Ulf
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
On 13/07/2012 20:22, Ulf Zibis wrote: First, most of those patches have been closed from official side, don't really understand why: https://bugs.openjdk.java.net/buglist.cgi?query_format=advancedshort_desc_type=allwordssubstrshort_desc=long_desc_type=substringlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=status_whiteboard_type=allwordssubstrstatus_whiteboard=emailassigned_to1=1emailtype1=substringemail1=emailassigned_to2=1emailreporter2=1emailqa_contact2=1emailcc2=1emailtype2=substringemail2=ulf.zibisbugidtype=includebug_id=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Importancefield0-0-0=nooptype0-0-0=noopvalue0-0-0= Sorry, I don't know, looks like Tim has been doing some cleanup. From a brief scan then at least some of these were fixed or closed a long time ago. You do need author role to push webrevs to cr.openjdk.java.net, I had thought were had this already but it seems not. -Alan.
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
On 7/13/12 9:22 PM, Ulf Zibis wrote: Am 13.07.2012 14:19, schrieb Alan Bateman: Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? First, most of those patches have been closed from official side, don't really understand why: https://bugs.openjdk.java.net/buglist.cgi?query_format=advancedshort_desc_type=allwordssubstrshort_desc=long_desc_type=substringlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=status_whiteboard_type=allwordssubstrstatus_whiteboard=emailassigned_to1=1emailtype1=substringemail1=emailassigned_to2=1emailreporter2=1emailqa_contact2=1emailcc2=1emailtype2=substringemail2=ulf.zibisbugidtype=includebug_id=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Importancefield0-0-0=nooptype0-0-0=noopvalue0-0-0= Tim is going through then issues in Bugzilla and closing the already fixed ones, and migrating the rest over to bug.sun.com in preparation for the the JIRA migration, afaik. I'm in a kind of vacation from OpenJDK development. One reason is the missing appreciation from official side. All attempts to at least have an account on http://cr.openjdk.java.net/, for easier publishing webrevs, have ended in *no answer* regardless my activity: See [Who Sent It?] on: http://markmail.org/search/+list:net.java.openjdk.core-libs-dev If this patch is eventually headed for JDK 7 Updates, I could grant you an Author role there, if you wish. cheers, dalibor topic -- Oracle http://www.oracle.com Dalibor Topic | Principal Product Manager Phone: +494089091214 tel:+494089091214 | Mobile: +491737185961 tel:+491737185961 Oracle Java Platform Group ORACLE Deutschland B.V. Co. KG | Nagelsweg 55 | 20097 Hamburg ORACLE Deutschland B.V. Co. KG Hauptverwaltung: Riesstr. 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Geschäftsführer: Jürgen Kunz Komplementärin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher Green Oracle http://www.oracle.com/commitment Oracle is committed to developing practices and products that help protect the environment
Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Hello, On 7/13/2012 2:59 PM, Dalibor Topic wrote: On 7/13/12 9:22 PM, Ulf Zibis wrote: Am 13.07.2012 14:19, schrieb Alan Bateman: Ulf - you've got several patches to the double byte charsets and I wonder if you have cycles to try Sherman's patch with jdk8 to see if there is any more to be gained? First, most of those patches have been closed from official side, don't really understand why: https://bugs.openjdk.java.net/buglist.cgi?query_format=advancedshort_desc_type=allwordssubstrshort_desc=long_desc_type=substringlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=status_whiteboard_type=allwordssubstrstatus_whiteboard=emailassigned_to1=1emailtype1=substringemail1=emailassigned_to2=1emailreporter2=1emailqa_contact2=1emailcc2=1emailtype2=substringemail2=ulf.zibisbugidtype=includebug_id=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Importancefield0-0-0=nooptype0-0-0=noopvalue0-0-0= Tim is going through then issues in Bugzilla and closing the already fixed ones, and migrating the rest over to bug.sun.com in preparation for the the JIRA migration, afaik. That is correct. I've asked Tim to go over the Bugzilla issues as part of the effort to move the JDK to JIRA: https://blogs.oracle.com/darcy/entry/moving_monarchs_dragons Bugzilla issues will be closed out if appropriate and patches, etc. moved from Bugzilla into the legacy bugtraq system so that they are then moved forward as a consequence of the bugtraq to JIRA migration. HTH, -Joe
Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])
Hi, In JDK7, the decoder and encoder implementation of most of our single-byte charsets and UTF-8 charset are optimized to implement the internal interfce sun.nio.cs.ArrayDecoder/ Encoder to provide a fastpath for String.getBytes(...) and new String(byte[]...) operations. I have an old blog regarding this optimization at https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs This rfe, as the followup for above changes, is to implement ArrayDe/Encoder for most of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is the webrev http://cr.openjdk.java.net/~sherman/7183053/webrev The results of the non -scientific benchmark StrCodingBenchmarkDB running on client and server vm on my linux machine are included in docs_c (client) and docs_s(server) below. http://cr.openjdk.java.net/~sherman/7183053/dbcs_c http://cr.openjdk.java.net/~sherman/7183053/dbcs_s Numbers are the time spent on the decoding/encoding operations, the smaller the better. Thanks, -Sherman