Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Vladimir Strigun wrote: Mikhail, It was pretty old build. Now I'm gathering info for the current DRLVM (antlr, eclipse, xalan are still not included). I've executed every benchmarks 10 times and the result is geometric mean of the last 5 executions. Machine: P4, 3Ghz, 1Gb RAM Build1 = current Harmony build, svn = r468353, (Oct 27 2006), Windows/ia32/msvc 1310, release build Build2 = Build1+Harmony-1980 RI: jdk1.5.0_06 Arguments for DRLVM: -Xem:server -Xms700m -Xmx700m Arguments for Sun: -XX:+AggressiveHeap -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xss64k -Xms700m -Xmx700m Results for small input: Build1Build2RI bloat1014,3711024,618968,976 chart1427,9121186,959956,125 fop243,426244,317171,701 hsqldb330,856324,493549,55 jython1092,8691102,331568,088 lusearch1999,631971,8131830,707 luindex421,703225,073594,78 pmd27,33226,98153,319 Average482,5168816434,5997662481,3767025 Here we can see that DRLVM is a little bit faster, but recommendations for Dacapo says that small workload is for testing and "either reporting default or large in any performance analysis". Default input: Build1Build2RI bloat17155,44117131,6313718,637 chart13342,10110924,0389755,926 fop2621,1462584,3262353,304 hsqldb3153,2123101,6915737,304 jython16240,51515632,528299,957 lusearch16280,76216255,76413518,751 luindex12420,63810730,49115782,563 pmd11027,17211136,6569689,841 Average9538,2595029063,9460468638,4136 So, for default input we are 5-10% slower. I'll provide the results for large input as soon as performance run completed. I know that I'm going to be an annoying broken record here, filling up people mailboxes, but I'll say it again - that's mighty impressive. I'll take within 20% of Sun at this point in our project's life any day of the week. (Of course, world-class performance - as measured by SPECjbb is currently held by IBM's J9 on woodcrest, so that's probably the stretch target ;) geir Thanks, Vladimir. On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: Vladimir, +1 more question: between TM integration and HARMONY-1942 incorrect behaviour of BBP could significantly slow down the execution. Did you do your measurements with Harmony-1942 applied? On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote: > > Mikhail, > > Not yet. As I mentioned in the thread I'm still working on Dacapo. > I'll let you know if I find any improvements for JIT. > > Thanks, > Vladimir. > > On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: > > Vladimir, > > I see you removed some arraycopy operations in your patch as not > effective. > > I'm Ok with your solution but what to know if JIT could solve the > problem > > generating more effective code? Do you have any suggestions for JIT > here? > > > > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > > > > > > 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 > > > using decapo? > > > > > > geir > > > > > > > > > Vladimir Strigun wrote: > > > > The optimization covers the following issues: > > > > - java.nio.charset.CharsetDecoder and > java.nio.charset.CharsetEncoder > > > > Streaming decoding/encoding was removed. Analysis of API hotspots > for > > > > Dacapo shows that CharsetDecoder is frequently used almost in all > > > > benchmark, especially in chart. We already discussed advantages of > > > > streaming decoding but the fix shows significant performance > > > > improvement on average for all Dacapo benchmarks. For instance, > boost > > > > for chart benchmark is about 16%. Paulex, you recently worked in > > > > nio_char module and if I correctly remember you introduce streaming > > > > operations, so could you please review the changes and let me know? > > > > Since streaming operation was removed, tests have been slightly > > > > modified as well (previous version of tests fails on RI). > > > > - java.io.BufferedReader > > > > readLine() method was slightly modified. Additional check whether > some > > > > characters available in cached buffer was added prior to main cycle. > > > > - java.io.InputStreamReader > > > > Cached char buffer was removed, read() , read(char[], int, int) > > > > methods were rewritten. Current implementation of read(char[], int, > > > > int) uses several invocation of System.arraycopy. Proposed solution > > > > wraps char[] arguments within char buffer and therefore doesn't use > > > > arraycopy. Decoding operation is also produced inside the method, so > > > > fillBuf() has been removed > > > > > > > > Thoughts? Comments? > > > > > > > > Thanks, > > > > Vladimir. > > > > > > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: > > > >> [classlib][performance] performance improvement for luni and > nio_char > > > >>
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Mikhail, It was pretty old build. Now I'm gathering info for the current DRLVM (antlr, eclipse, xalan are still not included). I've executed every benchmarks 10 times and the result is geometric mean of the last 5 executions. Machine: P4, 3Ghz, 1Gb RAM Build1 = current Harmony build, svn = r468353, (Oct 27 2006), Windows/ia32/msvc 1310, release build Build2 = Build1+Harmony-1980 RI: jdk1.5.0_06 Arguments for DRLVM: -Xem:server -Xms700m -Xmx700m Arguments for Sun: -XX:+AggressiveHeap -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xss64k -Xms700m -Xmx700m Results for small input: Build1 Build2 RI bloat 1014,3711024,618968,976 chart 1427,9121186,959956,125 fop 243,426 244,317 171,701 hsqldb 330,856 324,493 549,55 jython 1092,8691102,331568,088 lusearch1999,63 1971,8131830,707 luindex 421,703 225,073 594,78 pmd 27,332 26,981 53,319 Average 482,5168816 434,5997662 481,3767025 Here we can see that DRLVM is a little bit faster, but recommendations for Dacapo says that small workload is for testing and "either reporting default or large in any performance analysis". Default input: Build1 Build2 RI bloat 17155,441 17131,6313718,637 chart 13342,101 10924,038 9755,926 fop 2621,1462584,3262353,304 hsqldb 3153,2123101,6915737,304 jython 16240,515 15632,528299,957 lusearch16280,762 16255,764 13518,751 luindex 12420,638 10730,491 15782,563 pmd 11027,172 11136,656 9689,841 Average 9538,259502 9063,946046 8638,4136 So, for default input we are 5-10% slower. I'll provide the results for large input as soon as performance run completed. Thanks, Vladimir. On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: Vladimir, +1 more question: between TM integration and HARMONY-1942 incorrect behaviour of BBP could significantly slow down the execution. Did you do your measurements with Harmony-1942 applied? On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote: > > Mikhail, > > Not yet. As I mentioned in the thread I'm still working on Dacapo. > I'll let you know if I find any improvements for JIT. > > Thanks, > Vladimir. > > On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: > > Vladimir, > > I see you removed some arraycopy operations in your patch as not > effective. > > I'm Ok with your solution but what to know if JIT could solve the > problem > > generating more effective code? Do you have any suggestions for JIT > here? > > > > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > > > > > > 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 > > > using decapo? > > > > > > geir > > > > > > > > > Vladimir Strigun wrote: > > > > The optimization covers the following issues: > > > > - java.nio.charset.CharsetDecoder and > java.nio.charset.CharsetEncoder > > > > Streaming decoding/encoding was removed. Analysis of API hotspots > for > > > > Dacapo shows that CharsetDecoder is frequently used almost in all > > > > benchmark, especially in chart. We already discussed advantages of > > > > streaming decoding but the fix shows significant performance > > > > improvement on average for all Dacapo benchmarks. For instance, > boost > > > > for chart benchmark is about 16%. Paulex, you recently worked in > > > > nio_char module and if I correctly remember you introduce streaming > > > > operations, so could you please review the changes and let me know? > > > > Since streaming operation was removed, tests have been slightly > > > > modified as well (previous version of tests fails on RI). > > > > - java.io.BufferedReader > > > > readLine() method was slightly modified. Additional check whether > some > > > > characters available in cached buffer was added prior to main cycle. > > > > - java.io.InputStreamReader > > > > Cached char buffer was removed, read() , read(char[], int, int) > > > > methods were rewritten. Current implementation of read(char[], int, > > > > int) uses several invocation of System.arraycopy. Proposed solution > > > > wraps char[] arguments within char buffer and therefore doesn't use > > > > arraycopy. Decoding operation is also produced inside the method, so > > > > fillBuf() has been removed > > > > > > > > Thoughts? Comments? > > > > > > > > Thanks, > > > > Vladimir. > > > > > > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: > > > >> [classlib][performance] performance improvement for luni and > nio_char > > > >> modules > > > >> > > > > - > > > >> > > > >> > > > >> Key: HARMONY-1980 > > > >> URL: > http://issues.apache.org/jira/browse/HARMONY-1980 > > > >> Project: Harmony > > > >> Issue Type: Improvement > > > >> Components: Classlib > > > >>
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Vladimir Strigun wrote: > Small workload: > OrigBuildFixedSun1.5.0_06 [SNIP] > Average449,91408,60471,71 > > default workload: [SNIP] > > Average9337,739281,878787,42 > > large workload: [snip] > > Average31345,2127334,7222348,3525 > "Fixed" is the same build plus H-1980 included. As you can see from "average" rows "Fixed" build is faster. The values are in millisec, so the less the better, i.e. we are still slower that RI. Ah! Thanks. I was thinking in terms of "DeCapo Marks" or something So we're faster on small, 6% slower on default and 22% slower on large? That's mighty respectable! (Who's slacking off for the large workload stuff? ;) geir
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: This is a nice note to wake up to... Vladimir Strigun wrote: > Here are the results: > > Small workload: > OrigBuildFixedSun1.5.0_06 > bloat996,0781024,85955,589 > chart1240,7771068,112953,096 > fop250,433232,957174,901 > hsqldb348,942361,139540,45 > jython831,143824,775571,292 > lusearch1854,951870,9691830,589 > luindex339,45231,314441,79 > pmd29,70423,63861,638 > > Average449,91408,60471,71 > > default workload: > OrigBuildFixedSun1.5.0_06 > bloat16116,6911561813578,522 > chart11701,54610036,6319790,247 > fop2539,3862502,5182387,289 > hsqldb3217,3383078,3315709,291 > jython14639,27814064,1049456,167 > lusearch14508,93816175,08513663,679 > luindex16292,65215501,71315602,178 > pmd10840,26412937,2559734,032 > > Average9337,739281,878787,42 > > large workload: > OrigBuildFixedSun1.5.0_06 > bloat168733,5175493,46 138468,277 > chart31651,7925681,75125599,38 > fop2546,2892512,0452412,487 > hsqldb22873,60813555,51515751,873 > jython128207,392863,2826183,716 > lusearch29425,99130064,15326605,631 > luindex17825,79518083,89814307,71 > pmd44548,72440225,69446345,995 > > Average31345,2127334,7222348,3525 > > At first glace the results are pretty good, but antlr benchmark works > incorrectly with DRLVM (Harmony-1906) and there are no results for > eclipse and xalan benchmarks. I'm still working on Dacapo analysis. "Pretty good"? You're suggesting that DRLVM is faster than Sun 1.5. I would say "Wow!", not "pretty good..." More info - what is "OrigBuild" and what is "Fixed"? Why is "Fixed" slower than "OrigBuild"? "Fixed" is the same build plus H-1980 included. As you can see from "average" rows "Fixed" build is faster. The values are in millisec, so the less the better, i.e. we are still slower that RI. Thanks, Vladimir. geir > > Thanks, > Vladimir. > > On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: >> 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 >> using decapo? >> >> geir >> >> >> Vladimir Strigun wrote: >> > The optimization covers the following issues: >> > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder >> > Streaming decoding/encoding was removed. Analysis of API hotspots for >> > Dacapo shows that CharsetDecoder is frequently used almost in all >> > benchmark, especially in chart. We already discussed advantages of >> > streaming decoding but the fix shows significant performance >> > improvement on average for all Dacapo benchmarks. For instance, boost >> > for chart benchmark is about 16%. Paulex, you recently worked in >> > nio_char module and if I correctly remember you introduce streaming >> > operations, so could you please review the changes and let me know? >> > Since streaming operation was removed, tests have been slightly >> > modified as well (previous version of tests fails on RI). >> > - java.io.BufferedReader >> > readLine() method was slightly modified. Additional check whether some >> > characters available in cached buffer was added prior to main cycle. >> > - java.io.InputStreamReader >> > Cached char buffer was removed, read() , read(char[], int, int) >> > methods were rewritten. Current implementation of read(char[], int, >> > int) uses several invocation of System.arraycopy. Proposed solution >> > wraps char[] arguments within char buffer and therefore doesn't use >> > arraycopy. Decoding operation is also produced inside the method, so >> > fillBuf() has been removed >> > >> > Thoughts? Comments? >> > >> > Thanks, >> > Vladimir. >> > >> > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: >> >> [classlib][performance] performance improvement for luni and nio_char >> >> modules >> >> >> - >> >> >> >> >> >> >> Key: HARMONY-1980 >> >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 >> >> Project: Harmony >> >> Issue Type: Improvement >> >> Components: Classlib >> >>Reporter: Vladimir Strigun >> >> Attachments: Harmony-1980.diff >> >> >> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and >> >> found several places in luni and nio_char modules that can be >> >> improved. Suggested fix gives about 10-15% boost on average for Dacapo >> >> executed on DRLVM. I'll post more details to dev list. >> >> Attached fix contains modifications for the following classes: >> >> java.io.BufferedReader, java.io.InputStreamReader, >> >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. >> >> >> >> Please have a look to
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Vladimir, +1 more question: between TM integration and HARMONY-1942 incorrect behaviour of BBP could significantly slow down the execution. Did you do your measurements with Harmony-1942 applied? On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote: Mikhail, Not yet. As I mentioned in the thread I'm still working on Dacapo. I'll let you know if I find any improvements for JIT. Thanks, Vladimir. On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: > Vladimir, > I see you removed some arraycopy operations in your patch as not effective. > I'm Ok with your solution but what to know if JIT could solve the problem > generating more effective code? Do you have any suggestions for JIT here? > > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > > > > 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 > > using decapo? > > > > geir > > > > > > Vladimir Strigun wrote: > > > The optimization covers the following issues: > > > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder > > > Streaming decoding/encoding was removed. Analysis of API hotspots for > > > Dacapo shows that CharsetDecoder is frequently used almost in all > > > benchmark, especially in chart. We already discussed advantages of > > > streaming decoding but the fix shows significant performance > > > improvement on average for all Dacapo benchmarks. For instance, boost > > > for chart benchmark is about 16%. Paulex, you recently worked in > > > nio_char module and if I correctly remember you introduce streaming > > > operations, so could you please review the changes and let me know? > > > Since streaming operation was removed, tests have been slightly > > > modified as well (previous version of tests fails on RI). > > > - java.io.BufferedReader > > > readLine() method was slightly modified. Additional check whether some > > > characters available in cached buffer was added prior to main cycle. > > > - java.io.InputStreamReader > > > Cached char buffer was removed, read() , read(char[], int, int) > > > methods were rewritten. Current implementation of read(char[], int, > > > int) uses several invocation of System.arraycopy. Proposed solution > > > wraps char[] arguments within char buffer and therefore doesn't use > > > arraycopy. Decoding operation is also produced inside the method, so > > > fillBuf() has been removed > > > > > > Thoughts? Comments? > > > > > > Thanks, > > > Vladimir. > > > > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: > > >> [classlib][performance] performance improvement for luni and nio_char > > >> modules > > >> > > - > > >> > > >> > > >> Key: HARMONY-1980 > > >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 > > >> Project: Harmony > > >> Issue Type: Improvement > > >> Components: Classlib > > >>Reporter: Vladimir Strigun > > >> Attachments: Harmony-1980.diff > > >> > > >> I've analyzed API frequently used in all Dacapo[1] benchmarks and > > >> found several places in luni and nio_char modules that can be > > >> improved. Suggested fix gives about 10-15% boost on average for Dacapo > > >> executed on DRLVM. I'll post more details to dev list. > > >> Attached fix contains modifications for the following classes: > > >> java.io.BufferedReader, java.io.InputStreamReader, > > >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder . > > >> > > >> Please have a look to the results of Dacapo execution (values are in > > >> millisec, so the less the better): > > >> > > >> Small workload > > >> > > >>OrigBuild Patched > > >> bloat 996,078 1024,85 > > >> chart 1240,7771068,112 > > >> fop 250,433 232,957 > > >> hsqldb 348,942 361,139 > > >> jython 831,143 824,775 > > >> lusearch1854,95 1870,969 > > >> luindex 339,45 231,314 > > >> pmd 29,704 23,638 > > >> > > >> > > >> default workload > > >>OrigBuild Patched > > >> bloat 168733,562 175493,467 > > >> chart 31651,792 25681,751 > > >> fop 2546,2892512,045 > > >> hsqldb 22873,608 13555,515 > > >> jython 128207,303 92863,28 > > >> lusearch29425,991 30064,153 > > >> luindex 17825,795 18083,898 > > >> pmd 44548,724 40225,694 > > >> > > >> > > >> > > >> [1] http://dacapobench.sourceforge.net > > >> > > >> > > >> -- > > >> This message is automatically generated by JIRA. > > >> - > > >> If you think it was sent incorrectly contact one of the > > >> administrators: > > http://issues.apache.org/jira/secure/Administrators.jspa > > >> - > > >> For more information on JIRA, see: > > http://www.atlassian.com/software/jira > > >> > > >> > > >> > > > > > > > > > -- > Mikhail Fursov > > -- Mikhail Fursov
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
This is a nice note to wake up to... Vladimir Strigun wrote: Here are the results: Small workload: OrigBuildFixedSun1.5.0_06 bloat996,0781024,85955,589 chart1240,7771068,112953,096 fop250,433232,957174,901 hsqldb348,942361,139540,45 jython831,143824,775571,292 lusearch1854,951870,9691830,589 luindex339,45231,314441,79 pmd29,70423,63861,638 Average449,91408,60471,71 default workload: OrigBuildFixedSun1.5.0_06 bloat16116,6911561813578,522 chart11701,54610036,6319790,247 fop2539,3862502,5182387,289 hsqldb3217,3383078,3315709,291 jython14639,27814064,1049456,167 lusearch14508,93816175,08513663,679 luindex16292,65215501,71315602,178 pmd10840,26412937,2559734,032 Average9337,739281,878787,42 large workload: OrigBuildFixedSun1.5.0_06 bloat168733,5175493,46 138468,277 chart31651,7925681,75125599,38 fop2546,2892512,0452412,487 hsqldb22873,60813555,51515751,873 jython128207,392863,2826183,716 lusearch29425,99130064,15326605,631 luindex17825,79518083,89814307,71 pmd44548,72440225,69446345,995 Average31345,2127334,7222348,3525 At first glace the results are pretty good, but antlr benchmark works incorrectly with DRLVM (Harmony-1906) and there are no results for eclipse and xalan benchmarks. I'm still working on Dacapo analysis. "Pretty good"? You're suggesting that DRLVM is faster than Sun 1.5. I would say "Wow!", not "pretty good..." More info - what is "OrigBuild" and what is "Fixed"? Why is "Fixed" slower than "OrigBuild"? geir Thanks, Vladimir. On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 using decapo? geir Vladimir Strigun wrote: > The optimization covers the following issues: > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder > Streaming decoding/encoding was removed. Analysis of API hotspots for > Dacapo shows that CharsetDecoder is frequently used almost in all > benchmark, especially in chart. We already discussed advantages of > streaming decoding but the fix shows significant performance > improvement on average for all Dacapo benchmarks. For instance, boost > for chart benchmark is about 16%. Paulex, you recently worked in > nio_char module and if I correctly remember you introduce streaming > operations, so could you please review the changes and let me know? > Since streaming operation was removed, tests have been slightly > modified as well (previous version of tests fails on RI). > - java.io.BufferedReader > readLine() method was slightly modified. Additional check whether some > characters available in cached buffer was added prior to main cycle. > - java.io.InputStreamReader > Cached char buffer was removed, read() , read(char[], int, int) > methods were rewritten. Current implementation of read(char[], int, > int) uses several invocation of System.arraycopy. Proposed solution > wraps char[] arguments within char buffer and therefore doesn't use > arraycopy. Decoding operation is also produced inside the method, so > fillBuf() has been removed > > Thoughts? Comments? > > Thanks, > Vladimir. > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: >> [classlib][performance] performance improvement for luni and nio_char >> modules >> - >> >> >> Key: HARMONY-1980 >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 >> Project: Harmony >> Issue Type: Improvement >> Components: Classlib >>Reporter: Vladimir Strigun >> Attachments: Harmony-1980.diff >> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and >> found several places in luni and nio_char modules that can be >> improved. Suggested fix gives about 10-15% boost on average for Dacapo >> executed on DRLVM. I'll post more details to dev list. >> Attached fix contains modifications for the following classes: >> java.io.BufferedReader, java.io.InputStreamReader, >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. >> >> Please have a look to the results of Dacapo execution (values are in >> millisec, so the less the better): >> >> Small workload >> >>OrigBuild Patched >> bloat 996,078 1024,85 >> chart 1240,7771068,112 >> fop 250,433 232,957 >> hsqldb 348,942 361,139 >> jython 831,143 824,775 >> lusearch1854,95 1870,969 >> luindex 339,45 231,314 >> pmd 29,704 23,638 >> >> >> default workload >>OrigBuild Patched >> bloat 168733,562 175493,467 >> chart 31651,792 25681,751 >> fop 2546,289
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Mikhail, Not yet. As I mentioned in the thread I'm still working on Dacapo. I'll let you know if I find any improvements for JIT. Thanks, Vladimir. On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote: Vladimir, I see you removed some arraycopy operations in your patch as not effective. I'm Ok with your solution but what to know if JIT could solve the problem generating more effective code? Do you have any suggestions for JIT here? On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: > > 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 > using decapo? > > geir > > > Vladimir Strigun wrote: > > The optimization covers the following issues: > > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder > > Streaming decoding/encoding was removed. Analysis of API hotspots for > > Dacapo shows that CharsetDecoder is frequently used almost in all > > benchmark, especially in chart. We already discussed advantages of > > streaming decoding but the fix shows significant performance > > improvement on average for all Dacapo benchmarks. For instance, boost > > for chart benchmark is about 16%. Paulex, you recently worked in > > nio_char module and if I correctly remember you introduce streaming > > operations, so could you please review the changes and let me know? > > Since streaming operation was removed, tests have been slightly > > modified as well (previous version of tests fails on RI). > > - java.io.BufferedReader > > readLine() method was slightly modified. Additional check whether some > > characters available in cached buffer was added prior to main cycle. > > - java.io.InputStreamReader > > Cached char buffer was removed, read() , read(char[], int, int) > > methods were rewritten. Current implementation of read(char[], int, > > int) uses several invocation of System.arraycopy. Proposed solution > > wraps char[] arguments within char buffer and therefore doesn't use > > arraycopy. Decoding operation is also produced inside the method, so > > fillBuf() has been removed > > > > Thoughts? Comments? > > > > Thanks, > > Vladimir. > > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: > >> [classlib][performance] performance improvement for luni and nio_char > >> modules > >> > - > >> > >> > >> Key: HARMONY-1980 > >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 > >> Project: Harmony > >> Issue Type: Improvement > >> Components: Classlib > >>Reporter: Vladimir Strigun > >> Attachments: Harmony-1980.diff > >> > >> I've analyzed API frequently used in all Dacapo[1] benchmarks and > >> found several places in luni and nio_char modules that can be > >> improved. Suggested fix gives about 10-15% boost on average for Dacapo > >> executed on DRLVM. I'll post more details to dev list. > >> Attached fix contains modifications for the following classes: > >> java.io.BufferedReader, java.io.InputStreamReader, > >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. > >> > >> Please have a look to the results of Dacapo execution (values are in > >> millisec, so the less the better): > >> > >> Small workload > >> > >>OrigBuild Patched > >> bloat 996,078 1024,85 > >> chart 1240,7771068,112 > >> fop 250,433 232,957 > >> hsqldb 348,942 361,139 > >> jython 831,143 824,775 > >> lusearch1854,95 1870,969 > >> luindex 339,45 231,314 > >> pmd 29,704 23,638 > >> > >> > >> default workload > >>OrigBuild Patched > >> bloat 168733,562 175493,467 > >> chart 31651,792 25681,751 > >> fop 2546,2892512,045 > >> hsqldb 22873,608 13555,515 > >> jython 128207,303 92863,28 > >> lusearch29425,991 30064,153 > >> luindex 17825,795 18083,898 > >> pmd 44548,724 40225,694 > >> > >> > >> > >> [1] http://dacapobench.sourceforge.net > >> > >> > >> -- > >> This message is automatically generated by JIRA. > >> - > >> If you think it was sent incorrectly contact one of the > >> administrators: > http://issues.apache.org/jira/secure/Administrators.jspa > >> - > >> For more information on JIRA, see: > http://www.atlassian.com/software/jira > >> > >> > >> > > > -- Mikhail Fursov
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Here are the results: Small workload: OrigBuild Fixed Sun1.5.0_06 bloat 996,078 1024,85 955,589 chart 1240,7771068,112953,096 fop 250,433 232,957 174,901 hsqldb 348,942 361,139 540,45 jython 831,143 824,775 571,292 lusearch1854,95 1870,9691830,589 luindex 339,45 231,314 441,79 pmd 29,704 23,638 61,638 Average 449,91 408,60 471,71 default workload: OrigBuild Fixed Sun1.5.0_06 bloat 16116,691 15618 13578,522 chart 11701,546 10036,631 9790,247 fop 2539,3862502,5182387,289 hsqldb 3217,3383078,3315709,291 jython 14639,278 14064,104 9456,167 lusearch14508,938 16175,085 13663,679 luindex 16292,652 15501,713 15602,178 pmd 10840,264 12937,255 9734,032 Average 9337,73 9281,87 8787,42 large workload: OrigBuild Fixed Sun1.5.0_06 bloat 168733,5175493,46 138468,277 chart 31651,7925681,751 25599,38 fop 2546,2892512,0452412,487 hsqldb 22873,608 13555,515 15751,873 jython 128207,392863,2826183,716 lusearch29425,991 30064,153 26605,631 luindex 17825,795 18083,898 14307,71 pmd 44548,724 40225,694 46345,995 Average 31345,2127334,7222348,3525 At first glace the results are pretty good, but antlr benchmark works incorrectly with DRLVM (Harmony-1906) and there are no results for eclipse and xalan benchmarks. I'm still working on Dacapo analysis. Thanks, Vladimir. On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 using decapo? geir Vladimir Strigun wrote: > The optimization covers the following issues: > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder > Streaming decoding/encoding was removed. Analysis of API hotspots for > Dacapo shows that CharsetDecoder is frequently used almost in all > benchmark, especially in chart. We already discussed advantages of > streaming decoding but the fix shows significant performance > improvement on average for all Dacapo benchmarks. For instance, boost > for chart benchmark is about 16%. Paulex, you recently worked in > nio_char module and if I correctly remember you introduce streaming > operations, so could you please review the changes and let me know? > Since streaming operation was removed, tests have been slightly > modified as well (previous version of tests fails on RI). > - java.io.BufferedReader > readLine() method was slightly modified. Additional check whether some > characters available in cached buffer was added prior to main cycle. > - java.io.InputStreamReader > Cached char buffer was removed, read() , read(char[], int, int) > methods were rewritten. Current implementation of read(char[], int, > int) uses several invocation of System.arraycopy. Proposed solution > wraps char[] arguments within char buffer and therefore doesn't use > arraycopy. Decoding operation is also produced inside the method, so > fillBuf() has been removed > > Thoughts? Comments? > > Thanks, > Vladimir. > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: >> [classlib][performance] performance improvement for luni and nio_char >> modules >> - >> >> >> Key: HARMONY-1980 >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 >> Project: Harmony >> Issue Type: Improvement >> Components: Classlib >>Reporter: Vladimir Strigun >> Attachments: Harmony-1980.diff >> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and >> found several places in luni and nio_char modules that can be >> improved. Suggested fix gives about 10-15% boost on average for Dacapo >> executed on DRLVM. I'll post more details to dev list. >> Attached fix contains modifications for the following classes: >> java.io.BufferedReader, java.io.InputStreamReader, >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. >> >> Please have a look to the results of Dacapo execution (values are in >> millisec, so the less the better): >> >> Small workload >> >>OrigBuild Patched >> bloat 996,078 1024,85 >> chart 1240,7771068,112 >> fop 250,433 232,957 >> hsqldb 348,942 361,139 >> jython 831,143 824,775 >> lusearch1854,95 1870,969 >> luindex 339,45 231,314 >> pmd 29,704 23,638 >> >> >> default workload >>OrigBuild Patched >> bloat 168733,562 175493,467 >> chart 31651,792 25681,751 >> fop 2546,2892512,045 >> hsqldb 22873,608 13555,515 >> jython 128207,303 92863,28 >> lusearch29425,991 30064,153 >> luindex 17825,795 18083,898 >> pmd 44548,724 40225,694 >> >> >> >> [1
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
Vladimir, I see you removed some arraycopy operations in your patch as not effective. I'm Ok with your solution but what to know if JIT could solve the problem generating more effective code? Do you have any suggestions for JIT here? On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: 10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 using decapo? geir Vladimir Strigun wrote: > The optimization covers the following issues: > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder > Streaming decoding/encoding was removed. Analysis of API hotspots for > Dacapo shows that CharsetDecoder is frequently used almost in all > benchmark, especially in chart. We already discussed advantages of > streaming decoding but the fix shows significant performance > improvement on average for all Dacapo benchmarks. For instance, boost > for chart benchmark is about 16%. Paulex, you recently worked in > nio_char module and if I correctly remember you introduce streaming > operations, so could you please review the changes and let me know? > Since streaming operation was removed, tests have been slightly > modified as well (previous version of tests fails on RI). > - java.io.BufferedReader > readLine() method was slightly modified. Additional check whether some > characters available in cached buffer was added prior to main cycle. > - java.io.InputStreamReader > Cached char buffer was removed, read() , read(char[], int, int) > methods were rewritten. Current implementation of read(char[], int, > int) uses several invocation of System.arraycopy. Proposed solution > wraps char[] arguments within char buffer and therefore doesn't use > arraycopy. Decoding operation is also produced inside the method, so > fillBuf() has been removed > > Thoughts? Comments? > > Thanks, > Vladimir. > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: >> [classlib][performance] performance improvement for luni and nio_char >> modules >> - >> >> >> Key: HARMONY-1980 >> URL: http://issues.apache.org/jira/browse/HARMONY-1980 >> Project: Harmony >> Issue Type: Improvement >> Components: Classlib >>Reporter: Vladimir Strigun >> Attachments: Harmony-1980.diff >> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and >> found several places in luni and nio_char modules that can be >> improved. Suggested fix gives about 10-15% boost on average for Dacapo >> executed on DRLVM. I'll post more details to dev list. >> Attached fix contains modifications for the following classes: >> java.io.BufferedReader, java.io.InputStreamReader, >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. >> >> Please have a look to the results of Dacapo execution (values are in >> millisec, so the less the better): >> >> Small workload >> >>OrigBuild Patched >> bloat 996,078 1024,85 >> chart 1240,7771068,112 >> fop 250,433 232,957 >> hsqldb 348,942 361,139 >> jython 831,143 824,775 >> lusearch1854,95 1870,969 >> luindex 339,45 231,314 >> pmd 29,704 23,638 >> >> >> default workload >>OrigBuild Patched >> bloat 168733,562 175493,467 >> chart 31651,792 25681,751 >> fop 2546,2892512,045 >> hsqldb 22873,608 13555,515 >> jython 128207,303 92863,28 >> lusearch29425,991 30064,153 >> luindex 17825,795 18083,898 >> pmd 44548,724 40225,694 >> >> >> >> [1] http://dacapobench.sourceforge.net >> >> >> -- >> This message is automatically generated by JIRA. >> - >> If you think it was sent incorrectly contact one of the >> administrators: http://issues.apache.org/jira/secure/Administrators.jspa >> - >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> >> > -- Mikhail Fursov
Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
10%-15%? That's amazing. How fast are we (DRLVM) compared to Sun 1.5 using decapo? geir Vladimir Strigun wrote: The optimization covers the following issues: - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder Streaming decoding/encoding was removed. Analysis of API hotspots for Dacapo shows that CharsetDecoder is frequently used almost in all benchmark, especially in chart. We already discussed advantages of streaming decoding but the fix shows significant performance improvement on average for all Dacapo benchmarks. For instance, boost for chart benchmark is about 16%. Paulex, you recently worked in nio_char module and if I correctly remember you introduce streaming operations, so could you please review the changes and let me know? Since streaming operation was removed, tests have been slightly modified as well (previous version of tests fails on RI). - java.io.BufferedReader readLine() method was slightly modified. Additional check whether some characters available in cached buffer was added prior to main cycle. - java.io.InputStreamReader Cached char buffer was removed, read() , read(char[], int, int) methods were rewritten. Current implementation of read(char[], int, int) uses several invocation of System.arraycopy. Proposed solution wraps char[] arguments within char buffer and therefore doesn't use arraycopy. Decoding operation is also produced inside the method, so fillBuf() has been removed Thoughts? Comments? Thanks, Vladimir. On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: [classlib][performance] performance improvement for luni and nio_char modules - Key: HARMONY-1980 URL: http://issues.apache.org/jira/browse/HARMONY-1980 Project: Harmony Issue Type: Improvement Components: Classlib Reporter: Vladimir Strigun Attachments: Harmony-1980.diff I've analyzed API frequently used in all Dacapo[1] benchmarks and found several places in luni and nio_char modules that can be improved. Suggested fix gives about 10-15% boost on average for Dacapo executed on DRLVM. I'll post more details to dev list. Attached fix contains modifications for the following classes: java.io.BufferedReader, java.io.InputStreamReader, java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. Please have a look to the results of Dacapo execution (values are in millisec, so the less the better): Small workload OrigBuild Patched bloat 996,078 1024,85 chart 1240,7771068,112 fop 250,433 232,957 hsqldb 348,942 361,139 jython 831,143 824,775 lusearch1854,95 1870,969 luindex 339,45 231,314 pmd 29,704 23,638 default workload OrigBuild Patched bloat 168733,562 175493,467 chart 31651,792 25681,751 fop 2546,2892512,045 hsqldb 22873,608 13555,515 jython 128207,303 92863,28 lusearch29425,991 30064,153 luindex 17825,795 18083,898 pmd 44548,724 40225,694 [1] http://dacapobench.sourceforge.net -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980
The optimization covers the following issues: - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder Streaming decoding/encoding was removed. Analysis of API hotspots for Dacapo shows that CharsetDecoder is frequently used almost in all benchmark, especially in chart. We already discussed advantages of streaming decoding but the fix shows significant performance improvement on average for all Dacapo benchmarks. For instance, boost for chart benchmark is about 16%. Paulex, you recently worked in nio_char module and if I correctly remember you introduce streaming operations, so could you please review the changes and let me know? Since streaming operation was removed, tests have been slightly modified as well (previous version of tests fails on RI). - java.io.BufferedReader readLine() method was slightly modified. Additional check whether some characters available in cached buffer was added prior to main cycle. - java.io.InputStreamReader Cached char buffer was removed, read() , read(char[], int, int) methods were rewritten. Current implementation of read(char[], int, int) uses several invocation of System.arraycopy. Proposed solution wraps char[] arguments within char buffer and therefore doesn't use arraycopy. Decoding operation is also produced inside the method, so fillBuf() has been removed Thoughts? Comments? Thanks, Vladimir. On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote: [classlib][performance] performance improvement for luni and nio_char modules - Key: HARMONY-1980 URL: http://issues.apache.org/jira/browse/HARMONY-1980 Project: Harmony Issue Type: Improvement Components: Classlib Reporter: Vladimir Strigun Attachments: Harmony-1980.diff I've analyzed API frequently used in all Dacapo[1] benchmarks and found several places in luni and nio_char modules that can be improved. Suggested fix gives about 10-15% boost on average for Dacapo executed on DRLVM. I'll post more details to dev list. Attached fix contains modifications for the following classes: java.io.BufferedReader, java.io.InputStreamReader, java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder. Please have a look to the results of Dacapo execution (values are in millisec, so the less the better): Small workload OrigBuild Patched bloat 996,078 1024,85 chart 1240,7771068,112 fop 250,433 232,957 hsqldb 348,942 361,139 jython 831,143 824,775 lusearch1854,95 1870,969 luindex 339,45 231,314 pmd 29,704 23,638 default workload OrigBuild Patched bloat 168733,562 175493,467 chart 31651,792 25681,751 fop 2546,2892512,045 hsqldb 22873,608 13555,515 jython 128207,303 92863,28 lusearch29425,991 30064,153 luindex 17825,795 18083,898 pmd 44548,724 40225,694 [1] http://dacapobench.sourceforge.net -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira