Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Geir Magnusson Jr.



Vladimir Strigun wrote:

Mikhail,

It was pretty old build. Now I'm gathering info for the current DRLVM
(antlr, eclipse, xalan are still not included).
I've executed every benchmarks 10 times and the result is geometric
mean of the last 5 executions.
Machine: P4, 3Ghz, 1Gb RAM
Build1 = current Harmony build, svn = r468353, (Oct 27 2006),
Windows/ia32/msvc 1310, release build
Build2 = Build1+Harmony-1980
RI: jdk1.5.0_06

Arguments for DRLVM: -Xem:server -Xms700m -Xmx700m
Arguments for Sun: -XX:+AggressiveHeap -XX:+UseBiasedLocking
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xss64k -Xms700m -Xmx700m

Results for small input:
Build1Build2RI
bloat1014,3711024,618968,976
chart1427,9121186,959956,125
fop243,426244,317171,701
hsqldb330,856324,493549,55
jython1092,8691102,331568,088
lusearch1999,631971,8131830,707
luindex421,703225,073594,78
pmd27,33226,98153,319

Average482,5168816434,5997662481,3767025

Here we can see that DRLVM is a little bit faster, but recommendations
for Dacapo says that small workload is for testing and "either
reporting default or large in any performance analysis".

Default input:

Build1Build2RI
bloat17155,44117131,6313718,637
chart13342,10110924,0389755,926
fop2621,1462584,3262353,304
hsqldb3153,2123101,6915737,304
jython16240,51515632,528299,957
lusearch16280,76216255,76413518,751
luindex12420,63810730,49115782,563
pmd11027,17211136,6569689,841

Average9538,2595029063,9460468638,4136

So, for default input we are 5-10% slower.
I'll provide the results for large input as soon as performance run 
completed.


I know that I'm going to be an annoying broken record here, filling up 
people mailboxes, but I'll say it again - that's mighty impressive. 
I'll take within 20% of Sun at this point in our project's life any day 
of the week.


(Of course, world-class performance - as measured by SPECjbb is 
currently held by IBM's J9 on woodcrest, so that's probably the stretch 
target ;)


geir



Thanks,
Vladimir.

On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:

Vladimir,
+1 more question: between TM integration and HARMONY-1942 incorrect
behaviour of BBP could significantly slow down the execution.
Did you do your measurements with Harmony-1942 applied?

On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
>
> Mikhail,
>
> Not yet. As I mentioned in the thread I'm still working on Dacapo.
> I'll let you know if I find any improvements for JIT.
>
> Thanks,
> Vladimir.
>
> On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:
> > Vladimir,
> > I see you removed some arraycopy operations in your patch as not
> effective.
> > I'm Ok with your solution but what to know if JIT could solve the
> problem
> > generating more effective code? Do you have any suggestions for JIT
> here?
> >
> > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
> > >
> > > 10%-15%?  That's amazing.  How fast are we (DRLVM) compared to 
Sun 1.5

> > > using decapo?
> > >
> > > geir
> > >
> > >
> > > Vladimir Strigun wrote:
> > > > The optimization covers the following issues:
> > > > - java.nio.charset.CharsetDecoder and
> java.nio.charset.CharsetEncoder
> > > > Streaming decoding/encoding was removed. Analysis of API hotspots
> for
> > > > Dacapo shows that CharsetDecoder is frequently used almost in all
> > > > benchmark, especially in chart. We already discussed 
advantages of

> > > > streaming decoding but the fix shows significant performance
> > > > improvement on average for all Dacapo benchmarks. For instance,
> boost
> > > > for chart benchmark is about 16%. Paulex, you recently worked in
> > > > nio_char module and if I correctly remember you introduce 
streaming
> > > > operations, so could you please review the changes and let me 
know?

> > > > Since streaming operation was removed, tests have been slightly
> > > > modified as well (previous version of tests fails on RI).
> > > > - java.io.BufferedReader
> > > > readLine() method was slightly modified. Additional check whether
> some
> > > > characters available in cached buffer was added prior to main 
cycle.

> > > > - java.io.InputStreamReader
> > > > Cached char buffer was removed, read() , read(char[], int, int)
> > > > methods were rewritten. Current implementation of read(char[], 
int,
> > > > int) uses several invocation of System.arraycopy. Proposed 
solution
> > > > wraps char[] arguments within char buffer and therefore 
doesn't use
> > > > arraycopy. Decoding operation is also produced inside the 
method, so

> > > > fillBuf() has been removed
> > > >
> > > > Thoughts? Comments?
> > > >
> > > > Thanks,
> > > > Vladimir.
> > > >
> > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
> > > >> [classlib][performance] performance improvement for luni and
> nio_char
> > > >>

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Vladimir Strigun

Mikhail,

It was pretty old build. Now I'm gathering info for the current DRLVM
(antlr, eclipse, xalan are still not included).
I've executed every benchmarks 10 times and the result is geometric
mean of the last 5 executions.
Machine: P4, 3Ghz, 1Gb RAM
Build1 = current Harmony build, svn = r468353, (Oct 27 2006),
Windows/ia32/msvc 1310, release build
Build2 = Build1+Harmony-1980
RI: jdk1.5.0_06

Arguments for DRLVM: -Xem:server -Xms700m -Xmx700m
Arguments for Sun: -XX:+AggressiveHeap -XX:+UseBiasedLocking
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xss64k -Xms700m -Xmx700m

Results for small input:
Build1  Build2  RI
bloat   1014,3711024,618968,976
chart   1427,9121186,959956,125
fop 243,426 244,317 171,701
hsqldb  330,856 324,493 549,55
jython  1092,8691102,331568,088
lusearch1999,63 1971,8131830,707
luindex 421,703 225,073 594,78
pmd 27,332  26,981  53,319

Average 482,5168816 434,5997662 481,3767025

Here we can see that DRLVM is a little bit faster, but recommendations
for Dacapo says that small workload is for testing and "either
reporting default or large in any performance analysis".

Default input:

Build1  Build2  RI
bloat   17155,441   17131,6313718,637
chart   13342,101   10924,038   9755,926
fop 2621,1462584,3262353,304
hsqldb  3153,2123101,6915737,304
jython  16240,515   15632,528299,957
lusearch16280,762   16255,764   13518,751
luindex 12420,638   10730,491   15782,563
pmd 11027,172   11136,656   9689,841

Average 9538,259502 9063,946046 8638,4136

So, for default input we are 5-10% slower.
I'll provide the results for large input as soon as performance run completed.

Thanks,
Vladimir.

On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:

Vladimir,
+1 more question: between TM integration and HARMONY-1942 incorrect
behaviour of BBP could significantly slow down the execution.
Did you do your measurements with Harmony-1942 applied?

On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
>
> Mikhail,
>
> Not yet. As I mentioned in the thread I'm still working on Dacapo.
> I'll let you know if I find any improvements for JIT.
>
> Thanks,
> Vladimir.
>
> On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:
> > Vladimir,
> > I see you removed some arraycopy operations in your patch as not
> effective.
> > I'm Ok with your solution but what to know if JIT could solve the
> problem
> > generating more effective code? Do you have any suggestions for JIT
> here?
> >
> > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
> > >
> > > 10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
> > > using decapo?
> > >
> > > geir
> > >
> > >
> > > Vladimir Strigun wrote:
> > > > The optimization covers the following issues:
> > > > - java.nio.charset.CharsetDecoder and
> java.nio.charset.CharsetEncoder
> > > > Streaming decoding/encoding was removed. Analysis of API hotspots
> for
> > > > Dacapo shows that CharsetDecoder is frequently used almost in all
> > > > benchmark, especially in chart. We already discussed advantages of
> > > > streaming decoding but the fix shows significant performance
> > > > improvement on average for all Dacapo benchmarks. For instance,
> boost
> > > > for chart benchmark is about 16%. Paulex, you recently worked in
> > > > nio_char module and if I correctly remember you introduce streaming
> > > > operations, so could you please review the changes and let me know?
> > > > Since streaming operation was removed, tests have been slightly
> > > > modified as well (previous version of tests fails on RI).
> > > > - java.io.BufferedReader
> > > > readLine() method was slightly modified. Additional check whether
> some
> > > > characters available in cached buffer was added prior to main cycle.
> > > > - java.io.InputStreamReader
> > > > Cached char buffer was removed, read() , read(char[], int, int)
> > > > methods were rewritten. Current implementation of read(char[], int,
> > > > int) uses several invocation of System.arraycopy. Proposed solution
> > > > wraps char[] arguments within char buffer and therefore doesn't use
> > > > arraycopy. Decoding operation is also produced inside the method, so
> > > > fillBuf() has been removed
> > > >
> > > > Thoughts? Comments?
> > > >
> > > > Thanks,
> > > > Vladimir.
> > > >
> > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
> > > >> [classlib][performance] performance improvement for luni and
> nio_char
> > > >> modules
> > > >>
> > >
> -
> > > >>
> > > >>
> > > >> Key: HARMONY-1980
> > > >> URL:
> http://issues.apache.org/jira/browse/HARMONY-1980
> > > >> Project: Harmony
> > > >>  Issue Type: Improvement
> > > >>  Components: Classlib
> > > >>   

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Geir Magnusson Jr.



Vladimir Strigun wrote:


> Small workload:
> OrigBuildFixedSun1.5.0_06

[SNIP]

> Average449,91408,60471,71
>
> default workload:

[SNIP]

>
> Average9337,739281,878787,42
>
> large workload:

[snip]

>
> Average31345,2127334,7222348,3525
>

"Fixed" is the same build plus H-1980 included. As you can see from
"average" rows "Fixed" build is faster. The values are in millisec, so
the less the better, i.e. we are still slower that RI.


Ah!  Thanks.  I was thinking in terms of "DeCapo Marks" or something

So we're faster on small, 6% slower on default and 22% slower on large?

That's mighty respectable!  (Who's slacking off for the large workload 
stuff? ;)


geir


Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Vladimir Strigun

On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:

This is a nice note to wake up to...


Vladimir Strigun wrote:
> Here are the results:
>
> Small workload:
> OrigBuildFixedSun1.5.0_06
> bloat996,0781024,85955,589
> chart1240,7771068,112953,096
> fop250,433232,957174,901
> hsqldb348,942361,139540,45
> jython831,143824,775571,292
> lusearch1854,951870,9691830,589
> luindex339,45231,314441,79
> pmd29,70423,63861,638
>
> Average449,91408,60471,71
>
> default workload:
> OrigBuildFixedSun1.5.0_06
> bloat16116,6911561813578,522
> chart11701,54610036,6319790,247
> fop2539,3862502,5182387,289
> hsqldb3217,3383078,3315709,291
> jython14639,27814064,1049456,167
> lusearch14508,93816175,08513663,679
> luindex16292,65215501,71315602,178
> pmd10840,26412937,2559734,032
>
> Average9337,739281,878787,42
>
> large workload:
> OrigBuildFixedSun1.5.0_06
> bloat168733,5175493,46 138468,277
> chart31651,7925681,75125599,38
> fop2546,2892512,0452412,487
> hsqldb22873,60813555,51515751,873
> jython128207,392863,2826183,716
> lusearch29425,99130064,15326605,631
> luindex17825,79518083,89814307,71
> pmd44548,72440225,69446345,995
>
> Average31345,2127334,7222348,3525
>
> At first glace the results are pretty good, but antlr benchmark works
> incorrectly with DRLVM (Harmony-1906) and there are no results for
> eclipse and xalan benchmarks. I'm still working on Dacapo analysis.


"Pretty good"?  You're suggesting that DRLVM is faster than Sun 1.5.  I
would say "Wow!", not "pretty good..."

More info - what is "OrigBuild" and what is "Fixed"?  Why is "Fixed"
slower than "OrigBuild"?


"Fixed" is the same build plus H-1980 included. As you can see from
"average" rows "Fixed" build is faster. The values are in millisec, so
the less the better, i.e. we are still slower that RI.

Thanks,
Vladimir.


geir

>
> Thanks,
> Vladimir.
>
> On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
>> 10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
>> using decapo?
>>
>> geir
>>
>>
>> Vladimir Strigun wrote:
>> > The optimization covers the following issues:
>> > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
>> > Streaming decoding/encoding was removed. Analysis of API hotspots for
>> > Dacapo shows that CharsetDecoder is frequently used almost in all
>> > benchmark, especially in chart. We already discussed advantages of
>> > streaming decoding but the fix shows significant performance
>> > improvement on average for all Dacapo benchmarks. For instance, boost
>> > for chart benchmark is about 16%. Paulex, you recently worked in
>> > nio_char module and if I correctly remember you introduce streaming
>> > operations, so could you please review the changes and let me know?
>> > Since streaming operation was removed, tests have been slightly
>> > modified as well (previous version of tests fails on RI).
>> > - java.io.BufferedReader
>> > readLine() method was slightly modified. Additional check whether some
>> > characters available in cached buffer was added prior to main cycle.
>> > - java.io.InputStreamReader
>> > Cached char buffer was removed, read() , read(char[], int, int)
>> > methods were rewritten. Current implementation of read(char[], int,
>> > int) uses several invocation of System.arraycopy. Proposed solution
>> > wraps char[] arguments within char buffer and therefore doesn't use
>> > arraycopy. Decoding operation is also produced inside the method, so
>> > fillBuf() has been removed
>> >
>> > Thoughts? Comments?
>> >
>> > Thanks,
>> > Vladimir.
>> >
>> > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
>> >> [classlib][performance] performance improvement for luni and nio_char
>> >> modules
>> >>
>> -
>>
>> >>
>> >>
>> >> Key: HARMONY-1980
>> >> URL: http://issues.apache.org/jira/browse/HARMONY-1980
>> >> Project: Harmony
>> >>  Issue Type: Improvement
>> >>  Components: Classlib
>> >>Reporter: Vladimir Strigun
>> >> Attachments: Harmony-1980.diff
>> >>
>> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and
>> >> found several places in luni and nio_char modules that can be
>> >> improved. Suggested fix gives about 10-15% boost on average for Dacapo
>> >> executed on DRLVM. I'll post more details to dev list.
>> >> Attached fix contains modifications for the following classes:
>> >> java.io.BufferedReader, java.io.InputStreamReader,
>> >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.
>> >>
>> >> Please have a look to

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Mikhail Fursov

Vladimir,
+1 more question: between TM integration and HARMONY-1942 incorrect
behaviour of BBP could significantly slow down the execution.
Did you do your measurements with Harmony-1942 applied?

On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote:


Mikhail,

Not yet. As I mentioned in the thread I'm still working on Dacapo.
I'll let you know if I find any improvements for JIT.

Thanks,
Vladimir.

On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:
> Vladimir,
> I see you removed some arraycopy operations in your patch as not
effective.
> I'm Ok with your solution but what to know if JIT could solve the
problem
> generating more effective code? Do you have any suggestions for JIT
here?
>
> On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
> >
> > 10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
> > using decapo?
> >
> > geir
> >
> >
> > Vladimir Strigun wrote:
> > > The optimization covers the following issues:
> > > - java.nio.charset.CharsetDecoder and
java.nio.charset.CharsetEncoder
> > > Streaming decoding/encoding was removed. Analysis of API hotspots
for
> > > Dacapo shows that CharsetDecoder is frequently used almost in all
> > > benchmark, especially in chart. We already discussed advantages of
> > > streaming decoding but the fix shows significant performance
> > > improvement on average for all Dacapo benchmarks. For instance,
boost
> > > for chart benchmark is about 16%. Paulex, you recently worked in
> > > nio_char module and if I correctly remember you introduce streaming
> > > operations, so could you please review the changes and let me know?
> > > Since streaming operation was removed, tests have been slightly
> > > modified as well (previous version of tests fails on RI).
> > > - java.io.BufferedReader
> > > readLine() method was slightly modified. Additional check whether
some
> > > characters available in cached buffer was added prior to main cycle.
> > > - java.io.InputStreamReader
> > > Cached char buffer was removed, read() , read(char[], int, int)
> > > methods were rewritten. Current implementation of read(char[], int,
> > > int) uses several invocation of System.arraycopy. Proposed solution
> > > wraps char[] arguments within char buffer and therefore doesn't use
> > > arraycopy. Decoding operation is also produced inside the method, so
> > > fillBuf() has been removed
> > >
> > > Thoughts? Comments?
> > >
> > > Thanks,
> > > Vladimir.
> > >
> > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
> > >> [classlib][performance] performance improvement for luni and
nio_char
> > >> modules
> > >>
> >
-
> > >>
> > >>
> > >> Key: HARMONY-1980
> > >> URL:
http://issues.apache.org/jira/browse/HARMONY-1980
> > >> Project: Harmony
> > >>  Issue Type: Improvement
> > >>  Components: Classlib
> > >>Reporter: Vladimir Strigun
> > >> Attachments: Harmony-1980.diff
> > >>
> > >> I've analyzed API frequently used in all Dacapo[1] benchmarks and
> > >> found several places in luni and nio_char modules that can be
> > >> improved. Suggested fix gives about 10-15% boost on average for
Dacapo
> > >> executed on DRLVM. I'll post more details to dev list.
> > >> Attached fix contains modifications for the following classes:
> > >> java.io.BufferedReader, java.io.InputStreamReader,
> > >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
.
> > >>
> > >> Please have a look to the results of Dacapo execution (values are
in
> > >> millisec, so the less the better):
> > >>
> > >> Small workload
> > >>
> > >>OrigBuild   Patched
> > >> bloat   996,078 1024,85
> > >> chart   1240,7771068,112
> > >> fop 250,433 232,957
> > >> hsqldb  348,942 361,139
> > >> jython  831,143 824,775
> > >> lusearch1854,95 1870,969
> > >> luindex 339,45  231,314
> > >> pmd 29,704  23,638
> > >>
> > >>
> > >> default workload
> > >>OrigBuild   Patched
> > >> bloat   168733,562  175493,467
> > >> chart   31651,792   25681,751
> > >> fop 2546,2892512,045
> > >> hsqldb  22873,608   13555,515
> > >> jython  128207,303  92863,28
> > >> lusearch29425,991   30064,153
> > >> luindex 17825,795   18083,898
> > >> pmd 44548,724   40225,694
> > >>
> > >>
> > >>
> > >> [1] http://dacapobench.sourceforge.net
> > >>
> > >>
> > >> --
> > >> This message is automatically generated by JIRA.
> > >> -
> > >> If you think it was sent incorrectly contact one of the
> > >> administrators:
> > http://issues.apache.org/jira/secure/Administrators.jspa
> > >> -
> > >> For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> > >>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Mikhail Fursov
>
>





--
Mikhail Fursov


Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Geir Magnusson Jr.

This is a nice note to wake up to...


Vladimir Strigun wrote:

Here are the results:

Small workload:
OrigBuildFixedSun1.5.0_06
bloat996,0781024,85955,589
chart1240,7771068,112953,096
fop250,433232,957174,901
hsqldb348,942361,139540,45
jython831,143824,775571,292
lusearch1854,951870,9691830,589
luindex339,45231,314441,79
pmd29,70423,63861,638

Average449,91408,60471,71

default workload:
OrigBuildFixedSun1.5.0_06
bloat16116,6911561813578,522
chart11701,54610036,6319790,247
fop2539,3862502,5182387,289
hsqldb3217,3383078,3315709,291
jython14639,27814064,1049456,167
lusearch14508,93816175,08513663,679
luindex16292,65215501,71315602,178
pmd10840,26412937,2559734,032

Average9337,739281,878787,42

large workload:
OrigBuildFixedSun1.5.0_06
bloat168733,5175493,46 138468,277
chart31651,7925681,75125599,38
fop2546,2892512,0452412,487
hsqldb22873,60813555,51515751,873
jython128207,392863,2826183,716
lusearch29425,99130064,15326605,631
luindex17825,79518083,89814307,71
pmd44548,72440225,69446345,995

Average31345,2127334,7222348,3525

At first glace the results are pretty good, but antlr benchmark works
incorrectly with DRLVM (Harmony-1906) and there are no results for
eclipse and xalan benchmarks. I'm still working on Dacapo analysis.



"Pretty good"?  You're suggesting that DRLVM is faster than Sun 1.5.  I 
would say "Wow!", not "pretty good..."


More info - what is "OrigBuild" and what is "Fixed"?  Why is "Fixed" 
slower than "OrigBuild"?


geir



Thanks,
Vladimir.

On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:

10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
using decapo?

geir


Vladimir Strigun wrote:
> The optimization covers the following issues:
> - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
> Streaming decoding/encoding was removed. Analysis of API hotspots for
> Dacapo shows that CharsetDecoder is frequently used almost in all
> benchmark, especially in chart. We already discussed advantages of
> streaming decoding but the fix shows significant performance
> improvement on average for all Dacapo benchmarks. For instance, boost
> for chart benchmark is about 16%. Paulex, you recently worked in
> nio_char module and if I correctly remember you introduce streaming
> operations, so could you please review the changes and let me know?
> Since streaming operation was removed, tests have been slightly
> modified as well (previous version of tests fails on RI).
> - java.io.BufferedReader
> readLine() method was slightly modified. Additional check whether some
> characters available in cached buffer was added prior to main cycle.
> - java.io.InputStreamReader
> Cached char buffer was removed, read() , read(char[], int, int)
> methods were rewritten. Current implementation of read(char[], int,
> int) uses several invocation of System.arraycopy. Proposed solution
> wraps char[] arguments within char buffer and therefore doesn't use
> arraycopy. Decoding operation is also produced inside the method, so
> fillBuf() has been removed
>
> Thoughts? Comments?
>
> Thanks,
> Vladimir.
>
> On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
>> [classlib][performance] performance improvement for luni and nio_char
>> modules
>> 
- 


>>
>>
>> Key: HARMONY-1980
>> URL: http://issues.apache.org/jira/browse/HARMONY-1980
>> Project: Harmony
>>  Issue Type: Improvement
>>  Components: Classlib
>>Reporter: Vladimir Strigun
>> Attachments: Harmony-1980.diff
>>
>> I've analyzed API frequently used in all Dacapo[1] benchmarks and
>> found several places in luni and nio_char modules that can be
>> improved. Suggested fix gives about 10-15% boost on average for Dacapo
>> executed on DRLVM. I'll post more details to dev list.
>> Attached fix contains modifications for the following classes:
>> java.io.BufferedReader, java.io.InputStreamReader,
>> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.
>>
>> Please have a look to the results of Dacapo execution (values are in
>> millisec, so the less the better):
>>
>> Small workload
>>
>>OrigBuild   Patched
>> bloat   996,078 1024,85
>> chart   1240,7771068,112
>> fop 250,433 232,957
>> hsqldb  348,942 361,139
>> jython  831,143 824,775
>> lusearch1854,95 1870,969
>> luindex 339,45  231,314
>> pmd 29,704  23,638
>>
>>
>> default workload
>>OrigBuild   Patched
>> bloat   168733,562  175493,467
>> chart   31651,792   25681,751
>> fop 2546,289   

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Vladimir Strigun

Mikhail,

Not yet. As I mentioned in the thread I'm still working on Dacapo.
I'll let you know if I find any improvements for JIT.

Thanks,
Vladimir.

On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:

Vladimir,
I see you removed some arraycopy operations in your patch as not effective.
I'm Ok with your solution but what to know if JIT could solve the problem
generating more effective code? Do you have any suggestions for JIT here?

On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
>
> 10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
> using decapo?
>
> geir
>
>
> Vladimir Strigun wrote:
> > The optimization covers the following issues:
> > - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
> > Streaming decoding/encoding was removed. Analysis of API hotspots for
> > Dacapo shows that CharsetDecoder is frequently used almost in all
> > benchmark, especially in chart. We already discussed advantages of
> > streaming decoding but the fix shows significant performance
> > improvement on average for all Dacapo benchmarks. For instance, boost
> > for chart benchmark is about 16%. Paulex, you recently worked in
> > nio_char module and if I correctly remember you introduce streaming
> > operations, so could you please review the changes and let me know?
> > Since streaming operation was removed, tests have been slightly
> > modified as well (previous version of tests fails on RI).
> > - java.io.BufferedReader
> > readLine() method was slightly modified. Additional check whether some
> > characters available in cached buffer was added prior to main cycle.
> > - java.io.InputStreamReader
> > Cached char buffer was removed, read() , read(char[], int, int)
> > methods were rewritten. Current implementation of read(char[], int,
> > int) uses several invocation of System.arraycopy. Proposed solution
> > wraps char[] arguments within char buffer and therefore doesn't use
> > arraycopy. Decoding operation is also produced inside the method, so
> > fillBuf() has been removed
> >
> > Thoughts? Comments?
> >
> > Thanks,
> > Vladimir.
> >
> > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
> >> [classlib][performance] performance improvement for luni and nio_char
> >> modules
> >>
> -
> >>
> >>
> >> Key: HARMONY-1980
> >> URL: http://issues.apache.org/jira/browse/HARMONY-1980
> >> Project: Harmony
> >>  Issue Type: Improvement
> >>  Components: Classlib
> >>Reporter: Vladimir Strigun
> >> Attachments: Harmony-1980.diff
> >>
> >> I've analyzed API frequently used in all Dacapo[1] benchmarks and
> >> found several places in luni and nio_char modules that can be
> >> improved. Suggested fix gives about 10-15% boost on average for Dacapo
> >> executed on DRLVM. I'll post more details to dev list.
> >> Attached fix contains modifications for the following classes:
> >> java.io.BufferedReader, java.io.InputStreamReader,
> >> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.
> >>
> >> Please have a look to the results of Dacapo execution (values are in
> >> millisec, so the less the better):
> >>
> >> Small workload
> >>
> >>OrigBuild   Patched
> >> bloat   996,078 1024,85
> >> chart   1240,7771068,112
> >> fop 250,433 232,957
> >> hsqldb  348,942 361,139
> >> jython  831,143 824,775
> >> lusearch1854,95 1870,969
> >> luindex 339,45  231,314
> >> pmd 29,704  23,638
> >>
> >>
> >> default workload
> >>OrigBuild   Patched
> >> bloat   168733,562  175493,467
> >> chart   31651,792   25681,751
> >> fop 2546,2892512,045
> >> hsqldb  22873,608   13555,515
> >> jython  128207,303  92863,28
> >> lusearch29425,991   30064,153
> >> luindex 17825,795   18083,898
> >> pmd 44548,724   40225,694
> >>
> >>
> >>
> >> [1] http://dacapobench.sourceforge.net
> >>
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> -
> >> If you think it was sent incorrectly contact one of the
> >> administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> >> -
> >> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >>
> >>
> >>
> >
>



--
Mikhail Fursov




Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-27 Thread Vladimir Strigun

Here are the results:

Small workload:
OrigBuild   Fixed   Sun1.5.0_06
bloat   996,078 1024,85 955,589
chart   1240,7771068,112953,096
fop 250,433 232,957 174,901
hsqldb  348,942 361,139 540,45
jython  831,143 824,775 571,292
lusearch1854,95 1870,9691830,589
luindex 339,45  231,314 441,79
pmd 29,704  23,638  61,638

Average 449,91  408,60  471,71

default workload:
OrigBuild   Fixed   Sun1.5.0_06
bloat   16116,691   15618   13578,522
chart   11701,546   10036,631   9790,247
fop 2539,3862502,5182387,289
hsqldb  3217,3383078,3315709,291
jython  14639,278   14064,104   9456,167
lusearch14508,938   16175,085   13663,679
luindex 16292,652   15501,713   15602,178
pmd 10840,264   12937,255   9734,032

Average 9337,73 9281,87 8787,42

large workload:
OrigBuild   Fixed   Sun1.5.0_06
bloat   168733,5175493,46 138468,277
chart   31651,7925681,751   25599,38
fop 2546,2892512,0452412,487
hsqldb  22873,608   13555,515   15751,873
jython  128207,392863,2826183,716
lusearch29425,991   30064,153   26605,631
luindex 17825,795   18083,898   14307,71
pmd 44548,724   40225,694   46345,995

Average 31345,2127334,7222348,3525

At first glace the results are pretty good, but antlr benchmark works
incorrectly with DRLVM (Harmony-1906) and there are no results for
eclipse and xalan benchmarks. I'm still working on Dacapo analysis.

Thanks,
Vladimir.

On 10/26/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:

10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
using decapo?

geir


Vladimir Strigun wrote:
> The optimization covers the following issues:
> - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
> Streaming decoding/encoding was removed. Analysis of API hotspots for
> Dacapo shows that CharsetDecoder is frequently used almost in all
> benchmark, especially in chart. We already discussed advantages of
> streaming decoding but the fix shows significant performance
> improvement on average for all Dacapo benchmarks. For instance, boost
> for chart benchmark is about 16%. Paulex, you recently worked in
> nio_char module and if I correctly remember you introduce streaming
> operations, so could you please review the changes and let me know?
> Since streaming operation was removed, tests have been slightly
> modified as well (previous version of tests fails on RI).
> - java.io.BufferedReader
> readLine() method was slightly modified. Additional check whether some
> characters available in cached buffer was added prior to main cycle.
> - java.io.InputStreamReader
> Cached char buffer was removed, read() , read(char[], int, int)
> methods were rewritten. Current implementation of read(char[], int,
> int) uses several invocation of System.arraycopy. Proposed solution
> wraps char[] arguments within char buffer and therefore doesn't use
> arraycopy. Decoding operation is also produced inside the method, so
> fillBuf() has been removed
>
> Thoughts? Comments?
>
> Thanks,
> Vladimir.
>
> On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
>> [classlib][performance] performance improvement for luni and nio_char
>> modules
>> -
>>
>>
>> Key: HARMONY-1980
>> URL: http://issues.apache.org/jira/browse/HARMONY-1980
>> Project: Harmony
>>  Issue Type: Improvement
>>  Components: Classlib
>>Reporter: Vladimir Strigun
>> Attachments: Harmony-1980.diff
>>
>> I've analyzed API frequently used in all Dacapo[1] benchmarks and
>> found several places in luni and nio_char modules that can be
>> improved. Suggested fix gives about 10-15% boost on average for Dacapo
>> executed on DRLVM. I'll post more details to dev list.
>> Attached fix contains modifications for the following classes:
>> java.io.BufferedReader, java.io.InputStreamReader,
>> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.
>>
>> Please have a look to the results of Dacapo execution (values are in
>> millisec, so the less the better):
>>
>> Small workload
>>
>>OrigBuild   Patched
>> bloat   996,078 1024,85
>> chart   1240,7771068,112
>> fop 250,433 232,957
>> hsqldb  348,942 361,139
>> jython  831,143 824,775
>> lusearch1854,95 1870,969
>> luindex 339,45  231,314
>> pmd 29,704  23,638
>>
>>
>> default workload
>>OrigBuild   Patched
>> bloat   168733,562  175493,467
>> chart   31651,792   25681,751
>> fop 2546,2892512,045
>> hsqldb  22873,608   13555,515
>> jython  128207,303  92863,28
>> lusearch29425,991   30064,153
>> luindex 17825,795   18083,898
>> pmd 44548,724   40225,694
>>
>>
>>
>> [1

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-26 Thread Mikhail Fursov

Vladimir,
I see you removed some arraycopy operations in your patch as not effective.
I'm Ok with your solution but what to know if JIT could solve the problem
generating more effective code? Do you have any suggestions for JIT here?

On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:


10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5
using decapo?

geir


Vladimir Strigun wrote:
> The optimization covers the following issues:
> - java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
> Streaming decoding/encoding was removed. Analysis of API hotspots for
> Dacapo shows that CharsetDecoder is frequently used almost in all
> benchmark, especially in chart. We already discussed advantages of
> streaming decoding but the fix shows significant performance
> improvement on average for all Dacapo benchmarks. For instance, boost
> for chart benchmark is about 16%. Paulex, you recently worked in
> nio_char module and if I correctly remember you introduce streaming
> operations, so could you please review the changes and let me know?
> Since streaming operation was removed, tests have been slightly
> modified as well (previous version of tests fails on RI).
> - java.io.BufferedReader
> readLine() method was slightly modified. Additional check whether some
> characters available in cached buffer was added prior to main cycle.
> - java.io.InputStreamReader
> Cached char buffer was removed, read() , read(char[], int, int)
> methods were rewritten. Current implementation of read(char[], int,
> int) uses several invocation of System.arraycopy. Proposed solution
> wraps char[] arguments within char buffer and therefore doesn't use
> arraycopy. Decoding operation is also produced inside the method, so
> fillBuf() has been removed
>
> Thoughts? Comments?
>
> Thanks,
> Vladimir.
>
> On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
>> [classlib][performance] performance improvement for luni and nio_char
>> modules
>>
-
>>
>>
>> Key: HARMONY-1980
>> URL: http://issues.apache.org/jira/browse/HARMONY-1980
>> Project: Harmony
>>  Issue Type: Improvement
>>  Components: Classlib
>>Reporter: Vladimir Strigun
>> Attachments: Harmony-1980.diff
>>
>> I've analyzed API frequently used in all Dacapo[1] benchmarks and
>> found several places in luni and nio_char modules that can be
>> improved. Suggested fix gives about 10-15% boost on average for Dacapo
>> executed on DRLVM. I'll post more details to dev list.
>> Attached fix contains modifications for the following classes:
>> java.io.BufferedReader, java.io.InputStreamReader,
>> java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.
>>
>> Please have a look to the results of Dacapo execution (values are in
>> millisec, so the less the better):
>>
>> Small workload
>>
>>OrigBuild   Patched
>> bloat   996,078 1024,85
>> chart   1240,7771068,112
>> fop 250,433 232,957
>> hsqldb  348,942 361,139
>> jython  831,143 824,775
>> lusearch1854,95 1870,969
>> luindex 339,45  231,314
>> pmd 29,704  23,638
>>
>>
>> default workload
>>OrigBuild   Patched
>> bloat   168733,562  175493,467
>> chart   31651,792   25681,751
>> fop 2546,2892512,045
>> hsqldb  22873,608   13555,515
>> jython  128207,303  92863,28
>> lusearch29425,991   30064,153
>> luindex 17825,795   18083,898
>> pmd 44548,724   40225,694
>>
>>
>>
>> [1] http://dacapobench.sourceforge.net
>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> If you think it was sent incorrectly contact one of the
>> administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
>> -
>> For more information on JIRA, see:
http://www.atlassian.com/software/jira
>>
>>
>>
>





--
Mikhail Fursov


Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-26 Thread Geir Magnusson Jr.
10%-15%?  That's amazing.  How fast are we (DRLVM) compared to Sun 1.5 
using decapo?


geir


Vladimir Strigun wrote:

The optimization covers the following issues:
- java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
Streaming decoding/encoding was removed. Analysis of API hotspots for
Dacapo shows that CharsetDecoder is frequently used almost in all
benchmark, especially in chart. We already discussed advantages of
streaming decoding but the fix shows significant performance
improvement on average for all Dacapo benchmarks. For instance, boost
for chart benchmark is about 16%. Paulex, you recently worked in
nio_char module and if I correctly remember you introduce streaming
operations, so could you please review the changes and let me know?
Since streaming operation was removed, tests have been slightly
modified as well (previous version of tests fails on RI).
- java.io.BufferedReader
readLine() method was slightly modified. Additional check whether some
characters available in cached buffer was added prior to main cycle.
- java.io.InputStreamReader
Cached char buffer was removed, read() , read(char[], int, int)
methods were rewritten. Current implementation of read(char[], int,
int) uses several invocation of System.arraycopy. Proposed solution
wraps char[] arguments within char buffer and therefore doesn't use
arraycopy. Decoding operation is also produced inside the method, so
fillBuf() has been removed

Thoughts? Comments?

Thanks,
Vladimir.

On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
[classlib][performance] performance improvement for luni and nio_char 
modules
- 



Key: HARMONY-1980
URL: http://issues.apache.org/jira/browse/HARMONY-1980
Project: Harmony
 Issue Type: Improvement
 Components: Classlib
   Reporter: Vladimir Strigun
Attachments: Harmony-1980.diff

I've analyzed API frequently used in all Dacapo[1] benchmarks and 
found several places in luni and nio_char modules that can be 
improved. Suggested fix gives about 10-15% boost on average for Dacapo 
executed on DRLVM. I'll post more details to dev list.
Attached fix contains modifications for the following classes: 
java.io.BufferedReader, java.io.InputStreamReader, 
java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.


Please have a look to the results of Dacapo execution (values are in 
millisec, so the less the better):


Small workload

   OrigBuild   Patched
bloat   996,078 1024,85
chart   1240,7771068,112
fop 250,433 232,957
hsqldb  348,942 361,139
jython  831,143 824,775
lusearch1854,95 1870,969
luindex 339,45  231,314
pmd 29,704  23,638


default workload
   OrigBuild   Patched
bloat   168733,562  175493,467
chart   31651,792   25681,751
fop 2546,2892512,045
hsqldb  22873,608   13555,515
jython  128207,303  92863,28
lusearch29425,991   30064,153
luindex 17825,795   18083,898
pmd 44548,724   40225,694



[1] http://dacapobench.sourceforge.net


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the 
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atlassian.com/software/jira







[classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

2006-10-26 Thread Vladimir Strigun

The optimization covers the following issues:
- java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder
Streaming decoding/encoding was removed. Analysis of API hotspots for
Dacapo shows that CharsetDecoder is frequently used almost in all
benchmark, especially in chart. We already discussed advantages of
streaming decoding but the fix shows significant performance
improvement on average for all Dacapo benchmarks. For instance, boost
for chart benchmark is about 16%. Paulex, you recently worked in
nio_char module and if I correctly remember you introduce streaming
operations, so could you please review the changes and let me know?
Since streaming operation was removed, tests have been slightly
modified as well (previous version of tests fails on RI).
- java.io.BufferedReader
readLine() method was slightly modified. Additional check whether some
characters available in cached buffer was added prior to main cycle.
- java.io.InputStreamReader
Cached char buffer was removed, read() , read(char[], int, int)
methods were rewritten. Current implementation of read(char[], int,
int) uses several invocation of System.arraycopy. Proposed solution
wraps char[] arguments within char buffer and therefore doesn't use
arraycopy. Decoding operation is also produced inside the method, so
fillBuf() has been removed

Thoughts? Comments?

Thanks,
Vladimir.

On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:

[classlib][performance] performance improvement for luni and nio_char modules
-

Key: HARMONY-1980
URL: http://issues.apache.org/jira/browse/HARMONY-1980
Project: Harmony
 Issue Type: Improvement
 Components: Classlib
   Reporter: Vladimir Strigun
Attachments: Harmony-1980.diff

I've analyzed API frequently used in all Dacapo[1] benchmarks and found several 
places in luni and nio_char modules that can be improved. Suggested fix gives 
about 10-15% boost on average for Dacapo executed on DRLVM. I'll post more 
details to dev list.
Attached fix contains modifications for the following classes: 
java.io.BufferedReader, java.io.InputStreamReader, 
java.nio.charset.CharsetDecoder and java.nio.charset.CharsetEncoder.

Please have a look to the results of Dacapo execution (values are in millisec, 
so the less the better):

Small workload

   OrigBuild   Patched
bloat   996,078 1024,85
chart   1240,7771068,112
fop 250,433 232,957
hsqldb  348,942 361,139
jython  831,143 824,775
lusearch1854,95 1870,969
luindex 339,45  231,314
pmd 29,704  23,638


default workload
   OrigBuild   Patched
bloat   168733,562  175493,467
chart   31651,792   25681,751
fop 2546,2892512,045
hsqldb  22873,608   13555,515
jython  128207,303  92863,28
lusearch29425,991   30064,153
luindex 17825,795   18083,898
pmd 44548,724   40225,694



[1] http://dacapobench.sourceforge.net


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira