Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

Geir Magnusson Jr. Fri, 27 Oct 2006 08:33:01 -0700


Vladimir Strigun wrote:

Mikhail,

It was pretty old build. Now I'm gathering info for the current DRLVM
(antlr, eclipse, xalan are still not included).
I've executed every benchmarks 10 times and the result is geometric
mean of the last 5 executions.
Machine: P4, 3Ghz, 1Gb RAM
Build1 = current Harmony build, svn = r468353, (Oct 27 2006),
Windows/ia32/msvc 1310, release build
Build2 = Build1+Harmony-1980
RI: jdk1.5.0_06

Arguments for DRLVM: -Xem:server -Xms700m -Xmx700m
Arguments for Sun: -XX:+AggressiveHeap -XX:+UseBiasedLocking
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xss64k -Xms700m -Xmx700m

Results for small input:
    Build1    Build2    RI
bloat    1014,371    1024,618    968,976
chart    1427,912    1186,959    956,125
fop    243,426    244,317    171,701
hsqldb    330,856    324,493    549,55
jython    1092,869    1102,331    568,088
lusearch    1999,63    1971,813    1830,707
luindex    421,703    225,073    594,78
pmd    27,332    26,981    53,319

Average    482,5168816    434,5997662    481,3767025

Here we can see that DRLVM is a little bit faster, but recommendations
for Dacapo says that small workload is for testing and "either
reporting default or large in any performance analysis".

Default input:

    Build1    Build2    RI
bloat    17155,441    17131,63    13718,637
chart    13342,101    10924,038    9755,926
fop    2621,146    2584,326    2353,304
hsqldb    3153,212    3101,691    5737,304
jython    16240,515    15632,52    8299,957
lusearch    16280,762    16255,764    13518,751
luindex    12420,638    10730,491    15782,563
pmd    11027,172    11136,656    9689,841

Average    9538,259502    9063,946046    8638,4136

So, for default input we are 5-10% slower.

I'll provide the results for large input as soon as performance runcompleted.

I know that I'm going to be an annoying broken record here, filling uppeople mailboxes, but I'll say it again - that's mighty impressive.I'll take within 20% of Sun at this point in our project's life any dayof the week.

(Of course, world-class performance - as measured by SPECjbb iscurrently held by IBM's J9 on woodcrest, so that's probably the stretchtarget ;)


geir


Thanks,
Vladimir.

On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:

Vladimir,
+1 more question: between TM integration and HARMONY-1942 incorrect
behaviour of BBP could significantly slow down the execution.
Did you do your measurements with Harmony-1942 applied?

On 10/27/06, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
>
> Mikhail,
>
> Not yet. As I mentioned in the thread I'm still working on Dacapo.
> I'll let you know if I find any improvements for JIT.
>
> Thanks,
> Vladimir.
>
> On 10/27/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:
> > Vladimir,
> > I see you removed some arraycopy operations in your patch as not
> effective.
> > I'm Ok with your solution but what to know if JIT could solve the
> problem
> > generating more effective code? Do you have any suggestions for JIT
> here?
> >
> > On 10/27/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:
> > >

> > > 10%-15%? That's amazing. How fast are we (DRLVM) compared toSun 1.5

> > > using decapo?
> > >
> > > geir
> > >
> > >
> > > Vladimir Strigun wrote:
> > > > The optimization covers the following issues:
> > > > - java.nio.charset.CharsetDecoder and
> java.nio.charset.CharsetEncoder
> > > > Streaming decoding/encoding was removed. Analysis of API hotspots
> for
> > > > Dacapo shows that CharsetDecoder is frequently used almost in all

> > > > benchmark, especially in chart. We already discussedadvantages of

> > > > streaming decoding but the fix shows significant performance
> > > > improvement on average for all Dacapo benchmarks. For instance,
> boost
> > > > for chart benchmark is about 16%. Paulex, you recently worked in

> > > > nio_char module and if I correctly remember you introducestreaming> > > > operations, so could you please review the changes and let meknow?

> > > > Since streaming operation was removed, tests have been slightly
> > > > modified as well (previous version of tests fails on RI).
> > > > - java.io.BufferedReader
> > > > readLine() method was slightly modified. Additional check whether
> some

> > > > characters available in cached buffer was added prior to maincycle.

> > > > - java.io.InputStreamReader
> > > > Cached char buffer was removed, read() , read(char[], int, int)

> > > > methods were rewritten. Current implementation of read(char[],int,> > > > int) uses several invocation of System.arraycopy. Proposedsolution> > > > wraps char[] arguments within char buffer and thereforedoesn't use> > > > arraycopy. Decoding operation is also produced inside themethod, so

> > > > fillBuf() has been removed
> > > >
> > > > Thoughts? Comments?
> > > >
> > > > Thanks,
> > > > Vladimir.
> > > >
> > > > On 10/26/06, Vladimir Strigun (JIRA) <[EMAIL PROTECTED]> wrote:
> > > >> [classlib][performance] performance improvement for luni and
> nio_char
> > > >> modules
> > > >>
> > >

>-----------------------------------------------------------------------------

> > > >>
> > > >>
> > > >>                 Key: HARMONY-1980
> > > >>                 URL:
> http://issues.apache.org/jira/browse/HARMONY-1980
> > > >>             Project: Harmony
> > > >>          Issue Type: Improvement
> > > >>          Components: Classlib
> > > >>            Reporter: Vladimir Strigun
> > > >>         Attachments: Harmony-1980.diff
> > > >>

> > > >> I've analyzed API frequently used in all Dacapo[1] benchmarksand

> > > >> found several places in luni and nio_char modules that can be
> > > >> improved. Suggested fix gives about 10-15% boost on average for
> Dacapo
> > > >> executed on DRLVM. I'll post more details to dev list.
> > > >> Attached fix contains modifications for the following classes:
> > > >> java.io.BufferedReader, java.io.InputStreamReader,

> > > >> java.nio.charset.CharsetDecoder andjava.nio.charset.CharsetEncoder

> .
> > > >>

> > > >> Please have a look to the results of Dacapo execution (valuesare

> in
> > > >> millisec, so the less the better):
> > > >>
> > > >> Small workload
> > > >>
> > > >>        OrigBuild   Patched
> > > >> bloat   996,078 1024,85
> > > >> chart   1240,777        1068,112
> > > >> fop     250,433 232,957
> > > >> hsqldb  348,942 361,139
> > > >> jython  831,143 824,775
> > > >> lusearch        1854,95 1870,969
> > > >> luindex 339,45  231,314
> > > >> pmd     29,704  23,638
> > > >>
> > > >>
> > > >> default workload
> > > >>        OrigBuild   Patched
> > > >> bloat   168733,562      175493,467
> > > >> chart   31651,792       25681,751
> > > >> fop     2546,289        2512,045
> > > >> hsqldb  22873,608       13555,515
> > > >> jython  128207,303      92863,28
> > > >> lusearch        29425,991       30064,153
> > > >> luindex 17825,795       18083,898
> > > >> pmd     44548,724       40225,694
> > > >>
> > > >>
> > > >>
> > > >> [1] http://dacapobench.sourceforge.net
> > > >>
> > > >>
> > > >> --
> > > >> This message is automatically generated by JIRA.
> > > >> -
> > > >> If you think it was sent incorrectly contact one of the
> > > >> administrators:
> > > http://issues.apache.org/jira/secure/Administrators.jspa
> > > >> -
> > > >> For more information on JIRA, see:
> > > http://www.atlassian.com/software/jira
> > > >>
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Mikhail Fursov
> >
> >
>



--
Mikhail Fursov

Re: [classlib][performance] performance improvement for luni and nio_char modules - Harmony-1980

Reply via email to