On Thu, May 28, 2015 at 1:08 PM, Pierre Joye <pierre....@gmail.com> wrote:

>
> On May 28, 2015 1:47 PM, "Dmitry Stogov" <dmi...@zend.com> wrote:
> >
> > Hi Pierre,
> >
> > It looks like the training process is not a trivial task.
> > It may speedup one apps and slowdown others.
> > In my experiments training with many apps leaded to worse results.
> >
> > I'm trying to identify the optimization patterns that lead to speed
> improvement, but without success yet
>
> Not sure how GCC works for the instrumentation but with these apps (and
> flows), we always have speed improvement.
>

GCC may split some functions into "hot" and "cold" paths and then put basic
blocks into different segments.
Then all "hot" blocks are collected together and placed in the order
corresponding to call chains.
As result during execution of "hot" paths we got less i-cache and iTLB
misses, however switching to the "cold" path is almost always more
expensive.

The more we train PHP, the more functions are considered as hot and the
less improvement we actually get :)
But all this related to GCC 4.6-4.9.

GCC-5.1 provides new interesting FDO features.

> What are you using and what kind of degradation do you see? Or improvement
> s
>

For now I use just Wordpress home page and it give me 8% improvement on
Wordpress and 0-8% on other real apps, and some slowdown on bench.php. In
case I trained PHP with Wordpress and bench.php I got slowdown on both.

Thanks. Dmitry.



> > Thanks. Dmitry.
> >
> > On Thu, May 28, 2015 at 5:46 AM, Pierre Joye <pierre....@gmail.com>
> wrote:
> >>
> >>
> >> On May 26, 2015 11:43 PM, "Dmitry Stogov" <dmi...@zend.com> wrote:
> >> >
> >> > On Tue, May 26, 2015 at 6:24 PM, Rasmus Lerdorf <ras...@lerdorf.com>
> wrote:
> >> >
> >> > > On 05/26/2015 07:33 AM, Dmitry Stogov wrote:
> >> > > > Commit:    7dac4d449f72d7eb029aa1a8ee87aaf38e17e1c5
> >> > > > Author:    Dmitry Stogov <dmi...@zend.com>         Tue, 26 May
> 2015
> >> > > 17:33:25 +0300
> >> > > > Parents:   ca31711625095c2d6e308d7f0fc9d371ad0934d4
> >> > > > Branches:  master
> >> > > >
> >> > > > Link:
> >> > >
> http://git.php.net/?p=php-src.git;a=commitdiff;h=7dac4d449f72d7eb029aa1a8ee87aaf38e17e1c5
> >> > > >
> >> > > > Log:
> >> > > > Add targets to simplify building PHP with FDO (Feedback Directed
> >> > > Optimisation)
> >> > > > PHP should be built with the folowing steps:
> >> > > >
> >> > > > make clean
> >> > > > make -j4 prof-gen
> >> > > > ; now php should be trained with some scripts
> >> > > > ; for example `sapi/cgi/php -T 1000
> /var/www/http/wordpress/index.php >
> >> > > /dev/null`
> >> > > > make prof-clean
> >> > > > make -j4 prof-use
> >> > > >
> >> > > > The "properly" trained build may give up to 10% real performance
> boost!
> >> > > > "Improprly" trained PHP might be even slower.
> >> > >
> >> > > Whoa, really 10%? I know there is AutoFDO coming in gcc 5.1 and I
> didn't
> >> > > think this was really practical until then. Perhaps it is. I wonder
> if
> >> > > this will spur php-wordpress, php-drupal and php-mediawiki tuned
> builds?
> >> > >
> >> >
> >> > I hope, we will be able to identify the main sources of speed
> difference
> >> > and provide a source level solution that won't require FDO.
> >> > I'm going to work on this with a team of experts from Intel, but we
> don't
> >> > have any receipt yet.
> >> > In case anyone find something useful please share with me.
> >> >
> >> > At this point we know that FDO built inlines most memcpy() and
> memset()
> >> > calls.
> >> >
> >> > Training scripts may speed-up some apps and slow-down others, but it
> looks
> >> > like most real-life apps benefits from the the same things, while
> synthetic
> >> > benchmarks from something opposite. So improving real-life apps we
> >> > slow-down bench.php and back :)
> >>
> >> It should be possible to use what we use for the windows pgo profiling
> build. All php code anyway:
> >>
> >> https://github.com/OSTC/pgo-scripts/blob/master/README
> >>
> >> Let me know if you like to share the same base, it could save some work.
> >
> >
>

Reply via email to