On Feb 21, 2009, at 10:55 PM, shire wrote:
Hi Ronald,
Ronald Chmara wrote:
Wait... so if I understand this right, let's envision a code base
where,
per some random page load, 70 functions are actually called, but, oh,
7,000, or even 700,000, are being included for whatever reason?
The speed optimization is in *not* copying a massive amount of things
that weren't even needed, or used, in the first place?
Essentially, yes, this is probably best summed up by the 80/20 rule
where we only use 20% of the code etc...
Well, I can see 80% actually *used* code, with 20% in there by
accident.... but 80% unused code? eep! ack! Call the villagers and
get the torches and pitchforks!...
...but environments vary, of course. ;)
However, there's still the horribly massive speed hit of semi-
loading,
and marking, a fairly large amount of unused, un-needed,
functions, as
available?
I don't agree with the description of describing this as a
"horribly massive speed hit" at least in comparison with what was
happening without lazy loading.
Fair enough. Before the patch, for example, I might describe it (80%
unused, 20% used code) as an "insanely awful, horribly massive speed
hit", and after the patch, as being reduced to a much lesser
"horribly massive speed hit", but these are just rhetorical, and
qualitative, language devices that I used to characterize code issues.
In both cases, a large amount of CPU is spent on (effectively) doing
nothing, but your patch (as I understand its design) reduces the
amount of CPU waste... doing nothing.
Also, like I said there's further iterations I plan to make here,
one of these being increasing the performance of this marking
functions as available.
One thing I see as quite a beneficial future outcome of your work is
the ability to further profile code, and be able to seek out code
that marks massive amounts of functions as "available".... without
actually ever using them.
I do see the benefit of lazy loading, I'm just not very
comfortable with
enabling a philosophy of loading up a massive amount of CPU and
RAM with
"just in case they're wanted" features and code in the first place.
Well I am assuming that this is what a large amount of code does
already, except that without lazy loading the situation is
significantly worse.
Different code bases and philosophies vary.
Since much of what I do (enterprise PHP tuning) involves (among many
other things) finding, and eliminating, such code, I can say with
great confidence that there certainly are bloat-bases out there that
load metric hogs-heads of libraries to show a single web email form,
but there are also code bases which do *not* rely on endless
libraries, frameworks, additional template abstractions and end user
libraries, or other pre-determined architectures... to complete the
simple task of showing an web email form.
To frame the issue another way, you are trying to make huge,
complicated, code sets less painful to use, and I am arguing that
huge, complicated, code sets are a major part of the problem..... but
since neither of us can wave a magic wand and reduce the problem to
simple, elegant, code sets, you're reducing the magnitude of pain
involved. Kudos to you.
Your point that we should be sure this does not encourage poor
coding practices is well taken, but it's been my experience that
code tends to take this form regardless so I'm hoping to make the
best of the situation ;-).
There will always be bad code, yes. ;-)
I'm trying to raise a token flag of discussion/resistance to making
bad code practices less painful, as it still enables bad code practices.
Also keep in mind that there are cases where you may not know in
advance which functions you will/will not call, but it's probably
fair to say that the 80/20 rule still holds, so including all the
functions you may need is not particularly a misuse of the
language, but rather a necessity of a dynamic application and
language.
It all depends on the use, and environment, I suppose.
It certainly can boost an APC code set such as facebook, where
many of
those files and functions *will* likely be used in the next 20
minutes
or so, but I also fear that it will encourage programmers to load
everything they have, every time, just in case they need it....
and 2Gb
apache processes (and APC space) can be.... ugly.
I'm not entirely clear on where code being used in the next 20
minutes come into play, what differenc does 100 milliseconds vs. 20
minutes make in APC/lazy loading?
FB seems to have a fair bit of traffic, with a semi-patrolled code
set, so it's likely that any single APC-loaded function will be
invoked *eventually*, within an hour or so.
Contrast this with 1,000 different sites hosted on a box, using a
less patrolled, fairly unregulated, 1,000 different batches of PHP
codesets, where "myTotallyCustomDateTime()" can have 1,000 different
variants, some of which are only actually used once every 3-5 weeks
or so.
Lazy loading would (as I understand it) speed up both, but lazy
loading would also encourage not just one code set, but all code
sets, to assume that the *language authors*, rather than the
*developers*, were responsible for making sure CPU was being managed
efficiently.
It's actually likely that only a fraction of the code at Facebook
will be used in a request, hence the need for lazy loading.
Ouch. Seriously.
I can't tell you how to build your code, but I think you might
seriously benefit from:
a) Lazy Loading (as you've done, great idea)
b) Using Lazy Loading to find out which apps/code are sucking up
massive CPU, and taking action as need to help tune, or remove, the
offending code?
c) Breaking your Lazy Loading targets out, to where (a hypothetical)
the mytzyplk_scramble() function/class method is only included and
thus loaded as needed, rather than (as a guess) a function group (or
class method group) auto-loaded which may/may not be needed for a
given page load?
Does that make sense? Or did you try it already? :)
-Bop
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php