Re: [PHP-DEV] lstat call on each directory level

2008-07-16 Thread Amir Hardon
On Wed, 2008-07-16 at 06:45 -0700, Rasmus Lerdorf wrote:

 Arvids Godjuks wrote:
  Hello.
 
  I think this should be optimized.
  I'm not an expert ofcourse, but as I understood there is only one case
  witch need a special treatment - require/include _one when a file with
  equal contents is included from different directories.
  You can make a switch witch controls if lstat is made or not in these
  cases. People who know what they are doing will switch it to off and
  make sure their includes don't result in Fatal error (anyway, to my
  opinion it is bad desing if such thing happens).
  Ofcourse open_basedir users will don't have any benefit from it, but
  that's their choise.
  So I think you should think it out and make this optimization to 5.3
  release. It would be great optimization, IMHO.
 
 But all these lstats should be getting cached, so I don't see how it 
 would affect performance very much.  If you are blowing your realpath 
 cache, you need to take a look at why that is happening.
 
 We probably should disconnect clearstatcache() from the realpath_cache, 
 and we could perhaps look at doing partial path caches through our own 
 realpath implementation.  The other thing that can suck is when you have 
 an include_path miss.  We don't cache misses like this, so if you are 
 relying on include_path to find your files and you don't hit it on the 
 first try, you are going to see a bunch of stats.  But that is again 
 something that is easily fixed by not writing crappy code.
 
 I think that breaking code that looks like this:
 
 require_once './a.inc';
 require_once 'a.inc';
 require_once '../a.inc';
 require_once 'includes/a.inc';
 
 when these all refer to the same a.inc file depending on where the 
 parent file is and what the coder had for breakfast that morning would 
 be a very bad idea.
 
 -Rasmus
 


Since the realpath cache is only relevant for a single request(right?),
removing these lstats
 calls will a major benefit.

Before moving our portal dir to the /  dir, ~40% of our page requests
were slow on the server side (I'm not sure if my company policies allow
me to expose exactly what is considered slow),
after moving it ~20% of the page requests were slow! this is
significant.
And there are still many lstat calls made inside our portal's directory
tree.

So I think that a  php.ini directive for switching off these lstats
which will be off by default,
will be a great thing.

-Amir.


[PHP-DEV] lstat call on each directory level

2008-07-15 Thread Amir Hardon
Hi,

I've noticed a weird behavior when doing file access from PHP:
PHP seems to make an lstat call on each of the parent directories of the
accessed file, for example see this script:

?php
$fp=fopen(/var/www/metacafe/test,r);
fclose($fp);
?

When running with strace -e lstat I see this:
lstat(/var, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat(/var/www, {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
lstat(/var/www/metacafe, {st_mode=S_IFDIR|0755, st_size=4096, ...}) =
0
lstat(/var/www/metacafe/test, 0x7fbfff9b10) = -1 ENOENT (No such file
or directory)

Measuring total syscalls time for an apache process on a production
server, I found out
that ~33% of the time it spends in syscalls is spent on lstat.

I did a pretty deep web search on the issue and came out with nothing.
I'll also note that I did a small experiment - moving our root portal
folder to /,
this gave an amazing performance improvement!

So my questions are:
What is the reason for doing these lstat calls?
How can it be disabled? if not by configuration, maybe by patching php
(can you direct me to where is this being done in php's source?)


Thanks!
-Amir.


Re: [PHP-DEV] lstat call on each directory level

2008-07-15 Thread Amir Hardon
On Tue, 2008-07-15 at 11:40 -0700, Rasmus Lerdorf wrote:

 Amir Hardon wrote:
  I've noticed a weird behavior when doing file access from PHP:
  PHP seems to make an lstat call on each of the parent directories of the
  accessed file, for example see this script:
  
  ?php
  $fp=fopen(/var/www/metacafe/test,r);
  fclose($fp);
  ?
  
  When running with strace -e lstat I see this:
  lstat(/var, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
  lstat(/var/www, {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
  lstat(/var/www/metacafe, {st_mode=S_IFDIR|0755, st_size=4096, ...}) =
  0
  lstat(/var/www/metacafe/test, 0x7fbfff9b10) = -1 ENOENT (No such file
  or directory)
  
  Measuring total syscalls time for an apache process on a production
  server, I found out
  that ~33% of the time it spends in syscalls is spent on lstat.
  
  I did a pretty deep web search on the issue and came out with nothing.
  I'll also note that I did a small experiment - moving our root portal
  folder to /,
  this gave an amazing performance improvement!
  
  So my questions are:
  What is the reason for doing these lstat calls?
  How can it be disabled? if not by configuration, maybe by patching php
  (can you direct me to where is this being done in php's source?)
 
 That's a realpath() call and it should be getting cached by the realpath 
 cache, so if you are seeing these on every request, try increasing your 
 realpath_cache size in your .ini.  Without checking the realpath, you 
 would be able to circumvent open_basedir checking really easily with a 
 symlink.
 
 -Rasmus


I've already increased the realpath_cache to the point it didn't give
any more benefit(And it did give benefit),
but there are still many lstat calls, and still placing our portal dir
in the root directory gave a huge performance benefit(After fine-tuning
realpath_cache).
We don't use open_basedir.

I think it might be wise to make this dir check configurable, as the
performance impact is major.
Anyway - can you please direct me to the place where this check is made
in php's source, so I'll be able to disable it manually?


Thanks!
-Amir.



Re: [PHP-DEV] lstat call on each directory level

2008-07-15 Thread Amir Hardon
On Tue, 2008-07-15 at 12:25 -0700, Rasmus Lerdorf wrote:

 Amir Hardon wrote:
  On Tue, 2008-07-15 at 11:40 -0700, Rasmus Lerdorf wrote:
  Amir Hardon wrote:
   I've noticed a weird behavior when doing file access from PHP:
   PHP seems to make an lstat call on each of the parent directories of the
   accessed file, for example see this script:
   
   ?php
   $fp=fopen(/var/www/metacafe/test,r);
   fclose($fp);
   ?
   
   When running with strace -e lstat I see this:
   lstat(/var, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
   lstat(/var/www, {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
   lstat(/var/www/metacafe, {st_mode=S_IFDIR|0755, st_size=4096, ...}) =
   0
   lstat(/var/www/metacafe/test, 0x7fbfff9b10) = -1 ENOENT (No such file
   or directory)
   
   Measuring total syscalls time for an apache process on a production
   server, I found out
   that ~33% of the time it spends in syscalls is spent on lstat.
   
   I did a pretty deep web search on the issue and came out with nothing.
   I'll also note that I did a small experiment - moving our root portal
   folder to /,
   this gave an amazing performance improvement!
   
   So my questions are:
   What is the reason for doing these lstat calls?
   How can it be disabled? if not by configuration, maybe by patching php
   (can you direct me to where is this being done in php's source?)
 
  That's a realpath() call and it should be getting cached by the realpath 
  cache, so if you are seeing these on every request, try increasing your 
  realpath_cache size in your .ini.  Without checking the realpath, you 
  would be able to circumvent open_basedir checking really easily with a 
  symlink.
 
  -Rasmus
  
  I've already increased the realpath_cache to the point it didn't give 
  any more benefit(And it did give benefit),
  but there are still many lstat calls, and still placing our portal dir 
  in the root directory gave a huge performance benefit(After fine-tuning 
  realpath_cache).
  We don't use open_basedir.
  
  I think it might be wise to make this dir check configurable, as the 
  performance impact is major.
  Anyway - can you please direct me to the place where this check is made 
  in php's source, so I'll be able to disable it manually?
 
 Well, it is used in other places too, like in figuring out _once paths. 
   Including the same file using different paths still needs to be caught.
 
 Are you calling clearstatcache() manually anywhere?  That blows away the 
 entire realpath cache and completely destroys your performance, so you 
 will want to avoid doing that very often.
 
 -Rasmus
 


About clearstatcache() - not using it at all.

Correct me if I'm wrong but this realpath cache is a per-request cache
(when using php as an apache module),
so unless I'm wrong ,the performance benefit I'm getting when moving the
portal to the / dir should be obvious
(our code is splitted to many files, and each file that is being
required, is generating few lstat calls).

About the issue with the _once, did the patch Derick offered handles it
(I haven't examined it yet).
If not, I just need to make sure that the same file isn't being
referenced by two paths and I'm safe right?
(I mean assuming I'll adjust it to php5)



Thanks again to both of you!
-Amir.