[PHP-DEV] Academic papers on PHP (was Re: [PHP-DEV] RE: Optimizer discussion)

2009-06-07 Thread Paul Biggar
Hi Sebastian,

On Sun, Jun 7, 2009 at 6:56 AM, Sebastian
Bergmanns...@sebastian-bergmann.de wrote:
 Paul Biggar schrieb:
 They have a paper on PHP memory usage.

  Link? I am collecting papers that deal with PHP at
  http://delicious.com/sebastian_bergmann/academic_paper+php

This is great. Below is a list of all the papers I can think of. I
wonder if its a good idea to move your page to the wiki?

Some of these papers are less academic than others - I don't know
where you'd like to draw the line :)

Paul



Papers:

I see you have the Pixy and Minamide papers

This is the PLDI 2009 one. The paper is not yet published, so I can't
find a link.
A Study of Memory Management for Web-based Applications on Multicore Processors
by Hiroshi Inoue, Hideaki Komatsu, and Toshio Nakatani, IBM Tokyo
Research Laboratory


Sound and Precise Analysis of Web Applications for Injection Vulnerabilities
Gary Wassermann, Zhendong Su, PLDI'07.
http://wwwcsif.cs.ucdavis.edu/~wassermg/research/pldi07.pdf

Static Detection of Security Vulnerabilities in Scripting Languages
Yichen Xie and Alex Aiken
http://theory.stanford.edu/~yxie/sec.pdf

@conference{benda06,
   author = {Jan Benda and Tomas Matousek and Ladislav Prosek},
   year = {2006},
   title = {Phalanger: Compiling and Running {PHP} Applications on the
{Microsoft} {.NET} Platform},
   booktitle = {.NET Technologies 2006},
   month = {May},
   location = {Plzen, Czech Republic},
}

@article{johnson06,
   author = {Graeme Johnson and {Zo\{e}} Slattery},
   title =  {PHP}: A Language Implementer's Perspective,
   journal =International PHP Magazine,
   year =   2006,
   pages =  24--29,
   month =  Dec,
}

@techreport{deVries07,
  title = {Design and Implementation of a {PHP} Compiler Front-end},
  author = {Edsko de {Vries} and John Gilbert},
  institution = {Trinity College Dublin},
  type = {Dept. of Computer Science Technical Report},
  number = {TR-2007-47},
  year = {2007}
}

@inproceedings{1480908,
 author = {Tozawa, Akihiko and Tatsubori, Michiaki and Onodera, Tamiya
and Minamide, Yasuhiko},
 title = {Copy-on-write in the PHP language},
 booktitle = {POPL '09: Proceedings of the 36th annual ACM
SIGPLAN-SIGACT symposium on Principles of programming languages},
 year = {2009},
 isbn = {978-1-60558-379-2},
 pages = {200--212},
 location = {Savannah, GA, USA},
 doi = {http://doi.acm.org/10.1145/1480881.1480908},
 publisher = {ACM},
 address = {New York, NY, USA},
 }
http://www.trl.ibm.com/people/mich/pub/200901_popl2009phpsem.pdf

@inproceedings{biggar09,
author = {Paul Biggar and Edsko de Vries and David Gregg},
title = {A Practical Solution for Scripting Language Compilers},
booktitle = {SAC '09: Proceedings of the 2009 ACM symposium on
Applied computing},
year = {2009},
isbn = {978-1-60558-166-8},
pages = {1916--1923},
location = {Honolulu, Hawaii, U.S.A},

publisher = {ACM},
address = {New York, NY, USA},
}
https://www.cs.tcd.ie/~pbiggar/sac-2009.pdf



-- 
Paul Biggar
paul.big...@gmail.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RE: Optimizer discussion

2009-06-07 Thread Rob Nicholson
Graham, Paul,

Paul Biggar paul.big...@gmail.com wrote on 07/06/2009 02:28:48:
 
 On Fri, Jun 5, 2009 at 11:23 PM, Nuno Lopesnlop...@php.net wrote:
 
  About runkit  friends, I wouldn't worr
  much about them. If you're running
  them problably you also don't care about optimizations. If you want to 
be
  able to optimize something, you need to remove as many freedom degrees 
as
  you can..
 
 This is probably true of runkit. However, I would be careful what you
 remove for extra freedom. There is very likely PHP code out there that
 relies (possibly by accident) on some edge cases.
 

Firsly its great to see more and more folks experimenting with the 
implementation
of PHP. I think this will be good for the wider PHP community as the 
design 
of PHP and the possible optimisations become better understood.

I think you'll find that there are a lot of edge cases as Paul mentions 
in PHP that PHP code relies on. I work on IBM's project zero and we have 
hit
quite a lot of them.  Just one example to illustrate. 
We found that the evaluation order within assignments is not at all what 
you
might predict and that existing PHP applications actually rely on the 
evaluation 
order. Consider the following where foo() bar() and baz() have some 
coupling.
$a[foo()]=$b[bar()][baz()];

Even though the test coverage of the Zend Engine as measured by line 
coverage is 
fairly complete we found that there were missing testcases to verify this 
behaviour. We've been following a policy of writing new tests for any such 
behaviour 
that we find so I would suggest that you ensure that you can run and
pass all the PHPT tescases under /tests/lang and under /Zend. 

For example the tests for the behaviour I mention above are 
tests/lang/engine_assignExecutionOrder_XXX.phpt

Then if you find any more PHP code that does not run the same optimised as 
it
does unoptimised it would be great if you could contribute testcases for 
them. 

Actually for full disclosure I should say that although most of the tests 
we have
written are now in cvs, we are still a little behind with contributing all 
the 
engine tests we have written. Hopefully they'll all be there before you 
need them.


Rob Nicholson





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU







Re: [PHP-DEV] RE: Optimizer discussion

2009-06-07 Thread Ilia Alshanetsky

On 6-Jun-09, at 9:28 PM, Paul Biggar wrote:


On Fri, Jun 5, 2009 at 11:23 PM, Nuno Lopesnlop...@php.net wrote:

I'm happy there's some interest in a PHP optimizer :)
I agree with Paul that PECL's optimizer duplicates way too much  
stuff from
the Zend engine, which is not practic nor maintainable. (compare  
for example

with the simple constant folder I implemented some years ago:
http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).


This is certainly a much better demonstration of how the optimizer  
should work.


The existing optimizer already does constant folding...


Ilia Alshanetsky


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RE: Optimizer discussion

2009-06-06 Thread Paul Biggar
On Fri, Jun 5, 2009 at 11:23 PM, Nuno Lopesnlop...@php.net wrote:
 I'm happy there's some interest in a PHP optimizer :)
 I agree with Paul that PECL's optimizer duplicates way too much stuff from
 the Zend engine, which is not practic nor maintainable. (compare for example
 with the simple constant folder I implemented some years ago:
 http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).

This is certainly a much better demonstration of how the optimizer should work.

 About runkit  friends, I wouldn't worr
 much about them. If you're running
 them problably you also don't care about optimizations. If you want to be
 able to optimize something, you need to remove as many freedom degrees as
 you can..

This is probably true of runkit. However, I would be careful what you
remove for extra freedom. There is very likely PHP code out there that
relies (possibly by accident) on some edge cases.


 P.S.: I'll try to meet with Paul in PLDI (in a week) and chat about these
 kinds of things. Is anyone else comming that wants to join the discussion?

You should probably mention this is in Dublin.

Some of the IBM Toyko researches who work on (or maybe close to)
Project Zero will be there, and might have interesting ideas. They
have a paper on PHP memory usage.



Paul

-- 
Paul Biggar
paul.big...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] RE: Optimizer discussion

2009-06-06 Thread Graham Kelly
Hey,

Sorry I haven't replied sooner. I'm glad to see that there is some interest 
here. :-)

Basically where I am at with the project at the moment is trying to get 5.3 
compatibility with the current optimizer. At which point I would like to more 
or less dump the code base I have now in favor of starting from the ground up 
on something that can be built into a much more powerful system. I do really 
like your idea Paul about using eval or something similar for doing compile 
time evaluations. Hopefully I can implement many of the stuff (like all the 
function optimizations) this way. That should help to reduce a LOT of the 
duplicated code; which I agree is not a good thing to have. I've been working 
on whipping up an outline of where I want to take the project. I look forward 
to getting feedback on that :-).

As for runkit, I am not overly concerned with compatibility for extensions such 
as runkit or xdebug, etc at the moment. I don't really see this being too big 
of an issue for many people and if it turns out to be one I can look into it 
when the time comes.

Also, I wanted to let you know that I really enjoyed your tech talk, Paul. Your 
papers also seem like really interesting (from what I have read thus far).

- Graham


From: Paul Biggar [paul.big...@gmail.com]
Sent: Saturday, June 06, 2009 6:28 PM
To: Nuno Lopes
Cc: Graham Kelly; PHP Internals; Brian Shire
Subject: Re: [PHP-DEV] RE: Optimizer discussion

On Fri, Jun 5, 2009 at 11:23 PM, Nuno Lopesnlop...@php.net wrote:
 I'm happy there's some interest in a PHP optimizer :)
 I agree with Paul that PECL's optimizer duplicates way too much stuff from
 the Zend engine, which is not practic nor maintainable. (compare for example
 with the simple constant folder I implemented some years ago:
 http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).

This is certainly a much better demonstration of how the optimizer should work.

 About runkit  friends, I wouldn't worr
 much about them. If you're running
 them problably you also don't care about optimizations. If you want to be
 able to optimize something, you need to remove as many freedom degrees as
 you can..

This is probably true of runkit. However, I would be careful what you
remove for extra freedom. There is very likely PHP code out there that
relies (possibly by accident) on some edge cases.


 P.S.: I'll try to meet with Paul in PLDI (in a week) and chat about these
 kinds of things. Is anyone else comming that wants to join the discussion?

You should probably mention this is in Dublin.

Some of the IBM Toyko researches who work on (or maybe close to)
Project Zero will be there, and might have interesting ideas. They
have a paper on PHP memory usage.



Paul

--
Paul Biggar
paul.big...@gmail.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RE: Optimizer discussion

2009-06-06 Thread Sebastian Bergmann
Paul Biggar schrieb:
 They have a paper on PHP memory usage.

 Link? I am collecting papers that deal with PHP at
 http://delicious.com/sebastian_bergmann/academic_paper+php

-- 
Sebastian BergmannCo-Founder and Principal Consultant
http://sebastian-bergmann.de/   http://thePHP.cc/


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: Optimizer discussion

2009-06-05 Thread Paul Biggar
Hi Graham,

Simple things first:

On Fri, Jun 5, 2009 at 1:08 AM, Graham Kelly grah...@facebook.com wrote:
 I'm not sure which optimization you are talking about with the GLOBALS stuff 
 but what your saying makes sense. (Its been awhile since I've looked at the 
 code base myself, I'm just getting back to working on it)

I copied that comment straight from the source, but I can't find it
now that I went looking for it. No matter.



 Why not start off with the big stuff, dataflow. I personally believe that 
 working out good data flow for PHP is key to getting good optimizations. But 
 you are right, its a very tricky thing to do and in some cases impossible. 
 Ultimately, I would like to move a lot of the optimizer work more into this 
 direction and use the data flow to build a basic platform for code analysis 
 on which optimizations can be done. For now though, pecl/optimizer is dumb 
 about data types :-)


And now the hard stuff. To avoid me repeating myself, let me just pimp
my Tech Talk. Have a look at
http://www.youtube.com/watch?v=kKySEUrP7LA from about the 30:45 mark
until just before the 47:00 mark (slides at
https://www.cs.tcd.ie/~pbiggar/paul_biggar_google_18_mar_2009_notes.pdf).
That highlights most of the problems, and vaguely hints at their
solution. We can go into much greater detail on the solutions after.



Thanks,
Paul

-- 
Paul Biggar
paul.big...@gmail.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: Optimizer discussion

2009-06-05 Thread Paul Biggar
Hi Graham,

On Fri, Jun 5, 2009 at 12:03 PM, Paul Biggarpaul.big...@gmail.com wrote:
 Why not start off with the big stuff, dataflow. I personally believe that 
 working out good data flow for PHP is key to getting good optimizations. But 
 you are right, its a very tricky thing to do and in some cases impossible. 
 Ultimately, I would like to move a lot of the optimizer work more into this 
 direction and use the data flow to build a basic platform for code analysis 
 on which optimizations can be done. For now though, pecl/optimizer is dumb 
 about data types :-)


 And now the hard stuff. To avoid me repeating myself, let me just pimp
 my Tech Talk. Have a look at
 http://www.youtube.com/watch?v=kKySEUrP7LA from about the 30:45 mark
 until just before the 47:00 mark (slides at
 https://www.cs.tcd.ie/~pbiggar/paul_biggar_google_18_mar_2009_notes.pdf).
 That highlights most of the problems, and vaguely hints at their
 solution. We can go into much greater detail on the solutions after.


Based on the fact that you want to do dataflow, I wonder if its a good
idea to think about co-opting the phc optimizer to perform analysis on
bytecode. To my mind this seems much easier than re-implementing from
scratch. As I mentioned before, this incorporates about 2 years of
work (much of it research of course, so it might not take as long to
replicate). This would mean you could go straight to performing
analyses (though there will no doubt be work required on the optimizer
itself).

Technically speaking, this isn't a big problem. We'd probably need to
change the phc MIR to mirror the bytecode (no harm anyway in terms of
correctness), and have a bytecode-reader and -writer (though this
needn't involve serializing - likely a small interface instead).
Politically, I assume it won't be a problem either, since its in PECL.


Thoughts?

Paul




-- 
Paul Biggar
paul.big...@gmail.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] RE: Optimizer discussion

2009-06-05 Thread Nuno Lopes

Hi,

I'm happy there's some interest in a PHP optimizer :)
I agree with Paul that PECL's optimizer duplicates way too much stuff from 
the Zend engine, which is not practic nor maintainable. (compare for example 
with the simple constant folder I implemented some years ago: 
http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).
About runkit  friends, I wouldn't worry much about them. If you're running 
them problably you also don't care about optimizations. If you want to be 
able to optimize something, you need to remove as many freedom degrees as 
you can..


Anyway, I don't know how much time you're going to invest in this optimizer, 
but I'll certainly be more than happy to discuss your ideas.



Nuno

P.S.: I'll try to meet with Paul in PLDI (in a week) and chat about these 
kinds of things. Is anyone else comming that wants to join the discussion?



- Original Message - 
From: Graham Kelly grah...@facebook.com

To: Paul Biggar paul.big...@gmail.com
Cc: PHP Internals internals@lists.php.net; Brian Shire 
sh...@facebook.com

Sent: Friday, June 05, 2009 1:08 AM
Subject: [PHP-DEV] RE: Optimizer discussion


Hey,

I always love having input. When you said it was vicious I was expecting 
more, in fact I agree completely with you on a lot of things :-)


Anyway, I'm not really sure how much detail you want me to go into (or how 
much detail people on internals really want me to get into). So, I'll keep 
it brief for now and can expand on anything.


Why not start off with the big stuff, dataflow. I personally believe that 
working out good data flow for PHP is key to getting good optimizations. But 
you are right, its a very tricky thing to do and in some cases impossible. 
Ultimately, I would like to move a lot of the optimizer work more into this 
direction and use the data flow to build a basic platform for code analysis 
on which optimizations can be done. For now though, pecl/optimizer is dumb 
about data types :-)


The reimplementations of some engine code is messy and work should probably 
be done to try to remove this where possible. Also, I might be mistaken but 
the is_numeric_result stuff is partly left over from Turck MMCache which to 
my understanding this version of pecl/optimizer was based off of. Some of 
the stuff I was doing with building a function table (for optimizable and 
some non optimizable functions) was to try and get rid of rudimentary data 
type detection like this.  Actually folding in values from function calls is 
happening over in the optimize_fcr.c file.


I 100% agree with you on the file system functions. They were in there when 
I started working on the optimizer and I havent really paid much attention 
to them. The latest CVS version of pecl/optimizer has them at least removed 
from being candidates for optimization (the code to actually optimize is 
still there).


I'm not sure which optimization you are talking about with the GLOBALS stuff 
but what your saying makes sense. (Its been awhile since I've looked at the 
code base myself, I'm just getting back to working on it)


As far as my future plans for pecl/optimizer I should really gather up all 
my ideas and stuff in the next week or so that you or anyone else who is 
interested can give feedback. At the moment, I'm working on getting the 
current version to a stable state. I'm also still trying to gauge demand for 
pecl/optimizer to maybe help figure out direction for the project. (or if 
there is really any real interest/or use).


From: Paul Biggar [paul.big...@gmail.com]
Sent: Thursday, June 04, 2009 4:20 PM
To: Graham Kelly
Cc: PHP Internals; Brian Shire
Subject: Optimizer discussion

Graham and I are having a brief chat about the work he's going to do
on the PECL optimizer. People have asked me to do this on-list (they
may have meant the PECL list, but optimizations on PHP seem more
relevant here), so here goes.


Hi Graham,

So the general gist of what I have to say is that dataflow
optimizations on PHP are very difficult, and nearly impossible at the
function-local level. Loop-invariant hoisting and other redundant
expression computation liekwise. If you're planning on working on
them, we can go into more detail.


I guess the biggest thing is that I'm wondering what your plans are
for the PECL optimizer? I've spent about 2 years working on the phc
optimizer, (and a bit longer on relevant things) so I hope that my
advice will be relevant.



I've taken a look through the optimizer a few times over the last
while, (and even stolen some ideas from it). Here are my comments on
the current code:

- There is lots of code which reimplements parts of the engine, for
example: ini_bool_decode, optimizer_acosh and friends, optimize_md5,
optimize_crc32, optimize_sha1, optimize_class_exists and friends (to a
lesser extent). There are also lots of constant foldings, like casts
and 0 == false (etc) in optimize_code_block. I don't understand why
there is logic

[PHP-DEV] RE: Optimizer discussion

2009-06-04 Thread Graham Kelly
Hey,

I always love having input. When you said it was vicious I was expecting more, 
in fact I agree completely with you on a lot of things :-)

Anyway, I'm not really sure how much detail you want me to go into (or how much 
detail people on internals really want me to get into). So, I'll keep it brief 
for now and can expand on anything.

Why not start off with the big stuff, dataflow. I personally believe that 
working out good data flow for PHP is key to getting good optimizations. But 
you are right, its a very tricky thing to do and in some cases impossible. 
Ultimately, I would like to move a lot of the optimizer work more into this 
direction and use the data flow to build a basic platform for code analysis on 
which optimizations can be done. For now though, pecl/optimizer is dumb about 
data types :-)

The reimplementations of some engine code is messy and work should probably be 
done to try to remove this where possible. Also, I might be mistaken but the 
is_numeric_result stuff is partly left over from Turck MMCache which to my 
understanding this version of pecl/optimizer was based off of. Some of the 
stuff I was doing with building a function table (for optimizable and some non 
optimizable functions) was to try and get rid of rudimentary data type 
detection like this.  Actually folding in values from function calls is 
happening over in the optimize_fcr.c file.

I 100% agree with you on the file system functions. They were in there when I 
started working on the optimizer and I havent really paid much attention to 
them. The latest CVS version of pecl/optimizer has them at least removed from 
being candidates for optimization (the code to actually optimize is still 
there).

I'm not sure which optimization you are talking about with the GLOBALS stuff 
but what your saying makes sense. (Its been awhile since I've looked at the 
code base myself, I'm just getting back to working on it)

As far as my future plans for pecl/optimizer I should really gather up all my 
ideas and stuff in the next week or so that you or anyone else who is 
interested can give feedback. At the moment, I'm working on getting the current 
version to a stable state. I'm also still trying to gauge demand for 
pecl/optimizer to maybe help figure out direction for the project. (or if there 
is really any real interest/or use).

From: Paul Biggar [paul.big...@gmail.com]
Sent: Thursday, June 04, 2009 4:20 PM
To: Graham Kelly
Cc: PHP Internals; Brian Shire
Subject: Optimizer discussion

Graham and I are having a brief chat about the work he's going to do
on the PECL optimizer. People have asked me to do this on-list (they
may have meant the PECL list, but optimizations on PHP seem more
relevant here), so here goes.


Hi Graham,

So the general gist of what I have to say is that dataflow
optimizations on PHP are very difficult, and nearly impossible at the
function-local level. Loop-invariant hoisting and other redundant
expression computation liekwise. If you're planning on working on
them, we can go into more detail.


I guess the biggest thing is that I'm wondering what your plans are
for the PECL optimizer? I've spent about 2 years working on the phc
optimizer, (and a bit longer on relevant things) so I hope that my
advice will be relevant.



I've taken a look through the optimizer a few times over the last
while, (and even stolen some ideas from it). Here are my comments on
the current code:

- There is lots of code which reimplements parts of the engine, for
example: ini_bool_decode, optimizer_acosh and friends, optimize_md5,
optimize_crc32, optimize_sha1, optimize_class_exists and friends (to a
lesser extent). There are also lots of constant foldings, like casts
and 0 == false (etc) in optimize_code_block. I don't understand why
there is logic in the code for that, rather than simply executing the
opcodes, or constructing an eval and executing that.

- is_numeric_result: there has been great effort to figure out numeric
results from pure functions, when it seems straightforward to
optimizer the results straight in. Maybe that is being done elsewhere?
If so, there may need to be some care taken to ensure that all
optimizations terminate.

- File system functions are very iffy. I would be surprised if people
have content that reads from files repeatedly, but where the files do
not change, and who are willing to use that flag.

- Most of the identity optimizations arent safe. $x + 0 !== $x,
unfortunately, due to integer coercions (parallels exist for other
types/operators)

- I think I saw an optimizations converting (45 + $x) into (45+$x) -
that's a great idea, which I will steal.

- How does runkit (and other weird extensions) affect optimizations on
constants, class_exists, etc?

- The optimization unsafe: optimize out isset()/empty() ops on
GLOBALS['foo'] into $foo  is not safe, as GLOBALS['foo'] may not be
the same variable as $foo ($GLOBALS may be unset, and indeed, there