On Wed, 23 Dec 2009 12:52:38 +0100 Pavel Stehule <pavel.steh...@gmail.com> wrote:
> The problem is massive cycle. Plpgsql really isn't language for > this. This interpret missing own arithmetic unit. So every > expression is translated to SELECT statement > > IF a > c ~ SELECT a > c > a = a + 1 ~ SELECT a + 1 > these queries are executed in some special mode, but still it is > more expensive than C a = a + 1 I didn't get it. I'd expect that since plpgsql should shere SQL data types it could map easily something like if (a>b) then or a := a +b and something like a for in query loop was going to be highly optimised as well. plpgsql should be the most tightly bounded language to the internals of pg SQL. > > select sum(a) from data; takes 1999.492 ms. > > select count(*) from data; takes 1612.039 ms > it is slower, because there is one NULL test more. That didn't came as a surprise. It was there for comparison. > PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses I didn't notice. That was my first plperl function in my life. Anyway that means that that test didn't say anything about interpreter speed and duck typing. > > Anyway I'm not able to justify the difference in speed between > > plpgsql and your solution in such case unless plpgsql is > > switching back and forward between binary data and their text > > representation. It would be nice if someone that know plpgsql > > internals explain where the cost comes from. > I thing so most expensive part in your plpgsql code is two > expression. I didn't understand. What's the reason the interpreter can't translate the if and the b := row._a + b; into something that very resemble compiled C? plpgsql is not doing anything different than: select count(*) from (select case when a>3 then 1 else 2 end, a+a from data limit 9000000) as f; One of the advantages of using plpgsql (and one annoyance too) is that variables are declared and plpgsql should know how to operate on them with native C code. BTW the best performer considering all constraints and data distribution seems to be the simplest one: select sum(s.a) from (select a from data where a>0 limit 9000000) s; Time: 2620.677 ms We've no negative int... and they are int, so they have to be no less than 1 if they are not 0. If I know there are no 0, the simplest version become even faster. When I randomly pick up values in [1,3] plpgsql and generate_series start to perform similarly but still the simplest version is leading. When the interval is in [1,6] the plpgsql becomes faster than the generate_series but the simplest version is still leading. Just when the interval is in [1,10] the plpgsql version and the simplest one becomes comparable. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql