date:20060411

Re: [HACKERS] GPUSort project

2006-04-11 Thread Martijn van Oosterhout

On Tue, Apr 11, 2006 at 04:02:07PM -0700, Mischa Sandberg wrote:
> Anybody on this list hear/opine anything pf the GPUSort project for 
> postgresql? I'm working on a radix-sort subcase for tuplesort, and there 
> are similarities.
> 
> http://www.andrew.cmu.edu/user/ngm/15-823/project/

I've heard it meantioned, didn't know they'd got it working. However,
none of my database servers have a 3D graphics anywhere near the power
they suggest in the article.

Is this of practical use for run-of-the-mill video cards?
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Josh Berkus

Neil,

> Perhaps a compromise would be to enable pl/pgsql by default, but not
> grant the USAGE privilege on it. This would allow superusers to define
> pl/pgsql functions without taking any additional steps. Non-superusers
> could be given access to pl/pgsql via a simple GRANT -- either for all
> users via GRANT TO PUBLIC, or on a more granular basis as desired. This
> would lower the barrier to using pl/pgsql by a fairly significant
> margin, but not cause any additional security exposure that I can see.

Would this support PL/pgSQL based admin functions, though?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] How to implement oracle like rownum(function or seudocolumn)

2006-04-11 Thread Nicolas Barbier

2006/4/8, Tom Lane <[EMAIL PROTECTED]>:

> I've never understood what the conceptual model is for Oracle's rownum.
> Where along the SQL operational pipeline (FROM / WHERE / GROUP BY /
> aggregate / compute output columns / ORDER BY) is it supposed to be
> computed?  To be useful for the often-requested purpose of nicely
> labeling output with line numbers, it'd have to be assigned
> post-ORDER-BY, but then it doesn't make any sense at all to use it in
> WHERE, nor in sub-selects.
>
> A function implemented as per Michael's example would not give the
> results that I think people would expect for
>
> SELECT rownum(), * FROM foo ORDER BY whatever;
>
> unless the planner chances to do the ordering with an indexscan.
> If it does it with a sort step then the rownums will be computed before
> sorting :-(

I don't know about Oracle or ROW_NUM, but SQL apparently defines
ROW_NUMBER() OVER (..) (see
http://en.wikipedia.org/wiki/Select_(SQL)#ROW_NUMBER.28.29_window_function>)

This gives a number for each output row, according to some ordering
(in SQL, one cannot do ORDER BY in a subquery AFAIK). If used in a
subquery, one can then of course use the resulting column in the WHERE
clause of the outer query:

SELECT * FROM (
  SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
  FROM tablename
) AS foo
WHERE rownumber <= 10

(example stolen from the Wikipedia article linked above).

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake


> That's a fair point.
> 
> Perhaps a compromise would be to enable pl/pgsql by default, but not
> grant the USAGE privilege on it. This would allow superusers to define
> pl/pgsql functions without taking any additional steps. Non-superusers
> could be given access to pl/pgsql via a simple GRANT -- either for all
> users via GRANT TO PUBLIC, or on a more granular basis as desired. This
> would lower the barrier to using pl/pgsql by a fairly significant
> margin, but not cause any additional security exposure that I can see.

That seems reasonable.

Joshua D. Drake


> 
> -Neil
> 
> 
-- 

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/





---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] GPUSort project

2006-04-11 Thread Mischa Sandberg

Anybody on this list hear/opine anything pf the GPUSort project for postgresql? 
I'm working on a radix-sort subcase for tuplesort, and there are similarities.


http://www.andrew.cmu.edu/user/ngm/15-823/project/

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Jim C. Nasby

On Tue, Apr 11, 2006 at 03:43:56PM -0700, Joshua D. Drake wrote:
> On Tue, 2006-04-11 at 19:35 -0300, Marc G. Fournier wrote:
> > On Tue, 11 Apr 2006, Joshua D. Drake wrote:
> > 
> > >
> > >> No, but does that mean we should increase the potential by adding in
> > >> something that not everyone that runs PostgreSQL actually uses?
> > >
> > > Using this argument I could say that we don't need primary keys, foreign
> > > keys, views or rules. Especially the latter 3 ;).
> > 
> > *slap forehead* *groan*
> > 
> > then again, if we could pull it out and move it into loadable modules ... 
> > h ... >:)
> 
> Oh goodness. We could declare that we are better then MySQL because our
> referential integrity is optional... oh wait...

Hey, if our RI was optional but we threw an error when you tried to use
it when it was disabled we *would* be better than MySQL...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake

On Tue, 2006-04-11 at 19:35 -0300, Marc G. Fournier wrote:
> On Tue, 11 Apr 2006, Joshua D. Drake wrote:
> 
> >
> >> No, but does that mean we should increase the potential by adding in
> >> something that not everyone that runs PostgreSQL actually uses?
> >
> > Using this argument I could say that we don't need primary keys, foreign
> > keys, views or rules. Especially the latter 3 ;).
> 
> *slap forehead* *groan*
> 
> then again, if we could pull it out and move it into loadable modules ... 
> h ... >:)

Oh goodness. We could declare that we are better then MySQL because our
referential integrity is optional... oh wait...

Joshua D. Drake

> 
> 
> 
> Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
> 
-- 

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Marc G. Fournier


On Tue, 11 Apr 2006, Joshua D. Drake wrote:




No, but does that mean we should increase the potential by adding in
something that not everyone that runs PostgreSQL actually uses?


Using this argument I could say that we don't need primary keys, foreign
keys, views or rules. Especially the latter 3 ;).


*slap forehead* *groan*

then again, if we could pull it out and move it into loadable modules ... 
h ... >:)




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Neil Conway

On Tue, 2006-04-11 at 17:20 -0400, Tom Lane wrote:
> No, I'm saying that having access to a PL renders certain classes of
> attacks significantly more efficient.  A determined attacker with
> unlimited time may not care, but in the real world, security is
> relative.

That's a fair point.

Perhaps a compromise would be to enable pl/pgsql by default, but not
grant the USAGE privilege on it. This would allow superusers to define
pl/pgsql functions without taking any additional steps. Non-superusers
could be given access to pl/pgsql via a simple GRANT -- either for all
users via GRANT TO PUBLIC, or on a more granular basis as desired. This
would lower the barrier to using pl/pgsql by a fairly significant
margin, but not cause any additional security exposure that I can see.

-Neil

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Tom Lane

Martijn van Oosterhout  writes:
> --f2QGlHpHGjS2mn6Y
> On Tue, Apr 11, 2006 at 11:22:53PM +0200, Andreas Tille wrote:
>> I'm lacking experience here so I perfectly trust you that keeping
>> the default case as it is.  The question is, whether adding an
>> option to change the default might make sense.

> Can you give an example of a simple case where PostgreSQL doesn't do
> this already.

I think we might be confusing each other with varying meanings for the
word "case" ;-).  The facts as I see them are:

1. The CASE expression does indeed not evaluate unneeded subexpressions.

2. However, aggregate functions are evaluated in a separate pass before
we start to evaluate the SELECT's output list (or the HAVING clause if
any).  So you cannot use a CASE to suppress evaluation of an aggregate's
finalfunc ... much less its state transition function.

3. There are other situations where a CASE might "not work" to suppress
contained evaluations.  For instance, this example is pretty misleading:

> test=3D# select case when true then 5 else 1/0 end;
>  case=20
> --
> 5
> (1 row)

> test=3D# select case when false then 5 else 1/0 end;
> ERROR:  division by zero

A counterexample is:

regression=# select f1, case when true then 5 else 1/0 end from int4_tbl;
 f1  | case
-+--
   0 |5
  123456 |5
 -123456 |5
  2147483647 |5
 -2147483647 |5
(5 rows)

regression=# select f1, case when f1 <> 42 then 5 else 1/0 end from int4_tbl;
ERROR:  division by zero
regression=#

The reason the latter fails is that constant-folding encounters the 1/0
before we actually start to run the SELECT.  The first three examples
work only because the WHEN clause is a plan-time constant and so the
constant folder never reaches the ELSE clause.

I'm not really inclined to remove the constant folder just to make the
world safe for silly examples like this, so the bottom line is that you
have to be aware of there being multiple passes of evaluation.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Andrew - Supernews

On 2006-04-11, Tom Lane <[EMAIL PROTECTED]> wrote:
> More realistically, though, the theoretical point that you can do
> arbitrary calculations by turning loops into recursive SQL functions is
> mostly just theoretical,

It's not at all theoretical. The very practical problem of trying to write
code that does useful stuff (like generate_series on 7.4 or parsing the
values in pg_trigger.tgargs) without using pl/pgsql is a wonderful
demonstration of just how much you can really do in plain SQL functions
by using appropriate techniques. Sure, it requires some specialised
approaches, but then so does system cracking ...

> and the reason is that you won't be able to
> loop very many times before running out of stack space.  (On my machine
> it looks like you can recurse a trivial SQL function only about 600
> times before hitting the default stack limit.)

600 times is enough for the function to do more computation than could
ever be done in the lifetime of the universe. (Consider: how long would it
take to do the Towers of Hanoi with 600 disks?)

> If you have an exploit
> that involves moderate amounts of calculation within the server --- say,
> brute force password cracking --- the availability of a PL will render
> that exploit actually practical, whereas with only SQL functions to work
> with it won't be.

Tom, when you're engaged in a debate on a topic, it's polite to actually
_read_ what other people are posting.

I've already posted a very straightforward example of code that will happily
loop over 300 million values using a recursion depth of no greater than 7,
and I specifically chose it because it shows how easily large brute-force
searches can be done in plain SQL. The existence of cross joins means that
arbitrarily large loops can be constructed without needing either deep
recursion or large materialized function result sets. In many cases these
methods give you code which is both simpler and faster than the equivalent
in pl/pgsql (why code naive nested loops in pl/pgsql, for example, when the
executor already has that functionality built in?).

Here's your brute-force password crack (try it! should only take an hour or
two) using the simple alpha(n) function example from my other post:

select a||b||c from alpha(3) s1(a), alpha(3) s2(b), alpha(2) s3(c)
 where md5(a||b||c||'andrew') = 'ff113aee991f0a3519c3d4f97414561a'
 limit 1;

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Andrew - Supernews

On 2006-04-11, Tom Lane <[EMAIL PROTECTED]> wrote:
> David Fetter <[EMAIL PROTECTED]> writes:
>> I don't get your not getting this 'cause you're a very smart guy.  Are
>> you under the impression that an attacker will stop because he has to
>> try a few times?
>
> No, I'm saying that having access to a PL renders certain classes of
> attacks significantly more efficient.

Not significantly, and I'll happily back up that assertion with code
examples. (I've already posted an example brute-force search to illustrate
that.)

> A determined attacker with
> unlimited time may not care, but in the real world, security is
> relative.  You don't have to make yourself an impenetrable target,
> only a harder target than the next IP address --- or at least hard
> enough that the attacker's likely to get noticed before he's succeeded.
> (And certainly, doing anything compute-intensive via recursive SQL
> functions is not the way to go unnoticed.)

Doing something compute-intensive with pl/pgsql functions will be just as
noticable.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake


> No, but does that mean we should increase the potential by adding in 
> something that not everyone that runs PostgreSQL actually uses?

Using this argument I could say that we don't need primary keys, foreign
keys, views or rules. Especially the latter 3 ;).

Sincerely,

Joshua D. Drake




> 
> 
> Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
> 
-- 

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Thomas Hallgren


Tom Lane wrote:

In the end it's only one small component of security, but any security
expert will tell you that you take all the layers of security that you
can get.  If you don't need a given bit of functionality, it shouldn't
get installed.

I think any security expert would say that if let non trustworthy people get so far as to 
create their own SQL statements, you're in big trouble. Plpgsql or not. I fail to see what 
the real issue is here. Your argument is analog to saying "don't install bash on a Linux 
system by default. People might do bad things with it".


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] How to implement oracle like rownum(function or seudocolumn)

2006-04-11 Thread Jim C. Nasby

On Sat, Apr 08, 2006 at 03:04:40PM -0400, Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > My humble guess is that c) is also the reason why the ANSI didn't find a 
> > ROWNUM desirable.
> 
> I've never understood what the conceptual model is for Oracle's rownum.
> Where along the SQL operational pipeline (FROM / WHERE / GROUP BY /
> aggregate / compute output columns / ORDER BY) is it supposed to be
> computed?  To be useful for the often-requested purpose of nicely
> labeling output with line numbers, it'd have to be assigned
> post-ORDER-BY, but then it doesn't make any sense at all to use it in
> WHERE, nor in sub-selects.

AFAIK rownum() is Oracle's solution to doing LIMIT ... OFFSET from
before those were ANSI. rownum() is applied as rows are leaving the
relevant node, which means you can't use rownum() in any part of a
SELECT statement other than the SELECT clause (you can't even use it in
a HAVING clause afaik, though I would think you should be able to). So,
if you want to actually do anything useful with rownum(), you have to
use it in a subquery and then operate at a higher level:

SELECT * FROM (SELECT rownum() AS row_number, * FROM table) z ORDER BY
row_number;

> A function implemented as per Michael's example would not give the
> results that I think people would expect for
> 
>   SELECT rownum(), * FROM foo ORDER BY whatever;
> 
> unless the planner chances to do the ordering with an indexscan.
> If it does it with a sort step then the rownums will be computed before
> sorting :-(

I think you're right.

If people are that hot-to-trot about having Oracle compatable rownum()
in PostgreSQL, perhaps EnterpriseDB has some code they could share.
Though I think it'd be better to understand what people actually want
this info for. Personally I think having a rank function (or a complete
suite of analytic functions) would be far more useful.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Tom Lane

Andreas Tille <[EMAIL PROTECTED]> writes:
> I'm lacking experience here so I perfectly trust you that keeping
> the default case as it is.  The question is, whether adding an
> option to change the default might make sense.

I don't think so.  The current API contract for aggregate functions is
that the finalfunc will be called exactly once per aggregate occurrence.
Changing that to say "maybe you'll get called and maybe you won't" does
not strike me as making life easier for aggregate authors, rather the
reverse --- for example, the finalfunc might need to clean up some
working state.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] plpgsql by default

2006-04-11 Thread David Fetter

On Tue, Apr 11, 2006 at 05:20:02PM -0400, Tom Lane wrote:
> David Fetter <[EMAIL PROTECTED]> writes:
> > I don't get your not getting this 'cause you're a very smart guy.
> > Are you under the impression that an attacker will stop because he
> > has to try a few times?
> 
> No, I'm saying that having access to a PL renders certain classes of
> attacks significantly more efficient.  A determined attacker with
> unlimited time may not care, but in the real world, security is
> relative.  You don't have to make yourself an impenetrable target,
> only a harder target than the next IP address --- or at least hard
> enough that the attacker's likely to get noticed before he's
> succeeded.  (And certainly, doing anything compute-intensive via
> recursive SQL functions is not the way to go unnoticed.)
> 
> In the end it's only one small component of security, but any
> security expert will tell you that you take all the layers of
> security that you can get.  If you don't need a given bit of
> functionality, it shouldn't get installed.

As others have mentioned, and I will reiterate here:

1.  Anyone who imagines that PL/PgSQL presents a bigger or more
vulnerable attack surface can remove it via DROP LANGUAGE.

2.  Anybody who wants to do harm inside the database can do it to
arbitrary levels of damage in SQL with RULEs, recursive functions,
set-returning functions, etc.

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Martijn van Oosterhout

On Tue, Apr 11, 2006 at 11:22:53PM +0200, Andreas Tille wrote:
> On Tue, 11 Apr 2006, Tom Lane wrote:
> 
> >We could maybe change things so that the finalfunc isn't run unless the
> >result value is actually demanded in the SELECT list or HAVING clause,
> >but for 99.99% of applications checking that would be a waste of cycles,
> >so I'm disinclined to do it.
> 
> I'm lacking experience here so I perfectly trust you that keeping
> the default case as it is.  The question is, whether adding an
> option to change the default might make sense.

Can you give an example of a simple case where PostgreSQL doesn't do
this already. For the really obvious cases without aggregates, it works
already:

test=# select case when true then 5 else 1/0 end;
 case 
--
5
(1 row)

test=# select case when false then 5 else 1/0 end;
ERROR:  division by zero

What we're saying is that as long as the SQL standard doesn't require
it, we're not going to write large chunks of code to avoid a small
amount of processing that nobody is going to notice anyway. i.e. you
can't *rely* on this behaviour, but improvement is merely an
optimisation, not a feature or a bug fix.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

signature.asc
Description: Digital signature

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Marc G. Fournier


On Tue, 11 Apr 2006, Joshua D. Drake wrote:



Can you guarantee unequivocally that there are absolutely not security 
issues in plpgsql?


Can you guarantee unequivocally that there are absolutely not security issues 
in PostgreSQL?


No, but does that mean we should increase the potential by adding in 
something that not everyone that runs PostgreSQL actually uses?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Joshua D. Drake


> That's only going to be true for very high end systems with multiple raid
> controllers and dozens of spindles.

Not true. I have a system right now that would benifit. My database is
only 500 megs and I have 2 gig of ram and two processors... I only have
a raid 1, but that is o.k. because most things are cached in memory
anyway.

> 
> On the other hand even moderately sized dual-core systems could probably
> benefit from being able to perform multiple cpu-intensive operations
> simultaneously.

See above :)

Joshua D. Drake

> 
-- 

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Mischa Sandberg

Are there are more possibilities for some bug in the plpgsql engine to allow an 
exploit: actually changing the stack through a buffer overflow, or a bug in an 
intrinsic function, or allowing an injection that crosses some privilege 
boundary, via someone else's EXECUTE?


It's a lot easier to verify the few places where straight SQL can interact with 
the outside world (NOTIFY, COPY, and trojan .so's come to mind). It is harder 
for someone to find an unexpected combined-effect exploit, since there's not 
much you can combine.


Perhaps somebody in the core team has reservations about possible points of 
error to certify in plpgsql: is every possible weird array-overflow case 
covered? Further, can some innocuous side-effects in execution (INOUT 
parameters; function ownership; schema settings) combine to create a hole?

There's just that much more to worry about.

As they say, in theory, theory and practice are the same.
In practice, they differ :0)

I can understand someone being cautious about making guarantees (or even risk 
estimates) about plpgsql versus
the core engine. And so, just like not INITIALLY letting the server listen on 
all TCP sockets, it's modest conservatism to let the default be a bit restricted.


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Andreas Tille


On Tue, 11 Apr 2006, Tom Lane wrote:


We could maybe change things so that the finalfunc isn't run unless the
result value is actually demanded in the SELECT list or HAVING clause,
but for 99.99% of applications checking that would be a waste of cycles,
so I'm disinclined to do it.


I'm lacking experience here so I perfectly trust you that keeping
the default case as it is.  The question is, whether adding an
option to change the default might make sense.


As Martijn said, really you want to fix
the finalfunc so that it behaves sanely in corner cases.  An aggregate
that fails on zero rows needs work, period.


Fully ACK.  As I hopefully made clear I just used it as a sign /
proof, that something works differently than I would regard reasonable
(before I understand the problem with the aggregate).

Kind regards

  Andreas.

--
http://fam-tille.de

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Tom Lane

David Fetter <[EMAIL PROTECTED]> writes:
> I don't get your not getting this 'cause you're a very smart guy.  Are
> you under the impression that an attacker will stop because he has to
> try a few times?

No, I'm saying that having access to a PL renders certain classes of
attacks significantly more efficient.  A determined attacker with
unlimited time may not care, but in the real world, security is
relative.  You don't have to make yourself an impenetrable target,
only a harder target than the next IP address --- or at least hard
enough that the attacker's likely to get noticed before he's succeeded.
(And certainly, doing anything compute-intensive via recursive SQL
functions is not the way to go unnoticed.)

In the end it's only one small component of security, but any security
expert will tell you that you take all the layers of security that you
can get.  If you don't need a given bit of functionality, it shouldn't
get installed.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] plpgsql by default

2006-04-11 Thread David Fetter

On Tue, Apr 11, 2006 at 05:01:17PM -0400, Tom Lane wrote:
> David Fetter <[EMAIL PROTECTED]> writes:
> > On Tue, Apr 11, 2006 at 04:35:05PM -0400, Tom Lane wrote:
> >> More realistically, though, the theoretical point that you can do
> >> arbitrary calculations by turning loops into recursive SQL
> >> functions is mostly just theoretical, and the reason is that you
> >> won't be able to loop very many times before running out of stack
> >> space.  (On my machine it looks like you can recurse a trivial
> >> SQL function only about 600 times before hitting the default
> >> stack limit.)  If you have an exploit that involves moderate
> >> amounts of calculation within the server --- say, brute force
> >> password cracking --- the availability of a PL will render that
> >> exploit actually practical, whereas with only SQL functions to
> >> work with it won't be.
> 
> > The function I sent memoizes to a table, which avoids the stack
> > space problem you mentioned.
> 
> In general that's not possible, and even for the specific case, it
> still looks to me like fib(n) will use O(n) recursion levels if the
> table is initially empty.

I don't get your not getting this 'cause you're a very smart guy.  Are
you under the impression that an attacker will stop because he has to
try a few times?

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Tom Lane

David Fetter <[EMAIL PROTECTED]> writes:
> On Tue, Apr 11, 2006 at 04:35:05PM -0400, Tom Lane wrote:
>> More realistically, though, the theoretical point that you can do
>> arbitrary calculations by turning loops into recursive SQL functions
>> is mostly just theoretical, and the reason is that you won't be able
>> to loop very many times before running out of stack space.  (On my
>> machine it looks like you can recurse a trivial SQL function only
>> about 600 times before hitting the default stack limit.)  If you
>> have an exploit that involves moderate amounts of calculation within
>> the server --- say, brute force password cracking --- the
>> availability of a PL will render that exploit actually practical,
>> whereas with only SQL functions to work with it won't be.

> The function I sent memoizes to a table, which avoids the stack space
> problem you mentioned.

In general that's not possible, and even for the specific case, it still
looks to me like fib(n) will use O(n) recursion levels if the table is
initially empty.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Jim C. Nasby

On Mon, Apr 10, 2006 at 01:36:45PM -0400, Alvaro Herrera wrote:
> Markus Schiltknecht wrote:
> > On Mon, 2006-04-10 at 17:22 +0800, Qingqing Zhou wrote:
> > > Check the code InitPostgres(). These global varaibles are scattered in 
> > > many
> > > places, so I am not sure if it is easy to write clean code to clear up 
> > > these
> > > variables. But if you can come up with a patch to do reconnect without
> > > disconnect, that will be cool.
> > 
> > Yes, that's where I've been playing around already. Not with much
> > success until now, though. :-(
> 
> An idea arising in chat with Joshua Drake:  the retargetting code, if it
> turns out to work and not be excessively expensive, could also be useful
> to implement a server-side "connection pooling" of sorts: the postmaster
> could keep idle backends and retarget them to a database that receives
> an incoming connection.  However, we'd also need a mechanism to clean
> all backend state previous to reusing a connection, to leave it "as
> new" (no prepared statements, WITH HOLD cursors, etc.)

Oracle allows you to essentially re-connect to an existing connection by
saving connection state when a connection goes back to the connection
pool. Essentially, you connect, and then re-authenticate as a different
user. That user has a specific environment associated with it which is
then pulled in. This makes it reasonable to use Oracle's built-in code
for handling permissions, etc; you just give each system user a database
account. While this sounds scary to folk that haven't used it, it's
actually safer than rolling your own authentication and security
mechanism (which will likely have bugs in it) and having your middleware
connect to the database with some dedicated account.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Jim C. Nasby

On Mon, Apr 10, 2006 at 05:22:13PM +0800, Qingqing Zhou wrote:
> 
> "Markus Schiltknecht" <[EMAIL PROTECTED]> wrote
> > Hi Qingqing,
> >
> > >
> > > As Tom pointed out, without big change, a backend on database "D1" can't
> > > connect to "D2". This is because to connect to a database, we need to
> > > initialize a lot of variables. So when you reconnect to another one on
> the
> > > fly, you have to change these variables one by one.
> >
> > Sure, the question is: what is needed to retarget a backend?
> >
> Check the code InitPostgres(). These global varaibles are scattered in many
> places, so I am not sure if it is easy to write clean code to clear up these
> variables. But if you can come up with a patch to do reconnect without
> disconnect, that will be cool.

Something else to consider: most queries that would benefit from
parallel execution are expensive enough that the cost of spawning some
new backends wouldn't be that big a deal, so perhaps for an initial
version it would be best to KISS and just spawn parallel execution
backends as needed.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] plpgsql by default

2006-04-11 Thread David Fetter

On Tue, Apr 11, 2006 at 04:35:05PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > Rather than debate how turing complete SQL is, look at the real
> > issue: is a compromised system with plPGSQL installed more
> > dangerous than a compromised system without plPGSQL. As far as I
> > can see, it's not.
> 
> You're disregarding the possibility that plpgsql itself is the
> source of a security hole ...

So might SQL.

> More realistically, though, the theoretical point that you can do
> arbitrary calculations by turning loops into recursive SQL functions
> is mostly just theoretical, and the reason is that you won't be able
> to loop very many times before running out of stack space.  (On my
> machine it looks like you can recurse a trivial SQL function only
> about 600 times before hitting the default stack limit.)  If you
> have an exploit that involves moderate amounts of calculation within
> the server --- say, brute force password cracking --- the
> availability of a PL will render that exploit actually practical,
> whereas with only SQL functions to work with it won't be.

The function I sent memoizes to a table, which avoids the stack space
problem you mentioned.

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

OS cached buffers (was: [HACKERS] Support Parallel Query Execution in Executor)

2006-04-11 Thread Jim C. Nasby

On Mon, Apr 10, 2006 at 12:02:56PM -0700, Luke Lonergan wrote:
> Hannu,
> 
> On 4/10/06 2:23 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote:
> 
> >> The cost of fetching a page from the OS is not really much of an
> >> overhead,
> > 
> > Have you tested this ?
> 
> I have - the overhead of fetching a page from Linux I/O cache to buffer
> cache is about an additional 20% over fetching it directly from buffer cache
> on PG 7.4.

Is there any pratcical way to tell the difference between a page comming
from the OS cache and one comming from disk? Or maybe for a set of pages
an estimate on how many came from cache vs disk? There's some areas
where having this information would be very useful, such as for vacuum
delay. It would make tuning much easier, and it would also give us some
insight on how heavily loaded disks were, which would also be useful
info for vacuum to have (so we could adjust vacuum_cost_delay
dynamically based on load).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> Rather than debate how turing complete SQL is, look at the real issue:
> is a compromised system with plPGSQL installed more dangerous than a
> compromised system without plPGSQL. As far as I can see, it's not.

You're disregarding the possibility that plpgsql itself is the source of
a security hole ...

More realistically, though, the theoretical point that you can do
arbitrary calculations by turning loops into recursive SQL functions is
mostly just theoretical, and the reason is that you won't be able to
loop very many times before running out of stack space.  (On my machine
it looks like you can recurse a trivial SQL function only about 600
times before hitting the default stack limit.)  If you have an exploit
that involves moderate amounts of calculation within the server --- say,
brute force password cracking --- the availability of a PL will render
that exploit actually practical, whereas with only SQL functions to work
with it won't be.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] schema-qualified SET CONSTRAINTS

2006-04-11 Thread Kris Jurka




On Mon, 10 Apr 2006, Tom Lane wrote:


Kris Jurka <[EMAIL PROTECTED]> writes:

The attached patch allows SET CONSTRAINTS to take a schema qualified
constraint name (myschema.t1_fk_t2) and when given a bare constraint name
it uses the search_path to determine the matching constraint instead of
the previous behavior of disabling all identically named constraints.


This patch seems egregiously non backwards compatible :-(.


Yes, it does change the existing behavior, but "egregiously"?  How many 
applications intentionally defer constraints in multiple schemas at once? 
Not many.  I would guess the more likely situation is that these 
applications don't even realize that they are deferring more than one 
constraint when it happens.  So there will be some very minor pain when 
they must select the desired constraint (if it doesn't happen already by 
search_path) or explicitly defer more than one constraint, but I'm OK 
with that.  The existing behavior of SET CONSTRAINTS affecting everything 
is not what a user would expect when we have tools like search_path 
available.


Kris Jurka

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Jim C. Nasby

On Mon, Apr 10, 2006 at 11:02:50PM -0700, David Fetter wrote:
> On Tue, Apr 11, 2006 at 12:47:03AM -0400, Tom Lane wrote:
> > "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > > What does enabling plpgsql do via access that you can't just do from an 
> > > SQL query?
> > 
> > SQL isn't Turing-complete
> 
> With all due respect, SQL *is* Turing-complete.  Here's a little demo
> of this Turing-completeness:

Rather than debate how turing complete SQL is, look at the real issue:
is a compromised system with plPGSQL installed more dangerous than a
compromised system without plPGSQL. As far as I can see, it's not.
SQL makes it just as easy to DoS the machine (just select a large
cartesian product). plPGSQL doesn't provide any inherent ability to
damage data outside the database, and it doesn't make trashing the
database any easier than it is with plain SQL. About the only thing I
can think of that plPGSQL lets you do that SQL doesn't is to raise
arbitrary errors, but that hardly seems like much of an increased risk.

There is some limited truth to the argument that plPGSQL potentially
opens more potential for a machine to be compromised, but much less so
than allowing connections from any IP does for example. I haven't seen
any real reason not to include plPGSQL by default, especially since
removing whatever slight risk exists is a simple DROP LANGUAGE away.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Get explain output of postgresql in Tables

2006-04-11 Thread Jim C. Nasby

On Mon, Apr 10, 2006 at 10:44:15AM +0100, Richard Huxton wrote:
> Bruce Momjian wrote:
> >
> > * Allow EXPLAIN output to be more easily processed by scripts
> 
> Can I request an extension/additional point?
>  * Design EXPLAIN output to survive cut & paste on mailing-lists
> 
> Being able to paste into a web-form and get something readable formatted 
> back would be very useful on the lists. Sometimes it takes me longer to 
> reformat the explain than it does to understand the problem.

Actually, I've been wondering about better ways to handle this. One
thought is to come up with a non-human readable format that could easily
be cut and pasted into a website that would then provide something easy
to understand. Ideally that website could also produce graphical output
like pgAdmin does, since that makes it trivially easy to see what the
'critical path' is.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] semaphore usage "port based"?

2006-04-11 Thread Bruce Momjian

Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> > I updated the wording to say 'non-root users':
> > 
> > If running in FreeBSD jails by enabling sysconf's
> > security.jail.sysvipc_allowed, 
> > postmasters
> > running in different jails should be run by different operating 
> > system
> > users.  This improves security because it prevents non-root users
> > from interfering with shared memory or semaphores in a different 
> > jail,
> > and it allows the PostgreSQL IPC cleanup code to function properly.
> > (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly 
> > detect
> > processes in other jails, preventing the running of postmasters on 
> > the
> > same port in different jails.)
> 
> You're still saying it'll do something that it won't...  It doesn't
> prevent non-root users from messing with each other if they're the same
> UID, even if they're under different jails...  That's the whole problem
> here. :)

Uh, the first part says use different Unix users for different jails,
then it says why to do that (security).  Seems clear to me.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] semaphore usage "port based"?

2006-04-11 Thread Stephen Frost

* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> I updated the wording to say 'non-root users':
> 
> If running in FreeBSD jails by enabling sysconf's
> security.jail.sysvipc_allowed, postmasters
> running in different jails should be run by different operating system
> users.  This improves security because it prevents non-root users
> from interfering with shared memory or semaphores in a different jail,
> and it allows the PostgreSQL IPC cleanup code to function properly.
> (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
> processes in other jails, preventing the running of postmasters on the
> same port in different jails.)

You're still saying it'll do something that it won't...  It doesn't
prevent non-root users from messing with each other if they're the same
UID, even if they're under different jails...  That's the whole problem
here. :)

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] semaphore usage "port based"?

2006-04-11 Thread Bruce Momjian

Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> >  
> > + If running in FreeBSD jails by enabling sysconf's
> > + security.jail.sysvipc_allowed, 
> > postmasters
> > + running in different jails should be run by different operating 
> > system
> > + users.  This improves security because it prevents one jail from
> > + interfering with shared memory or semaphores in another, and it
> > + allows the PostgreSQL IPC cleanup code to function properly.  
> > + (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly 
> > detect
> > + processes in other jails, preventing the running of postmasters 
> > on the
> > + same port in different jails.)
> > +
> 
> This looks good, my only comment would be that we don't want people to
> believe that using different users somehow makes the sysv spaces
> seperate between the jails.  It doesn't.  Even when using different
> uids, a user who gets root in one jail would be able to mess with the
> Postgres instance in the other jail through IPC.
> 
> Perhaps change: 
> 
> "This improves security because it prevents one jail from
> interfering with shared memory or semaphores in another"
> 
> to:
> 
> "This improves security because it prevents the postgres user in one
> jail from interfering with shared memory or semaphores owned by a
> different user in another jail (with BSD jails, root, or the same 
> UID, in any jail can see and interfere with the shared memory and 
> semaphores in any other jail of the same UID, or all if root)"
> 
> That's still not great but I think it's a little better...

I updated the wording to say 'non-root users':

If running in FreeBSD jails by enabling sysconf's
security.jail.sysvipc_allowed, postmasters
running in different jails should be run by different operating system
users.  This improves security because it prevents non-root users
from interfering with shared memory or semaphores in a different jail,
and it allows the PostgreSQL IPC cleanup code to function properly.
(In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
processes in other jails, preventing the running of postmasters on the
same port in different jails.)

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] semaphore usage "port based"?

2006-04-11 Thread Stephen Frost

* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
>  
> + If running in FreeBSD jails by enabling sysconf's
> + security.jail.sysvipc_allowed, 
> postmasters
> + running in different jails should be run by different operating 
> system
> + users.  This improves security because it prevents one jail from
> + interfering with shared memory or semaphores in another, and it
> + allows the PostgreSQL IPC cleanup code to function properly.  
> + (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly 
> detect
> + processes in other jails, preventing the running of postmasters on 
> the
> + same port in different jails.)
> +

This looks good, my only comment would be that we don't want people to
believe that using different users somehow makes the sysv spaces
seperate between the jails.  It doesn't.  Even when using different
uids, a user who gets root in one jail would be able to mess with the
Postgres instance in the other jail through IPC.

Perhaps change: 

"This improves security because it prevents one jail from
interfering with shared memory or semaphores in another"

to:

"This improves security because it prevents the postgres user in one
jail from interfering with shared memory or semaphores owned by a
different user in another jail (with BSD jails, root, or the same 
UID, in any jail can see and interfere with the shared memory and 
semaphores in any other jail of the same UID, or all if root)"

That's still not great but I think it's a little better...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] semaphore usage "port based"?

2006-04-11 Thread Bruce Momjian


[ FreeBSD email list removed.]

I totally agree, and have added the attached documentation patch to
recommend using different users in FreeBSD jails.

---

Stephen Frost wrote:
-- Start of PGP signed section.
> * Marc G. Fournier ([EMAIL PROTECTED]) wrote:
> > On Mon, 3 Apr 2006, Stephen Frost wrote:
> > >Running the Postgres instances under different uids (as you'd probably
> > >expect to do anyway if not using the jails) is probably the right
> > >approach.  Doing that and using jails would probably work, just don't
> > >delude yourself into thinking that you're safe from a malicious user in
> > >one jail.
> > 
> > We don't ... we put all our databases on a central database server, even 
> > private ones, that nobody has shell access to ... we keep them isolated 
> > ...
> 
> I guess what I was trying to get at is this:
> 
> Running 2 Postgres instances under FreeBSD with (or without really, but
> I guess that's more obvious) jails but with the same UID is a bad idea. 
> Even if Postgres could be modified to allow this to work you're going to
> be in a position where the jail isn't really helping much except to give
> a somewhat false (in this case) sense of security.  We probably
> shouldn't encourage it and in fact it's something of a nice feature that
> it breaks.
> 
> The reasoning is pretty simple: if someone manages to get control of 
> one of the Postgres instances they're going to be able to wreck havoc on
> the other.  With different UIDs, with or without jails, this would be
> much more difficult (need to get root first).
> 
> Running 2 Postgres instances under FreeBSD with jails *and* different
> UIDs is *probably* better than w/o jails but since you have to enable
> the single-instance IPC system it might not be that great of a benefit
> over a simple chroot or similar.
> 
> Hope that helps...
> 
>   Thanks,
> 
>   Stephen
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/runtime.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.366
diff -c -c -r1.366 runtime.sgml
*** doc/src/sgml/runtime.sgml   3 Apr 2006 23:35:02 -   1.366
--- doc/src/sgml/runtime.sgml   11 Apr 2006 19:23:27 -
***
*** 764,769 
--- 764,781 
 
  
 
+ If running in FreeBSD jails by enabling sysconf's
+ security.jail.sysvipc_allowed, postmasters
+ running in different jails should be run by different operating system
+ users.  This improves security because it prevents one jail from
+ interfering with shared memory or semaphores in another, and it
+ allows the PostgreSQL IPC cleanup code to function properly.  
+ (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
+ processes in other jails, preventing the running of postmasters on the
+ same port in different jails.)
+
+ 
+
  FreeBSD versions before 4.0 work like 
  NetBSD and 
  OpenBSD (see below).

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] pg_contrib default schema

2006-04-11 Thread Thomas Sondag

Hi pg Hackers,

First I using PostgreSQL for four year now, and I would like to thank you all for your work :)

I'm currently doing a study about the Open Source database.

So I'm testing some of the contrib modules like dblink, tsearch, admin81 and I was using postgis a lot in a previous life.
All that module add internal fonctions to the database, some in
pg_catalog (and I personaly don't like to mix Internal fonction and
contrib) others in pg_default.

Do you think it's possible to create a default schema called
pg_contrib, and write a "contrib modules coding rules document" to put
all functions / types ...  in pg_contrib (to tidy a little all
that things) ?



 Thomas

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

> Well I am go out on a limb here and gather to guess that sequential scans and
> index scans are still very relevant because the CPU could be bound by the scan
> (either one) based on the type of query being performed.
> 
> This doesn't really have anything to do with being IO bound as to the type of
> accesses to the data we have to deal with in regards to query processing.

It has everything to do with being i/o bound. The only way having two
processors perform part of an index or sequential scan would help is if your
disk subsystem is capable of providing data faster than a single processor is
capable of requesting it.

That's only going to be true for very high end systems with multiple raid
controllers and dozens of spindles.

On the other hand even moderately sized dual-core systems could probably
benefit from being able to perform multiple cpu-intensive operations
simultaneously.

-- 
greg

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] adding fields to pg_database

2006-04-11 Thread Jonah H. Harris

On 4/11/06, Tom Lane <[EMAIL PROTECTED]> wrote:
> Good ways to answer this sort of question are:
>
> 1. Grep for references to some of the existing fields in the same catalog.
>
> 2. Look at the CVS diff for previous commits that added fields to the
> same catalog.

True, true.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] adding fields to pg_database

2006-04-11 Thread Tom Lane

"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> On 4/11/06, Markus Schiltknecht <[EMAIL PROTECTED]> wrote:
>> I've only added the fields in include/catalog/pg_database.h. Do I need
>> to fiddle other places?

> Make sure you updated Natts_pg_database, the bootstrap DATA line, and
> the stuff in src/backend/commands/dbcommands.c.

Good ways to answer this sort of question are:

1. Grep for references to some of the existing fields in the same catalog.

2. Look at the CVS diff for previous commits that added fields to the
same catalog.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] adding fields to pg_database

2006-04-11 Thread Jonah H. Harris

On 4/11/06, Markus Schiltknecht <[EMAIL PROTECTED]> wrote:
> dbcommands.c was the missing peace, thank you!

No problemo :)

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] adding fields to pg_database

2006-04-11 Thread Markus Schiltknecht

On Tue, 2006-04-11 at 14:07 -0400, Jonah H. Harris wrote:
> Make sure you updated Natts_pg_database, the bootstrap DATA line, and
> the stuff in src/backend/commands/dbcommands.c.

dbcommands.c was the missing peace, thank you!

Markus



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Joshua D. Drake


Greg Stark wrote:

Simon Riggs <[EMAIL PROTECTED]> writes:


I think it would be useful to think about exactly what type of
query/activity we are looking to improve the performance on. That way we
can understand the benefit of this proposal and take some baseline
measurements to analyse what is happening for those cases.


I find the focus on sequential scans, index scans, etc. quite odd when you're
discussing parallel query processing. The whole goal of parallel query
processing is to bring more *cpu* to bear on the problem. That's going to be
most relevant when you're cpu bound, not i/o bound.

The queries I would expect to be helped most by parallel query processing are
queries that involve sorting. For example, a big merge join with two sorts on
either side could perform the two sorts simultaneously. If they provide the
results of the final pass to a third thread it can execute the merge join and
the rest of the query plan while the sorts are still executing on two other
processors.


Well I am go out on a limb here and gather to guess that sequential 
scans and index scans are still very relevant because the CPU could be 
bound by the scan (either one) based on the type of query being performed.


This doesn't really have anything to do with being IO bound as to the 
type of accesses to the data we have to deal with in regards to query 
processing.


You are correct about parallel query processing helping mutliple sort 
queries but those sorts may or may not hit and index.


Sincerely,

Joshua D. Drake








--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] adding fields to pg_database

2006-04-11 Thread Jonah H. Harris

On 4/11/06, Markus Schiltknecht <[EMAIL PROTECTED]> wrote:
> I've only added the fields in include/catalog/pg_database.h. Do I need
> to fiddle other places?

Make sure you updated Natts_pg_database, the bootstrap DATA line, and
the stuff in src/backend/commands/dbcommands.c.

Other than that I don't know what's totally related to pg_database
itself as I haven't changed it in a long time.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Alvaro Herrera

Greg Stark wrote:

> Even on Solaris I'm sure parsing and preparing plans for all the queries,
> building up the system table cache for all the objects in the database, and so
> on are much much more expensive than fork(). I wouldn't be surprised if even
> on windows it was still a pretty close race.

Parsing/planning what queries?  Regarding system caches, they are
populated from a cache file; they are not read from the catalogs each
time.

But while we don't see a patch implementing the idea, this is all very
theoretical and probably wrong.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark


Alvaro Herrera <[EMAIL PROTECTED]> writes:

> Greg Stark wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > 
> > > An idea arising in chat with Joshua Drake:  the retargetting code, if it
> > > turns out to work and not be excessively expensive, could also be useful
> > > to implement a server-side "connection pooling" of sorts: the postmaster
> > > could keep idle backends and retarget them to a database that receives
> > > an incoming connection.  However, we'd also need a mechanism to clean
> > > all backend state previous to reusing a connection, to leave it "as
> > > new" (no prepared statements, WITH HOLD cursors, etc.)
> > 
> > Isn't all that work pretty much exactly the main cost of starting a new
> > backend?
> 
> On Linux and other systems were fork() has negligible cost, maybe; but
> on Windows and Solaris, it's certainly not.

Even on Solaris I'm sure parsing and preparing plans for all the queries,
building up the system table cache for all the objects in the database, and so
on are much much more expensive than fork(). I wouldn't be surprised if even
on windows it was still a pretty close race.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] adding fields to pg_database

2006-04-11 Thread Markus Schiltknecht

Hi,

I'm trying to add fields to pg_database in the system catalog. As long
as I add fixed-size fields before the VAR LENGTH ones, that's all fine.

But adding a VAR LENGTH (text) field initdb fails at: copying template1
to template0. The child process 'postgres' fails with signal 11. Strange
enough I suspect the segfault to occure at program termination (?). That
is just after the following initdb commands have been executed:


...

"REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n",
"REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n",

/*
 * Finally vacuum to clean up dead rows in pg_database
 */
"VACUUM FULL pg_database;\n",


I've only added the fields in include/catalog/pg_database.h. Do I need
to fiddle other places?

Regards

Markus



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Myron Scott

On Tue, 2006-04-11 at 07:47, Myron Scott wrote:
> client 
> or additional processing.  Am I missing something in this analysis?
> 
> I've attached my dtrace script.
> 

To answer my own question, I suppose my processors are
relatively slow compared to most setups.

Myron Scott


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark

Simon Riggs <[EMAIL PROTECTED]> writes:

> I think it would be useful to think about exactly what type of
> query/activity we are looking to improve the performance on. That way we
> can understand the benefit of this proposal and take some baseline
> measurements to analyse what is happening for those cases.

I find the focus on sequential scans, index scans, etc. quite odd when you're
discussing parallel query processing. The whole goal of parallel query
processing is to bring more *cpu* to bear on the problem. That's going to be
most relevant when you're cpu bound, not i/o bound.

The queries I would expect to be helped most by parallel query processing are
queries that involve sorting. For example, a big merge join with two sorts on
either side could perform the two sorts simultaneously. If they provide the
results of the final pass to a third thread it can execute the merge join and
the rest of the query plan while the sorts are still executing on two other
processors.

-- 
greg

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Alvaro Herrera

Greg Stark wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> 
> > An idea arising in chat with Joshua Drake:  the retargetting code, if it
> > turns out to work and not be excessively expensive, could also be useful
> > to implement a server-side "connection pooling" of sorts: the postmaster
> > could keep idle backends and retarget them to a database that receives
> > an incoming connection.  However, we'd also need a mechanism to clean
> > all backend state previous to reusing a connection, to leave it "as
> > new" (no prepared statements, WITH HOLD cursors, etc.)
> 
> Isn't all that work pretty much exactly the main cost of starting a new
> backend?

On Linux and other systems were fork() has negligible cost, maybe; but
on Windows and Solaris, it's certainly not.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] RH9 postgresql 8.0.7 rpm

2006-04-11 Thread Joshua D. Drake


Gaetano Mendola wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,
I'm trying to build the rpms for RH9,
I downloaded the srpm for RH9 but I'm stuck on these errors:


RH9 is not a supported platform by RedHat or PGDG.



Attempt a:

# rpmbuild --rebuild postgresql-8.0.7-1PGDG.src.rpm
Installing postgresql-8.0.7-1PGDG.src.rpm
error: Failed build dependencies:
tcl-devel is needed by postgresql-8.0.7-1PGDG

why tcl-devel on rh9 version? tcl-devel doesn't exist on rh9 )


Because you are using a very old version of RH.

I strongly suggest you update your version of RH to FC5 or better yet
Ubuntu.

Also this really is better served on pgsql-general not hackers.

Sincerely,

Joshua D. Drake


--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Greg Stark

Alvaro Herrera <[EMAIL PROTECTED]> writes:

> An idea arising in chat with Joshua Drake:  the retargetting code, if it
> turns out to work and not be excessively expensive, could also be useful
> to implement a server-side "connection pooling" of sorts: the postmaster
> could keep idle backends and retarget them to a database that receives
> an incoming connection.  However, we'd also need a mechanism to clean
> all backend state previous to reusing a connection, to leave it "as
> new" (no prepared statements, WITH HOLD cursors, etc.)

Isn't all that work pretty much exactly the main cost of starting a new
backend?


-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake


Richard Huxton wrote:

Andrew - Supernews wrote:

On 2006-04-11, Tom Lane <[EMAIL PROTECTED]> wrote:

I don't feel a need to offer specific examples as requested by Andrew.


Why not? You're basing your entire argument on a false premise (that
pl/pgsql is more powerful than SQL); I can provide specific examples of
why this is not the case, or refute any that you care to provide. 


You can write trigger functions in plpgsql.


That doesn't make it more powerful, just that it has another feature.
Keep in mind that all internal functions that PostgreSQL includes are 
called from SQL.


Joshua D. Drake






--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake


Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
What does enabling plpgsql do via access that you can't just do from an 
SQL query?


SQL isn't Turing-complete --- plpgsql is.  So if our would-be hacker has
a need to do some computation incidental to his hack, he can certainly
get it done in plpgsql, but not necessarily in plain SQL.


O.k. sure... but if the hackers wants to do something really bad it is 
easy to do so in SQL... TRUNCATE, DELETE FROM, VACUUM FULL, DROP... ,

SELECT generate_series()

Sincerely,

Joshua D. Drake


--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Joshua D. Drake



Can you guarantee unequivocally that there are absolutely not security 
issues in plpgsql?


Can you guarantee unequivocally that there are absolutely not security 
issues in PostgreSQL?




I believe Tom's point is that it is not possible to do so, and, since 
plpgsql isn't something that all applications need/use, it isn't 
something that needs to be 'loaded by default' ... its like loading 
mod_perl in apache for an application that only uses PHP ... you can do 
it, but why bother?


Well many distributions do but no it is not the same. plPGSQL is the 
default procedural language for PostgreSQL. It is not a contrib module,
and it is built by default. So why not install it by default to make it 
just one step easier for our community?


Sincerely,

Joshua D. Drake



If Tom could cite any security issues with plpgsql, he would have 
probably fixed it by now ... but I don't believe he'd go out on a limb 
and state that there weren't any either ...




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664




--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] RH9 postgresql 8.0.7 rpm

2006-04-11 Thread Gaetano Mendola

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,
I'm trying to build the rpms for RH9,
I downloaded the srpm for RH9 but I'm stuck on these errors:

Attempt a:

# rpmbuild --rebuild postgresql-8.0.7-1PGDG.src.rpm
Installing postgresql-8.0.7-1PGDG.src.rpm
error: Failed build dependencies:
tcl-devel is needed by postgresql-8.0.7-1PGDG

why tcl-devel on rh9 version? tcl-devel doesn't exist on rh9 )

Attempt b:
# rpmbuild --nodeps --rebuild postgresql-8.0.7-1PGDG.src.rpm

checking krb5.h presence... no
checking for krb5.h... no
configure: error: header file  is required for Kerberos 5
error: Bad exit status from /var/tmp/rpm-tmp.73067 (%build)


ok no kerberos now:

Attempt c:
# rpmbuild --nodeps  --rebuild --define 'kerberos 0' 
postgresql-8.0.7-1PGDG.src.rpm
.
checking for zlib.h... yes
checking openssl/ssl.h usability... no
checking openssl/ssl.h presence... no
checking for openssl/ssl.h... no
configure: error: header file  is required for OpenSSL
error: Bad exit status from /var/tmp/rpm-tmp.3109 (%build)

actually I have that file:

# locate openssl/ssl.h
/usr/include/openssl/ssl.h


Can someone help me in this ?


Regards
Gaetano Mendola
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEO9nu7UpzwH2SGd4RAi/nAJ9WoyVBUR1aSp0+TCPkNEnXhvSbzwCgmEYf
2xQem+7IA7cAF7HxclNv6Ts=
=Lj75
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Andreas Tille


On Tue, 11 Apr 2006, Martijn van Oosterhout wrote:


Because there were no non-null rows, the system passed a NULL to the
final func. Seems you have two ways of dealing with this. Mark the
finalfunc as STRICT so the system won't call it with NULL. Or give the
agrregate an INITCOND which is an empty array. This would also avoid
the NULL.


Ah.  Thanks, this might help for the original problem.


The problem in your example is that you're using aggrgates in the case
statement. Which means that as each row is processed, the aggregates
need to be calculated. It can't shortcut because if it first calculated
the max() and then the median() it would have to evaluate the entire
query twice.


A this sounds be reasonable.  So my assumption might have been wrong.


In the general case, PostgreSQL *may* avoid calculating redundant
clauses if it doesn't need to, but you can't rely on it.


Just theoretically spoken: Woouldn't it make sense to enforce to
avoid this calculation.


Fixing your underlying issue with the aggregate should solve everything
for you.


Sure.  I hope that I was able to trigger some ideas about optimisation
anyway.

Thanks for the quick help

 Andreas.

--
http://fam-tille.de

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Tom Lane

Martijn van Oosterhout  writes:
> The problem in your example is that you're using aggrgates in the case
> statement.

Yeah.  The aggregate results are all computed before we start to evaluate
the SELECT output list --- the fact that the aggregate is referenced
within a CASE doesn't save you if the aggregate's finalfunc fails.

We could maybe change things so that the finalfunc isn't run unless the
result value is actually demanded in the SELECT list or HAVING clause,
but for 99.99% of applications checking that would be a waste of cycles,
so I'm disinclined to do it.  As Martijn said, really you want to fix
the finalfunc so that it behaves sanely in corner cases.  An aggregate
that fails on zero rows needs work, period.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Martijn van Oosterhout

On Tue, Apr 11, 2006 at 04:43:33PM +0200, Andreas Tille wrote:
> Hi,
> 
> when dealing with a bug in the postgresql-plr interface I think
> I found a suboptimal method to process CASE statements.  First
> to the problem:

> SELECT median(nonnull), median(mightbenull) from plrtest where flag = 0;
> ERROR:  R interpreter expression evaluation error
> DETAIL:  Error in median(arg1) : need numeric data
> CONTEXT:  In PL/R function r_median

Because there were no non-null rows, the system passed a NULL to the
final func. Seems you have two ways of dealing with this. Mark the
finalfunc as STRICT so the system won't call it with NULL. Or give the
agrregate an INITCOND which is an empty array. This would also avoid
the NULL.

> I would expect NULL as result of the last query.
> 
> So I thought I will verify in a CASE statement whether there
> are only NULL values in the column by max(mightbenull) like this:

> The problem I want to discuss here is the following:  Usually in
> programming languages only one branch of the IF-THEN-ELSE statement
> will be calculated.  But here *both* branches are calculated
> (obviousely because of the error that occures).  If we just forget

Usually in programming languages, but not in SQL.

> that my goal was to circumvent the error by some hack, I think
> if there is some kind of complex query in the ELSE branche that
> calculation would just cost extra processing time with no need.
> I would regard this as a bug.

The problem in your example is that you're using aggrgates in the case
statement. Which means that as each row is processed, the aggregates
need to be calculated. It can't shortcut because if it first calculated
the max() and then the median() it would have to evaluate the entire
query twice.

In the general case, PostgreSQL *may* avoid calculating redundant
clauses if it doesn't need to, but you can't rely on it.

Fixing your underlying issue with the aggregate should solve everything
for you.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

signature.asc
Description: Digital signature

[HACKERS] Suboptimal evaluation of CASE expressions

2006-04-11 Thread Andreas Tille


Hi,

when dealing with a bug in the postgresql-plr interface I think
I found a suboptimal method to process CASE statements.  First
to the problem:

I'm using the Debian packaged version of PLR version 0.6.2-2 (Debian
testing) and found a problem calculating median from a set of
values that contain only NULL values.  The problem becomes clear
if you look at the following example:

-
$ psql -t test

create table plrtest( nonnull numeric not null, mightbenull numeric, flag int);
insert into plrtest values(42.0, 42.0, 1);
insert into plrtest values(17.0, 17.0, 1);
insert into plrtest values(23.0, NULL, 0 );
insert into plrtest values(4711.0, 4711.0, 1);
insert into plrtest values(174.0, NULL, 0);

CREATE OR REPLACE FUNCTION plr_call_handler()
RETURNS LANGUAGE_HANDLER
AS '\$libdir/plr' LANGUAGE C;

CREATE LANGUAGE plr HANDLER plr_call_handler;

create or replace function r_median(_numeric) returns numeric as 'median(arg1)' 
language 'plr';


CREATE OR REPLACE FUNCTION r_median(_numeric) returns numeric as '
  median(arg1)
' language 'plr';

CREATE OR REPLACE FUNCTION plr_array_accum (_numeric, numeric)
RETURNS numeric[]
AS '\$libdir/plr','plr_array_accum'
LANGUAGE 'C';


CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = numeric,
  stype = _numeric,
  finalfunc = r_median
);

SELECT median(nonnull) from plrtest;
 42

SELECT median(mightbenull) from plrtest;
 42

SELECT median(nonnull), median(mightbenull) from plrtest where flag = 0;
ERROR:  R interpreter expression evaluation error
DETAIL:  Error in median(arg1) : need numeric data
CONTEXT:  In PL/R function r_median

-

I would expect NULL as result of the last query.

So I thought I will verify in a CASE statement whether there
are only NULL values in the column by max(mightbenull) like this:

# SELECT CASE WHEN max(mightbenull) IS NULL THEN 0 ELSE median(mightbenull) END 
from plrtest where flag = 0;
ERROR:  R interpreter expression evaluation error
DETAIL:  Error in median(arg1) : need numeric data
CONTEXT:  In PL/R function r_median

The problem I want to discuss here is the following:  Usually in
programming languages only one branch of the IF-THEN-ELSE statement
will be calculated.  But here *both* branches are calculated
(obviousely because of the error that occures).  If we just forget
that my goal was to circumvent the error by some hack, I think
if there is some kind of complex query in the ELSE branche that
calculation would just cost extra processing time with no need.
I would regard this as a bug.

Kind regards

  Andreas.

PS: Please CC me. I'm not subscribed.
--
http://fam-tille.de

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-11 Thread Myron Scott

On Mon, 2006-04-10 at 02:16, Martijn van Oosterhout wrote:

> The appears to be two seperate cases here though, one is to just farm
> out the read request to another process (basically aio), the other is
> to do actual processing there. The latter is obviously for more useful
> but requires a fair bit more infrastructure.
> 

I ran some tests to see where time is spent during SeqScans.  I did the
following.

tester=# vacuum analyze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 727960 nonremovable row
versions in 5353 pagesDETAIL:  0 dead row versions cannot be
removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.18s/0.27u sec elapsed 0.91 sec.
INFO:  analyzing "public.test"
INFO:  "test": scanned 3000 of 5353 pages, containing 407952
live rows and 0 dead rows; 3000 rows in sample, 727922 estimated
total rows
VACUUM
tester=# select version();
version

---
 PostgreSQL 8.2devel on sparc-sun-solaris2.11, compiled by GCC
gcc (GCC) 3.3.2
(1 row)

tester=# select count(random) from test;
 count

 727960
(1 row)

With the follow ing dtrace results...

# ./probediff2.d 514607
dtrace: script './probediff2.d' matched 10 probes
CPU IDFUNCTION:NAME
  0  46811ExecEndSeqScan:return scan time 20406
^C

  smgrread 
641566800
  Virtualized - smgrread   
439798800
  smgread - Call Count  
5353
  HeapTupleSatisfiesSnapshot  
6735471000
  Virtualized - HeapTupleSatisfiesSnapshot
3516556800
  HeapTupleSatisfiesSnapshot - Call Count 
727960
  Virtualized - ReadBuffer 
558230600
  ReadBuffer   
864931000
  Virtualized - ExecutePlan   
7331181400
  Virtualized - ExecSeqScan   
7331349600
  ExecutePlan
20405943000
  ExecSeqScan
20406161000

The virtualized times are supposed to be actual time spent on the CPU
with
the time spent in the probe factored out.  It seems here that half the
time
in SeqScan is spent time validating the tuples as opposed to 1/10th
doing IO.
I'm not sure that just farming out read IO is going to be all that
helpful 
in this situation.  That's why I think it's a good idea to create a
slave 
process that prefetchs pages and transfers valid ItemPointers to the
master.
There may not be much to be gained on simple SeqScans, however, in
complex 
queries that include a SeqScan, you may gain alot by offloading this
work
onto a slave thread.  A table with TOAST'ed attributes comes to mind. 
The
slave thread could be working away on the rest of the table while the
master
is PG_DETOAST_DATUM'ing the attributes for transmission back to the
client 
or additional processing.  Am I missing something in this analysis?

I've attached my dtrace script.

Myron Scott

#!/usr/sbin/dtrace -s
pid$1::ExecInitSeqScan:entry
{
ts = timestamp;
vts = vtimestamp;
timeon = 1;
}
pid$1::ExecEndSeqScan:return
/ts/
{
printf("scan time %d",(timestamp - ts) /100) ;
@val["ExecSeqScan"] = sum(timestamp - ts);
@val["Virtualized - ExecSeqScan"] = sum(vtimestamp - vts);
ts = 0;
vts = 0;
timeon = 0;
}
pid$1::HeapTupleSatisfiesSnapshot:entry
/timeon/
{
validity = timestamp;
vvalidity = vtimestamp;
}
pid$1::HeapTupleSatisfiesSnapshot:return
/validity/
{
@val["HeapTupleSatisfiesSnapshot"] = sum(timestamp - validity);
@val["Virtualized - HeapTupleSatisfiesSnapshot"] = sum(vtimestamp - 
vvalidity);
@val["HeapTupleSatisfiesSnapshot - Call Count"] = sum(1);
validity = 0;
vvalidity = 0;
}
pid$1::smgrread:entry
/timeon/
{
rt= timestamp;
vrt= vtimestamp;
}
pid$1::smgrread:return
/rt/
{
@val["smgrread"] = sum(timestamp - rt);
@val["Virtualized - smgrread"] = sum(vtimestamp - vrt);
@val["smgread - Call Count"] = sum(1);
rt = 0;
vrt = 0;
}
pid$1::ReadBuffer:entry
/timeon/
{
brt= timestamp;
vbrt= vtimestamp;
}
pid$1::ReadBuffer:return
/brt/
{
@val["ReadBuffer"] = sum(timestamp - brt);

Re: [HACKERS] using eclipse to compiler and debug the source code

2006-04-11 Thread Dave Cramer


Hi,


You have to start a background process, then attach to it. If this is  
possible in eclipse then it should work.


Typically everyone uses gdb.

Dave

On 9-Apr-06, at 10:55 PM, 李峰 wrote:


pgsql-hackers！

	Hi , I want to use eclipse to compiler the postgresql source code  
and debug them on winxp . I have tried some times with fails . Is  
there anyone to tell me how to do ?



thanks.

[EMAIL PROTECTED]
　　2006-04-10

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Andrew - Supernews

On 2006-04-11, Richard Huxton  wrote:
> Andrew - Supernews wrote:
>> On 2006-04-11, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> I don't feel a need to offer specific examples as requested by Andrew.
>> 
>> Why not? You're basing your entire argument on a false premise (that
>> pl/pgsql is more powerful than SQL); I can provide specific examples of
>> why this is not the case, or refute any that you care to provide. 
>
> You can write trigger functions in plpgsql.

You can write rules without plpgsql.

While rules and triggers are not equivalent, I think you'll be hard-pressed
to come up with an example where a malicious intruder, with sufficient
access to the system to create pl/pgsql functions if pl/pgsql is loaded,
can carry out a useful attack using triggers that would not be possible
without them.

Let's try a simple example; changing the value of a column in future
inserts into a table. Doing it without a trigger turns out to be simple;
as a demonstration, this method allows an SQL function to be invoked:

create function foox(foo) returns integer language sql
  as $$ update foo set value='bogus' where id=$1.id; select 1; $$;

create rule foo_rule as on insert to foo do insert into bar values (foox(NEW));

insert into foo values (2,'bar');
INSERT 0 1

select * from foo;
 id | value 
+---
  1 | foo
  2 | bogus
(2 rows)

So that's triggers without pl/pgsql. Anyone else want to try a challenge?

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] plpgsql by default

2006-04-11 Thread Richard Huxton


Andrew - Supernews wrote:

On 2006-04-11, Tom Lane <[EMAIL PROTECTED]> wrote:

I don't feel a need to offer specific examples as requested by Andrew.


Why not? You're basing your entire argument on a false premise (that
pl/pgsql is more powerful than SQL); I can provide specific examples of
why this is not the case, or refute any that you care to provide. 


You can write trigger functions in plpgsql.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] Question on win32 semaphore simulation

2006-04-11 Thread Qingqing Zhou

As I reviewed the win32/sema.c, there is some code that I am not clear, can
anybody explain please?

In semctl(SETVAL):

   if (semun.val < sem_counts[semNum])
sops.sem_op = -1;
   else
sops.sem_op = 1;

   /* Quickly lock/unlock the semaphore (if we can) */
   if (semop(semId, &sops, 1) < 0)
return -1;

When semun.val < sem_counts[semNum], it means we want to set the semaphore
to semun.val, but because somebody ReleaseSemaphore() for serveral times, so
we should wait for this semaphore several times (i.e., sem_counts[semNum] -
semun.val) to recover it. When semun.val > sem_counts[semNum], we should
ReleaseSemaphore() serveral times to recovery it.

That is, should the sem_op assignment logic be:

sops.sem_op = semun.val - sem_counts[semNum];

Of course, this would require we add a loop logic in semop().


Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

67 matches

Mail list logo