Re: [HACKERS] LATERAL
On Sat, Oct 17, 2009 at 10:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: That still leaves a lot of silly paths, though. In many cases, if you're thinking about joining A to {B C} using an index-accelerated path, you'd be just as well off joining to B first and then to C. So it might be that we only need to consider index-accelerated paths when there is no legal join between the LHS and a subset of the RHS. Yeah. If there are no join order constraints, it's always possible to form a plan such that you join a given rel only once you have all the rels needed (to provide values for the inner indexscan) on the lefthand side. I think this is probably the main reason why the issue was not treated in the planner originally. So maybe the way to think about this is as a way of dealing with join order constraints without losing the benefits of inner indexscans. The other problem I see here is that the bottom-up approach that we use in general is going to be difficult to apply here, because the set of paths will vary depending on what parameters are pushed down from the outer side. Well, we deal with that already --- the set of possible inner indexscan paths already varies depending on what the LHS is. I think the point here is that we'd be considering an inner path that's against an LHS that it's not legal to join the inner rel to *directly*. Such a path would only be legal if we later join to that LHS at a higher join level. So we'd be keeping around some tentative paths that might not ever form a valid join plan. Maybe we should turn around the way that inner indexscan paths are made. Currently we form them on-the-fly while considering a valid join combination. Maybe we should build them all at the first level (driving this off the set of available join clauses for each base rel) and mark each such path as requires a join to this other set of rels to be valid. But then we'd go ahead and join such paths to *other* rels, keeping the resulting join paths still marked as requiring the same future join. Once that join actually happens, the resulting path becomes fully valid. Only a join to a proper subset of the future-join requirement would be disallowed meanwhile. I'm not even sure this would be slower or more complicated than what we do now --- if you look at the logic that caches potential inner indexscan plans, it's almost doing this already. It would result in considering more join paths, but only ones that have some plausible use. Wow, that's pretty sneaky. I had to read this email twice before I understood what you were proposing. It sounds to me like it will work. I'm not 100% sure what the impact on planner performance will be, but it's at least plausible that it will be OK. I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. I don't see any reason to constrain the form of joins before the future-join requirement. Once you get to the join that satisfies that requirement, obviously you must implement it as a nestloop with the inner-indexscan path on the inside. But there shouldn't be any more constraint beyond that one for that join, either: * you might want a fast-start plan not cheapest total * sort orderings of the outer path might be useful We know that sort ordering of the inner indexscan or its joins won't be useful once we reach the future-join level, so it might be possible to discard paths that are not cheapest total cost below that level. The problem is to distinguish sort orderings that are useful within joins below that level. You could avoid that if you could convince yourself that there's no point in ever doing a mergejoin at such a level, but I don't think I believe that. It may well be that there's no point in being too tense about this issue anyway. The planner will only consider early joins to rels that have join clauses to the rel with the inner-indexscan path. There probably won't be that many of them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
Bruce Momjian wrote: Yep, this is illustrating something that is pretty basic to open source --- that is open source often provides the tools for a solution, rather than a complete solution. I often think of open source as providing a calculator with wires sticking out, rather than calculator buttons; the wires allow more flexibility, but they are harder to use. I disagree.Open source typically provides the complete solution too - just not from the developer who programs one component of the solution. Checklist writers intentionally use this to make straw-man arguments. People used to say linux doesn't even have a GUI - noting that X11 is a separate project. Now people have database checkboxes for: * a GUI admin tool (which we have, though it's a separate package) * GIS data types (which we have, though it's a separate package) * server-side password filters (which we have, though LDAP, etc) * replication (which we have, though many packages) * clustering (which we have, though hadoopdb) The Linux guys successfully communicated that it isn't fair for checklists to compare an OS kernel against commercial application suites. Seems it'd be good for the postgres project to similarly communicate that the database kernel is the core of a platform that's broader than just a database kernel. Ron -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Controlling changes in plpgsql variable resolution
As most of you will recall, plpgsql currently acts as though identifiers in SQL queries should be resolved first as plpgsql variable names, and only failing that do they get processed as names of the query. The plpgsql parser rewrite that I'm working on will fix that for the obviously-silly cases where a plpgsql variable is substituted for a table name or some other non-scalar-variable identifier. However, what should we do when a name could represent either a plpgsql variable or a column of the query? Historically we've resolved it as the plpgsql variable, but we've sure heard a lot of complaints about that. Oracle's PL/SQL has the precedence the other way around: resolve first as the query column, and only failing that as a PL variable. The Oracle behavior is arguably less surprising because the query-provided names belong to the nearer enclosing scope. I believe that we ought to move to the Oracle behavior over time, but how do we get there from here? Changing it is almost surely going to break a lot of people's functions, and in rather subtle ways. I think there are basically three behaviors that we could offer: 1. Resolve ambiguous names as plpgsql (historical PG behavior) 2. Resolve ambiguous names as query column (Oracle behavior) 3. Throw error if name is ambiguous (useful for finding problems) (Another possibility is to throw a warning but proceed anyway. It would be easy to do that if we proceed with the Oracle behavior, but *not* easy if we proceed with the historical PG behavior. The reason is that the code invoked by transformColumnRef may have already made some side-effects on the query tree. We discussed the implicit-RTE behavior yesterday, but there are other effects of a successful name lookup, such as marking columns for privilege checking.) What I'm wondering about at the moment is which behaviors to offer and how to control them. The obvious answer is use a GUC but that answer scares me because of the ease with which switching between #1 and #2 would break plpgsql functions. It's not out of the question that that could even amount to a security problem. I could see using a GUC to turn the error behavior (#3) on and off, but not to switch between #1 and #2. Another possibility is to control it on a per-function basis by adding some special syntax to plpgsql function bodies to say which behavior to use. We could for instance extend the never-documented #option syntax. This is pretty ugly and would be inconvenient to use too --- if people have to go and add #option something to a function, they might as well just fix whatever name conflicts it has instead. I'm not seeing any choice that seems likely to make everybody happy. Any comments or ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Boosting cost estimates for some built-in functions
We recently saw a complaint about psql \d commands being quite slow with many tables, which turned out to be because the planner was putting a table_has_privilege() call where it would get executed a lot, before other cheaper tests on pg_class. This is not the planner's fault --- it has no information that would suggest that table_has_privilege() is any more expensive than any other built-in function. In another case I was looking at just now, it seems that to_tsquery() and to_tsvector() are noticeably slower than most other built-in functions, which is not surprising given the amount of mechanism that gets invoked inside them. It would be useful to tell the planner about that to discourage it from picking seqscan plans that involve repeated execution of these functions. I'd like to suggest boosting the built-in cost estimates for the xxx_has_privilege functions to perhaps 10. to_tsquery and to_tsvector maybe should be boosted even higher, but I don't have a good specific number in mind. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. I don't see any reason to constrain the form of joins before the future-join requirement. Once you get to the join that satisfies that requirement, obviously you must implement it as a nestloop with the inner-indexscan path on the inside. But there shouldn't be any more constraint beyond that one for that join, either: * you might want a fast-start plan not cheapest total * sort orderings of the outer path might be useful We know that sort ordering of the inner indexscan or its joins won't be useful once we reach the future-join level, so it might be possible to discard paths that are not cheapest total cost below that level. Yeah, this was what I was driving at, but I got turned around in my head and was explaining it incorrectly. The problem is to distinguish sort orderings that are useful within joins below that level. You could avoid that if you could convince yourself that there's no point in ever doing a mergejoin at such a level, but I don't think I believe that. It may well be that there's no point in being too tense about this issue anyway. The planner will only consider early joins to rels that have join clauses to the rel with the inner-indexscan path. There probably won't be that many of them. You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? ... especially when there are partial index scans involved) but I think your last point here is well-taken. We've certainly turned a corner from it's just the exponential growth in the number of join paths to consider that's a problem. :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Boosting cost estimates for some built-in functions
On Sun, 2009-10-18 at 14:50 -0400, Tom Lane wrote: I'd like to suggest boosting the built-in cost estimates for the xxx_has_privilege functions to perhaps 10. to_tsquery and to_tsvector maybe should be boosted even higher, but I don't have a good specific number in mind. ISTM we should set up a test that measures the cost, of those and other functions. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Boosting cost estimates for some built-in functions
On Sun, Oct 18, 2009 at 3:38 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sun, 2009-10-18 at 14:50 -0400, Tom Lane wrote: I'd like to suggest boosting the built-in cost estimates for the xxx_has_privilege functions to perhaps 10. to_tsquery and to_tsvector maybe should be boosted even higher, but I don't have a good specific number in mind. ISTM we should set up a test that measures the cost, of those and other functions. Benchmarking??? Heresy! ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Controlling changes in plpgsql variable resolution
On Sun, 2009-10-18 at 13:25 -0400, Tom Lane wrote: As most of you will recall, plpgsql currently acts as though identifiers in SQL queries should be resolved first as plpgsql variable names, and only failing that do they get processed as names of the query. The plpgsql parser rewrite that I'm working on will fix that for the obviously-silly cases where a plpgsql variable is substituted for a table name or some other non-scalar-variable identifier. However, what should we do when a name could represent either a plpgsql variable or a column of the query? Historically we've resolved it as the plpgsql variable, but we've sure heard a lot of complaints about that. Oracle's PL/SQL has the precedence the other way around: resolve first as the query column, and only failing that as a PL variable. The Oracle behavior is arguably less surprising because the query-provided names belong to the nearer enclosing scope. I believe that we ought to move to the Oracle behavior over time, but how do we get there from here? Changing it is almost surely going to break a lot of people's functions, and in rather subtle ways. I think there are basically three behaviors that we could offer: 1. Resolve ambiguous names as plpgsql (historical PG behavior) 2. Resolve ambiguous names as query column (Oracle behavior) 3. Throw error if name is ambiguous (useful for finding problems) (Another possibility is to throw a warning but proceed anyway. It would be easy to do that if we proceed with the Oracle behavior, but *not* easy if we proceed with the historical PG behavior. The reason is that the code invoked by transformColumnRef may have already made some side-effects on the query tree. We discussed the implicit-RTE behavior yesterday, but there are other effects of a successful name lookup, such as marking columns for privilege checking.) What I'm wondering about at the moment is which behaviors to offer and how to control them. The obvious answer is use a GUC but that answer scares me because of the ease with which switching between #1 and #2 would break plpgsql functions. It's not out of the question that that could even amount to a security problem. I could see using a GUC to turn the error behavior (#3) on and off, but not to switch between #1 and #2. Another possibility is to control it on a per-function basis by adding some special syntax to plpgsql function bodies to say which behavior to use. We could for instance extend the never-documented #option syntax. This is pretty ugly and would be inconvenient to use too --- if people have to go and add #option something to a function, they might as well just fix whatever name conflicts it has instead. I'd suggest two options, one for name resolution (#1 or #2) and one for error level of ambiguity (none or ERROR). GUCs are fine, now we have GUC settings per-function. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Controlling changes in plpgsql variable resolution
On Sun, Oct 18, 2009 at 1:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: As most of you will recall, plpgsql currently acts as though identifiers in SQL queries should be resolved first as plpgsql variable names, and only failing that do they get processed as names of the query. The plpgsql parser rewrite that I'm working on will fix that for the obviously-silly cases where a plpgsql variable is substituted for a table name or some other non-scalar-variable identifier. However, what should we do when a name could represent either a plpgsql variable or a column of the query? Historically we've resolved it as the plpgsql variable, but we've sure heard a lot of complaints about that. Oracle's PL/SQL has the precedence the other way around: resolve first as the query column, and only failing that as a PL variable. The Oracle behavior is arguably less surprising because the query-provided names belong to the nearer enclosing scope. I believe that we ought to move to the Oracle behavior over time, but how do we get there from here? Changing it is almost surely going to break a lot of people's functions, and in rather subtle ways. I think there are basically three behaviors that we could offer: 1. Resolve ambiguous names as plpgsql (historical PG behavior) 2. Resolve ambiguous names as query column (Oracle behavior) 3. Throw error if name is ambiguous (useful for finding problems) (Another possibility is to throw a warning but proceed anyway. It would be easy to do that if we proceed with the Oracle behavior, but *not* easy if we proceed with the historical PG behavior. The reason is that the code invoked by transformColumnRef may have already made some side-effects on the query tree. We discussed the implicit-RTE behavior yesterday, but there are other effects of a successful name lookup, such as marking columns for privilege checking.) What I'm wondering about at the moment is which behaviors to offer and how to control them. The obvious answer is use a GUC but that answer scares me because of the ease with which switching between #1 and #2 would break plpgsql functions. It's not out of the question that that could even amount to a security problem. I could see using a GUC to turn the error behavior (#3) on and off, but not to switch between #1 and #2. Another possibility is to control it on a per-function basis by adding some special syntax to plpgsql function bodies to say which behavior to use. We could for instance extend the never-documented #option syntax. This is pretty ugly and would be inconvenient to use too --- if people have to go and add #option something to a function, they might as well just fix whatever name conflicts it has instead. I'm not seeing any choice that seems likely to make everybody happy. Any comments or ideas? If we just change the default behavior from #1 to #2, it's going to be insanely easy to dump a database using pg_dump for 8.4, restore into an 8.5 database, and end up with a function that does something different and broken. So I'm opposed to that plan, but amenable to any of the other options in varying degrees. I think it would make a fair amount of sense to make #3 the default behavior. If possible, I think we should try to engineer things so that using pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5 database produces a function with identical semantics. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. And then when we get done we can consider implementing $SUBJECT, which is no longer what this thread is about at all. :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Controlling changes in plpgsql variable resolution
Robert Haas robertmh...@gmail.com writes: If possible, I think we should try to engineer things so that using pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5 database produces a function with identical semantics. Hmm ... actually, we could have pg_dump stick either a #option line or a GUC SET parameter onto every plpgsql function it pulls from an old database. So if you're willing to assume that people do their upgrades that way, it could be made reasonably safe, even if the default behavior changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Boosting cost estimates for some built-in functions
Simon Riggs si...@2ndquadrant.com writes: On Sun, 2009-10-18 at 14:50 -0400, Tom Lane wrote: I'd like to suggest boosting the built-in cost estimates for the xxx_has_privilege functions to perhaps 10. to_tsquery and to_tsvector maybe should be boosted even higher, but I don't have a good specific number in mind. ISTM we should set up a test that measures the cost, of those and other functions. Well, some quick testing suggests that 10x is in the right ballpark for has_table_privilege versus a simple function such as OID equality, and that even fairly simple cases of to_tsvector/to_tsquery can be 100x the cost of OID equality. I'm not sure if there is a typical case for the latter given the wide range of possible tsearch configurations, so trying to benchmark it with any degree of precision might be pointless. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Controlling changes in plpgsql variable resolution
On Sun, Oct 18, 2009 at 4:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: If possible, I think we should try to engineer things so that using pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5 database produces a function with identical semantics. Hmm ... actually, we could have pg_dump stick either a #option line or a GUC SET parameter onto every plpgsql function it pulls from an old database. So if you're willing to assume that people do their upgrades that way, it could be made reasonably safe, even if the default behavior changes. I'm not completely willing to assume that. I know we recommend it, but I'm sure that people don't always do it. And this is pretty subtle: someone might screw it up and get a non-working function without realizing it. The advantage of making the default behavior throw an error is that it should be pretty obvious if you've broken something. And this isn't just about pg_dump, either. I have a script that gets run regularly on one of my production databases that goes an updates the definitions of a whole bunch of functions to the latest version from the source code repository. Even if pg_dump DTRT, the next run of a script of that type might subtly break a bunch of stuff. The current behavior is a trap for the unwary, so I have no problem with changing it. But I think we should try really hard to avoid creating a more subtle trap in the process. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres 8.3.8 and Solaris 10_x86 64 bit problems?
Are you sure about this? When I try to build and don't have openssl in the lib/include path it claims it needs it. As I'm building 64 bit I can now build postgres in 64 bit with openssl 98k just fine. However when I run it I'm getting the same error message. I'm curious if this is a lost hope. My boss is recommending we flatten the Sun box and install redhat linux (which I'm fine with). I'd rather not as threading in Solaris is better. Thoughts? thanks Zdenek Kotala wrote: You can look on http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=ghost_mothdt=2009-10-07%2021:06:00 How it is built. You also does not needed own version of Openssl. All security fixes are backported. It is located in /usr/sfw/lib or /usr/sfw/lib/64 Sometimes are problem with gcc and solaris linker. IIRC, I had problem with PLPerl compilation. Zdenek Dne 8.10.09 03:48, u235sentinel napsal(a): So I compiled postgres with Solaris 10 and have problems running it. # ./pg_ctl ld.so.1: pg_ctl: fatal: relocation error: R_AMD64_32: file /usr/local/postgres64/lib/libpq.so.5: symbol (unknown): value 0xfd7fff1cf210 does not fit Killed # ldd pg_ctl libpq.so.5 =/usr/local/postgres64/lib/libpq.so.5 libm.so.2 = /usr/lib/64/libm.so.2 libxml2.so.2 = /usr/lib/64/libxml2.so.2 libz.so.1 = /usr/lib/64/libz.so.1 libreadline.so.6 = /usr/local/lib/libreadline.so.6 libcurses.so.1 =/usr/lib/64/libcurses.so.1 librt.so.1 =/usr/lib/64/librt.so.1 libsocket.so.1 =/usr/lib/64/libsocket.so.1 libc.so.1 = /usr/lib/64/libc.so.1 libpthread.so.1 = /usr/lib/64/libpthread.so.1 libnsl.so.1 = /lib/64/libnsl.so.1 libgcc_s.so.1 = /usr/sfw/lib/amd64/libgcc_s.so.1 libaio.so.1 = /lib/64/libaio.so.1 libmd.so.1 =/lib/64/libmd.so.1 libmp.so.2 =/lib/64/libmp.so.2 libscf.so.1 = /lib/64/libscf.so.1 libdoor.so.1 = /lib/64/libdoor.so.1 libuutil.so.1 = /lib/64/libuutil.so.1 libgen.so.1 = /lib/64/libgen.so.1 # file /usr/local/postgres64/lib/libpq.so.5 /usr/local/postgres64/lib/libpq.so.5: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE CMOV], dynamically linked, not stripped What am I missing??? Here's my environment. Solaris 10 x86_64 with postgres 8.3.8 and openssl 98k using gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) , sunstudio12.1 and GNU Make 3.80 I've even monkied with LD_LIBRARY_PATH but getting the same issues. Seems when I don't compile in openssl everything is fine. Thanks! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres 8.3.8 and Solaris 10_x86 64 bit problems?
I'm curious if this is a lost hope. My boss is recommending we flatten the Sun box and install redhat linux (which I'm fine with). I'd rather not as threading in Solaris is better. Maybe solaris threads were better 10-15 years ago, but I'm not convinced that is still the case. Any data supporting that argument, solaris 10 threads vs. linux 2.6.11+ kernel (p)threads? Another thing to consider in your decision is that Sun was just bought by oracle, leaving the solaris road map up in the air. At least for me, Linux is a bit more reassuring ... or freebsd. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] foreign-key inference join removal
On Thu, Sep 17, 2009 at 5:14 PM, I wrote: I think that the next project in this area should be to try to support removal of INNER joins. (Removal of SEMI, ANTI, and FULL joins seems unlikely ever to be interesting.) That will require teaching the planner about foreign key constraints, which interestingly opens up some other interesting optimization opportunities. An INNER join that can never match zero rows can alternatively be implemented as a LEFT join, which can be reordered in some situations where an inner join can't. e.g. A LJ (B IJ C ON Pbc) ON Pab can be implemented as (A LJ B ON Pab) LJ/IJ C ON Pbc if it is the case that for every row in B, there will be at least one row in C such that Pbc holds. A few more thoughts on this topic. Suppose we define a new join type called inner_or_left_join. This means that we've proven that every outer row has at least one join partner, so that we'll get the same results whichever way we implement it. We can prove this for either inner joins or left joins, or for right joins with the sides reversed, by checking that: (1) The inner rel is a baserel with no restriction clauses. (2) All the join clauses are merge-joinable. (3) There is a table on the outer side of the join with a foreign key constraint referencing the inner table, such that the columns of the foreign key constraint and the chosen equality operators exactly match up with the join clauses (no extra columns, no extra join clauses). (4) All the relevant columns of the outer table are NOT NULL. I am not quite sure how to check for #3 and #4, nor am I sure where the best place to put this logic is. Once this infrastructure is in place, it opens up a number of interesting optimization opportunities. 1. Inner joins that are relabelled as inner_or_left joins can be removed using the same logic that we already use for left joins. 2. Inner_or_left joins can be reordered more flexibly than either inner joins or left joins. Consider: A LJ (B IJ C ON Pbc) ON Pab If B IJ C is inner_or_left and Pbc is strict, the joins commute. Further, I believe that if the LJ is inner_or_left and Pab is strict, we can commute the joins provided that, in the process, we change the join between A and B to be an inner join (no longer inner_or_left). But if neither join is inner_or_left then no reordering is possible. 3. Some path types are implemented only for inner joins but not for left joins; these can also be used for inner_or_left joins. For example, consider A LJ B ON Pab. If the join can be proven inner_or_left, we have the option of treating it as an inner join and using A rather than B as the inner rel. Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reworks for Access Control facilities (r2363)
Heikki Linnakangas wrote: KaiGai Kohei wrote: 1) creation of a database object In SELinux model, when a user tries to create a new object (not limited to database object, like a file or socket), a default security context is assigned on the new object, then SELinux checks whether the user has privileges to create a new object labeled with the security context, or not. When we create a new table, the default PG model checks ACL_CREATE privilege on the namespace which is supposed to own the new table. DefineRelation() invokes pg_namespace_aclcheck() with OID of the namespace, but we cannot see any properties of the new table from inside of pg_namespace_aclcheck(). It checks permissions on the couple of a user and a namespace. On the other hand, SE-PG model follows the above principle. When we create a new table, SE-PG compute a default security context to be assigned on, then it checks the security policy whether the user is allowed to create a new table labeled with the context, or not. It checks permissions on the couple of a user and a new table itself. I don't think I buy that argument. Can't we simply decide that in PostgreSQL, the granularity is different, and you can only create policies governing creation of objects on the basis of schema+user combination, not on the properties of the new object. AFAICS it wouldn't violate the principle of Mandatory Access Control. No, it violates the principle. I omitted a case for simplification of explanations. When we create a new object, we can provide an explicit security context to be assigned on the new object, instead of the default one. In this case, SELinux checks privilege to create the object with the given security context. (If it is disallowed, this creation will be failed.) If we check MAC permission to create a new object based on a couple of user and schema which owns the new one, it also allows users to create a new object with arbitrary security context, because this check is not applied on security context of the new object itself. It is a reason why SELinux is MAC. It never allows to create a new object with a violated security context. The only way to control this policy is to check privileges on the pair of user and the new object. Thus, SELinux defines its permission to create a new object on various kind of objects; not limited to database objects such as files, sockets, IPC, x-window and so on. 2) AND-condition for all the privileges When a certain action requires multiple permissions at one time, the principle of SELinux is that all the permissions have to be checked. If one of them is not allowed, it disallows the required action. In other word, all the conditions are chained by AND. This principle enables us to analyze the data flows between users and resources with the security policy, without implementation details. If a certain permission (e.g db_table:{select}) can override any other permission (e.g db_column:{select}), it also implicitly means a possibility of infotmation leaks/manipulations, even if the security policy said this user cannot read a data from the column. On the other hand, the default PG model allows to bypass checks on certain objects. For example, column-level privileges are only checked when a user does not have enough permissions on the target table. If SELECT a,b FROM t is given, pg_attribute_aclcheck() may not invoked when user has needed privileges on the table t. Hmm, I see. Yes, it does seem like we'd need to change such permission checks to accommodate both models. I'm not clear why we need to rework the permission checks here. DAC and MAC perform orthogonally and independently. DAC allows to override column-level privileges by table-level privileges according to the default PG's model. It seems to me fine. On the other hand, MAC checks both of permissions. It is also fine. 3) superuser is not an exception of access control. It is the similar issue to the 2). Yeah. The following code is a part of AlterFunctionOwner_internal(). /* Superusers can always do it */ if (!superuser()) { /* Otherwise, must be owner of the existing object */ if (!pg_proc_ownercheck(procOid, GetUserId())) aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_PROC, NameStr(procForm-proname)); /* Must be able to become new owner */ check_is_member_of_role(GetUserId(), newOwnerId); /* New owner must have CREATE privilege on namespace */ aclresult = pg_namespace_aclcheck(procForm-pronamespace, newOwnerId, ACL_CREATE); if (aclresult != ACLCHECK_OK) aclcheck_error(aclresult, ACL_KIND_NAMESPACE, get_namespace_name(procForm-pronamespace)); } From perspective of the default PG model, this code
Re: [HACKERS] Reworks for Access Control facilities (r2363)
Robert Haas wrote: On Sat, Oct 17, 2009 at 9:53 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: This raises an important point: We need *developer documentation* on how to write SE-Pgsql compliant permission checks. Not only for authors of 3rd party modules but for developers of PostgreSQL itself. Point 2) above needs to be emphasized, it's a big change in the way permission checks have to be programmed. One that I hadn't realized before. I haven't been paying much attention, but neither is most other developers, so we need clear documentation. This is a good point. All throughout these discussions, there has been a concern that whatever is implemented here will be unmaintainable because we don't have any committers who are familiar with the ins and outs of SE-Linux and MAC (and not too many other community members interested in the topic, either). So some developer documentation seems like it might help. On the other hand, KaiGai has made several attempts at documentation and several attempts at patches and we're not really any closer to having SE-PostgreSQL in core than we were a year ago. I think that's partly because KaiGai tried to bite off far too much initially (still?), partly because of technical problems with the patches, partly because the intersection of people who are experts in PostgreSQL and people who are experts in MAC seems to be empty, and partly because, as much as people sorta kinda like this feature, nobody other than KaiGai has really been willing to step up and pour into this project the kind of resources that it will likely require to be successful. I have to admit that I'm kind of giving up hope. We seem to be going in circles, and I don't think anything new is being said on this thread that hasn't been said before. We may not be always able to find out the right way to the mountain summit. Indeed, it seems that we returned to the original design which deploys SE-PgSQL's hooks on the strategic points. But there is a significant improvement. We learned several designs which we already tried were on the rocky path, although they look like an easy path at first. I agrre to the Heikki's suggestion. Not only user documentation, we need another documentation from the viewpoint of developer, which describes what permissions are defined, what is the purpose of SE-PgSQL's hooks and when/where these are called. Thanks, BTW, as I noted in the last message, I have to allocate my activities to Japan Linux Symposium in this week. So, responses may delay, Sorry. http://events.linuxfoundation.org/events/japan-linux-symposium/schedule -- OSS Platform Development Division, NEC KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Actually i thought of a solution for the wrap-around sometime back. Let me try to put my initial thoughts into it. May be it would get refined over conversation. Transaction wrap-around failure Actually this problem is present even in today's transaction id scenario and the only way we avoid is by using freezing. Can we use a similar approach? This freezing should mean that we are freezing the sub-transaction in order to avoid the sub-transaction wrap around failure. I think when we use a sub-transaction, the tuple would have xmin as the sub-transaction id(correct me, if i am wrong). If the tuple insertion becomes successful, we will make it equal to the parent transaction id. This way, we get a chance to re-use the sub-transaction id, previously used. This would mean accessing the tuple again after the sub-transaction completes On the recovery front, the freezing should get logged into redo. With this approach, we need only one sub-transaction id for the entire copy. If insert gets successful for a copied row, we goto the tuple again and sub-freeze a tuple. If it gets un-successful, we rollback the sub-transaction. But for every un-successful transaction, we need a transaction id. May be in order to avoid this, we can have one transaction id to mark the failures and freeze all the failed rows for that transaction id. I am just throwing out an idea for consideration. Thanks, Gokul.