Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Wed, Jul 2, 2014 at 9:25 PM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: On Sun, Jun 29, 2014 at 4:18 PM, David Rowley dgrowle...@gmail.com wrote: I think I'm finally ready for a review again, so I'll update the commitfest app. I have reviewed this on code level. 1. Patch gets applied cleanly. 2. make/make install/make check all are fine No issues found till now. However some cosmetic points: 1. * The API of this function is identical to convert_ANY_sublink_to_join's, * except that we also support the case where the caller has found NOT EXISTS, * so we need an additional input parameter under_not. Since now, we do support NOT IN handling in convert_ANY_sublink_to_join() we do have under_not parameter there too. So above comments near convert_EXISTS_sublink_to_join() function is NO more valid. Nice catch. I've removed the last 2 lines from that comment now. 2. Following is the unnecessary change. Can be removed: @@ -506,6 +560,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } } + } /* Else return it unmodified */ return node; Removed 3. Typos: a. + * queryTargetListListCanHaveNulls ... +queryTargetListCanHaveNulls(Query *query) Function name is queryTargetListCanHaveNulls() not queryTargetListListCanHaveNulls() Fixed. b. *For example the presense of; col IS NOT NULL, or col = 42 would allow presense = presence Fixed. 4. get_attnotnull() throws an error for invalid relid/attnum. But I see other get_att* functions which do not throw an error rather returning some invalid value, eg. NULL in case of attname, InvalidOid in case of atttype and InvalidAttrNumber in case of attnum. I have observed that we cannot return such invalid value for type boolean and I guess that's the reason you are throwing an error. But somehow looks unusual. You had put a note there which is good. But yes, caller of this function should be careful enough and should not assume its working like other get_att*() functions. However for this patch, I would simply accept this solution. But I wonder if someone thinks other way round. hmmm, I remember thinking about that at the time. It was a choice between returning false or throwing an error. I decided that it probably should be an error, as that's what get_relid_attribute_name() was doing. Just search lsyscache.c for cache lookup failed for attribute %d of relation %u. Testing more on SQL level. Thank you for reviewing this. I think in the attached I've managed to nail down the logic in find_inner_rels(). It was actually more simple than I thought. On working on this today I also noticed that RIGHT joins can still exist at this stage of planning and also that full joins are not transformed. e.g: a FULL JOIN b ON a.id=b.id WHERE a.is IS NOT NULL would later become a LEFT JOIN, but at this stage in planning, it's still a FULL JOIN. I've added some new regression tests which test some of these join types. With further testing I noticed that the patch was not allowing ANTI joins in cases like this: explain select * from a where id not in(select x from b natural join c); even if b.x and b.c were NOT NULL columns. This is because the TargetEntry for x has a varno which belongs to neither b or c, it's actually the varno for the join itself. I've added some code to detect this in find_inner_rels(), but I'm not quite convinced yet that it's in the correct place... I'm wondering if a future use for find_inner_rels() would need to only see relations rather than join varnos. The code I added does get the above query using ANTI joins, but it does have a bit of a weird quirk, in that for it to perform an ANTI JOIN, b.x must be NOT NULL. If c.x is NOT NULL and b.x is nullable, then there will be no ANTI JOIN. There must be some code somewhere that chooses which of the 2 vars that should go into the target list in for naturals joins. The above got me thinking that the join conditions can also be used to prove not nullness too. Take the query above as an example, x can never actually be NULL, the condition b.x = c.x ensures that. I've only gone as far as adding some comments which explain that this is a possible future optimisation. I've not had time to look at this yet and I'd rather see the patch go in without it than risk delaying this until the next commitfest... Unless of course someone thinks it's just too weird a quirk to have in the code. Attached is the updated version of the patch. Regards David Rowley However, assigning it to author to think on above cosmetic issues. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company not_in_anti_join_5257082_2014-07-05.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] Proposing pg_hibernate
Hello, I'm reviewing this patch. I find this feature useful, so keep good work. I've just begun the review of pg_hibernate.c, and finished reviewing other files. pg_hibernate.c will probably take some time to review, so let me give you the result of my review so far. I'm sorry for trivial comments. (1) The build on Windows with MSVC 2008 Express failed. The error messages are as follows (sorry, they are in Japanese): .\contrib\pg_hibernator\pg_hibernator.c(50): error C2373: 'CreateDirectoryA' : 再定義されています。異なる型修飾子です。 .\contrib\pg_hibernator\pg_hibernator.c(196): error C2373: 'CreateDirectoryA' : 再定義されています。異なる型修飾子です。 .\contrib\pg_hibernator\pg_hibernator.c(740): error C2198: 'CreateDirectoryA' : 呼び出しに対する引数が少なすぎます。 The cause is that CreateDirectory() is an Win32 API. When I renamed it, the build succeeded. I think you don't need to make it a function, because its processing is simple and it is used only at one place. (2) Please look at the following messages. Block Reader read all blocks successfully but exited with 1, while the Buffer Reader exited with 0. I think the Block Reader also should exit 0 when it completes its job successfully, because the exit code 0 gives the impression of success. LOG: worker process: Buffer Saver (PID 2984) exited with exit code 0 ... LOG: Block Reader 1: all blocks read successfully LOG: worker process: Block Reader 2 (PID 6064) exited with exit code 1 In addition, the names Buffer Saver and Block Reader don't correspond, while they both operate on the same objects. I suggest using the word Buffer or Block for both processes. (3) Please add the definition of variable PGFILEDESC in Makefile. See pg_upgrade_support's Makefile for an example. It is necessary for the versioning info of DLLs on Windows. Currently, other contrib libraries lack the versioning info. Michael Paquier is adding the missing versioning info to other modules for 9.5. (4) Remove the following #ifdef, because you are attempting to include this module in 9.5. #if PG_VERSION_NUM = 90400 (5) Add header comments at the beginning of source files like other files. (6) Add user documentation SGML file in doc/src/sgml instead of README.md. For reference, I noticed the following mistakes in README.md: instalation - installation `Block Reader` threads - `Block Reader` processes (7) The message could not open \%s\: %m should better be: could not open file \%s\: %m because this message is already used in many places. Please find and use existing messages for other places as much as possible. That will reduce the translation efforts for other languages. (8) fileClose() never returns false despite its comment: /* Returns true on success, doesn't return on error */ (9) I think it would be better for the Block Reader to include the name and/or OID of the database in its success message: LOG: Block Reader 1: restored 14 blocks (10) I think the save file name should better be database OID.save instead of some arbitrary integer.save. That also means %u.save instead of %d.save. What do you think? (11) Why don't you remove misc.c, removing unnecessary functions and keeping just really useful ones in pg_hibernator.c? For example, I don't think fileOpen(), fileClose(), fileRead() and fileWrite() need not be separate functions (see src/backend/postmaster/pgstat.c). And, there's only one call site of the following functions: readDBName getSavefileNameBeingRestored markSavefileBeingRestored Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] unused local variable
Commit 8b6010b8350a1756cd85595705971df81b5ffc07 eliminated the only usage of a variable called typeStruct in plpy_spi.c, but left the declaration and the code that sets a value for it. This is generating a warning when I build. I would have just pushed a fix, but I was concerned that it might have been left on purpose for some follow-on patch. Any objections to the attached, to silence the warning? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c index 6c3eeff..465b316 100644 --- a/src/pl/plpython/plpy_spi.c +++ b/src/pl/plpython/plpy_spi.c @@ -93,7 +93,6 @@ PLy_spi_prepare(PyObject *self, PyObject *args) HeapTuple typeTup; Oid typeId; int32 typmod; - Form_pg_type typeStruct; optr = PySequence_GetItem(list, i); if (PyString_Check(optr)) @@ -129,7 +128,6 @@ PLy_spi_prepare(PyObject *self, PyObject *args) optr = NULL; plan-types[i] = typeId; - typeStruct = (Form_pg_type) GETSTRUCT(typeTup); PLy_output_datum_func(plan-args[i], typeTup); ReleaseSysCache(typeTup); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Kevin Grittner kgri...@ymail.com wrote: Because this review advances the patch so far, it may be feasible to get it committed in this CF. I'll see what is needed to get there and maybe have a patch toward that end in a few days. It appears that I need to create a new execution node and a way for SPI calls to use it. That seems beyond the scope of what is fair to include in this CF, even if I got something put together in the next couple days. FWIW, I think that once I have the other pieces, what I initially posted is committable as the first patch of a series. A second patch would add the new execution node and code to allow SPI calls to use it. The patch that David submitted, as modified by myself and with further refinements that David is working on would be the third patch. An implementation in plpgsql, would be the fourth. Other PLs would be left for people more familiar with those languages to implement. What I was hoping for in this CF was a review to confirm the approach before proceeding to build on this foundation. David found nothing to complain about, and went to the trouble of writing code to confirm that it was actually generating complete results which were usable. Robert doesn't feel this constitutes a serious code review. I'm not aware of any changes which are needed to the pending patch once the follow-on patches are complete. I'm moving this to Needs Review status. People will have another chance to review this patch when the other code is available, but if we want incremental maintenance of materialized views in 9.5, delaying review of what I have submitted in this CF until the next CF will put that goal in jeopardy. The one thing I don't feel great about is that it's using tuplestores of the actual tuples rather than just their TIDs; but it seems to me that we need the full tuple to support triggers on FDWs, so the TID approach would be an optimization for a subset of the cases, and would probably be more appropriate, if we do it at all, in a follow-on patch after this is working (although I think it is an optimization we should get into 9.5 if we are going to do it). If you disagree with that assessment, now would be a good time to explain your reasoning. A minor point is psql tab-completion for the REFERENCING clause. It seems to me that's not critical, but I might slip it in anyway before commit. I took a look at whether I could avoid making OLD and NEW non-reserved keywords, but I didn't see how to do that without making FOR at least partially reserved. If someone sees a way to do this without creating three new unreserved keywords (REFERENCING, OLD, and NEW) I'm all ears. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
Amit Kapila amit.kapil...@gmail.com writes: Please find the patch attached to address the above concern. I have updated docs, so that users can be aware of such behaviour. I'm in the camp that says that we need to do something about this, not just claim that it's operating as designed. AFAICS, the *entire* argument for having ALTER SYSTEM at all is ease of use; but this behavior is not what I would call facilitating ease of use. In particular, if you are conveniently able to edit postgresql.conf, what the heck do you need ALTER SYSTEM for? One possible answer is to modify guc-file.l so that only the last value supplied for any one GUC gets processed. The naive way of doing that would be O(N^2) in the number of uncommented settings, which might or might not be a problem; if it is, we could no doubt devise a smarter data structure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unused local variable
Kevin Grittner kgri...@ymail.com writes: Commit 8b6010b8350a1756cd85595705971df81b5ffc07 eliminated the only usage of a variable called typeStruct in plpy_spi.c, but left the declaration and the code that sets a value for it. This is generating a warning when I build. I would have just pushed a fix, but I was concerned that it might have been left on purpose for some follow-on patch. Any objections to the attached, to silence the warning? Huh. Odd that I did not see such a warning here. No objection to removing the dead variable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: Attached is a delta patch between version 1.2 and 1.3, and also a completely updated patch. Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. I think we need to verify that they are members of the same opclass, as relation_has_unique_index_for does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unused local variable
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: typeStruct in plpy_spi.c No objection to removing the dead variable. Done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 documentation: duplicate paragraph in logical decoding example
Hi, while reading the logical decoding docs, I came across a duplicated paragraph in doc/src/sgml/logicaldecoding.sgml - in the current master branch, lines 108 to 115 are the same as lines 117 to 124. I've attached a patch which removes the second instance of that paragraph. In case it is intended to demonstrate that the changes in the stream were not consumed by pg_logical_slot_peek_changes(), the comment in line 117 should be removed, or reworded like the changes have not been consumed by the previous command, just to avoid making it look like that paragraph had been duplicated by accident :) Regards, Christoph -- Spare Space diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index a2108d6..5fa2a1e 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -114,15 +114,6 @@ postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, N 0/16E0B90 | 690 | COMMIT 690 (3 rows) -postgres=# -- You can also peek ahead in the change stream without consuming changes -postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL); - location | xid | data +-+--- - 0/16E09C0 | 690 | BEGIN 690 - 0/16E09C0 | 690 | table public.data: INSERT: id[integer]:3 data[text]:'3' - 0/16E0B90 | 690 | COMMIT 690 -(3 rows) - postgres=# -- options can be passed to output plugin, to influence the formatting postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'include-timestamp', 'on'); location | xid | data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Spent some time analyzing a severe performance regression on 9.1-9.3 upgrade for a user on IRC. Narrowed it down to this: commit 807a40c5 fixed a bug in handling of (new in 9.2) functionality of ScalarArrayOpExpr in btree index quals, forcing the results of scans including such a qual to be treated as unordered (because the order can in fact be wrong). However, this kills any chance of using the same index _without_ the SAOP to get the benefit of its ordering. For example: regression=# create index on tenk1 (ten,unique1,thousand); CREATE INDEX regression=# set enable_sort = false; SET regression=# explain analyze select * from tenk1 where thousand in (19,29,39,49,57,66,77,8,90,12,22,32) and (ten = 5) and (ten 5 or unique1 5000) order by ten,unique1 limit 1; QUERY PLAN -- Limit (cost=1000294.71..1000294.71 rows=1 width=244) (actual time=0.889..0.891 rows=1 loops=1) - Sort (cost=1000294.71..1000294.81 rows=42 width=244) (actual time=0.884..0.884 rows=1 loops=1) Sort Key: ten, unique1 Sort Method: top-N heapsort Memory: 25kB - Bitmap Heap Scan on tenk1 (cost=44.34..294.50 rows=42 width=244) (actual time=0.194..0.687 rows=80 loops=1) Recheck Cond: (thousand = ANY ('{19,29,39,49,57,66,77,8,90,12,22,32}'::integer[])) Filter: ((ten = 5) AND ((ten 5) OR (unique1 5000))) Rows Removed by Filter: 40 Heap Blocks: exact=100 - Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..44.33 rows=120 width=0) (actual time=0.148..0.148 rows=120 loops=1) Index Cond: (thousand = ANY ('{19,29,39,49,57,66,77,8,90,12,22,32}'::integer[])) Planning time: 0.491 ms Execution time: 1.023 ms (13 rows) (real case had larger rowcounts of course, but showed the same effect of using a Sort plan even when enable_sort=false.) Obviously one can create a (ten,unique1) index (which would otherwise be almost completely redundant with the 3-column one), but that wastes space and eliminates some index-only-scan opportunities. Similarly one can hack the query, e.g. using WHERE (thousand+0) IN (...) but that is as ugly as all sin. I've experimented with the attached patch, which detects when this situation might have occurred and does another pass to try and build ordered scans without the SAOP condition. However, the results may not be quite ideal, because at least in some test queries (not all) the scan with the SAOP included in the indexquals is being costed higher than the same scan with the SAOP moved to a Filter, which seems unreasonable. (One of the affected queries is the regression test for the original bug, which this patch does not yet try and fix and therefore currently fails regression on.) Thoughts? -- Andrew (irc:RhodiumToad) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 42dcb11..6ae8e33 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -50,7 +50,8 @@ typedef enum { SAOP_PER_AM,/* Use ScalarArrayOpExpr if amsearcharray */ SAOP_ALLOW, /* Use ScalarArrayOpExpr for all indexes */ - SAOP_REQUIRE/* Require ScalarArrayOpExpr to be used */ + SAOP_REQUIRE,/* Require ScalarArrayOpExpr to be used */ + SAOP_SKIP_LOWER/* Require lower ScalarArrayOpExpr to be eliminated */ } SaOpControl; /* Whether we are looking for plain indexscan, bitmap scan, or either */ @@ -118,7 +119,8 @@ static void get_index_paths(PlannerInfo *root, RelOptInfo *rel, static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauses, bool useful_predicate, - SaOpControl saop_control, ScanTypeControl scantype); + SaOpControl saop_control, ScanTypeControl scantype, + bool *saop_retry); static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *other_clauses); static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel, @@ -734,6 +736,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, { List *indexpaths; ListCell *lc; + bool saop_retry = false; /* * Build simple index paths using the clauses. Allow ScalarArrayOpExpr @@ -742,7 +745,23 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, indexpaths = build_index_paths(root, rel, index, clauses, index-predOK, - SAOP_PER_AM, ST_ANYSCAN); + SAOP_PER_AM, ST_ANYSCAN, saop_retry); + + /* + * If we allowed any ScalarArrayOpExprs on an index with a useful sort + * ordering, then try again without them, since otherwise we miss important + * paths where the index
Re: [HACKERS] Allowing join removals for more join types
On 6 July 2014 03:20, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: Attached is a delta patch between version 1.2 and 1.3, and also a completely updated patch. Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. I think we need to verify that they are members of the same opclass, as relation_has_unique_index_for does. hmm, good point. If I understand this correctly we can just ensure that the same operator is used for both the grouping and the join condition. I've attached a small delta patch which fixes this, and also attached the full updated patch. Regards David Rowley subquery_leftjoin_removal_v1.4.patch Description: Binary data subquery_leftjoin_removal_v1.4_delta.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISTINCT with btree skip scan
On 5 July 2014 02:03, Vik Fearing vik.fear...@dalibo.com wrote: [1] http://wiki.postgresql.org/wiki/Loose_indexscan Thanks. It talks about DISTINCT, and also about using index when you don't have the leading column in your WHERE clause (as well as MySQL (loose), at least Oracle (skip), SQLite (skip), DB2 (jump) can do this). It looks like at least MySQL can also use loose index scans to implement GROUP BY in certain cases involving MIN or MAX aggregate functions (things like SELECT a, MIN(b) FROM foo GROUP BY a, given an index on (a, b)). But I'm only trying to implement the lowest hanging index skipping plan: plain old DISTINCT. I think I see roughly how to cost, plan and execute it... now to learn a lot more about PG internals... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
* Greg Stark (st...@mit.edu) wrote: On Thu, Jul 3, 2014 at 11:40 AM, Atri Sharma atri.j...@gmail.com wrote: IIRC, last time when we tried doing bloom filters, I was short of some real world useful hash functions that we could use for building the bloom filter. Last time was we wanted to use bloom filters in hash joins to filter out tuples that won't match any of the future hash batches to reduce the amount of tuples that need to be spilled to disk. However the problem was that it was unclear for a given amount of memory usage how to pick the right size bloom filter and how to model how much it would save versus how much it would cost in reduced hash table size. Right. There's only one hash function available, really, (we don't currently support multiple hash functions..), unless we want to try and re-hash the 32bit hash value that we get back (not against trying that, but it isn't what I'd start with) and it would hopefully be sufficient for this. I think it just required some good empirical tests and hash join heavy workloads to come up with some reasonable guesses. We don't need a perfect model just some reasonable bloom filter size that we're pretty sure will usually help more than it hurts. This would help out a lot of things, really.. Perhaps what Tomas is developing regarding test cases would help here also. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] postgresql.auto.conf and reload
On Sat, Jul 5, 2014 at 8:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: Please find the patch attached to address the above concern. I have updated docs, so that users can be aware of such behaviour. I'm in the camp that says that we need to do something about this, not just claim that it's operating as designed. AFAICS, the *entire* argument for having ALTER SYSTEM at all is ease of use; but this behavior is not what I would call facilitating ease of use. In particular, if you are conveniently able to edit postgresql.conf, what the heck do you need ALTER SYSTEM for? One possible answer is to modify guc-file.l so that only the last value supplied for any one GUC gets processed. Another could be that during initdb all the uncommented settings be written to postgresql.auto.conf file rather than to postgresql.conf. I think we can do this by changing code in initdb.c-setup_config(). This will ensure that unless user is changing settings in a mixed way (some by Alter System and some manually by editing postgresql.conf), there will no such problem. The naive way of doing that would be O(N^2) in the number of uncommented settings, which might or might not be a problem; if it is, we could no doubt devise a smarter data structure. Okay. To implement it, we can make sure during parsing Configfile that only unique element can be present in list. We can modify function ParseConfigFp() to achieve this functionality. Another way could be that after the list is formed, we can eliminate duplicate entries from it, we might need to do this at multiple places. Do you have anything else in mind? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] RLS Design
Kaigai, * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Can you clarify where this is coming from..? It sounds like you're referring to an existing implementation and, if so, it'd be good to get more information on how that works exactly. Oracle VPD - Multiple Policies for Each Table, View, or Synonym http://docs.oracle.com/cd/B19306_01/network.102/b14266/apdvpoli.htm#i1008351 It says - Note that all policies applied to a table are enforced with AND syntax. While I'm not against using this as an example to consider, it's much more complex than what we're talking about here- and it supports application contexts which allow groups of RLS rights to be applied or not applied; essentially it allows both AND and OR for sets of RLS policies, along with default policies which are applied no matter what. Not only Oracle VPD, it fits attitude of defense in depth. Please assume a system that installs network firewall, unix permissions and selinux. If somebody wants to reference an information asset within a file, he has to connect the server from the network address being allowed by the firewall configuration AND both of DAC and MAC has to allow his access. These are not independent systems and your argument would apply to our GRANT system also, which I hope it's agreed would make it far less useful. Note also that SELinux brings in another complexity- it needs to make system calls out to check the access. Usually, we have to pass all the access control to reference the target information, not one of the access control stuffs being installed. This is true in some cases, and not in others. Only one role you are a member of needs to have access to a relation, not all of them. There are other examples of 'OR'-style security policies, this is merely one. I'm simply not convinced that it applies in the specific case we're talking about. In the end, I expect that either way people will be upset because they won't be able to specify fully which should be AND vs. which should be OR with the kind of flexibility other systems provide. What I'm trying to get to is an initial implementation which is generally useful and is able to add such support later. This is similar to how roles work- your overall access includes all access granted to any roles you are a member of. You don't need SELECT rights granted to every role you are a member of to select from the table. Additionally, if an admin wants to AND the quals together then they can simply create a policy which does that rather than have 2 policies. It seems to me a pain on database administration, if we have to pay attention not to conflict each RLS-policy. This notion of 'conflict' doesn't make much sense to me. What is 'conflicting' here? Each policy would simply need to stand on its own for the role which it's being applied to. That's very simple and straight-forward. I expect 90% of RLS-policy will be configured to PUBLIC user, to apply everybody same rules on access. In this case, DBA has to ensure the target table has no policy or existing policy does not conflict with the new policy to be set. I don't think it is a good idea to enforce DBA these checks. If the DBA only uses PUBLIC then they have to ensure that each policy they set up for PUBLIC can stand on its own- though, really, I expect if they go that route they'd end up with just one policy that calls a stored procedure... You are suggesting instead that if application 2 sets up policies on the table and then application 1 adds another policy that it should reduce what application 2's users can see? That doesn't make any sense to me. I'd actually expect these applications to at least use different roles anyway, which means they could each have a single role specific policy which only returns what that application is allowed to see. I don't think this assumption is reasonable. Please expect two applications: app-X that is a database security product to apply access control based on remote ip-address of the client for any table accesses by any database roles. app-Y that is a usual enterprise package for daily business data, with RLS-policy. What is the expected behavior in this case? That the DBA manage the rights on the tables. I expect that will be required for quite a while with PG. It's nice to think of these application products that will manage all access for users by setting up their own policies, but we have yet to even discuss how they would have appropriate rights on the table to be able to do so (and to not interfere with each other..). Let's at least get something which is generally useful in. I'm all for trying to plan out how to get there and would welcome suggestions you have which are specific to PG on what we could do here (I'm not keen on just trying to mimic another product completely...), but at the level we're talking about (either AND them all or OR them all), I don't think we'd actually solve the