Re: [HACKERS] [WIP] collation support revisited (phase 1)
Alvaro Herrera napsal(a): Zdenek Kotala escribió: The example is when you have translation data (vocabulary) in database. But the reason is that ANSI specify (chapter 4.2) charset as a part of string descriptor. See below: — The length or maximum length in characters of the character string type. — The catalog name, schema name, and character set name of the character set of the character string type. — The catalog name, schema name, and collation name of the collation of the character string type. We already support multiple charsets, and are able to do conversions between them. The set of charsets is hardcoded and it's hard to make a case that a user needs to create new ones. I concur with Martijn's suggestion -- there's no need for this to appear in a system catalog. Perhaps it could be argued that we need to be able to specify the charset a given string is in -- currently all strings are in the server encoding (charset) which is fixed at initdb time. Making the system support multiple server encodings would be a major undertaking in itself and I'm not sure that there's a point. Background: We specify encoding in initdb phase. ANSI specify repertoire, charset, encoding and collation. If I understand it correctly, then charset is subset of repertoire and specify list of allowed characters for language-collation. Encoding is mapping of character set to binary format. For example for Czech alphabet(charset) we have 6 different encoding for 8bit ASCII, but on other side for UTF8 there is specified multi charsets. I think if we support UTF8 encoding, than it make sense to create own charsets, because system locales could have defined collation for that. We need conversion only in case when client encoding is not compatible with charset and conversion is not defined. Any comments? Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
On Sat, 2008-07-12 at 00:57 +0100, Dave Page wrote: On Sat, Jul 12, 2008 at 12:41 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2008-07-11 at 17:26 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: So it would seem that we need a way of handling temp tables that doesn't rely on catalog entries at all. That's a complete non-starter; I need go no farther than to point out that it would break clients that expect to see their temp tables reflected in pg_class and so forth. What does the SQL Standard say about the Information Schema I wonder/ Many apps were written long before we had one. Not too mention that it doesn't provide anything like all the info that PostgreSQL-specific tool (though not necessarily user apps) would likely need. So are you saying a) that other sessions need to be able to see pg_class entries for temporary tables created by different sessions? b) that you need to be able to see pg_class entries for temporary tables created only in your session? Hopefully you just mean (b)?? a) would simply not be possible at the same time as having us avoid pg_class writes in Hot Standby mode. We would have a choice of seeing temp tables, or allowing temp tables to work in Hot Standby, not both. b) would is possible, if we follow the route of taking a locally inherited copy of pg_class. The SQL Standard Information Schema does show LOCAL TEMPORARY and GLOBAL TEMPORARY tables. Our implementation of temp tables differs from the standard, so I think (b) is fully in line with that. If anybody did want (a), then I'd suggest that we use the keyword GLOBAL TEMPORARY table to denote something that would be put in pg_class and visible to all, and thus unusable in hot standby mode. I'm not planning on building that. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
On Sat, Jul 12, 2008 at 9:16 AM, Simon Riggs [EMAIL PROTECTED] wrote: So are you saying a) that other sessions need to be able to see pg_class entries for temporary tables created by different sessions? b) that you need to be able to see pg_class entries for temporary tables created only in your session? Hopefully you just mean (b)?? Not necessarily - it may be useful for an admin to peek at what tables are created by other sessions. I don't feel strongly about that though, moreso b). The SQL Standard Information Schema does show LOCAL TEMPORARY and GLOBAL TEMPORARY tables. Our implementation of temp tables differs from the standard, so I think (b) is fully in line with that. My point is that any good admin tool will use pg_class directly, as information_schema doesn't include any of the Postgres specifc info that such a tool would want. FYI, pgAdmin doesn't do anything with temp tables at the moment anyway - I'm thinking of other apps that may do. An inheritance schema for pg_class may well work for them. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Follow-up on replication hooks for PostgreSQL
[EMAIL PROTECTED] (Marko Kreen) writes: Also the design should be based on assumption that the target side is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE. We should not make scheme more complex to survive cases where target is not in sync. That way madness lies. The effect should be like same SQL statements are applied to target by hand, no more, no less. We have, already, in 8.4, a handling of triggers for TRUNCATE; the reason why support hasn't made it into Slony-I yet relates quite exactly to this... The trouble comes in if you do TRUNCATE CASCADE; I'm not quite sure how to collect together the multiple recordings of the trigger functions that would be collected as a result; for it all to work, safely, on the remote node, we'd need to apply all of those truncates at once. Note also that there is an issue with coordination of schemas; Slony-I shuts off the RI triggers on subscribers, so that the target is fairly certain to not be *entirely* in sync, by express intent. Those are legitimate differences between source and target. -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://cbbrowne.com/info/lsf.html Rules of the Evil Overlord #145. My dungeon cell decor will not feature exposed pipes. While they add to the gloomy atmosphere, they are good conductors of vibrations and a lot of prisoners know Morse code. http://www.eviloverlord.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending grant insert on tables to sequences
At 2008-07-11 11:57:37 -0500, [EMAIL PROTECTED] wrote: attached is a new version of the patch, it implements Alvaro's suggestion and fix a bug i found (it wasn't managing GRANT ALL) :( Looks good to me. About the SELECT issue, AFAIU Robert doesn't complaint he just asked what is the use case... if people think it should be removed ok, but OTOH: why? i don't think that affects anything... As I said, I don't feel too strongly about it. para ! Granting permission on a table automatically extend ! permissions to any sequences owned by the table, including ! sequences tied to typeSERIAL/ columns. /para Should be Granting permissions on a table automatically extends those permissions to + if ((istmt.objtype == ACL_OBJECT_RELATION) (istmt.all_privs || + (istmt.privileges (ACL_INSERT | ACL_UPDATE | ACL_SELECT + { The parentheses around the first comparison can go away, and also the ones around the ACL_* here: + if (istmt.privileges (ACL_INSERT)) + istmt_seq.privileges |= ACL_USAGE; + if (istmt.privileges (ACL_UPDATE)) + istmt_seq.privileges |= ACL_UPDATE; + if (istmt.privileges (ACL_SELECT)) + istmt_seq.privileges |= ACL_SELECT; -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix advises ...
At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote: There was some discussion about this change and in fact if you look at CVS HEAD you'll find it already applied. Not as far as I can see. Incrementing the most significant index keys would maximize the distance we're jumpin around in the index tree. I see. Thanks. The later versions of mine had a GUC named effective_spindle_count which I think is nicely abstracted away from the implementation details. Yes, that does sound much better. (The patch I read had a preread_pages_bitmapscan variable instead.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] collation support revisited (phase 1)
On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote: Background: We specify encoding in initdb phase. ANSI specify repertoire, charset, encoding and collation. If I understand it correctly, then charset is subset of repertoire and specify list of allowed characters for language-collation. Encoding is mapping of character set to binary format. For example for Czech alphabet(charset) we have 6 different encoding for 8bit ASCII, but on other side for UTF8 there is specified multi charsets. Oh, so you're thinking of a charset as a sort of check constraint. If your locale is turkish and you have a column marked charset ASCII then storing lower('HI') results in an error. A collation must be defined over all possible characters, it can't depend on the character set. That doesn't mean sorting in en_US must do something meaningful with japanese characters, it does mean it can't throw an error (the usual procedure is to sort on unicode point). I think if we support UTF8 encoding, than it make sense to create own charsets, because system locales could have defined collation for that. We need conversion only in case when client encoding is not compatible with charset and conversion is not defined. The problem is that locales in POSIX are defined on an encoding, not a charset. In locale en_US.UTF-8 doesn't actually sort any differently than en_US.latin1, it's just that japanese characters are not representable in the latter. locale-gen can create a locale for any pair of (locale code,encoding), whether the result is meaningful is another question. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] [WIP] collation support revisited (phase 1)
Zdenek Kotala [EMAIL PROTECTED] writes: I think if we support UTF8 encoding, than it make sense to create own charsets, because system locales could have defined collation for that. Say what? I cannot imagine a scenario in which a user-defined encoding would be useful. The amount of infrastructure you need for a new encoding is so large that providing management commands is just silly --- anyone who can create the infrastructure can do the last little bit for themselves. The analogy to index access methods is on point, again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix advises ...
Abhijit Menon-Sen [EMAIL PROTECTED] writes: At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote: There was some discussion about this change and in fact if you look at CVS HEAD you'll find it already applied. Not as far as I can see. The place where it matters is in ExecIndexAdvanceArrayKeys(), not initial setup. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
On Sat, 2008-07-12 at 09:56 +0100, Dave Page wrote: An inheritance schema for pg_class may well work for them. OK, proposal is something like this: We must avoid writes to many system tables to allow temp tables to function. Priority 1 pg_class - base definition of table pg_attribute - columns pg_attrdef - defaults pg_statistic - ability to store ANALYZE and VACUUM results pg_inherits Priority 2 - would we need any of these? pg_depend pg_rules pg_rewrite pg_triggers pg_indexes Each of the above tables would have a matching temp_* version. We wouldn't want to create a full temp schema every time we connect, we would only do that when a temp table was created. When first temp table requested we create temp schema, using reserved oids established at bootstrap time. We'll need a temp-bootstrap process for the creation of temp_pg_class and temp_pg_attribute, then we can create the other catalog tables more normally. We hack find_inheritance_children() so it returns the oid of the inherited temp catalog table for appropriate catalog tables. That way we don't need to write to pg_inherits in order to have the catalog tables recognise they have inheritance children. So all direct accesses to catalog tables would result in temp tables showing up in the result set, but only within the same session. In Hot Standby mode we would ignore the inheritance hint and *always* search pg_inherits every time we access a table. Would also need to go through all catalog code so that it accessed the correct catalog table depending upon whether its permanent or temp. Seems like it would be a fairly big patch. The temp-bootstrap process is still just hand-waving though. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
Simon Riggs [EMAIL PROTECTED] writes: Seems like it would be a fairly big patch. The temp-bootstrap process is still just hand-waving though. Yeah, and it seems fairly messy. The idea I'd had was that all the catalog entries for (a single set of) inheritance child tables are Just There in the output of initdb. The tricky part is that each session that makes use of these tables needs to have its own copy of their contents. (I note that this is real close to the SQL spec's notion of how a temp table works --- maybe we could usefully combine this with a patch to provide spec-compliant temp tables?) I think that could be solved with some magic that made their pg_class entries reflect a per-session relfilenode value. This seems no worse than your proposed magic in pg_inherit, and it eliminates the problem of needing to bootstrap all the temp-file catalog infrastructure in every session. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
On Sat, 2008-07-12 at 12:04 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Seems like it would be a fairly big patch. The temp-bootstrap process is still just hand-waving though. Yeah, and it seems fairly messy. The idea I'd had was that all the catalog entries for (a single set of) inheritance child tables are Just There in the output of initdb. Yeh, not having a temp-boostrap process at all is best. The tricky part is that each session that makes use of these tables needs to have its own copy of their contents. I think that could be solved with some magic that made their pg_class entries reflect a per-session relfilenode value. This seems no worse than your proposed magic in pg_inherit, and it eliminates the problem of needing to bootstrap all the temp-file catalog infrastructure in every session. Agreed. Perhaps the magic would be to create sub-directories in the pg_temp namespace based upon backend pid. That way all relfilenode values would be the same, but would refer to different objects. Side thought... We can't generate Oids in Hot Standby mode, since they'll end up duplicating values from the primary. We probably need to reserve the top 16384 Oid values for use as temp object Oids. (I note that this is real close to the SQL spec's notion of how a temp table works --- maybe we could usefully combine this with a patch to provide spec-compliant temp tables?) Yeh, I read that and thought something similar. But we're talking about temp additions to catalog tables, not all temp tables. If we tried to implement spec-compliant temp tables we would need to write to catalog tables again, which is what we are trying to avoid! -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuuming leaked temp tables (once again)
Simon Riggs [EMAIL PROTECTED] writes: Yeh, I read that and thought something similar. But we're talking about temp additions to catalog tables, not all temp tables. If we tried to implement spec-compliant temp tables we would need to write to catalog tables again, which is what we are trying to avoid! No, because a spec-compliant temp table is a persistent object and *should* be reflected in the permanent catalogs. What you meant to say is that hot-standby sessions would only be able to use our traditional type of temp tables. [ thinks for a bit ... ] actually, maybe a hot standby session could be allowed to use a *pre-existing* spec-compliant temp table. It couldn't make a new one though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v3
BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out of place. * Lose the GRANT EXECUTEs on the I/O functions; they're redundant. (If you needed them, you'd need them on a lot more than these two.) I'd be inclined to lose the COMMENTs on the functions too; again these are about the least useful ones to comment on out of the whole module. * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's. * The type declaration needs to say storage = extended, else the type won't be toastable. * The cast from bpchar to citext cannot be WITHOUT FUNCTION; it needs to invoke rtrim1. Compare the bpchar to text cast. * = is surely not its own commutator. You might try running the opr_sanity regression test on this module to see if it finds any other silliness. (Procedure: insert the citext definition script into the serial_schedule list just ahead of opr_sanity, run tests, make sure you understand the reason for any diffs in the opr_sanity result. There will be at least one from the uses of text-related functions for citext.) * I think you can and should drop all of the size functions and a lot of the miscellaneous functions: anyplace where the implicit coercion to text would serve, you don't need a piggyback function, and introducing one just creates risks of can't-resolve-ambiguous-function failures. The overloaded miscellaneous functions are only justifiable to the extent that it's important to preserve citext-ness of the result of a function, which seems at best dubious for many of these. It is likewise pretty pointless AFAICS to introduce regex functions taking citext pattern arguments. * Don't use the OPERATOR() notation when you don't need to. It's just clutter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending grant insert on tables to sequences
On Sat, Jul 12, 2008 at 6:30 AM, Abhijit Menon-Sen [EMAIL PROTECTED] wrote: para ! Granting permission on a table automatically extend ! permissions to any sequences owned by the table, including ! sequences tied to typeSERIAL/ columns. /para Should be Granting permissions on a table automatically extends those permissions to what about extends them to... + if ((istmt.objtype == ACL_OBJECT_RELATION) (istmt.all_privs || + (istmt.privileges (ACL_INSERT | ACL_UPDATE | ACL_SELECT + { The parentheses around the first comparison can go away, and also the ones around the ACL_* here: ok -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Guayaquil - Ecuador Cel. (593) 87171157 Index: doc/src/sgml/ref/grant.sgml === RCS file: /home/postgres/cvshome/pgsql/doc/src/sgml/ref/grant.sgml,v retrieving revision 1.70 diff -c -r1.70 grant.sgml *** doc/src/sgml/ref/grant.sgml 3 Jul 2008 15:59:55 - 1.70 --- doc/src/sgml/ref/grant.sgml 12 Jul 2008 19:22:01 - *** *** 401,410 /para para ! Granting permission on a table does not automatically extend ! permissions to any sequences used by the table, including ! sequences tied to typeSERIAL/ columns. Permissions on ! sequence must be set separately. /para para --- 401,409 /para para ! Granting permissions on a table automatically extend ! them to any sequences owned by the table, including ! sequences tied to typeSERIAL/ columns. /para para Index: src/backend/catalog/aclchk.c === RCS file: /home/postgres/cvshome/pgsql/src/backend/catalog/aclchk.c,v retrieving revision 1.147 diff -c -r1.147 aclchk.c *** src/backend/catalog/aclchk.c 19 Jun 2008 00:46:03 - 1.147 --- src/backend/catalog/aclchk.c 12 Jul 2008 18:39:40 - *** *** 361,366 --- 361,406 } ExecGrantStmt_oids(istmt); + + /* + * If the objtype is a relation and the privileges includes INSERT, UPDATE + * or SELECT then extends the GRANT/REVOKE to the sequences owned by the + * relation + */ + if (istmt.objtype == ACL_OBJECT_RELATION (istmt.all_privs || + (istmt.privileges (ACL_INSERT | ACL_UPDATE | ACL_SELECT + { + InternalGrant istmt_seq; + + istmt_seq.is_grant = istmt.is_grant; + istmt_seq.objtype = ACL_OBJECT_SEQUENCE; + istmt_seq.grantees = istmt.grantees; + istmt_seq.grant_option = istmt.grant_option; + istmt_seq.behavior = istmt.behavior; + + istmt_seq.all_privs = false; + istmt_seq.privileges = ACL_NO_RIGHTS; + + if (istmt.all_privs) + istmt_seq.all_privs = true; + else + { + if (istmt.privileges ACL_INSERT) + istmt_seq.privileges |= ACL_USAGE; + if (istmt.privileges ACL_UPDATE) + istmt_seq.privileges |= ACL_UPDATE; + if (istmt.privileges ACL_SELECT) + istmt_seq.privileges |= ACL_SELECT; + } + + istmt_seq.objects = NIL; + foreach(cell, istmt.objects) + istmt_seq.objects = list_concat(istmt_seq.objects, + getOwnedSequences(lfirst_oid(cell))); + + if (istmt_seq.objects != NIL) + ExecGrantStmt_oids(istmt_seq); + } } /* Index: src/test/regress/expected/dependency.out === RCS file: /home/postgres/cvshome/pgsql/src/test/regress/expected/dependency.out,v retrieving revision 1.7 diff -c -r1.7 dependency.out *** src/test/regress/expected/dependency.out 3 Jul 2008 15:59:55 - 1.7 --- src/test/regress/expected/dependency.out 11 Jul 2008 16:53:14 - *** *** 13,22 -- can't drop neither because they have privileges somewhere DROP USER regression_user; ERROR: role regression_user cannot be dropped because some objects depend on it ! DETAIL: access to table deptest DROP GROUP regression_group; ERROR: role regression_group cannot be dropped because some objects depend on it ! DETAIL: access to table deptest -- if we revoke the privileges we can drop the group REVOKE SELECT ON deptest FROM GROUP regression_group; DROP GROUP regression_group; --- 13,24 -- can't drop neither because they have privileges somewhere DROP USER regression_user; ERROR: role regression_user cannot be dropped because some objects depend on it ! DETAIL: access to sequence deptest_f1_seq ! access to table deptest DROP GROUP regression_group; ERROR: role regression_group cannot be dropped because some objects depend on it ! DETAIL: access to sequence deptest_f1_seq ! access to table deptest -- if we revoke the privileges we can drop the group REVOKE SELECT ON deptest FROM GROUP regression_group; DROP GROUP regression_group; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Extending grant insert on tables to sequences
At 2008-07-12 14:32:03 -0500, [EMAIL PROTECTED] wrote: Should be Granting permissions on a table automatically extends those permissions to what about extends them to... Yes, sounds fine, thanks. But I notice that nobody else has commented on whether they want this feature or not. Does anyone particularly dislike the idea? -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending grant insert on tables to sequences
Jaime Casanova [EMAIL PROTECTED] writes: + if (istmt.all_privs) + istmt_seq.all_privs = true; + else + { + if (istmt.privileges ACL_INSERT) + istmt_seq.privileges |= ACL_USAGE; + if (istmt.privileges ACL_UPDATE) + istmt_seq.privileges |= ACL_UPDATE; + if (istmt.privileges ACL_SELECT) + istmt_seq.privileges |= ACL_SELECT; + } This definition of the derived rights seems pretty arbitrary and unprincipled. If you can't explain it precisely in a few words in the documentation (and I notice you didn't even attempt to explain it at all), then you probably need to think harder. In particular, I don't see why having UPDATE on the parent table should grant the right to use setval() on the sequence. If you had INSERT and DELETE as well, implying the right to make arbitrary changes in the parent table, then maybe setval() would be sensible to allow --- but this code can't really tell that, since it doesn't have a global view of the privileges previously granted. What I think makes sense is just to have parent INSERT privilege lead to USAGE on the sequence (thus granting nextval/currval rights, which are what you'd typically need in association with trying to do inserts). I don't see any need to automatically grant more than that. Selects and updates on the parent table don't need to touch the sequence, so why are those privileges granting anything? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending grant insert on tables to sequences
Abhijit Menon-Sen [EMAIL PROTECTED] writes: But I notice that nobody else has commented on whether they want this feature or not. Does anyone particularly dislike the idea? I think it's probably reasonable as long as we keep the implicitly granted rights as narrow as possible. INSERT on the parent table would normally be hard to use correctly if you can't nextval() the sequence, so automatically allowing nextval() seems pretty reasonable. I think the case for granting anything more than that is weak --- even without considering backwards-compatibility arguments. A fairly important practical problem is whether this will keep pg_dump from correctly reproducing the state of a database. Assume that someone did revoke the implicitly-granted rights on the sequence --- would a dump and reload correctly preserve that state? It'd depend on the order in which pg_dump issued the GRANTs, and I'm not at all sure pg_dump could be relied on to get that right. (Even if we fixed it to account for the issue today, what of older dump scripts?) Another issue is the interaction with the planned column-level GRANT feature. AFAICS, the obvious-sounding rule that usage of the sequence should be granted consequent to granting INSERT on the owning column would be exactly backwards. It's when you have *not* got INSERT on that column that you *must* rely on the default for it, and hence you'd better have the ability to do nextval() or your alleged insert privileges on other columns are worthless. So it seems that sequence usage should be granted if any column INSERT is granted, and revoked only when all column INSERT privileges are revoked --- and that latter rule is going to be hard to implement with this type of patch, because it doesn't know what column privileges are going to remain. I thought for a bit about abandoning the proposed implementation and instead having nextval/currval check at runtime: IOW, if the check for ACL_USAGE on the sequence fails, look to see if the sequence is owned and if so look to see if the user has ACL_INSERT on the parent table. (This seems a bit slow but maybe it wouldn't be a problem, or maybe we could arrange to cache the lookup results.) This would avoid the action at a distance behavior in GRANT and thereby cure both of the problems mentioned above. However, it would mean that it'd be impossible to grant INSERT without effectively granting sequence USAGE --- revoking USAGE on the sequence wouldn't stop anything. Plus, \z on the sequence would fail to tell you about those implicitly held rights. So I'm not sure I like this way any better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
OK, here's the (hopefully final) version of the typanalyze function for tsvectors. It applies to HEAD and passes regression tests. I now plan to move towards a selectivity function that'll use the gathered statistics. Thanks to everyone for their reviews and comments, Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin gsoc08-tss-04-lc.diff.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v3
On Jul 12, 2008, at 12:17, Tom Lane wrote: BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: Thanks a million for these, Tom. I greatly appreciate it. * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out of place. I've added SQL comments. Were you talking about a COMMENT? * Lose the GRANT EXECUTEs on the I/O functions; they're redundant. (If you needed them, you'd need them on a lot more than these two.) I'd be inclined to lose the COMMENTs on the functions too; again these are about the least useful ones to comment on out of the whole module. I wondered about that; those were copied from CITEXT 1. I've removed all GRANTs and COMMENTs. * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's. I'm confused. Is that not what the citextin and citextout functions are? * The type declaration needs to say storage = extended, else the type won't be toastable. Ah, good, thanks. * The cast from bpchar to citext cannot be WITHOUT FUNCTION; it needs to invoke rtrim1. Compare the bpchar to text cast. Where do I find that? I have trouble finding the SQL that creates the core types. :-( * = is surely not its own commutator. Changed to =. You might try running the opr_sanity regression test on this module to see if it finds any other silliness. (Procedure: insert the citext definition script into the serial_schedule list just ahead of opr_sanity, run tests, make sure you understand the reason for any diffs in the opr_sanity result. There will be at least one from the uses of text-related functions for citext.) Thanks. Added to my list. * I think you can and should drop all of the size functions and a lot of the miscellaneous functions: anyplace where the implicit coercion to text would serve, you don't need a piggyback function, and introducing one just creates risks of can't-resolve-ambiguous-function failures. The overloaded miscellaneous functions are only justifiable to the extent that it's important to preserve citext-ness of the result of a function, which seems at best dubious for many of these. It is likewise pretty pointless AFAICS to introduce regex functions taking citext pattern arguments. I added most of those as I wrote tests and they failed to find the functions. Once I added the functions, they worked. But I'll do an audit to make sure that I didn't inadvertantly leave in any unneeded ones (I'm happy to have less code :-)). * Don't use the OPERATOR() notation when you don't need to. It's just clutter. Sorry, don't know what you're referring to here. CREATE OPERATOR appears to require parens… Thanks for the great feedback! I'll work on getting things all straightened out and a new patch in tonight. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v3
There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of not. I have these gripes: 1. The style is gratuitously different from every other regression test in the system. This is not a good thing. If it were an amazingly better way to do things, then maybe, but as far as I can tell the style pgTAP forces on you is really pretty darn poorly suited for SQL tests. You have to contort what could naturally be expressed in SQL as a table result into a scalar. Plus it's redundant with the expected-output file. 2. It's ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day --- and I do not wish to discourage any developers who don't work that way from learning better habits ;-) Because of #1 and #2 I find the use of pgTAP to be a nonstarter. I have a couple of gripes about the substance of the tests as well: 3. What you are mostly testing is not the behavior of citext itself, but the behavior of the underlying strcoll function. This is pretty much doomed to failure, first because the regression tests are intended to work in multiple locales (and we are *not* giving that up for citext), and second because the behavior of strcoll isn't all that portable across platforms even given allegedly similar locale settings (as we already found out yesterday). Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. The odds of ever finding anything with those tests are not distinguishable from zero (unless the underlying text function is busted, which is not your responsibility to test). So I don't see any point in putting them into the standard regression package. (What maybe *would* be useful to test, but you didn't, is whether the result of a function is considered citext rather than text.) I suggest something more like the attached as a suitable regression test. I got bored about halfway through and started to skim, so there might be a few tests toward the end of the original set that would be worth transposing into this one, but nothing jumped out at me. regards, tom lane -- -- Test citext datatype -- -- -- first, define the datatype. Turn off echoing so that expected file -- does not depend on contents of citext.sql. -- SET client_min_messages = warning; \set ECHO none \i citext.sql \set ECHO all RESET client_min_messages; -- Test the operators and indexing functions -- Test = and . SELECT 'a'::citext = 'a'::citext as t; SELECT 'a'::citext = 'A'::citext as t; SELECT 'a'::citext = 'A'::text as f;-- text wins the discussion SELECT 'a'::citext = 'b'::citext as f; SELECT 'a'::citext = 'ab'::citext as f; SELECT 'a'::citext 'ab'::citext as t; -- Test and = SELECT 'B'::citext 'a'::citext as t; SELECT 'b'::citext 'A'::citext as t; SELECT 'B'::citext 'b'::citext as f; SELECT 'B'::citext = 'b'::citext as t; -- Test and = SELECT 'a'::citext 'B'::citext as t; SELECT 'a'::citext = 'B'::citext as t; -- Test implicit casting. citext casts to text, but not vice-versa. SELECT 'a'::citext = 'a'::text as t; SELECT 'A'::text 'a'::citext as t; SELECT 'aardvark'::citext = 'aardvark'::citext as t; SELECT 'aardvark'::citext = 'aardVark'::citext as t; -- Check the citext_cmp() function explicitly. SELECT citext_cmp('aardvark'::citext, 'aardvark'::citext) as zero; SELECT citext_cmp('aardvark'::citext, 'aardVark'::citext) as zero; SELECT citext_cmp('B'::citext, 'a'::citext) as one; -- Do some tests using a table and index. CREATE TEMP TABLE try ( name citext PRIMARY KEY ); INSERT INTO try (name) VALUES ('a'), ('ab'), ('aba'), ('b'), ('ba'), ('bab'), ('AZ'); SELECT name, 'a' = name from try; SELECT name, 'a' = name from try where name = 'a'; SELECT name, 'A' = name from try; SELECT name, 'A' = name from try where name = 'A'; SELECT name, 'A' = name from try where name = 'A'; -- expected failures on duplicate key INSERT INTO try (name) VALUES ('a'); INSERT INTO try (name) VALUES ('A'); INSERT INTO try (name) VALUES ('aB'); -- Test aggregate functions and sort ordering CREATE TEMP TABLE srt ( name CITEXT ); INSERT INTO srt (name) VALUES ('aardvark'), ('AAA'), ('aba'), ('ABC'), ('abd'); -- Check the min() and max() aggregates, with and without index. set enable_seqscan = off; SELECT MIN(name) from srt; SELECT MAX(name) from srt; reset enable_seqscan; set enable_indexscan = off; SELECT MIN(name) from srt; SELECT MAX(name) from srt; reset enable_indexscan; -- Check sorting likewise set enable_seqscan = off; SELECT name FROM srt ORDER BY
Re: [HACKERS] PATCH: CITEXT 2.0 v3
On Jul 12, 2008, at 14:57, Tom Lane wrote: There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of not. I have these gripes: You're nothing if not thorough, Tom. 1. The style is gratuitously different from every other regression test in the system. This is not a good thing. If it were an amazingly better way to do things, then maybe, but as far as I can tell the style pgTAP forces on you is really pretty darn poorly suited for SQL tests. You have to contort what could naturally be expressed in SQL as a table result into a scalar. Plus it's redundant with the expected-output file. Sure. I wasn't aware of the existing regression test methodology when I wrote pgTAP and those tests. I'm fine to change them and use pgTAP for testing my app code, rather than PostgreSQL itself. 2. It's ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day --- and I do not wish to discourage any developers who don't work that way from learning better habits ;-) Hrm. I'm wonder why it's so slow? The test functions don't really do a lot. Anyway, I agree that they should perform well. Because of #1 and #2 I find the use of pgTAP to be a nonstarter. I have a couple of gripes about the substance of the tests as well: 3. What you are mostly testing is not the behavior of citext itself, but the behavior of the underlying strcoll function. This is pretty much doomed to failure, first because the regression tests are intended to work in multiple locales (and we are *not* giving that up for citext), and second because the behavior of strcoll isn't all that portable across platforms even given allegedly similar locale settings (as we already found out yesterday). Yes, I *just* ran the tests on Ubuntu and opened my mail to ask about the bizarre differences when I saw this message. Thanks for answering my question before I asked it. :-) Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? Are test database clusters all built using SQL_ASCII and the C locale? 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. The odds of ever finding anything with those tests are not distinguishable from zero (unless the underlying text function is busted, which is not your responsibility to test). So I don't see any point in putting them into the standard regression package. (What maybe *would* be useful to test, but you didn't, is whether the result of a function is considered citext rather than text.) Well, I was doing test-driven development: I wrote the tests to ensure that I had added the functions for CITEXT properly, and when they passed, I could move on. As a unit tester, it'd feel weird for me to drop these. I'm not testing the underlying functions; I'm making sure that they work properly with CITEXT. I suggest something more like the attached as a suitable regression test. I got bored about halfway through and started to skim, so there might be a few tests toward the end of the original set that would be worth transposing into this one, but nothing jumped out at me. Thanks! I'll work this in. Best, David PS: I confirmed late yesterday that the memory leak I saw was with my version for 8.3, where I had my own str_tolower(). It clearly has a leak that I'll have to fix, but it has no bearing on the contribution to 8.4, which has no such leak. Thanks for running the test yourself to confirm. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v3
On Jul 12, 2008, at 15:13, David E. Wheeler wrote: Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? Are test database clusters all built using SQL_ASCII and the C locale? I just tried to take your modified tests and add multibyte tests that run only on OS X with en_US.UTF-8. They worked like this: CREATE OR REPLACE FUNCTION test_multibyte () RETURNS BOOLEAN AS $$ SELECT version() ~ 'apple-darwin' AND (select setting from pg_settings where name = 'lc_collate') = 'en_US.UTF-8'; $$ LANGUAGE SQL IMMUTABLE; SELECT 'À'::citext = 'À'::citext WHERE test_multibyte() = true; SELECT 'À'::citext = 'à'::citext WHERE test_multibyte() = true; But then I realized that this would change the expected output depending on the platform, and thus make the tests fail. This is one reason why the inflexibility of the existing regression test model is a drag: it limits you to testing only what works everywhere! Grrr. I'll remove all the multibyte character tests, but I have to say that, as a result, the CITEXT 1 module would likely pass all such tests, but it still isn't locale-aware. How can one add regressions to ensure that something truly is locale-aware? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v3
On Jul 12, 2008, at 14:50, David E. Wheeler wrote: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out of place. I've added SQL comments. Were you talking about a COMMENT? * Lose the GRANT EXECUTEs on the I/O functions; they're redundant. (If you needed them, you'd need them on a lot more than these two.) I'd be inclined to lose the COMMENTs on the functions too; again these are about the least useful ones to comment on out of the whole module. I wondered about that; those were copied from CITEXT 1. I've removed all GRANTs and COMMENTs. * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's. I'm confused. Is that not what the citextin and citextout functions are? * The type declaration needs to say storage = extended, else the type won't be toastable. Ah, good, thanks. * The cast from bpchar to citext cannot be WITHOUT FUNCTION; it needs to invoke rtrim1. Compare the bpchar to text cast. Where do I find that? I have trouble finding the SQL that creates the core types. :-( Duh, you just told me. Added, thanks. * = is surely not its own commutator. Changed to =. You might try running the opr_sanity regression test on this module to see if it finds any other silliness. (Procedure: insert the citext definition script into the serial_schedule list just ahead of opr_sanity, run tests, make sure you understand the reason for any diffs in the opr_sanity result. There will be at least one from the uses of text-related functions for citext.) Thanks. Added to my list. * I think you can and should drop all of the size functions and a lot of the miscellaneous functions: anyplace where the implicit coercion to text would serve, you don't need a piggyback function, and introducing one just creates risks of can't-resolve-ambiguous-function failures. The overloaded miscellaneous functions are only justifiable to the extent that it's important to preserve citext-ness of the result of a function, which seems at best dubious for many of these. It is likewise pretty pointless AFAICS to introduce regex functions taking citext pattern arguments. I added most of those as I wrote tests and they failed to find the functions. Once I added the functions, they worked. But I'll do an audit to make sure that I didn't inadvertantly leave in any unneeded ones (I'm happy to have less code :-)). Of the size functions, I was able to remove only this one and keep all of my pgTAP tests passing: CREATE FUNCTION textlen(citext) RETURNS int4 AS 'textlen' LANGUAGE 'internal' IMMUTABLE STRICT; When I deleted any of the others, I got errors like this: psql:sql/citext.sql:865: ERROR: function length(citext) is not unique LINE 1: SELECT is( length( name ), length(name::text), 'length(' ||... ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. I think you can see now why I wrote the tests: I wanted to ensure that CITEXT really does work (almost) just like TEXT. I was able to eliminate *all* of the miscellaneous functions, but the upper() and lower() functions now return TEXT instead of CITEXT, which I don't think is exactly what we want, is it? For now, I'e left upper() and lower() in. It just seems like more expected functionality. * Don't use the OPERATOR() notation when you don't need to. It's just clutter. Sorry, don't know what you're referring to here. CREATE OPERATOR appears to require parens… Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers