Re: [HACKERS] [WIP] collation support revisited (phase 1)

2008-07-12 Thread Zdenek Kotala

Alvaro Herrera napsal(a):

Zdenek Kotala escribió:

The example is when you have translation data (vocabulary) in database. 
But the  reason is that ANSI specify (chapter 4.2) charset as a part of 
string descriptor. See below:


— The length or maximum length in characters of the character string type.
— The catalog name, schema name, and character set name of the character 
set of the character string type.
— The catalog name, schema name, and collation name of the collation of 
the character string type.


We already support multiple charsets, and are able to do conversions
between them.  The set of charsets is hardcoded and it's hard to make a
case that a user needs to create new ones.  I concur with Martijn's
suggestion -- there's no need for this to appear in a system catalog.

Perhaps it could be argued that we need to be able to specify the
charset a given string is in -- currently all strings are in the server
encoding (charset) which is fixed at initdb time.  Making the system
support multiple server encodings would be a major undertaking in itself
and I'm not sure that there's a point.



Background:
We specify encoding in initdb phase. ANSI specify repertoire, charset, encoding 
and collation. If I understand it correctly, then charset is subset of 
repertoire and specify list of allowed characters for language-collation. 
Encoding is mapping of character set to binary format. For example for Czech 
alphabet(charset) we have 6 different encoding for 8bit ASCII, but on other side 
for UTF8 there is specified multi charsets.



I think if we support UTF8 encoding, than it make sense to create own charsets, 
because system locales could have defined collation for that. We need conversion 
only in case when client encoding is not compatible with charset and conversion 
is not defined.


Any comments?

Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Simon Riggs

On Sat, 2008-07-12 at 00:57 +0100, Dave Page wrote:
 On Sat, Jul 12, 2008 at 12:41 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  On Fri, 2008-07-11 at 17:26 -0400, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   So it would seem that we need a way of handling temp tables that doesn't
   rely on catalog entries at all.
 
  That's a complete non-starter; I need go no farther than to point out
  that it would break clients that expect to see their temp tables
  reflected in pg_class and so forth.
 
  What does the SQL Standard say about the Information Schema I wonder/
 
 Many apps were written long before we had one. Not too mention that it
 doesn't provide anything like all the info that PostgreSQL-specific
 tool (though not necessarily user apps) would likely need.

So are you saying

a) that other sessions need to be able to see pg_class entries for
temporary tables created by different sessions?

b) that you need to be able to see pg_class entries for temporary tables
created only in your session?

Hopefully you just mean (b)??

a) would simply not be possible at the same time as having us avoid
pg_class writes in Hot Standby mode. We would have a choice of seeing
temp tables, or allowing temp tables to work in Hot Standby, not both.

b) would is possible, if we follow the route of taking a locally
inherited copy of pg_class.

The SQL Standard Information Schema does show LOCAL TEMPORARY and GLOBAL
TEMPORARY tables. Our implementation of temp tables differs from the
standard, so I think (b) is fully in line with that.

If anybody did want (a), then I'd suggest that we use the keyword GLOBAL
TEMPORARY table to denote something that would be put in pg_class and
visible to all, and thus unusable in hot standby mode. I'm not planning
on building that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Dave Page
On Sat, Jul 12, 2008 at 9:16 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 So are you saying

 a) that other sessions need to be able to see pg_class entries for
 temporary tables created by different sessions?

 b) that you need to be able to see pg_class entries for temporary tables
 created only in your session?

 Hopefully you just mean (b)??

Not necessarily - it may be useful for an admin to peek at what tables
are created by other sessions. I don't feel strongly about that
though, moreso b).

 The SQL Standard Information Schema does show LOCAL TEMPORARY and GLOBAL
 TEMPORARY tables. Our implementation of temp tables differs from the
 standard, so I think (b) is fully in line with that.

My point is that any good admin tool will use pg_class directly, as
information_schema doesn't include any of the Postgres specifc info
that such a tool would want.

FYI, pgAdmin doesn't do anything with temp tables at the moment anyway
- I'm thinking of other apps that may do. An inheritance schema for
pg_class may well work for them.


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Follow-up on replication hooks for PostgreSQL

2008-07-12 Thread chris
[EMAIL PROTECTED] (Marko Kreen) writes:
 Also the design should be based on assumption that the target side
 is exactly in sync.  Eg. DROP CASCADE should be replicated as DROP CASCADE.
 We should not make scheme more complex to survive cases where target
 is not in sync.  That way madness lies.  The effect should be like
 same SQL statements are applied to target by hand, no more, no less.

We have, already, in 8.4, a handling of triggers for TRUNCATE; the
reason why support hasn't made it into Slony-I yet relates quite
exactly to this...

The trouble comes in if you do TRUNCATE CASCADE; I'm not quite sure
how to collect together the multiple recordings of the trigger
functions that would be collected as a result; for it all to work,
safely, on the remote node, we'd need to apply all of those truncates
at once.

Note also that there is an issue with coordination of schemas; Slony-I
shuts off the RI triggers on subscribers, so that the target is fairly
certain to not be *entirely* in sync, by express intent.

Those are legitimate differences between source and target.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules  of the  Evil Overlord  #145. My  dungeon cell  decor  will not
feature exposed pipes.  While they add to the  gloomy atmosphere, they
are good  conductors of vibrations and  a lot of  prisoners know Morse
code. http://www.eviloverlord.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending grant insert on tables to sequences

2008-07-12 Thread Abhijit Menon-Sen
At 2008-07-11 11:57:37 -0500, [EMAIL PROTECTED] wrote:

 attached is a new version of the patch, it implements Alvaro's
 suggestion and fix a bug i found (it wasn't managing GRANT ALL) :(

Looks good to me.

 About the SELECT issue, AFAIU Robert doesn't complaint he just asked
 what is the use case... if people think it should be removed ok, but
 OTOH: why? i don't think that affects anything...

As I said, I don't feel too strongly about it.

  para
 ! Granting permission on a table automatically extend 
 ! permissions to any sequences owned by the table, including 
 ! sequences tied to typeSERIAL/ columns.
  /para

Should be Granting permissions on a table automatically extends those
permissions to

 + if ((istmt.objtype == ACL_OBJECT_RELATION)  (istmt.all_privs ||  
 + (istmt.privileges  (ACL_INSERT | ACL_UPDATE | ACL_SELECT 
 + {

The parentheses around the first comparison can go away, and also the
ones around the ACL_* here:

 + if (istmt.privileges  (ACL_INSERT)) 
 + istmt_seq.privileges |= ACL_USAGE;
 + if (istmt.privileges  (ACL_UPDATE)) 
 + istmt_seq.privileges |= ACL_UPDATE;
 + if (istmt.privileges  (ACL_SELECT)) 
 + istmt_seq.privileges |= ACL_SELECT;

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix advises ...

2008-07-12 Thread Abhijit Menon-Sen
At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote:

 There was some discussion about this change and in fact if you
 look at CVS HEAD you'll find it already applied.

Not as far as I can see.

 Incrementing the most significant index keys would maximize the
 distance we're jumpin around in the index tree.

I see. Thanks.

 The later versions of mine had a GUC named effective_spindle_count
 which I think is nicely abstracted away from the implementation
 details.

Yes, that does sound much better. (The patch I read had a
preread_pages_bitmapscan variable instead.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] collation support revisited (phase 1)

2008-07-12 Thread Martijn van Oosterhout
On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote:
 Background:
 We specify encoding in initdb phase. ANSI specify repertoire, charset, 
 encoding and collation. If I understand it correctly, then charset is 
 subset of repertoire and specify list of allowed characters for 
 language-collation. Encoding is mapping of character set to binary format. 
 For example for Czech alphabet(charset) we have 6 different encoding for 
 8bit ASCII, but on other side for UTF8 there is specified multi charsets.

Oh, so you're thinking of a charset as a sort of check constraint. If
your locale is turkish and you have a column marked charset ASCII then
storing lower('HI') results in an error.

A collation must be defined over all possible characters, it can't
depend on the character set. That doesn't mean sorting in en_US must do
something meaningful with japanese characters, it does mean it can't
throw an error (the usual procedure is to sort on unicode point).

 I think if we support UTF8 encoding, than it make sense to create own 
 charsets, because system locales could have defined collation for that. We 
 need conversion only in case when client encoding is not compatible with 
 charset and conversion is not defined.

The problem is that locales in POSIX are defined on an encoding, not a
charset. In locale en_US.UTF-8 doesn't actually sort any differently
than en_US.latin1, it's just that japanese characters are not
representable in the latter.

locale-gen can create a locale for any pair of (locale code,encoding),
whether the result is meaningful is another question.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] collation support revisited (phase 1)

2008-07-12 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 I think if we support UTF8 encoding, than it make sense to create own 
 charsets, 
 because system locales could have defined collation for that.

Say what?  I cannot imagine a scenario in which a user-defined encoding
would be useful. The amount of infrastructure you need for a new
encoding is so large that providing management commands is just silly
--- anyone who can create the infrastructure can do the last little bit
for themselves.  The analogy to index access methods is on point, again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix advises ...

2008-07-12 Thread Tom Lane
Abhijit Menon-Sen [EMAIL PROTECTED] writes:
 At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote:
 There was some discussion about this change and in fact if you
 look at CVS HEAD you'll find it already applied.

 Not as far as I can see.

The place where it matters is in ExecIndexAdvanceArrayKeys(),
not initial setup.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Simon Riggs

On Sat, 2008-07-12 at 09:56 +0100, Dave Page wrote:
 An inheritance schema for pg_class may well work for them.

OK, proposal is something like this:

We must avoid writes to many system tables to allow temp tables to
function.
Priority 1
pg_class - base definition of table
pg_attribute - columns
pg_attrdef - defaults
pg_statistic - ability to store ANALYZE and VACUUM results
pg_inherits

Priority 2 - would we need any of these?
pg_depend
pg_rules
pg_rewrite
pg_triggers
pg_indexes

Each of the above tables would have a matching temp_* version.

We wouldn't want to create a full temp schema every time we connect, we
would only do that when a temp table was created. 

When first temp table requested we create temp schema, using reserved
oids established at bootstrap time. We'll need a temp-bootstrap process
for the creation of temp_pg_class and temp_pg_attribute, then we can
create the other catalog tables more normally.

We hack find_inheritance_children() so it returns the oid of the
inherited temp catalog table for appropriate catalog tables. That way we
don't need to write to pg_inherits in order to have the catalog tables
recognise they have inheritance children.

So all direct accesses to catalog tables would result in temp tables
showing up in the result set, but only within the same session.

In Hot Standby mode we would ignore the inheritance hint and *always*
search pg_inherits every time we access a table.

Would also need to go through all catalog code so that it accessed the
correct catalog table depending upon whether its permanent or temp.

Seems like it would be a fairly big patch. The temp-bootstrap process is
still just hand-waving though.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Seems like it would be a fairly big patch. The temp-bootstrap process is
 still just hand-waving though.

Yeah, and it seems fairly messy.  The idea I'd had was that all the
catalog entries for (a single set of) inheritance child tables are Just
There in the output of initdb.  The tricky part is that each session that
makes use of these tables needs to have its own copy of their contents.
(I note that this is real close to the SQL spec's notion of how a temp
table works --- maybe we could usefully combine this with a patch to
provide spec-compliant temp tables?)  I think that could be solved with
some magic that made their pg_class entries reflect a per-session
relfilenode value.  This seems no worse than your proposed magic in
pg_inherit, and it eliminates the problem of needing to bootstrap all
the temp-file catalog infrastructure in every session.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Simon Riggs

On Sat, 2008-07-12 at 12:04 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Seems like it would be a fairly big patch. The temp-bootstrap process is
  still just hand-waving though.
 
 Yeah, and it seems fairly messy.  The idea I'd had was that all the
 catalog entries for (a single set of) inheritance child tables are Just
 There in the output of initdb.  

Yeh, not having a temp-boostrap process at all is best.

 The tricky part is that each session that
 makes use of these tables needs to have its own copy of their contents.
 I think that could be solved with
 some magic that made their pg_class entries reflect a per-session
 relfilenode value.  This seems no worse than your proposed magic in
 pg_inherit, and it eliminates the problem of needing to bootstrap
 all the temp-file catalog infrastructure in every session.

Agreed.

Perhaps the magic would be to create sub-directories in the pg_temp
namespace based upon backend pid. That way all relfilenode values would
be the same, but would refer to different objects.

Side thought... We can't generate Oids in Hot Standby mode, since
they'll end up duplicating values from the primary. We probably need to
reserve the top 16384 Oid values for use as temp object Oids.

 (I note that this is real close to the SQL spec's notion of how a temp
 table works --- maybe we could usefully combine this with a patch to
 provide spec-compliant temp tables?)  

Yeh, I read that and thought something similar. But we're talking about
temp additions to catalog tables, not all temp tables. If we tried to
implement spec-compliant temp tables we would need to write to catalog
tables again, which is what we are trying to avoid!

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuuming leaked temp tables (once again)

2008-07-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Yeh, I read that and thought something similar. But we're talking about
 temp additions to catalog tables, not all temp tables. If we tried to
 implement spec-compliant temp tables we would need to write to catalog
 tables again, which is what we are trying to avoid!

No, because a spec-compliant temp table is a persistent object and
*should* be reflected in the permanent catalogs.  What you meant to say
is that hot-standby sessions would only be able to use our traditional
type of temp tables.

[ thinks for a bit ... ] actually, maybe a hot standby session could be
allowed to use a *pre-existing* spec-compliant temp table.  It couldn't
make a new one though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
BTW, I looked at the SQL file (citext.sql.in) a bit.  Some comments:

* An explicit comment explaining that you're piggybacking on the I/O
functions (and some others) for text wouldn't be out of place.

* Lose the GRANT EXECUTEs on the I/O functions; they're redundant.
(If you needed them, you'd need them on a lot more than these two.)
I'd be inclined to lose the COMMENTs on the functions too; again
these are about the least useful ones to comment on out of the 
whole module.

* You should provide binary I/O (send/receive) functions, if you want
this to be an industrial-strength module.  It's easy since you can
piggyback on text's.

* The type declaration needs to say storage = extended, else the type
won't be toastable.

* The cast from bpchar to citext cannot be WITHOUT FUNCTION;
it needs to invoke rtrim1.  Compare the bpchar to text cast.

* = is surely not its own commutator.  You might try running the
opr_sanity regression test on this module to see if it finds any
other silliness.  (Procedure: insert the citext definition script
into the serial_schedule list just ahead of opr_sanity, run tests,
make sure you understand the reason for any diffs in the opr_sanity
result.  There will be at least one from the uses of text-related
functions for citext.)

* I think you can and should drop all of the size functions and
a lot of the miscellaneous functions: anyplace where the implicit
coercion to text would serve, you don't need a piggyback function,
and introducing one just creates risks of
can't-resolve-ambiguous-function failures.  The overloaded miscellaneous
functions are only justifiable to the extent that it's important to
preserve citext-ness of the result of a function, which seems at
best dubious for many of these.  It is likewise pretty pointless AFAICS
to introduce regex functions taking citext pattern arguments.

* Don't use the OPERATOR() notation when you don't need to.
It's just clutter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending grant insert on tables to sequences

2008-07-12 Thread Jaime Casanova
On Sat, Jul 12, 2008 at 6:30 AM, Abhijit Menon-Sen [EMAIL PROTECTED] wrote:

  para
 ! Granting permission on a table automatically extend
 ! permissions to any sequences owned by the table, including
 ! sequences tied to typeSERIAL/ columns.
  /para

 Should be Granting permissions on a table automatically extends those
 permissions to


what about extends them to...

 + if ((istmt.objtype == ACL_OBJECT_RELATION)  (istmt.all_privs ||
 + (istmt.privileges  (ACL_INSERT | ACL_UPDATE | ACL_SELECT
 + {

 The parentheses around the first comparison can go away, and also the
 ones around the ACL_* here:


ok

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Guayaquil - Ecuador
Cel. (593) 87171157
Index: doc/src/sgml/ref/grant.sgml
===
RCS file: /home/postgres/cvshome/pgsql/doc/src/sgml/ref/grant.sgml,v
retrieving revision 1.70
diff -c -r1.70 grant.sgml
*** doc/src/sgml/ref/grant.sgml	3 Jul 2008 15:59:55 -	1.70
--- doc/src/sgml/ref/grant.sgml	12 Jul 2008 19:22:01 -
***
*** 401,410 
 /para
  
 para
! Granting permission on a table does not automatically extend 
! permissions to any sequences used by the table, including 
! sequences tied to typeSERIAL/ columns.  Permissions on 
! sequence must be set separately.
 /para
  
 para
--- 401,409 
 /para
  
 para
! Granting permissions on a table automatically extend 
! them to any sequences owned by the table, including 
! sequences tied to typeSERIAL/ columns.
 /para
  
 para
Index: src/backend/catalog/aclchk.c
===
RCS file: /home/postgres/cvshome/pgsql/src/backend/catalog/aclchk.c,v
retrieving revision 1.147
diff -c -r1.147 aclchk.c
*** src/backend/catalog/aclchk.c	19 Jun 2008 00:46:03 -	1.147
--- src/backend/catalog/aclchk.c	12 Jul 2008 18:39:40 -
***
*** 361,366 
--- 361,406 
  	}
  
  	ExecGrantStmt_oids(istmt);
+ 
+ 	/*
+ 	 * If the objtype is a relation and the privileges includes INSERT, UPDATE 
+ 	 * or SELECT then extends the GRANT/REVOKE to the sequences owned by the 
+ 	 * relation
+ 	 */
+ 	if (istmt.objtype == ACL_OBJECT_RELATION  (istmt.all_privs ||  
+ 		(istmt.privileges  (ACL_INSERT | ACL_UPDATE | ACL_SELECT 
+ 	{
+ 		InternalGrant istmt_seq;
+ 
+ 		istmt_seq.is_grant = istmt.is_grant;
+ 		istmt_seq.objtype = ACL_OBJECT_SEQUENCE;
+ 		istmt_seq.grantees = istmt.grantees;
+ 		istmt_seq.grant_option = istmt.grant_option;
+ 		istmt_seq.behavior = istmt.behavior;
+ 
+ 		istmt_seq.all_privs = false;
+ 		istmt_seq.privileges = ACL_NO_RIGHTS;
+ 
+ 		if (istmt.all_privs)
+ 			istmt_seq.all_privs = true;
+ 		else
+ 		{
+ 			if (istmt.privileges  ACL_INSERT) 
+ istmt_seq.privileges |= ACL_USAGE;
+ 			if (istmt.privileges  ACL_UPDATE) 
+ istmt_seq.privileges |= ACL_UPDATE;
+ 			if (istmt.privileges  ACL_SELECT) 
+ istmt_seq.privileges |= ACL_SELECT;
+ 		}
+  
+ 		istmt_seq.objects = NIL;
+ 		foreach(cell, istmt.objects)
+ 	istmt_seq.objects = list_concat(istmt_seq.objects,
+ getOwnedSequences(lfirst_oid(cell)));
+  
+ 		if (istmt_seq.objects != NIL)
+ 			ExecGrantStmt_oids(istmt_seq);
+ 	} 
  }
  
  /*
Index: src/test/regress/expected/dependency.out
===
RCS file: /home/postgres/cvshome/pgsql/src/test/regress/expected/dependency.out,v
retrieving revision 1.7
diff -c -r1.7 dependency.out
*** src/test/regress/expected/dependency.out	3 Jul 2008 15:59:55 -	1.7
--- src/test/regress/expected/dependency.out	11 Jul 2008 16:53:14 -
***
*** 13,22 
  -- can't drop neither because they have privileges somewhere
  DROP USER regression_user;
  ERROR:  role regression_user cannot be dropped because some objects depend on it
! DETAIL:  access to table deptest
  DROP GROUP regression_group;
  ERROR:  role regression_group cannot be dropped because some objects depend on it
! DETAIL:  access to table deptest
  -- if we revoke the privileges we can drop the group
  REVOKE SELECT ON deptest FROM GROUP regression_group;
  DROP GROUP regression_group;
--- 13,24 
  -- can't drop neither because they have privileges somewhere
  DROP USER regression_user;
  ERROR:  role regression_user cannot be dropped because some objects depend on it
! DETAIL:  access to sequence deptest_f1_seq
! access to table deptest
  DROP GROUP regression_group;
  ERROR:  role regression_group cannot be dropped because some objects depend on it
! DETAIL:  access to sequence deptest_f1_seq
! access to table deptest
  -- if we revoke the privileges we can drop the group
  REVOKE SELECT ON deptest FROM GROUP regression_group;
  DROP GROUP regression_group;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Extending grant insert on tables to sequences

2008-07-12 Thread Abhijit Menon-Sen
At 2008-07-12 14:32:03 -0500, [EMAIL PROTECTED] wrote:

  Should be Granting permissions on a table automatically extends
  those permissions to
 
 what about extends them to...

Yes, sounds fine, thanks.

But I notice that nobody else has commented on whether they want this
feature or not. Does anyone particularly dislike the idea?

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending grant insert on tables to sequences

2008-07-12 Thread Tom Lane
Jaime Casanova [EMAIL PROTECTED] writes:
 + if (istmt.all_privs)
 + istmt_seq.all_privs = true;
 + else
 + {
 + if (istmt.privileges  ACL_INSERT) 
 + istmt_seq.privileges |= ACL_USAGE;
 + if (istmt.privileges  ACL_UPDATE) 
 + istmt_seq.privileges |= ACL_UPDATE;
 + if (istmt.privileges  ACL_SELECT) 
 + istmt_seq.privileges |= ACL_SELECT;
 + }

This definition of the derived rights seems pretty arbitrary and
unprincipled.  If you can't explain it precisely in a few words in the
documentation (and I notice you didn't even attempt to explain it at
all), then you probably need to think harder.

In particular, I don't see why having UPDATE on the parent table should
grant the right to use setval() on the sequence.  If you had INSERT and
DELETE as well, implying the right to make arbitrary changes in the
parent table, then maybe setval() would be sensible to allow --- but
this code can't really tell that, since it doesn't have a global view
of the privileges previously granted.

What I think makes sense is just to have parent INSERT privilege lead to
USAGE on the sequence (thus granting nextval/currval rights, which are
what you'd typically need in association with trying to do inserts).
I don't see any need to automatically grant more than that.  Selects and
updates on the parent table don't need to touch the sequence, so why are
those privileges granting anything?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending grant insert on tables to sequences

2008-07-12 Thread Tom Lane
Abhijit Menon-Sen [EMAIL PROTECTED] writes:
 But I notice that nobody else has commented on whether they want this
 feature or not. Does anyone particularly dislike the idea?

I think it's probably reasonable as long as we keep the implicitly
granted rights as narrow as possible.  INSERT on the parent table
would normally be hard to use correctly if you can't nextval() the
sequence, so automatically allowing nextval() seems pretty reasonable.
I think the case for granting anything more than that is weak ---
even without considering backwards-compatibility arguments.

A fairly important practical problem is whether this will keep pg_dump
from correctly reproducing the state of a database.  Assume that someone
did revoke the implicitly-granted rights on the sequence --- would a
dump and reload correctly preserve that state?  It'd depend on the order
in which pg_dump issued the GRANTs, and I'm not at all sure pg_dump
could be relied on to get that right.  (Even if we fixed it to account
for the issue today, what of older dump scripts?)

Another issue is the interaction with the planned column-level GRANT
feature.  AFAICS, the obvious-sounding rule that usage of the sequence
should be granted consequent to granting INSERT on the owning column
would be exactly backwards.  It's when you have *not* got INSERT on
that column that you *must* rely on the default for it, and hence you'd
better have the ability to do nextval() or your alleged insert
privileges on other columns are worthless.  So it seems that sequence
usage should be granted if any column INSERT is granted, and revoked
only when all column INSERT privileges are revoked --- and that latter
rule is going to be hard to implement with this type of patch, because
it doesn't know what column privileges are going to remain.

I thought for a bit about abandoning the proposed implementation and
instead having nextval/currval check at runtime: IOW, if the check for
ACL_USAGE on the sequence fails, look to see if the sequence is owned
and if so look to see if the user has ACL_INSERT on the parent table.
(This seems a bit slow but maybe it wouldn't be a problem, or maybe we
could arrange to cache the lookup results.)  This would avoid the
action at a distance behavior in GRANT and thereby cure both of
the problems mentioned above.  However, it would mean that it'd be
impossible to grant INSERT without effectively granting sequence USAGE
--- revoking USAGE on the sequence wouldn't stop anything.  Plus, \z on
the sequence would fail to tell you about those implicitly held rights.
So I'm not sure I like this way any better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-12 Thread Jan Urbański
OK, here's the (hopefully final) version of the typanalyze function for 
tsvectors. It applies to HEAD and passes regression tests.


I now plan to move towards a selectivity function that'll use the 
gathered statistics.


Thanks to everyone for their reviews and comments,

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


gsoc08-tss-04-lc.diff.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler

On Jul 12, 2008, at 12:17, Tom Lane wrote:


BTW, I looked at the SQL file (citext.sql.in) a bit.  Some comments:


Thanks a million for these, Tom. I greatly appreciate it.


* An explicit comment explaining that you're piggybacking on the I/O
functions (and some others) for text wouldn't be out of place.


I've added SQL comments. Were you talking about a COMMENT?


* Lose the GRANT EXECUTEs on the I/O functions; they're redundant.
(If you needed them, you'd need them on a lot more than these two.)
I'd be inclined to lose the COMMENTs on the functions too; again
these are about the least useful ones to comment on out of the
whole module.


I wondered about that; those were copied from CITEXT 1. I've removed  
all GRANTs and COMMENTs.



* You should provide binary I/O (send/receive) functions, if you want
this to be an industrial-strength module.  It's easy since you can
piggyback on text's.


I'm confused. Is that not what the citextin and citextout functions are?


* The type declaration needs to say storage = extended, else the type
won't be toastable.


Ah, good, thanks.


* The cast from bpchar to citext cannot be WITHOUT FUNCTION;
it needs to invoke rtrim1.  Compare the bpchar to text cast.


Where do I find that? I have trouble finding the SQL that creates the  
core types. :-(



* = is surely not its own commutator.


Changed to =.


You might try running the
opr_sanity regression test on this module to see if it finds any
other silliness.  (Procedure: insert the citext definition script
into the serial_schedule list just ahead of opr_sanity, run tests,
make sure you understand the reason for any diffs in the opr_sanity
result.  There will be at least one from the uses of text-related
functions for citext.)


Thanks. Added to my list.


* I think you can and should drop all of the size functions and
a lot of the miscellaneous functions: anyplace where the implicit
coercion to text would serve, you don't need a piggyback function,
and introducing one just creates risks of
can't-resolve-ambiguous-function failures.  The overloaded  
miscellaneous

functions are only justifiable to the extent that it's important to
preserve citext-ness of the result of a function, which seems at
best dubious for many of these.  It is likewise pretty pointless  
AFAICS

to introduce regex functions taking citext pattern arguments.


I added most of those as I wrote tests and they failed to find the  
functions. Once I added the functions, they worked. But I'll do an  
audit to make sure that I didn't inadvertantly leave in any unneeded  
ones (I'm happy to have less code :-)).



* Don't use the OPERATOR() notation when you don't need to.
It's just clutter.


Sorry, don't know what you're referring to here. CREATE OPERATOR  
appears to require parens…


Thanks for the great feedback! I'll work on getting things all  
straightened out and a new patch in tonight.


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
There was some discussion earlier about whether the proposed regression
tests for citext are suitable for use in contrib or not.  After playing
with them for awhile, I have to come down very firmly on the side of
not.  I have these gripes:

1. The style is gratuitously different from every other regression test
in the system.  This is not a good thing.  If it were an amazingly
better way to do things, then maybe, but as far as I can tell the style
pgTAP forces on you is really pretty darn poorly suited for SQL tests.
You have to contort what could naturally be expressed in SQL as a table
result into a scalar. Plus it's redundant with the expected-output file.

2. It's ridiculously slow; at least a factor of ten slower than doing
equivalent tests directly in SQL.  This is a very bad thing.  Speed of
regression tests matters a lot to those of us who run them a dozen times
per day --- and I do not wish to discourage any developers who don't
work that way from learning better habits ;-)

Because of #1 and #2 I find the use of pgTAP to be a nonstarter.
I have a couple of gripes about the substance of the tests as well:

3. What you are mostly testing is not the behavior of citext itself,
but the behavior of the underlying strcoll function.  This is pretty
much doomed to failure, first because the regression tests are intended
to work in multiple locales (and we are *not* giving that up for
citext), and second because the behavior of strcoll isn't all that
portable across platforms even given allegedly similar locale settings
(as we already found out yesterday).  Sadly, I think you have to give up
attempts to check the interesting multibyte cases and confine yourself
to tests using ASCII strings.

4. A lot of the later test cases are equally uselessly testing whether
piggybacking over text functions works.  The odds of ever finding
anything with those tests are not distinguishable from zero (unless the
underlying text function is busted, which is not your responsibility to
test).  So I don't see any point in putting them into the standard
regression package.  (What maybe *would* be useful to test, but you
didn't, is whether the result of a function is considered citext rather
than text.)


I suggest something more like the attached as a suitable regression
test.  I got bored about halfway through and started to skim, so there
might be a few tests toward the end of the original set that would be
worth transposing into this one, but nothing jumped out at me.

regards, tom lane

--
--  Test citext datatype
--

--
-- first, define the datatype.  Turn off echoing so that expected file
-- does not depend on contents of citext.sql.
--
SET client_min_messages = warning;
\set ECHO none
\i citext.sql
\set ECHO all
RESET client_min_messages;

-- Test the operators and indexing functions

-- Test = and .
SELECT 'a'::citext = 'a'::citext as t;
SELECT 'a'::citext = 'A'::citext as t;
SELECT 'a'::citext = 'A'::text as f;-- text wins the discussion
SELECT 'a'::citext = 'b'::citext as f;
SELECT 'a'::citext = 'ab'::citext as f;
SELECT 'a'::citext  'ab'::citext as t;

-- Test  and =
SELECT 'B'::citext  'a'::citext as t;
SELECT 'b'::citext  'A'::citext as t;
SELECT 'B'::citext  'b'::citext as f;
SELECT 'B'::citext = 'b'::citext as t;

-- Test  and =
SELECT 'a'::citext  'B'::citext as t;
SELECT 'a'::citext = 'B'::citext as t;

-- Test implicit casting. citext casts to text, but not vice-versa.
SELECT 'a'::citext = 'a'::text as t;
SELECT 'A'::text  'a'::citext as t;

SELECT 'aardvark'::citext = 'aardvark'::citext as t;
SELECT 'aardvark'::citext = 'aardVark'::citext as t;

-- Check the citext_cmp() function explicitly.
SELECT citext_cmp('aardvark'::citext, 'aardvark'::citext) as zero;
SELECT citext_cmp('aardvark'::citext, 'aardVark'::citext) as zero;
SELECT citext_cmp('B'::citext, 'a'::citext) as one;

-- Do some tests using a table and index.

CREATE TEMP TABLE try (
name citext PRIMARY KEY
);

INSERT INTO try (name)
VALUES ('a'), ('ab'), ('aba'), ('b'), ('ba'), ('bab'), ('AZ');

SELECT name, 'a' = name from try;
SELECT name, 'a' = name from try where name = 'a';
SELECT name, 'A' = name from try;
SELECT name, 'A' = name from try where name = 'A';
SELECT name, 'A' = name from try where name = 'A';

-- expected failures on duplicate key
INSERT INTO try (name) VALUES ('a');
INSERT INTO try (name) VALUES ('A');
INSERT INTO try (name) VALUES ('aB');

-- Test aggregate functions and sort ordering

CREATE TEMP TABLE srt (
name CITEXT
);

INSERT INTO srt (name)
VALUES ('aardvark'),
   ('AAA'),
   ('aba'),
   ('ABC'),
   ('abd');

-- Check the min() and max() aggregates, with and without index.
set enable_seqscan = off;
SELECT MIN(name) from srt;
SELECT MAX(name) from srt;
reset enable_seqscan;
set enable_indexscan = off;
SELECT MIN(name) from srt;
SELECT MAX(name) from srt;
reset enable_indexscan;

-- Check sorting likewise
set enable_seqscan = off;
SELECT name FROM srt ORDER BY 

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler

On Jul 12, 2008, at 14:57, Tom Lane wrote:

There was some discussion earlier about whether the proposed  
regression
tests for citext are suitable for use in contrib or not.  After  
playing

with them for awhile, I have to come down very firmly on the side of
not.  I have these gripes:


You're nothing if not thorough, Tom.

1. The style is gratuitously different from every other regression  
test

in the system.  This is not a good thing.  If it were an amazingly
better way to do things, then maybe, but as far as I can tell the  
style

pgTAP forces on you is really pretty darn poorly suited for SQL tests.
You have to contort what could naturally be expressed in SQL as a  
table
result into a scalar. Plus it's redundant with the expected-output  
file.


Sure. I wasn't aware of the existing regression test methodology when  
I wrote pgTAP and those tests. I'm fine to change them and use pgTAP  
for testing my app code, rather than PostgreSQL itself.



2. It's ridiculously slow; at least a factor of ten slower than doing
equivalent tests directly in SQL.  This is a very bad thing.  Speed of
regression tests matters a lot to those of us who run them a dozen  
times

per day --- and I do not wish to discourage any developers who don't
work that way from learning better habits ;-)


Hrm. I'm wonder why it's so slow? The test functions don't really do a  
lot. Anyway, I agree that they should perform well.



Because of #1 and #2 I find the use of pgTAP to be a nonstarter.
I have a couple of gripes about the substance of the tests as well:

3. What you are mostly testing is not the behavior of citext itself,
but the behavior of the underlying strcoll function.  This is pretty
much doomed to failure, first because the regression tests are  
intended

to work in multiple locales (and we are *not* giving that up for
citext), and second because the behavior of strcoll isn't all that
portable across platforms even given allegedly similar locale settings
(as we already found out yesterday).


Yes, I *just* ran the tests on Ubuntu and opened my mail to ask about  
the bizarre differences when I saw this message. Thanks for answering  
my question before I asked it. :-)



Sadly, I think you have to give up
attempts to check the interesting multibyte cases and confine yourself
to tests using ASCII strings.


Grr. Kind of defeats the purpose. Is there no infrastructure for  
testing multibyte functionality? Are test database clusters all built  
using SQL_ASCII and the C locale?



4. A lot of the later test cases are equally uselessly testing whether
piggybacking over text functions works.  The odds of ever finding
anything with those tests are not distinguishable from zero (unless  
the
underlying text function is busted, which is not your responsibility  
to

test).  So I don't see any point in putting them into the standard
regression package.  (What maybe *would* be useful to test, but you
didn't, is whether the result of a function is considered citext  
rather

than text.)


Well, I was doing test-driven development: I wrote the tests to ensure  
that I had added the functions for CITEXT properly, and when they  
passed, I could move on. As a unit tester, it'd feel weird for me to  
drop these. I'm not testing the underlying functions; I'm making sure  
that they work properly with CITEXT.



I suggest something more like the attached as a suitable regression
test.  I got bored about halfway through and started to skim, so there
might be a few tests toward the end of the original set that would be
worth transposing into this one, but nothing jumped out at me.


Thanks! I'll work this in.

Best,

David

PS: I confirmed late yesterday that the memory leak I saw was with my  
version for 8.3, where I had my own str_tolower(). It clearly has a  
leak that I'll have to fix, but it has no bearing on the contribution  
to 8.4, which has no such leak. Thanks for running the test yourself  
to confirm.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler

On Jul 12, 2008, at 15:13, David E. Wheeler wrote:


Sadly, I think you have to give up
attempts to check the interesting multibyte cases and confine  
yourself

to tests using ASCII strings.


Grr. Kind of defeats the purpose. Is there no infrastructure for  
testing multibyte functionality? Are test database clusters all  
built using SQL_ASCII and the C locale?


I just tried to take your modified tests and add multibyte tests that  
run only on OS X with en_US.UTF-8. They worked like this:


CREATE OR REPLACE FUNCTION test_multibyte ()
RETURNS BOOLEAN AS $$
SELECT version() ~ 'apple-darwin'
   AND (select setting from pg_settings where name = 'lc_collate')
 = 'en_US.UTF-8';
$$ LANGUAGE SQL IMMUTABLE;

SELECT 'À'::citext = 'À'::citext WHERE test_multibyte() = true;
SELECT 'À'::citext = 'à'::citext WHERE test_multibyte() = true;

But then I realized that this would change the expected output  
depending on the platform, and thus make the tests fail. This is one  
reason why the inflexibility of the existing regression test model is  
a drag: it limits you to testing only what works everywhere!


Grrr.

I'll remove all the multibyte character tests, but I have to say that,  
as a result, the CITEXT 1 module would likely pass all such tests,   
but it still isn't locale-aware. How can one add regressions to ensure  
that something truly is locale-aware?


Thanks,

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler

On Jul 12, 2008, at 14:50, David E. Wheeler wrote:


* An explicit comment explaining that you're piggybacking on the I/O
functions (and some others) for text wouldn't be out of place.


I've added SQL comments. Were you talking about a COMMENT?


* Lose the GRANT EXECUTEs on the I/O functions; they're redundant.
(If you needed them, you'd need them on a lot more than these two.)
I'd be inclined to lose the COMMENTs on the functions too; again
these are about the least useful ones to comment on out of the
whole module.


I wondered about that; those were copied from CITEXT 1. I've removed  
all GRANTs and COMMENTs.



* You should provide binary I/O (send/receive) functions, if you want
this to be an industrial-strength module.  It's easy since you can
piggyback on text's.


I'm confused. Is that not what the citextin and citextout functions  
are?



* The type declaration needs to say storage = extended, else the type
won't be toastable.


Ah, good, thanks.


* The cast from bpchar to citext cannot be WITHOUT FUNCTION;
it needs to invoke rtrim1.  Compare the bpchar to text cast.


Where do I find that? I have trouble finding the SQL that creates  
the core types. :-(


Duh, you just told me. Added, thanks.


* = is surely not its own commutator.


Changed to =.


You might try running the
opr_sanity regression test on this module to see if it finds any
other silliness.  (Procedure: insert the citext definition script
into the serial_schedule list just ahead of opr_sanity, run tests,
make sure you understand the reason for any diffs in the opr_sanity
result.  There will be at least one from the uses of text-related
functions for citext.)


Thanks. Added to my list.


* I think you can and should drop all of the size functions and
a lot of the miscellaneous functions: anyplace where the implicit
coercion to text would serve, you don't need a piggyback function,
and introducing one just creates risks of
can't-resolve-ambiguous-function failures.  The overloaded  
miscellaneous

functions are only justifiable to the extent that it's important to
preserve citext-ness of the result of a function, which seems at
best dubious for many of these.  It is likewise pretty pointless  
AFAICS

to introduce regex functions taking citext pattern arguments.


I added most of those as I wrote tests and they failed to find the  
functions. Once I added the functions, they worked. But I'll do an  
audit to make sure that I didn't inadvertantly leave in any unneeded  
ones (I'm happy to have less code :-)).


Of the size functions, I was able to remove only this one and keep all  
of my pgTAP tests passing:


CREATE FUNCTION textlen(citext)
RETURNS int4 AS 'textlen'
LANGUAGE 'internal' IMMUTABLE STRICT;

When I deleted any of the others, I got errors like this:

psql:sql/citext.sql:865: ERROR:  function length(citext) is not unique
LINE 1: SELECT is( length( name ), length(name::text), 'length(' ||...
   ^
HINT:  Could not choose a best candidate function. You might need to  
add explicit type casts.


I think you can see now why I wrote the tests: I wanted to ensure that  
CITEXT really does work (almost) just like TEXT.


I was able to eliminate *all* of the miscellaneous functions, but the  
upper() and lower() functions now return TEXT instead of CITEXT, which  
I don't think is exactly what we want, is it? For now, I'e left  
upper() and lower() in. It just seems like more expected functionality.



* Don't use the OPERATOR() notation when you don't need to.
It's just clutter.


Sorry, don't know what you're referring to here. CREATE OPERATOR  
appears to require parens…


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers