Re: [pgsql-advocacy] [HACKERS] GSoC on WAL-logging hash indexes
Is there a good reason for this thread being copied to the advocacy list? It seems to me just on topic for hackers. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [pgsql-advocacy] [HACKERS] First draft of update announcement
On 2014-03-18, 2:42 PM, Josh Berkus wrote: Other PostgreSQL 9.3 only fixes in this update include: * Add read-only data_checksum parameter I recall being told last fall that this would not be added to 9.3.x (9.3.1 at the time I think) and only to 9.4.x because such a feature addition was something only allowed for major releases and not minor ones which were just supposed to be security and bug fixes. So what changed that it is added in 9.3.x after all? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 2013.08.27 7:57 PM, Tom Lane wrote: Darren Duncan dar...@darrenduncan.net writes: I have a proposal. Assuming we decide to do away with RULEs, You lost me already. If we had replacement functionality for everything that can be done with rules, we could start to think about when we might begin to tell people they can't use rules. Unfortunately, we don't have that precondition. regards, tom lane That's a really old post/thread, and I'm not arguing for any kind of action related to RULEs, please disregard the message. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)
On 2013.06.12 10:11 AM, Josh Berkus wrote: Multidim arrays are why we can't have nice things. Yeah, I think that was not our best moment. :-( Actually, if you think about it, we've never had arrays in PostgreSQL ... we have always had matrixes. If you think about things that way, most of the current functionality makes sense. If the type system were changed so that arrays were were always just one-dimensional, you can define your matrix simply as a binary relation type whose primary key attribute has the type of a fixed-length array of integers, where the number of elements in the array is the number of dimensions in the matrix, and the array elements themselves defined the coordinates in the matrix. This design confers a number of benefits. Also, the case of the zero-dimension matrix needs no special treatment; the key array has zero elements. Would that not work? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add --single-row to psql
being too specific to details of one. And of course, cross-invocation of code written in multiple languages is made much easier. Note, just to be clear, my proposal does not necessitate that all of a node tree has to be kept in memory at once. This design should be adaptable to a streaming approach, especially as it is expected to be able to handle database dumps or transfers of arbitrary size, same as SQL engines can today. That is in contrast to probably what most application languages would assume, where everything would fit in memory at once. But the ability to stream or not would largely be an implementation detail. Realistically, all code should fit in memory at once, and anything that would have to be buffered out of memory would be say embedded data literals (whether large strings or simply large relations). If you're not sure how my proposal would address any of the needs or wants raised in the thread, go ahead and ask, and I will try and answer as time permits. -- Darren Duncan On 2013.05.11 9:27 AM, Tom Lane wrote: David Fetter da...@fetter.org writes: On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote: Some kind of extendable parser would be awesome. It would need to tie into the rewriter also. No, I don't have a clue what the design looks like. That's a direction several of the proprietary RDBMS vendors have proposed. I think it'd be great :) Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API to pass expression trees in and out. This could have other benefits as to clustering space, shortening the planning cycle, etc., but let's not go there for now. My knowledge is very, very sketchy, but when I squint, the expression trees we use look a lot like JSON. Are they isomorphic? By the time you've got an expression tree, the problem is mostly solved, at least so far as parser extension is concerned. More years ago than I care to admit, I worked on systems that had run-time-extensible parsers at Hewlett-Packard, so technology for this does exist. But my (vague) memory of those systems is that the parser's language capabilities were more limited than bison's, perhaps only LL(1). Parsing spec-compatible SQL that way might be a challenge. A larger issue is that if you don't have the whole grammar available to check, it's difficult to be sure there are no parsing conflicts. I seem to remember that we hit some conflicts between different extension ROMs back at HP :-( Another point is that extensions that are actually interesting require a lot more than new syntax. Robert mentioned the rewriter, but you'd typically need planner and executor additions as well. It's possible to see how whole new plan node types might be added by a plugin so far as the executor is concerned, but I haven't a clue how we'd get the planner to emit them ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing event trigger support functions in 9.3
On 2013.05.09 10:40 AM, Pavel Stehule wrote: I am writing a article about 9.3. I found so event trigger functions is not complete. We have only pg_event_trigger_dropped_objects() function. It looks really strange and asymmetric. Can we implement similar function for CREATE as minimum to 9.3? I am expecting so this function should not be too complex - and can be moved to contrib maybe (if it is too late now). Really, the touted new event triggers feature in fact only works for dropping objects rather than for all DDL or other events? If so, then the feature seems seriously incomplete. Are drops the main expected use case for the feature, rather than tell me when something happened? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing event trigger support functions in 9.3
On 2013.05.09 11:22 AM, Alvaro Herrera wrote: Darren Duncan wrote: On 2013.05.09 10:40 AM, Pavel Stehule wrote: I am writing a article about 9.3. I found so event trigger functions is not complete. We have only pg_event_trigger_dropped_objects() function. It looks really strange and asymmetric. Can we implement similar function for CREATE as minimum to 9.3? I am expecting so this function should not be too complex - and can be moved to contrib maybe (if it is too late now). Really, the touted new event triggers feature in fact only works for dropping objects rather than for all DDL or other events? It works for all DDL. It provides complete information for drops, not other kinds of commands. This will be expanded later. If so, then the feature seems seriously incomplete. Feel free to contribute during the 9.4 development cycle. Your appreciation for the work that already went into this feature is duly noted. Sorry if I came across as not appreciating the work already done. I in fact do appreciate it. I made a mistake. Please disregard my initial comment. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding support for zero-attribute unique/etc keys
On 2013.03.26 1:40 AM, Albe Laurenz wrote: Darren Duncan wrote: So, determining if 2 rows are the same involves an iteration of dyadic logical AND over the predicates for each column comparison. Now logical AND has an identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results in p for all p. Therefore, any 2 rows with zero columns each are the same. Since any 2 rows with zero columns are the same, the UNIQUE predicate is FALSE any time there is more than 1 row in a table. Does anyone agree or disagree with this logic? Yes :^) You could use the same kind of argument like this: UNIQUE is true iff any two rows in T satisfy for each column: the column in row 1 is null OR the column in row 2 is null OR the column in row 1 is distinct from the column in row 2 Now you you iterate your logical AND over this predicate for all columns and come up with TRUE since there are none. Consequently UNIQUE is satisfied, no matter how many rows there are. In a nutshell: All members of the empty set satisfy p, but also: all members of the empty set satisfy the negation of p. You can use this technique to make anything plausible. Consider the context however. We're talking about a UNIQUE constraint and so what we want to do is prevent the existence of multiple tuples in a relation that are the same for some defined subset of their attributes. I would argue that logically, and commonsensically, two tuples with no attributes are the same, and hence a set of distinct tuples having zero attributes could have no more than one member, and so a UNIQUE constraint over zero attributes would say the relation can't have more than one tuple. So unless someone wants to argue that two tuples with no attributes are not the same, my interpretation makes more sense and is clearly the one to follow. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding support for zero-attribute unique/etc keys
On 2013.03.25 1:17 AM, Albe Laurenz wrote: Darren Duncan wrote: From my usage and http://www.postgresql.org/docs/9.2/interactive/sql-createtable.html I see that Postgres requires constraints like unique (and primary) keys, and foreign keys, to range over at least 1 attribute/column. I propose that this be generalized so that constraints may instead be nullary, that is, range over zero or more attributes/columns instead. I don't think that a new way to force single-row tables is a good enough use case, and for completeness' sake is only a good argument if it simplifies things and makes them more comprehensible, which I don't think is the case here. The semantics seem unclear to me: The standard defines UNIQUE on the basis of the UNIQUE predicate: unique predicate ::= UNIQUE table subquery and states: 1) Let T be the result of the table subquery. 2) If there are no two rows in T such that the value of each column in one row is non-null and is not distinct from the value of the corresponding column in the other row, then the result of the unique predicate is *True*; otherwise, the result of the unique predicate is *False*. Since an imagined zero-column query would have an empty set of result columns, you could with equal force argue that these columns satisfy the condition or not, because the members of the empty set have all the properties you desire. So I see no compelling argument that such a UNIQUE constraint would force a single-row table. I do see that compelling argument, and it has to do with identities. The above definition of UNIQUE predicate says that the UNIQUE predicate is FALSE iff, for every pair of rows in T, the 2 rows of any pair are the same. Further, 2 rows are the same iff, for every corresponding column, the values in both rows are the same. Further, 2 such values are the same iff they are both not null and are mutually not distinct. So, determining if 2 rows are the same involves an iteration of dyadic logical AND over the predicates for each column comparison. Now logical AND has an identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results in p for all p. Therefore, any 2 rows with zero columns each are the same. Since any 2 rows with zero columns are the same, the UNIQUE predicate is FALSE any time there is more than 1 row in a table. Hence, a UNIQUE constraint over zero columns signifies a row-comparison predicate that unconditionally results in TRUE, and so no two rows at all would be allowed in the table with that constraint at once, thus restricting the table to at most one row. Does anyone agree or disagree with this logic? And can the definition of UNIQUE predicate reasonably be read the way I stated? And if the SQL definition is ambiguous, could it reasonably be said that the behavior I stated is reasonable for Postgres and would match expectations of people using it? Saying 2 empty rows are the same makes as much sense as saying 2 empty arrays are the same, does it not? The desired effect can be had today with a unique index: CREATE TABLE singleton (id integer); CREATE UNIQUE INDEX singleton_idx ON singleton((1)); Okay, that is helpful, and less of a kludge than what I was doing, but it is still a kludge compared to what I'm proposing, which I see as elegant. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 2013.03.25 5:55 PM, Craig Ringer wrote: On 03/25/2013 10:28 PM, Tom Lane wrote: Yeah, they are, because things break when they're set wrong. They also make debugging and support harder; you need to get an ever-growing list of GUC values from the user to figure out what their query does. bytea_output, standard_conforming_strings, etc. Yick. That said, I don't have a better answer for introducing non-BC changes. Given the general trouble GUC values cause, is there a plan to deprecate and remove each of the existing ones over time? As long as post-removal there isn't any actual loss of functionality, but users might have to change their code to do it the one true way, that would seem a good thing. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 2013.03.25 6:03 PM, Darren Duncan wrote: On 2013.03.25 5:55 PM, Craig Ringer wrote: On 03/25/2013 10:28 PM, Tom Lane wrote: Yeah, they are, because things break when they're set wrong. They also make debugging and support harder; you need to get an ever-growing list of GUC values from the user to figure out what their query does. bytea_output, standard_conforming_strings, etc. Yick. That said, I don't have a better answer for introducing non-BC changes. Given the general trouble GUC values cause, is there a plan to deprecate and remove each of the existing ones over time? As long as post-removal there isn't any actual loss of functionality, but users might have to change their code to do it the one true way, that would seem a good thing. -- Darren Duncan To clarify, I mean GUC related to backwards compatibility matters, such as bytea_output or standard_conforming_strings, things that affect the logical behavior of code. I don't mean all GUC, not at all, most of the ones I know about should remain configurable. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] adding support for zero-attribute unique/etc keys
From my usage and http://www.postgresql.org/docs/9.2/interactive/sql-createtable.html I see that Postgres requires constraints like unique (and primary) keys, and foreign keys, to range over at least 1 attribute/column. I propose that this be generalized so that constraints may instead be nullary, that is, range over zero or more attributes/columns instead. Supporting this gives us more logical completeness, while not allowing lists of zero attributes is akin to having math without zeroes. The semantics of having a nullary / zero-attribute unique/primary key on a table is to restrict the table to having at most one row at a time. Practically speaking, I find that nullary primary/unique keys are useful in the numerous situations where one might want to have a table consisting of just one record (or zero), because it stores data that there is just a single instance of, such some kinds of config data, rather than a list of such. Such tables let you store such data in a strongly typed and well self-documented manner, same as normal tables. Currently in order to do this, I have to work around the lack of nullary keys by creating a table with a dummy column having a check constraint saying its value must equal a constant, and have a unary unique/primary key on that. But this is an unpleasant kludge and one shouldn't have to do that. Nullary foreign keys can also be useful, or at least should be supported for logical completeness. The semantics of having one is to say that one table may only have rows if another table also has rows. I was also going to say that it is important to be able to declare a composite type with zero attributes, which is a way to make a singleton type. However I'm not sure that we don't already support this so I won't say more on that. The primary request I have is nullary unique/primary key support, which I see has a solid use case. (Note: Logically speaking, if you have a nullary key, it would be the primary key, and all other unique keys are redundant. Except perhaps that others can exist in order to support implementation matters of foreign keys that require their targets to be unique.) How much work would it be to support this? But also important, does anyone either agree it should be supported or does anyone want to counter-argue that it shouldn't be supported? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-parseable object descriptions
On 2013.03.18 1:03 PM, Alvaro Herrera wrote: For Dimitri's patch to add support for dropped objects in event triggers, there's an open question about how to report objects that are being dropped in a tabular format. Now that JSON is a built-in type with 9.2+, could we not perhaps use that to represent some things in a more natural manner than a tabular format? JSON is designed to be machine-parseable, and some objects such as routine definitions are naturally trees of arbitrary depth. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
I agree with Jeff. Options that change the language at initdb or create-database time just fragment the language. It is best to just have 1 language where options are providable either dynamically per connection or otherwise lexically, so that then they are really just shorthands for the current local usage, and the language as a whole is the same. That also means you can have example code out there and know it will work on any Postgres install, invariant of static global options. If language modifiers are local or lexical, then any example code presumably would include the switches to turn them on for that example. That all being said, I also think it is best to explicitly overload operators with extra parameter types, such as defining another operator with the signature of (Nunber,String) with the same base name as string catenation, rather than making numbers implicitly stringify. But I can also accept implicit stringification / language behavior changes if it is a lexical/temporary effect that the same user is still explicitly turning on. -- Darren Duncan Jeff Davis wrote: On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: And we not only don't give them the behavior they want; we don't even have a meaningful way to give the option of opting into that behavior at initdb or create-database time. I strongly object to offering options that change the language in such a substantial way. initdb-time options still mean that we are essentially dividing our language, and therefore the applications that support postgres, in half (or worse). One of the things I really like about postgres is that we haven't forked the language with a million options like mysql has. I don't even like the fact that we have a GUC to control the output format of a BYTEA. For every developer who says wow, that mysql query just worked without modification there is another one who says oh, I forgot to test with option XYZ... postgres is too complex to support, I'm going to drop it from the list of supported databases. I still don't see a compelling reason why opting out of overloading on a per-function basis won't work. Your objections seemed fairly minor in comparison to how strongly you are advocating this use case. In particular, I didn't get a response to: http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025 For what it's worth, I'm glad that people like you are pushing on these usability issues, because it can be hard for insiders to see them sometimes. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
Craig Ringer wrote: On 11/15/2012 06:25 PM, Hannu Krosing wrote: On 11/15/2012 09:48 AM, Craig Ringer wrote: If you want to prevent TRUNCATE, deny the privilege or add a trigger that aborts the command. You can abort the transaction but not skip action as currently it is only possible to skip in ROW level triggers. So I'd modify this request to allow BEFORE EACH STATEMENT triggers to also be able to silently skip current action like BEFORE EACH ROW triggers can. Then this request would simply be satisfied by a simple trigger which rewrites TRUNCATE into DELETE . That seems sensible to me, too. To further explain ... What I'm desiring here with respect to TRUNCATE is that users are allowed to use TRUNCATE syntax as an alternative to DELETE (they are GRANTed both) but that any triggers defined for DELETE can also be applied to TRUNCATE without too much difficulty. So users can use different syntactic constructs that look similar without having to think about, is this going to be audited. I understand that ROW level triggers don't exist yet for TRUNCATE (this is already clearly documented in the manual) and adding them would mean TRUNCATE would do a table scan in their presence. I still think the syntax of TRUNCATE FOR EACH ROW would be useful, but if no one agrees, then I'll just make do with alternative solutions mentioned here. That is, either a statement-level TRUNCATE trigger of some kind, or simply disallow TRUNCATE privileges. Thank you. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature proposal - triggers by semantics
Dimitri Fontaine wrote: Darren Duncan dar...@darrenduncan.net writes: So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW Kevin has been proposing that we consider an alternative approach in some other cases that I think would work better for you, too. Namely, to have access to OLD and NEW in FOR EACH STATEMENT triggers, where they would be relations rather than records. Regards, Yes, I believe that would work very well. In fact, that would provide some power features. I look forward to this, probably the best solution. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] feature proposal - triggers by semantics
I have a feature request, which at one level should require little code change, but at another level may require more. Since Postgres 9.3 is going to be doing some significant feature additions for triggers, I'd like to see some more. As they currently exist, triggers always fire based on certain SQL syntax used, rather than on the semantics of what is actually going on. I would like to see a new class of triggers that fire when particular database operations happen regardless of what SQL syntax was used. As a simple example, I'd like to be able to define a trigger like AFTER DELETE ON foo FOR EACH ROW and have that trigger be invoked not only by a DELETE on foo but also by a TRUNCATE on foo. So I would like to do some auditing action when a row of foo is deleted, no matter how it happens. The reason this particular example in particular is important is that TRUNCATE is documented as a data-manipulation action semantically equivalent to an unqualified DELETE in its effects, primarily. As such, I would expect the same triggers to fire as would for an unqualified DELETE. The reason I propose it be a new kind of trigger is so that then we also retain the ability to declare triggers that fire on DELETE and not on TRUNCATE. Less important, but also nice at least from the ability to be less verbose, is that said trigger could also run when an UPDATE happens, optionally, since an UPDATE can be considered semantically a DELETE+INSERT. But adding the TRUNCATE support is most important because it simply doesn't exist now, while UPDATE you can get just by adding or update. I suggest that the simplest way to add this feature is to just extend the existing syntax for defining a FOR EACH ROW so that TRUNCATE is also an option, besides INSERT/UPDATE/DELETE. In that case, the semantics of the TRUNCATE statement could be altered as follows: Iff TRUNCATE foo is invoked and foo has an TRUNCATE FOR EACH ROW trigger defined on it, then an unqualified DELETE FROM foo will be performed instead with its usual semantics. If such a trigger is not defined on foo, then the old TRUNCATE semantics happen. As such, this case of the feature can be added without breaking anything legacy. So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW, but I'm also proposing the ability to generally define triggers based not on the syntax used but the actual action requested. A tangential feature request is to provide a runtime config option that can cause TRUNCATE to always behave as unqualified DELETE FROM regardless of any triggers, as if it were just a syntactic shorthand. Or alternately/also provide extra syntax to TRUNCATE itself where one can specify which behavior to have, and both options can be given explicitly to override any config option. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
Josh Berkus wrote: I don't think you're listening, none of those things are problems and so not user hostile. Having an upgrade fail for mysterious reasons with a cryptic error message the user doesn't understand isn't user-hostile? Wow, you must have a very understanding group of users. Lemme try to make it clear to you exactly how user-hostile you're being: 1. User downloads 9.2 today. 2. User builds a new application. 3. User finds the doc page on RULEs, decides they're a nifty concept. 4. New application includes some RULEs. snip I have a proposal. Assuming we decide to do away with RULEs, change the *documentation* for RULEs right away in all supported maintenance branches (including 9.2), saying that RULEs will be deprecated, but don't change any code / add any warnings until 9.3. Then, no later than the next bug/security fix minor release, 9.2.2/etc, the documentation for RULEs all says that, yes we have RULEs, but you shouldn't use them on any new projects as they are going away, and you should migrate any existing uses, and uses will warn starting in 9.3.0. This documentation change can also be highlighted in a bullet point in the 9.2.2/etc release announcements. If necessary, also make reference in the docs to some tool or procedure to help find any uses of RULEs and help with the migration. Since this isn't a code change, it should be very conservative and be safe to include in maintenance branches, and it will alert users right where they're most likely to look, the RULEs documentation, without any undue delay. How does that sound? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Successor of MD5 authentication, let's use SCRAM
Stephen Frost wrote: * Josh Berkus (j...@agliodbs.com) wrote: Problem is, the fact that setting up SSL correctly is hard is outside of our control. Agreed, though the packagers do make it easier.. Unless we can give people a run these three commands on each server and you're now SSL authenticating script, we can continue to expect the majority of users not to use SSL. And I don't think that level of simplicity is even theoretically possible. The Debian-based packages do quite a bit to ease this pain. Do the other distributions do anything to set up SSL certificates, etc on install? Perhaps they could be convinced to? This has bit me. At my work we started a project on Debian, using the http://packages.debian.org/squeeze-backports/ version of Postgres 9.1, and it included the SSL out of the box, just install that regular Postgres or Pg client package and SSL was ready to go. And now we're migrating to Red Hat for the production launch, using the http://www.postgresql.org/download/linux/redhat/ packages for Postgres 9.1, and these do *not* include the SSL. This change has been a pain, as we then disabled SSL when we otherwise would have used it. (Though all database access would be over a private server-server network, so the situation isn't as bad as going over the public internet.) How much trouble would it be to make the http://www.postgresql.org/download/linux/redhat/ packages include SSL? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Successor of MD5 authentication, let's use SCRAM
John R Pierce wrote: On 10/12/12 9:00 PM, Darren Duncan wrote: And now we're migrating to Red Hat for the production launch, using the http://www.postgresql.org/download/linux/redhat/ packages for Postgres 9.1, and these do *not* include the SSL. hmm? I'm using the 9.1 for CentOS 6(RHEL 6) and libpq.so certainly has libssl3.so, etc as references. ditto the postmaster/postgres main program has libssl3.so too. maybe your certificate chains don't come pre-built, I dunno, I haven't dealt with that end of things. Okay, I'll have to look into that. All I know is out of the box SSL just worked on Debian and it didn't on Red Hat; trying to enable SSL on out of the box Postgres on Red Hat gave a fatal error on server start, at the very least needing the installation of SSL keys/certs, which I didn't have to do on Debian. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
Josh Berkus wrote: For 9.3, I suggest we create a DDL trigger by default which prevents RULEs and throws an ERROR that explains they are now deprecated. Well, even if we were considering this, the sequence would need to be: 1. Announce in 9.3 that RULES will be going away RSN. 2. In 9.4, send a warning every time someone loads/edits a user-defined RULE. 3. In 10.0, get rid of CREATE RULE. I think we can easily move up the first 2 steps. 1. Announce right now, effectively in 9.2, that RULES are going away soon. 2. In 9.3, send the warning. Then optionally 3. In 9.4 can be where CREATE RULE is removed, or stay with 10.0 there and we have a solid 2 years of warnings instead of one. It seems to me that step 1 is completely outside the release cycle, as it doesn't involve changing any code. Since 9.2 just came out, we can just do #1 as of 9.2. The only reason I see to delay #1 is if we aren't sure we're going to go ahead with it, and give a few months to think about it before announcing this major thing suddenly. Waiting until 9.3 just to make an announcement is silly. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: is JSON really a type (Re: [HACKERS] data to json enhancements)
Hannu Krosing wrote: Reflecting over the dual possible interpretation of what it does mean to convert between text and json data types it has dawned to me that the confusion may come mainly from wanting json to be two things at once: 1. - a serialisation of of a subset of javascript objects to a string. 2. - a dynamic type represented by the above serialisation. case 1 -- If we stick with interpretation 1. then json datatype is really no more than a domain based on text type and having a CHECK is_valid_json() constraint. snip case 2 -- My suggestions on using typecasts for convert-to-json were result of this interpretation of json-as-dynamic-type. Having thought more of this I now think that we probably should leave JSON alone and develop an separate dynamic type here. snip I think it would be best to have 2 main JSON-concerning data types: 1. A proper subset of text consisting of every value meeting some is_valid_json() constraint, as a DOMAIN; every value of this type is a text. 2. A type that is disjoint from text, that is, no value of this type would compare as equal to any text value. It would be considered a collection type, similarly to how an array or tuple or relation is, but having arbitrary depth and that is heterogeneous in the general case. You could say that #1 is to textual source code what #2 is to a parsed syntax tree of that code. Or that #1 is to textual XML what #2 is to an XML DOM. It would be type #2 above that is the primary JSON type, which has all the special operators for working with JSON, while type #1 would be opaque, just a character string, and must be cast as type #2 in order to use any special operators on it. Similarly, all the converting operators between other types and JSON would be with #2 only, and producing #1 must go through #2. So call #1 say JSON_source and #2 say JSON_model, or JSON_text and JSON respectively. That's how I think it should work. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?
That sounds like a good change to me. -- Darren Duncan Craig Ringer wrote: Hi all I'm seeing lots of confusion from people about why: REVOKE CONNECT ON DATABASE foo FROM someuser; doesn't stop them connecting. Users seem to struggle to understand that: - There's a default GRANT to public; and - REVOKE removes existing permissions, it doesn't add deny rules It'd really help if REVOKE consistently raised warnings when it didn't actually revoke anything. Even better, a special case for REVOKEs on objects that only have owner and public permissions could say: WARNING: REVOKE didn't remove any permissions for user blah. This table/db/whatever has default permissions, so there were no GRANTs for user blah to revoke. See the documentation for REVOKE for more information. Opinions? -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: auto updatable views
My thoughts on this is that it would be a very valuable feature to have, and would make Postgres views behave more like they always were intended to behave, which is indistinguishible to users from tables in behavior where all possible, and that the reverse mapping would be automatic with the DBMS being given only the view-defining SELECT, where possible. -- Darren Duncan Dean Rasheed wrote: I've been playing around with the idea of supporting automatically updatable views, and I have a working proof of concept. I've taken a different approach than the previous attempts to implement this feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), instead doing all the work in the rewriter, substituting the view for its base relation rather than attempting to auto-generate any rules or triggers. Basically what it does is this: in the first stage of query rewriting, just after any non-SELECT rules are applied, the new code kicks in - if the target relation is a view, and there were no unqualified INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the view is simply updatable. If so, the target view is replaced by its base relation and columns are re-mapped. Then the remainder of the rewriting process continues, recursively handling any further non-SELECT rules or additional simply updatable views. This handles the case of views on top of views, with or without rules and/or triggers. snip Obviously there's still more work to do but the early signs seem to be encouraging. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] relation complex types
Jaime Casanova wrote: I knew that we create an entry in pg_type for every table we create, what i didn't know is that we actually create 2 entries. for example CREATE TABLE foo (i int); will create types foo and _foo. so, any reason to create 2 entries? I don't know offhand; maybe its the corresponding row/tuple type and table/relation type? anyway, what really kept my attention is that CREATE SEQUENCE also create an entry in pg_type. there is any reason for that? This I'm fairly sure, is due to a sequence generator being implemented as a table/relation-typed variable, so it has a corresponding type like the regular table/relation-typed variables in the database. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transformations between types and languages
Peter Eisentraut wrote: Here is a draft design for the transforms feature, which I'd like to work on. The purpose of this is to allow adapting types to languages. The most popular case is to enable converting hstore to something useful like a dict or a hash in PL/Python or PL/Perl, respectively. In general, the type and the language don't know of each other, and neither need to be in core. Maybe you want to adapt PostGIS types to pygeometry objects in PL/Python (made up example, but you get the idea). This is a good idea in principle. I expect we should be able to use the same syntax both for system-defined types and user-defined types. I would expect, though, that in some common cases one can't avoid say having to call hstore_to_plpython() directly, in order to disambiguate, and we may want to provide terser syntax for using the desired TRANSFORM. For example, if we have a Perl 5 hash, that could reasonably either map to an hstore or to a tuple. Or a Perl 5 string with false utf8 flag could map to either a character string or a byte string. Or a Perl 5 empty string (result of 1==0) could map to the false Boolean. Or a Perl 5 string that looks like a number could map to either a character string or some kind of numeric. Or a Perl 5 number 1 could map to either a numeric 1 (result of 1==1) or the true Boolean. Or we have to tighten the conversion rules so that things which are sometimes equivalent and sometimes not on one side have different interpretations in the transform. Ideally the feature would also work not only for interfacing with PLs but also with client languages, since conceptually its alike but just differing on who calls who. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
Your proposal makes me think of something similar which might be useful, INclusion constraints. As exclusion constraints might be thought of like a generalization of unique/key constraints, inclusion constraints are like a generalization of foreign key constraints. The inclusion constraints basically allow some comparison operator other than is-equal to test if values in one table match values in another table, and the constraint allows the former if the test results in true. An example of said inclusion test is whether the range in one table is contained in a range in another table. I assume/hope that, similarly, now that we have range types in 9.2, that the existing exclusion constraints can be used with range comparison operators. As to your actual proposal, it sounds like a generalization of the relational join or set intersection operator where instead of comparing sets defined in terms of an enumeration of discrete values we are comparing sets defined by a range, which conceptually have infinite values depending on the data type the range is defined over. But if we're doing this, then it would seem to make sense to go further and see if we have set analogies for all of our relational or set operators, should we want to do work with non-discrete sets. Now this sounds interesting in theory, but I would also assume that these could be implemented by an extension in terms of existing normal relational operators, where each range value is a discrete value, combined with operators for unioning or differencing etc ranges. A relation of ranges effectively can represent a discontinuous range; in that case, the empty discontinuous range is also canonically representable by a relation with zero tuples. Jeff, I get the impression your proposal is partly about helping performance by supporting this internally, rather than one just defining it as a SQL function, am I right? -- Darren Duncan Jeff Davis wrote: I hope this is not an inappropriate time for 9.3 discussions. The flip side of asking for submissions in the first couple commitfests means that I need to submit proposals now. What is a Range Join? See attached SQL for example. The key difference is that the join condition is not equality, but overlaps (). Problem statement: slow. Nested loops are the only option, although they can benefit from an inner GiST index if available. But if the join is happening up in the plan tree somewhere, then it's impossible for any index to be available. Proposed solution: a modified merge join that can handle ranges. 1. Order the ranges on both sides by the lower bound, then upper bound. Empty ranges can be excluded entirely. 2. Left := first range on left, Right := first range on right 3. If Left or Right is empty, terminate. 4. If lower(Left) upper(Right), discard Right, goto 2 5. If lower(Right) upper(Left), discard Left, goto 2 6. return (Left, Right) as joined tuple 7. Right := next range on right 8. goto 3 If we get step 4 or step 5 keeps getting triggered, and a btree index is available (ordered by lower bound), we can re-probe to go to the correct position, and consider that the new top range on that side. This is an optimization for the case where there are large numbers of ranges with no match on the other side. Thanks to Nathan Boley for helping me devise this algorithm. However, any bugs are mine alone ;) Weaknesses: I haven't thought through the optimization, but I suspect it will be hard to be very accurate in the costing. That might be OK, because there aren't very many options anyway, but I'll need to think it through. Questions: * Is this idea sane? -- that is, are ranges important enough that people are willing to maintain a new operator? * The more general problem might be spatial joins which can operate in N dimensions, and I doubt this would work very well in that case. Does someone know of a spatial join algorithm (without IP claims) that would be as good as this one for ranges? * Other thoughts? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Pavel Stehule wrote: 2011/10/30 Darren Duncan dar...@darrenduncan.net: I agree that this feature would be quite useful and should be included in SQL. The exact syntax is less of an issue, but just the ability to cleanly say select all columns except for these. I have in fact argued for the same feature in the past. If you want to and can implement this feature then more power to you. I'll look forward to it being in Pg 9.2. I think then the only discussion point should be what (terse) syntax to use for it, not whether the feature should exist at all. Arguing against this feature is like arguing against supporting where not() or except or not in. One should be able to do complements not only of rows but of columns too. Basic good language design. My practice speaks so this is not true - I don't know only bad designed projects or very bad designed projects that needs too. I don't see any reason why do it on SQL level. It can sence only in psql as same special filter - if we would to enhace a report features there. The SQL level is exactly the correct and proper place to do this. Its all about mathematical parity. That is the primary reason to do it. - SELECT * gives you a whole set. - SELECT foo, bar gives you a subset of that. - SELECT ALL BUT foo, bar gives you the complementary subset. There's a variety of uses for specifying complementary subsets, and when the clean syntax is available, people will start using it for cleaner code, even if they previously had workarounds. The complementary subset should be implemented in exactly the same place and manner as the forward subset, on the SQL level. Arguing against this is like arguing against a subtraction operator, because we can emulate using addition plus negation, or saying subtraction should just be a special filter in a client app. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
I think the loose consensus I've picked up from people in this thread is that the ability to tersely specify a complementary subset of columns is something that is simple enough and wouldn't hurt us to have it but that its utility is limited such that a lot of people wouldn't want to do the work to implement it either. Eric B. Ridge wrote: Anyways, looks like it might be an uphill battle to get the idea accepted (let alone any code I write!), but I ain't givin' up just yet. I think this is the bottom line here. The real question to ask ourselves is, if Eric Ridge is willing to do all the work to implement this feature, and the code quality is up to the community standards and doesn't break anything else, then will the code be accepted? Its one thing to argue whether a new small feature is useful enough to go to the trouble to implement, and its another thing to argue whether that feature is harmful enough to reject a free working implementation (of otherwise conforming code quality) from someone who has already gone to the trouble to implement it. Eric, if you want to implement this, I say more power to you, and I will use it. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
David Wilson wrote: On Sun, Oct 30, 2011 at 1:10 AM, Darren Duncan dar...@darrenduncan.net wrote: The SQL level is exactly the correct and proper place to do this. Its all about mathematical parity. That is the primary reason to do it. - SELECT * gives you a whole set. - SELECT foo, bar gives you a subset of that. - SELECT ALL BUT foo, bar gives you the complementary subset. That's not actually entirely true given the usual SQL (and mathematical) meaning of set. This feature relates to the set of attributes returned regarding elements of the returned set, not the set itself- the actual returned set is identical regardless of the column-specifier formulation. Claiming this as an SQL mathematical purity issue is a bit disingenuous, as SQL set manipulation takes place at the member level rather than the attribute level- SQL is otherwise quite explicit about requiring explicit listings of the attributes that the client is interested in regarding a returned set of member rows. SQL rowsets/tables are distinctly sets across two dimensions, both across columns and across rows. You have sets of rows and sets of columns at the same time. And it is useful to slice the rowset/table along either or both dimension, wherein you produce a subset in that dimension. We presently have the terse syntax for specifying both the subset we do want and the subset we want the complement of, for rows, but not for columns. It is true that columns and rows are not the same, but they are both still sets. Arguing against this is like arguing against a subtraction operator, because we can emulate using addition plus negation, or saying subtraction should just be a special filter in a client app. That would be true if this was an argument against WHERE or EXCEPT. Column specification and row specification are very different and cannot be conflated. Well I raised the WHERE/EXCEPT analogy in my initial reply. Not conflating rows and columns is why we have different syntax to work with them. That's not to say this proposal is without merit, merely that your arguments for it are poorly founded and not particularly relevant. I disagree, but regardless, other arguments have been made for the feature that are more based in utility, and I agree with those, how having the feature can save a lot of users a lot of work. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Tom Lane wrote: Darren Duncan dar...@darrenduncan.net writes: The real question to ask ourselves is, if Eric Ridge is willing to do all the work to implement this feature, and the code quality is up to the community standards and doesn't break anything else, then will the code be accepted? It's entirely possible that it will get bounced on standards-compliance grounds. In particular, I don't think it's acceptable to introduce a new reserved keyword for this --- that would fall under the fails to not break anything else category. regards, tom lane Well then we come up with a (SQL-level) syntax for the feature that doesn't introduce new reserved keywords. As I said before, the important thing is to have the feature, and that the exact syntax is the main point to discuss. Postgres already has a number of syntactic features that aren't in the SQL standard and coexist, and so we add one of those. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Eric Ridge wrote: On Sun, Oct 30, 2011 at 4:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's just a gut feeling, I've not tried it ... but the proposed syntax sure looks a lot like a call to a function named EXCLUDING. I think what makes it okay is that its new use is only defined to immediately follow an asterisk in the target_el production. If you look at gram.y:11578 (from git HEAD), I was thinking this: snip That's an interesting thought. I had previously envisioned EXCLUDING to appear exactly once in the SELECT list, where it functions as a column analogy to EXCEPT for rows. So you'd say: SELECT list of stuff to keep EXCLUDING list of stuff to skip FROM ... But then your comment makes me think that your EXCLUDING actually could have a much tighter scope, and as a result might appear multiple times, like AS can: SELECT foo.* EXCLUDING foo.x, bar.* EXCLUDING bar.y, baz.z, (a+b) AS c FROM ... So if you do this, EXCLUDING has syntactic precedence similar to AS, and while AS is only valid not following a *, EXCLUDING is only valid following a *, and so EXCLUDING directly modifies a single * and not the SELECT list in general. Is that where you're going with this? If so, I think that would make the feature even more valuable and more syntactically clean than I had previously thought. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Eric Ridge wrote: I don't actually like the term EXCLUDING, but it conveys what's happening and is already defined as a keyword. I thought about EXCEPT, but that doesn't work for obvious reasons, and NOT might just be confusing. How about BUT? Is that already in use by something? Its nice and short and conveys the except meaning. And there is already precedent for using that word for this purpose. CJ Date already uses ALL BUT in his literature as a modifier to his illustrative relation projection syntax to give the complementary projection, like with r{x,y} vs r{all but x,y}. Also, a more tenuous connection, Larry Wall likes but as logical modifier. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Mark Mielke wrote: On 10/30/2011 03:50 PM, Eric Ridge wrote: Changes of omission can break your code just as easily. I think I wasn't as clear as I intended. In many ways, I think use of * in the first place is wrong for code (despite that I do it as well). Therefore, * EXCLUDING (...) would also be wrong. It comes to does the code know what it wants? snip select * is not deterministic from a programming perspective. I understand what you're saying. However, we're stuck with * because it is in the standard and is widely used, and if we have * anyway, then the exclusion proposal is just an enhancement to that. So there is no reason to reject the complementary columns feature because of the problems with select *; you might as well argue to get rid of select *. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
I agree that this feature would be quite useful and should be included in SQL. The exact syntax is less of an issue, but just the ability to cleanly say select all columns except for these. I have in fact argued for the same feature in the past. If you want to and can implement this feature then more power to you. I'll look forward to it being in Pg 9.2. I think then the only discussion point should be what (terse) syntax to use for it, not whether the feature should exist at all. Arguing against this feature is like arguing against supporting where not() or except or not in. One should be able to do complements not only of rows but of columns too. Basic good language design. -- Darren Duncan Eric Ridge wrote: Would y'all accept a patch that extended the SELECT * syntax to let you list fields to exclude from the A_Star? Quite regularly I'll be testing queries via psql and want to see all the columns from a fairly wide table except maybe a giant text or xml column. A syntax like: SELECT * EXCLUDING (big_col1, big_col2) FROM foo; would be pretty handy. It would definitely save some typing in certain cases. It seems like such a syntax would better document the intent of a query too, rather than leaving one wondering if big_col1 was supposed to be omitted from the target list or not. Anyways, I just wanted to run the idea by youse guys before I put too much more effort into it. I've already made what appear to be the minimum necessary changes to gram.y, and a few quick greps through the code make me think the rest will be pretty easy. Maybe the SQL spec says something about this and nobody's done the work yet? Thanks for your input! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Back-branch releases upcoming this week
Tom Lane wrote: Dave Page dp...@pgadmin.org writes: On Tue, Sep 20, 2011 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: As has been mentioned a couple times, we're well overdue for updates of the back branches. �Seems like time to get that done, so we'll be wrapping 8.2.x and up this Thursday for release Monday the 26th. 8.2 up, including 9.1.1? I'm not sure our QA guys will be able to cope with verification of so many individual installers in that timeframe - 8.2 - 9.0 is hard enough to do in one go. Well, all the pre-9.1 branches are definitely badly in need of updates. 9.1 maybe could go without at this point, but we do have one crasher bug and one serious memory leak fixed there, neither new in 9.1. I'd just as soon not establish a precedent for not releasing the same fixes at the same time in all branches. How about we wrap them all, but you could let your team slip the 9.1 update for a day or so if they need more time? It's certainly less critical than the older branches. I would prefer that all branches have synchronized patch releases as they seem to have had in the past, and that the latest production is included, 9.1.1 in this case, even if its change set is more minor. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
FYI, one of the main goals of the Muldis D language is to be an open source SQL standard. It is intended to satisfy both relational and NoSQL folks, and predates UnQL significantly. Muldis D has always been published openly and is comprehensive enough to cover anything that SQL does, and anyone is welcome to improve it. Moreover, this standard has built-in resilience against embrace, extend and extinguish by including explicit versioning with authorities (Perl 6 inspired that feature), so that if anyone forks the language, it is possible for the different versions to be easily distinguishable and non-conflicting, and in a more benign respect it is designed to be extensible so DBMSs can be free to evolve under it, adding unique features, while not causing compatibility conflicts with other DBMSs in the process. Note that I have fallen behind in specifying a number of intended significant design improvements/simplifications to the spec proper, though much of this is hashed out in the laundry list TODO_DRAFT file in github. -- Darren Duncan Joe Abbate wrote: On 09/19/2011 12:40 PM, Christopher Browne wrote: On Mon, Sep 19, 2011 at 12:20 PM, David Fetter da...@fetter.org wrote: Actually, I think it *is* a bad idea, as it would require construction from whole cloth of kinds of mostly political infrastructure that we don't have, as a community and aren't necessarily notably competent to construct. The nearest sort of thing that *could* conceivably be sensible would be to participate in UnQL http://www.unqlspec.org/display/UnQL/Home. That's early enough in its process that it's likely somewhat guidable, and, with the popularity of NoSQL, being at the ground breaking of a common query language to access that would likely be useful to us. If we wanted to start a new standards process, I imagine it would best involve embracing truly relational, stepping back to PostQUEL, and promoting a standard based on something off more in that direction. If I were looking for something truly relational I wouldn't go towards JSON or NoSQL, I'd go with something like Dee (http://www.quicksort.co.uk/ ) which IIRC were interested in building a PostgreSQL inteface. As much as that might sound like a terrible idea, trying to take over SQL by forking it strikes me as a much *worse* idea. My intention was not to take over anything. I only think it may be useful to discuss SQL features, informally or otherwise, with other open source competitors such as SQLite, MySQL (brethren), Firebird, etc., and Josh, having been close to the MySQL camp (even physically, from what I recall :-) is possibly well suited to start that discussion. Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Returned Array
David E. Wheeler wrote: On Aug 12, 2011, at 6:17 PM, Alex Hunsaker wrote: Anyway, the attached patch fixes it for me. That is when we don't have an array state, just return an empty array. (Also adds some additional comments) Fix confirmed, thank you! +1 to getting this committed before the next beta/RC. Policy question. If this problem hadn't been detected until after 9.1 was declared production, would it have been considered a bug to fix in 9.1.x or would it have been delayed to 9.2 since that fix might be considered an incompatibility? If the latter, then I'm really glad it was found now. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Christopher Browne wrote: Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1, there's a pretty good reason NOT to support that, namely that this breaks relational handling of tables. PostgreSQL is a *relational* database system, hence it's preferable for structures to be relational, as opposed to hierarchical, which is what any of the suggested nestings are. I won't argue with whether or not nested naming is a good idea, but I will argue with your other comment about breaking relational handling. A relational database is a database in which all data is kept in relation-typed variables, which SQL calls tables, and you can perform all queries and updates with just relation-valued expressions and statements. Organizing the tables into a multi-level namespace, either fixed-depth or variable-depth, rather than using a flat namespace, does not make the database any less relational, because the above definition and any others still hold. The less relational argument above is a red herring or distraction. One can argue against namespace nesting just fine without saying that. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
I will put my support for David Johnston's proposal, in principle, though minor details of syntax could be changed if using ! conflicts with something. -- Darren Duncan David Johnston wrote: On Mon, Jul 11, 2011 at 10:12 AM, Florian Pflug f...@phlo.org wrote: On Jul11, 2011, at 07:08 , Darren Duncan wrote: Christopher Browne wrote: Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1, there's a pretty good reason NOT to support that, namely that this breaks relational handling of tables. PostgreSQL is a *relational* database system, hence it's preferable for structures to be relational, as opposed to hierarchical, which is what any of the suggested nestings are. Rather, the argument is that it was intentional for the structuring of table naming to, itself, be relational, and changing that definitely has some undesirable characteristics. The need for recursive queries is the most obvious undesirable, but it's not the only undesirable thing, by any means. I do not see how recursive queries (really iteration of records) even enters the picture... Right now I can emulate a hierarchical schema structure via a naming scheme - for example schemabase_sub1_sub2_etc. I am simply looking for a formal way to do the above AND also tell the system that I want all schemas under schemabase to be in the search path. Heck, I guess just allowing for simply pattern matching in search_path would be useful in this case regardless of the presence of an actual schema hierarchy. Using LIKE syntax say: SET search_path TO schemabase_sub1_% or something similar. The only missing ability becomes a way for graphical tools to represent the schema hierarchy using a tree-structure with multiple depths. I can see how adding . and .. and relative paths would confuse the issue those are not necessary features of a multi-level schema depth. The above, combined with a different separator for intra-level namespace/schema delineation, would allow for an unambiguous way to define and use a hierarchical schema with seemingly minimal invasion into the current way of doing things. You could almost implement it just by requiring a specific character to act as the separator and then construct the actual schema using single-level literals and supporting functions that can convert them into an hierarchy. In other words, the schema table would still only contain one field with the full parent!child as opposed to (schema, parent) with (VALUES('parent',null),('child','parent')). In other words, if we use ! as the separator, any schema named parent!child could be stored and referenced as such but then if you run a getChildren(parent) function it would return child along with any other schemas of the form parent!%. In this case the % sign could maybe only match everything except ! and the * symbol could be used to match ! as well. I could give more examples but I hope the basic idea is obvious. The main thing is that the namespace hierarchy usage is standardized in such a way that pgAdmin and other GUI tools can reliably use for display purposes and that search_path can be constructed in a more compact format so that every schema and sub-schema is still absolutely referenced (you can even have the SET command resolve search_path at execution time and then remain static just like CREATE VIEW SELECT * FROM table. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: If for some reason we needed to have tables that happened to be called x.y.z and a.b.c accessible from a single SQL session, we could allow that much more simply by allowing schemas to be nested. Then we could allow arbitrary numbers of levels, not just three. FWIW, I actually tried to do that back when we first introduced schema support (the fact that the code calls them namespaces and not schemas is a leftover from that idea). It turns out to be a whole lot harder than it sounds, because of the ambiguity you get about which name goes at what level. A simple example of this is: if you write x.y in a query, is that meant to be table x's column y, or is it meant to be field y within a composite column x of some table in the query? We've resolved that by requiring you to write (x).y when you mean the latter, but it's not exactly an intuitive or pleasant answer. In the same way, if namespaces can be nested to different levels, it gets really messy to support abbreviations of any sort --- but the SQL spec requires us to be able to do so. What if you used the context of the calling code and resolve in favor of whatever match is closest to it? The problem is related to general-purpose programming languages. Basically start looking in the lexical context for an x and if you find one use that; otherwise, assuming we're talking about referencing code that lives in the database such as a function, look at the innermost schema containing the referencing code and see if it has a direct child named x; otherwise go up one level to a parent schema, and so on until you get to the top, and finding none by then say it doesn't exist. If there are several x in this search sequence, only use the first one regardless of whether it has a y, so to prevent bugs from too much complexity. Same for just looking for x by itself in fact, not just an x.y. For the case of calling code that doesn't live in the database such as a client-side query, I believe there are session variables like current schema or such, and you can use this as the starting point for the search for x, looking first at what that schema directly contains, and then its parent, and so on. Something like that. Or ignore what I said about starting in a lexical context and do what you already do there, but keep what I said about relative order of schemas to search, only searching direct children of ancestors of the current code's context schema starting with the current context. You could also come up with some relative name syntax such as filesystems support with their ../ and such, but that's further from standard SQL. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Jeff Davis wrote: On Fri, 2011-07-08 at 23:39 -0700, Darren Duncan wrote: What if you used the context of the calling code and resolve in favor of whatever match is closest to it? The problem is related to general-purpose programming languages. Basically start looking in the lexical context for an x and if you find one use that; otherwise, assuming we're talking about referencing code that lives in the database such as a function, look at the innermost schema containing the referencing code and see if it has a direct child named x; otherwise go up one level to a parent schema, and so on until you get to the top, and finding none by then say it doesn't exist. This is an example of where data languages and normal programming languages have a crucial difference. With a data language, you have this problem: 1. An application uses a query referencing 'y.z.foo' that resolves to internal object with fully-qualified name 'x.y.z'. 2. An administrator creates object 'y.z.foo'. Now, the application breaks all of a sudden. In a normal prgramming language, if the schema of the two foos are different, the compiler could probably catch the error. SQL really has no hope of catching it though. PostgreSQL has this problem now in a couple ways, but it's much easier to grasp what you might be conflicting with. If you have multiple nested levels to traverse and different queries using different levels of qualification, it gets a little more messy and I think a mistake is more likely. Well, my search path suggestion was based on Tom Lane's comment that the SQL spec requires us to be able to [support abbreviations] and I expected it would be syntactically and semantically backwards compatible with how things work now. FYI, with Muldis D, being more green fields, there are no search paths in the general case, and every entity reference is unambiguous because it has to be fully-qualified. However, I also support relative references, and in fact require their use for references within the same database, which carries a number of benefits, at the cost of being a few characters more verbose than when using a search path. So introducing new things with the same names in different namespaces won't break anything there, even if they are closer. Its essentially like navigating a Unix filesystem but with . rather than /. So for example, if you had 2 sibling schemas s1 and s2, each with 2 functions f1,f2 and a table t, then s1.f1 would reference s1.f2 and s1.t as sch.lib.f2 and sch.data.t respectively, while s1.f1 would refer to the entities in s2 as sch.par.s2.lib.f1 and sch.par.s2.data.t and such (a function can also refer to itself anonymously as rtn if it's recursive). The sch is like . in Unix and the par is like .. in Unix. The data is for data tables or views (and cat is for catalog tables/views) while lib is for user-defined types, routines, constraints, etc (and sys is for built-in types and routines, but sys may be omitted and search paths exist just for built-ins). Synonyms are also supported. I don't expect you would adopt relative (fully-qualified) references, because the syntax isn't in standard SQL (I think), but I did. Unless you like them and can come up with a syntax that will fit into how SQL does things. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Jeff Davis wrote: On Thu, 2011-07-07 at 20:56 -0700, Darren Duncan wrote: When you create a temporary table, PostgreSQL needs to add rows in pg_class, pg_attribute, and probably other system catalogs. So there are writes, which aren't possible in a read-only transaction. Hence the error. And no, there is no workaround. That sounds like a deficiency to overcome. It should be possible for those system catalogs to be virtual, defined like union views over similar immutable tables for the read-only database plus mutable in-memory ones for the temporary tables. Ideally, yes, from a logical standpoint there are catalog entries that are only interesting to one backend. But that doesn't mean it's easy to do. Remember that catalog lookups (even though most go through a cache) are a path that is important to performance. Also, more complex catalog interpretations may introduce some extra bootstrapping challenges. Are there any plans in the works to do this? I don't think so. It sounds like some fairly major work for a comparatively minor benefit. Suggestions welcome, of course, to either make the work look more minor or the benefits look more major ;) What I said before was a simplification; below I present my real proposal. I think an even better way to support this is would be based on Postgres having support for directly using multiple databases within the same SQL session at once, as if namespaces were another level deep, the first level being the databases, the second level the schemas, and the third level the schema objects. Kind of like what the SQL standard defines its catalog/schema/object namespaces. This instead of needing to use federating or that contrib module to use multiple Pg databases of the same cluster at once. Under this scenario, we make the property of a database being read-only or read-write for the current SQL session associated with a database rather than the whole SQL session. A given transaction can read from any database but can only make changes to the ones not read-only. Also, the proper way to do temporary tables would be to put them in another database than the main one, where the whole other database has the property of being temporary. Under this scenario, there would be separate system catalogs for each database, and so the ones for read-only databases are read-only, and the ones for other databases aren't. Then the system catalog itself fundamentally isn't more complicated, per database, and anything extra to handle cross-database queries or whatever, if anything, is a separate layer. Code that only deals with a single database at once would be an optimized situation and perform no worse than it does now. Furthermore, federating databases is done with the same interface, by adding remote/foreign databases as extra databases at the top level namespace. Fundamentally, a SQL session would be associated with a Pg server, not a database managed by such. When one starts a SQL session, there are initially no databases visible to them, and the top-level namespace is empty. They then mount a database, similarly to how one mounts an OS filesystem, by providing appropriate connection info, either just the database name or also user/pass or also remote host etc as is applicable, these details being the difference between using a local/same-Pg-cluster db or a remote/federated one, and the details also say whether it is temporary or initially read-only etc. See also how SQLite works; this mount being analogous to their attach. Such a paradigm is also how my Muldis D language interfaces databases; this is the most flexible, portable, extensible, optimizable, and elegant approach I can think of. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Jeff Davis wrote: On Thu, 2011-07-07 at 23:21 -0700, Darren Duncan wrote: I think an even better way to support this is would be based on Postgres having support for directly using multiple databases within the same SQL session at once, as if namespaces were another level deep, the first level being the databases, the second level the schemas, and the third level the schema objects. snip Then the system catalog itself fundamentally isn't more complicated, per database, and anything extra to handle cross-database queries or whatever, if anything, is a separate layer. Code that only deals with a single database at once would be an optimized situation and perform no worse than it does now. One challenge that jumps to mind here is that an Oid would need to become a pair (catalog, oid). Even if the end result isn't much more complex, getting there is not trivial. Yes, but that would just be in-memory or in temporary places external to every database. On disk internal to a database there would just be the oid. In fact, another aspect of the database model I defined is that each database is entirely self-contained; while you can do cross-database queries, you don't have cross-database constraints, in the general case. See also how SQLite works; this mount being analogous to their attach. I'm not sure SQLite is the best example. It has a radically different architecture. Still, its an example I know of where you can access several clearly separable databases at once through a common namespace. While one might argue this is a substitute for multiple schema support, I don't because with multiple schemas you can have integrity constraints that cross schemas. The namespaces issue is largely orthogonal to self-containment or integrity in my model. But look at Oracle too, at least how I understand it. Oracle supports CONNECT TO ... AUTHORIZE .../etc SQL, meaning you can define what databases you are accessing within the SQL session, rather than having to do it externally. I assume that Oracle's features correspond somewhat to my proposal, and so enable cross-database queries in the illusion that several databases are one. Suffice it to say, I have thought through my proposed model for years, with one of its (and Muldis D's) express purposes in providing a common normalized paradigm that all the existing SQL DBMSs can map to with consistent behavior whether Oracle or SQLite, and I haven't stated all of it here (a lot more is in my published language spec). Key mapping points are the boundaries of a database's self-definability. And namespace nesting is actually arbitrary-depth, so accounting for everything from no native schema support to schema plus package namespace support. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Robert Haas wrote: But if that's what you want, just don't put your data in different databases in the first place. That's what schemas are for. If for some reason we needed to have tables that happened to be called x.y.z and a.b.c accessible from a single SQL session, we could allow that much more simply by allowing schemas to be nested. Then we could allow arbitrary numbers of levels, not just three. The whole point of having databases and schemas as separate objects is that they do different things: schemas are just containers for names, allowing common access to data, and databases are completely separate entities, allowing privilege separation for (say) a multi-tenant hosting environment. We're not going to throw out the latter concept just so people can use two dots in their table names instead of one. I agree with what you're saying in general and that schema namespaces should be nestable to arbitrary levels. One dot or two isn't an issue I have. Dividing based on databases or on schemas is a big and important distinction. I see that the semantic purpose of using multiple databases is to allow things to be completely independent and self-defined, where one can understand the meaning of any one database in isolation. So one can take each of the 2 databases and walk off with them in opposite directions, and each can still be used and understood. Whereas, schemas are namespaces for organizing entities within a single database where any of those entities may be interdependent, such as defining a data type in one schema and using it as the declared type with a routine or table or constraint in another. But just because you use multiple databases in order for them to be independent, sometimes one still wants to use them together, and an abstraction loosely like federating is useful here. I think you should make more of an effort to understand how the system works now, and why, before proposing radical redesigns. Well yes, of course. But that will take time and I think I already understand enough about it to make some useful contributions in the meantime. How much or what I already know may not always come across well. If this bothers people then I can make more of an effort to reduce my input until I have more solid things to back them up. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
Guillaume Lelarge wrote [on pgsql-general]: On Thu, 2011-07-07 at 16:01 +, mike beeper wrote [on pgsql-general]: I have a function that creates a temp table, populate it with results during intermediate processing, and reads from it at the end. When the transaction is marked as read only, it does not allow creation of temp table, even though there are no permanent writes to the db. Are there any workarounds? The following block errors out. SET TRANSACTION ISOLATION LEVEL READ COMMITTED READ ONLY; create temp table test(test int); When you create a temporary table, PostgreSQL needs to add rows in pg_class, pg_attribute, and probably other system catalogs. So there are writes, which aren't possible in a read-only transaction. Hence the error. And no, there is no workaround. That sounds like a deficiency to overcome. It should be possible for those system catalogs to be virtual, defined like union views over similar immutable tables for the read-only database plus mutable in-memory ones for the temporary tables. Are there any plans in the works to do this? On the other hand, if one can have lexical-scope tables (table-typed routine variables), and I know Pg 8.4+ has named subqueries which handle a lot of cases where temp tables would otherwise be used, I would certainly expect those to work when you're dealing with a readonly database. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
Jeff Davis wrote: On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote: I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any ,,etc. The built-in generic text type would have exactly 1 system-defined collation that can't be changed, and it would be something simple and generic, such as simply sorting on the codepoint as integers. Well, we're trying to support SQL, and SQL supports collations, so I don't think we can just ignore that. I'm not saying you can't support collations. See also my reply to Tom. I also agree with Tom that it's not a good idea. My reasons are: * Practical considerations, such as having a bunch of cruft from duplicated types all over the system. With sufficient changes to the type system, maybe that could be overcome. Or perhaps domains could be used to make that work for range types (sort of), but the result would not be very consistent with the rest of the system. Yes, duplication can be avoided. * It doesn't seem to be based in any mathematical argument. A type is a set of values, and there's no reason it can't have several total orders; or no total order at all. So it appears to just be piggybacking on the type system infrastructure as a place to hold the metadata for a total order. Yes, I agree that a type is a set of values, and a type can have 0..N total orders. My proposal is just that, for those types that have at least 1 total order, exactly 1 of those is defined to be used implicitly in contexts where a total order is desired and no explicit collation is given, such as in ranges. * Who's to say that a compare function is the only way to specify a total order? There might be other interfaces that would support something closer to a lexicographic sort. So, from a theoretical standpoint, trying to attach a single notion of total order to a type seems strange, because there might be multiple interfaces for specifying even one total order. Thank you for bringing this up, the notion of multiple interfaces for specifying even one total order. My example of a compare function was just an example, and it is valuable to consider that this may not be the only way to do it. * It would require extra explicit type annotations. If you have 12 text types, the only way to practically use any text type is to constantly specify which more-specific text type it actually is (probably using the :: operator). That is not necessarily a bad choice if starting a language from scratch and forming the syntax in a way that it's reasonable to do. But this is SQL, and lots of type annotations are un-SQL-like. Well sometimes it doesn't hurt to suggest solutions from the point of view that one can start the language from scratch, because that provides a clean way to conceptualize and explain a feature. And then people can find some middle ground that adapts benefits from that idea for the feature without changing SQL more than needed. Witness the various improvements to Perl 5 that were first expressed in terms of Perl 6. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
Jeff Davis wrote: On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote: Tom Lane wrote: Darren Duncan dar...@darrenduncan.net writes: I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any ,,etc. We've spent years and blood on making sure that Postgres could support multiple orderings for any datatype; and there are plenty of natural examples for the usefulness of that. So I'm not at all impressed by any line of reasoning that starts out by baldly throwing that away. I'm not saying that you can't use multiple orderings with a data type. I'm just saying that the type only has *at most* one (possibly none) *native* ordering, which is what is used when you do something ordered-sensitive with the type, such as have a range. So, are you saying that it would be impossible to have a range that uses a different ordering? What about ORDER BY? What about BTrees? And if those things can use different orders for the same type, then what is the difference between what you are suggesting and a default ordering for the type (which we already support)? I suppose it's hard to tell what you mean by native. Regards, Jeff Davis Maybe I'm just talking about default ordering then. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
To eludicate my earlier comments on this subject ... I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any ,,etc. The built-in generic text type would have exactly 1 system-defined collation that can't be changed, and it would be something simple and generic, such as simply sorting on the codepoint as integers. When we want to have some other native ordering for an existing type, such as when we want to use a different text collation, we do this by creating a *new base type*, using CREATE TYPE or some shorthand thereof, and this new type defines its own ordering, such as that a particular text collation is used. For example: CREATE TYPE text__en_US AS (v text); CREATE TYPE text__C AS (v text); These will not compare equal to each other or to text, and that is good, because having a different text collation implies that we consider 'foo'::text__en_US and 'foo'::text__C to be different values. I believe that any other approach is worse, and in particular I believe that creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes of some other type, whose set of member values is a subset of the other type's values, and that have the same ordering. Multiple CREATE type over the same base type don't interfere with each other like multiple DOMAIN could. Assuming that what CREATE TYPE produces is actually a base type, I believe there is no better solution using the facilities that SQL provides. If there is concern about performance related to CREATE TYPE being a composite type, I'm sure it is possible to engineer an optimization for when the type has just 1 attribute so that performance isn't an issue. The main point I'm trying to raise here is about semantics and good type systems. Likewise, don't let concern about syntax for using values of such composite types. Once again, there can be shorthands if necessary. In fact, Postgres could provide a general shorthand for creating a composite type of 1 attribute whose purpose is to make one type that is like but unequal to another, and using this shorthand could also cause the composite type to overload/polymorph all the operators of it's attribute type, so that the syntax to define one is very short. For example: CREATE TYPE text__en_US WRAPS text COLLATE en_US; ... and I assume the name of that attribute would just be system-defined. Note that the above is specific to wrapping text, and the COLLATE is just shorthand for defining an ordering function for text__en_US. A more general form could be: CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...; And then we can say: RANGE OF text__en_US RANGE OF text ... similarly to how we declare array types with ARRAY. One can also just define range values as they do array values, such as like this: range('foo','bar') # default collation range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation If that seems verbose, I have a few words for you: 1. Users should in practice name their wrapper types over their intended meaning, not their mechanics, such as like this (not using text for variety), and that may be more terse: CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default 2. If the wrapper types overload the base operators, either automatically or selectively (does it make sense to multiply an acct_num?), one doesn't have to keep unpacking and packing them to use them in most cases. For example, I'd expect many text wrappers to polymorph catenation or substring etc, so no extra syntax. 3. In practice, most literal values come from applications and are given to SQL code either as function parameters or bind parameter arguments. While lots of example code may have literal values in it, I would think that most real-work code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US for example. You'd more likely see the less common var::text__en_US or such. So that's my position, CREATE TYPE on the regular types or the like is the best solution, and anything else is an inferior solution. Such a design is also how I do collations and ranges in my Muldis D language. -- Darren Duncan Jeff Davis wrote: Different ranges over the same subtype make sense when using different total orders for the subtype. This is most apparent with text collation, but makes sense (at least mathematically, if not practically) for any subtype. For instance: [a, Z) is a valid range in en_US, but not in C, so it makes sense to have multiple ranges over the same subtype with different collations. But what if you have a function (like a constructor), of the form: (anyelement, anyelement) - anyrange ? To work with the type system, you need to be able to figure out
Re: [HACKERS] Range Types, constructors, and the type system
Tom Lane wrote: Darren Duncan dar...@darrenduncan.net writes: I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any ,,etc. We've spent years and blood on making sure that Postgres could support multiple orderings for any datatype; and there are plenty of natural examples for the usefulness of that. So I'm not at all impressed by any line of reasoning that starts out by baldly throwing that away. I'm not saying that you can't use multiple orderings with a data type. I'm just saying that the type only has *at most* one (possibly none) *native* ordering, which is what is used when you do something ordered-sensitive with the type, such as have a range. To be specific, if the type system supports a concept like Perl 6 roles (or other languages have similar concepts) where types can declare themselves members of a union type such as Ordered, then types of that union would have the native ordering and other types wouldn't and then generic range operators could be declared over ANYORDERED or such. When we want to have some other native ordering for an existing type, such as when we want to use a different text collation, we do this by creating a *new base type*, Nope. This has all sorts of problems that you're conveniently ignoring, beginning with the need to duplicate all of the infrastructure for the type (such as non-ordering-related operators), and then moving into difficulties arising from added ambiguity as to which operator is meant. Well a related solution is to have exactly 1 text wrapper type which has 2 attributes, one being the text value and the other being the collation name. Then you just have 1 type that does the job instead a separate one per collation. But to keep the semantics, the collation name is part of the identity of the type. For example: CREATE TYPE collated_text AS (t text, c collation); The key point I'm trying to support is that collation issues are firmly attached to the text type, not the range type. Anyway, if a better solution can be arrived at for the problem at hand, then good for the team; meanwhile, what I've proposed is the best one I can think of. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Tom Lane wrote: Florian Pflug f...@phlo.org writes: On Jun20, 2011, at 19:16 , Merlin Moncure wrote: On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote: hm, what if there *was( only one range type per base type, but in the various contexts where specific ordering and collation was important you could optionally pass them in? Meaning, the specific ordering was not bound rigidly to the type, but to the operation? I suggested that previously here http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php In the ensuing discussion, however, it became clear that by doing so range types become little more than a pair of values. More specifically, a range then *doesn't* represent a set of values, because whether or not a value is in the range depends on a specific sort order. Yeah, that doesn't seem like the way to go. If a range value doesn't represent a well-defined set of base-type values, we lose a lot of the mathematical underpinnings for range operations. So ... just how awful would it be if we hard-wired range types to always use their base type's default btree sort ordering and the database's default collation? In principle that sucks, but I'm not sure how wide the use-cases actually will be for other choices. The other viable alternative seems to be to require those two properties (btree opclass and collation) to be part of a specific range type definition. The complaint about that seemed to be that we couldn't infer an ANYRANGE type given only ANYELEMENT, but could we alleviate that by identifying one range type as the default for the base type, and then using that one in cases where we have no ANYRANGE input? regards, tom lane I still think that the most elegant solution is for stuff like collation to just be built-in to the base types that the range is ranging over, meaning we have a separate text base type for each text collation, and the text operators are polymorphic over all those base types. Having collations and stuff as something off to the side not built-in to text/etc types is the root of the problem. The range-specific stuff can remain ANYELEMENT and no special-casing is required. Also, besides range constructors, a generic membership test like value in range is polymorphic. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Florian Pflug wrote: On Jun20, 2011, at 20:58 , Tom Lane wrote: Darren Duncan dar...@darrenduncan.net writes: I still think that the most elegant solution is for stuff like collation to just be built-in to the base types that the range is ranging over, meaning we have a separate text base type for each text collation, and the text operators are polymorphic over all those base types. Having collations and stuff as something off to the side not built-in to text/etc types is the root of the problem. I tend to agree that this aspect of the SQL standard isn't terribly well designed, but it's the standard and we're stuck with it. We're not going to support two parallel methods of dealing with collations. Plus, you can always define a DOMAIN for every collation you intent to use, and stay clear of COLLATE clauses except as part of these domain definitions. Most interestingly, this is also the workaround Jeff Davis suggested for those who absolutely need two range types over the same base type (i.e. define one of the ranges over a domain). best regards, Florian Pflug That DOMAIN-based solution ostensibly sounds like a good one then, under the circumstances. What I *don't* want to see is for things like ranges to have their own collations and the like. From the perspective of all range-specific things, the types over which they're defined like text should just have their own native ordering, which defines the range's sense of before and after. If DOMAIN effectively does that for text types, then that is the way to go. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Robert Haas wrote: On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug f...@phlo.org wrote: I think the collation is going to have to be baked into the type definition, no? You can't just up and change the collation of the column as you could for a straight text column, if that might cause the contents of some rows to be viewed as invalid. Now you've lost me. If a text range is simply a pair of strings, as I suggested, and collations are applied only during comparison and RANGE_EMPTY(), why would the collation have to be baked into the type? If you're referring to the case (1) Create table with text-range column and collation C1 (2) Add check constraint containing RANGE_EMPTY() (3) Add data (4) Alter column to have collation C2, possibly changing the result of RANGE_EMPTY() for existing ranges. then that points to a problem with ALTER COLUMN. No, I'm saying that you might have a column containing '[a, Z)', and someone might change the collation of the column from en_US to C. When the collation was en_US, the column could legally contain that value, but now that the collation is C, it can't. ALTER TABLE isn't going to recheck the validity of the data when someone changes the collation: that's only supposed to affect the sort order, not the definition of what is a legal value. You can have the same collation problem even without range types. Consider the following: (1) Create table with the 2 text columns {L,R} and both columns have the collation en_US. (2) Add check constraint requiring L = R. (3) Add a record with the value 'a' for L and 'Z' for R. (4) Alter the columns to have the collation C. Good language design principles demand that the semantics for this simplified case and the semantics for replacing {L,R} with a single range-of-text-typed column be the same, including what happens with CHECK and ALTER TABLE. Likewise, anything that affects ORDER BY should affect {,,=,=} and friends the same way and vice-versa and likewise should affect range validity. It makes sense for collation to be considered part of text data types, and changing collation is casting from one text type to another. Generally speaking, any inherent or applied aspect of a text or other value (such as collation) that affects the results of any deterministic operations on those values (such as sorting) should be considered part of the data type of those values. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Jeff Davis wrote: On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: right. hm -- can you have multiple range type definitions for a particular type? In principle, sure, if the type has multiple useful sort orderings. Right. Additionally, you might want to use different canonical functions for the same subtype. I don't immediately see any core types for which we'd bother. Agreed. BTW, Jeff, have you worked out the implications of collations for textual range types? Well, it seems to work is about as far as I've gotten. As far as the implications, I'll need to do a little more research and thinking. But I don't immediately see anything too worrisome. I would expect ranges to have exactly the same semantics as ORDER BY or etc with respect to collations for textual range types. If collation is an attribute of a textual type, meaning that the textual type or its values have a sense of their collation built-in, then ranges for those textual types should just work without any extra range-specific syntax, same as you could say ORDER BY without any further qualifiers. If collation is not an attribute of a textual type, meaning that you normally have to qualify the desired collation for each order-sensitive operation using it (even if that can be defined by a session/etc setting which still just ultimately works at the operator rather than type level), or if a textual type can have it built in but it is overridable per operator, then either ranges should have an extra attribute saying what collation (or other type-specific order-determining function) to use, or all range operators take the optional collation parameter like with ORDER BY. Personally, I think it is a more elegant programming language design for an ordered type to have its own sense of a one true canonical ordering of its values, and where one could conceptually have multiple orderings, there would be a separate data type for each one. That is, while you probably only need a single type with respect to ordering for any real numeric type, for textual types you could have a separate textual type for each collation. In particular, I say separate type because a collation can sometimes affect differently what text values compare as same, as far as I know. On a tangent, I believe that various insensitive comparisons or sortings are very reasonably expressed as collations rather than some other mechanism, eg if you wanted sortings that compare different letter case as same or not, or with or without accents as same or not. So under this elegant system, there is no need to ever specify collation at the operator level (which could become quite verbose and unweildy), but instead you can cast data types if you want to change their sense of canonical ordering. Now if the various text-specific operators are polymorphic across these text type variants, users don't generally have to know the difference except when it matters. On a tangent, I believe that the best definition of equal or same in a type system is global substitutability. Ignoring implementation details, if a program ever finds that 2 operands to the generic = (equality test) operator result in TRUE, then the program should feel free to replace all occurrences of one operand in the program with occurrences of the other, for optimization, because generic = returning TRUE means one is just as good as the other. This assumes generally that we're dealing with immutable value types. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Jeff Davis wrote: On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote: Can Pg be changed to support . in operator names as long as they don't just appear by themselves? What would this break to do so? Someone else would have to comment on that. My feeling is that it might create problems with qualified names, and also with PG's arg.function call syntax. With respect to qualified names or arg.function, then unless the function can be symbolic, I considered your examples to be the appear by themselves, hence . by itself wouldn't be a new operator, and I generally assumed here that any multi-character operators with . to be symbolic. In any event, I also saw Tom's reply about DOT_DOT being a token already. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Jeff Davis wrote: On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote: Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? I believe that ranges aka intervals are widely useful generic types, next after relations/tuples/arrays, and they *should* be supported in core, same as arrays are. I think we all agree that ranges are important. I am not suggesting that we sacrifice on the semantics to make it an extension; I'm just trying to see if involving extensions for some of the approximately 5000 lines would be a good idea. Generally speaking, the best way to go about this is to define the *generic* data type in the core, and leave most operators to extensions. So, in core, we need to have the way to select a range value over ANYTYPE either completely as a value literal or in terms of endpoint values from arbitrary expressions or variables, store the range value in a database, retrieve it, and access its component attributes (endpoints, open/closed) in user-defined constraint and operator definitions. The fundamental value of ranges is the fact that they're a concise way to store and express an interval over an ordered type, and to either compare such intervals or test whether individual values or sets of values are in intervals. And people do that a *lot* (such as with dates), so I see having this range type, which is generic and orthogonal to other types in the same way as arrays or tables are, in core just makes the most sense, and as previously illustrated, ranges are useful in places one might not always think about. Ranges are also much more flexible than BETWEEN for what it does, because AFAIK you can't indicate open or closed with BETWEEN. You should not need to define separate range types or operators for each ordered type, same as you should not have to do so for arrays, or where such functionality is defined should be similar; whatever functionality for arrays you do or don't define in core, do corresponding things for ranges. Now assuming that a range/interval value is generally defined in terms of a pair of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or {,,=,=} etc or LIMIT makes sense), it will be essential that this value is capable of distinguishing open and closed intervals. Right, it already does that explicitly. I'd appreciate your input on some of the previous discussion though. On this note, here's a *big* thing that needs discussion ... Citing this whole FOREACH talk, we need to recognize that this talk about ranges is actually being overloaded for 2 very distinct concepts, which are probably best dealt with separately, possibly as distinct types. This discussion came up in the development of Perl 6 too, and that discussion is probably worth looking into. Ranges/intervals in the general sense can *not* be used to enumerate a list of values in a standard type-sensical manner, such as FOREACH requires. Ranges/intervals are about *comparison*, meaning combinations of tests of how 2 arbitrary values of an ordered type sort relative to each other, and that's it. This usage works for integers, other numbers, strings, dates, and so on, all in a natural manner. Value enumeration, such as in a FOREACH, is a *separate* concept. The comparison and enumeration tasks have distinct sets of operators and are used in distinct contexts. Enumeration requires next/prev-value operators, while ranges/intervals in general do not. Enumeration requires discrete types (or the faking of such) like integers while ranges work for continuous types. Moreover, in practice, one probably wants enumerations to be more flexible than just monotonic increases. With enumerations you'd probably want to start go top-down or bottom-up, you might want to increase geometrically or by some other formula rather than incrementally. I totally agree with sharing syntax and using ranges/intervals to define sequence generators, but a range value should be considered immutable like a number or string while a sequence generator may mutate. For syntax, one could use x..y to define an interval while x...y for a sequence generator, or that's what Perl 6 does. See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that talks about how Perl 6 does ranges. Also, if Postgres has some concept of type-generic special values -Inf and +Inf (which always sort before or after any other value in the type system), those can be used as endpoints to indicate that the interval is unbounded. I already introduced +/- infinity to range types. They are not generic outside of ranges, however -- therefore you can't select the upper bound of an upper-infinite range. Well, what you have is the least one would want. Unless you have some
Re: [HACKERS] Range Types and extensions
Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? I believe that ranges aka intervals are widely useful generic types, next after relations/tuples/arrays, and they *should* be supported in core, same as arrays are. In particular, the usefulness of ranges/intervals is often orthogonal to many other things, and for many types including numbers, strings, temporals. Now assuming that a range/interval value is generally defined in terms of a pair of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or {,,=,=} etc or LIMIT makes sense), it will be essential that this value is capable of distinguishing open and closed intervals. For example, a range value can be represented by a tuple with 4 attributes, where two of those are the endpoint values, and two of those are booleans saying whether each of the endpoints is inside or outside the range/interval. Also, if Postgres has some concept of type-generic special values -Inf and +Inf (which always sort before or after any other value in the type system), those can be used as endpoints to indicate that the interval is unbounded. Unless you have some other syntax in mind, I suggest lifting the range literal syntax from Perl 6, where .. is an infix operator building a range between its arguments, and a ^ on either side means that side is open, I think; so there are 4 variants: {..,^..,..^,^..^}. Now as to general usefulness of intervals ... Any operation that wants to deal with a range somehow, such as the BETWEEN syntax, could instead use a range/interval; for example, both of: foo in 1..10 foo between 1 and 10 ... would mean the same thing, but the 1..10 can be replaced by an arbitrary value expression or variable reference. Likewise with: date in start ..^ end date = start and date end ... mean the same thing. The LIMIT clause could take a range to specify take and skip count at once. Array slicing can be done using foo[first..last] or such. A random number generator that takes endpoints can take a range argument. An array or relation of these range can represent ranges with holes, and the general results of range union operations. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Pavel Stehule wrote: 2011/6/6 Darren Duncan dar...@darrenduncan.net: Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? it should be supported by FOREACH statement in PL/pgSQL Yes, absolutely. I know this feature is loved in Perl. But this usage would only work for a more limited range of data types, namely those over which one can build a sequence generator, such as integers, because they have a next-value/prev-value function defined. In other words, while range types in general would work for any ordered type, FOREACH would only work for the subset of those that are ordinal types. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] deprecating contrib for PGXN
I have missed it if this was discussed before but ... Would now be a good time to start deprecating the contrib/ directory as a way to distribute Pg add-ons, with favor given to PGXN and the like instead? It would make sense to leave contrib/ alone for 9.1, but I believe that it should start slimming down as we move towards 9.2, with any content that can easily be migrated to PGXN/etc being taken out of contrib/ . Or, the policy would be to stop adding new things to contrib/ except in the odd case where that is surely the best place to put it, so only the legacy things are there, and for the legacy things, they are removed case-by-case as workable distributions for them first appear on PGXN/etc. An analogy for policy here would be Perl 5 and what Perl modules it bundles. The Perl modules that have the most business being bundled with Perl are those minimal ones whose function is to go out to CPAN and install other modules. Another analogy would be Parrot and languages implemented over it. Originally, various language compilers were bundled with Parrot, and they gradually migrated to their own distributions, Rakudo for example. If this general policy of deprecating contrib/ is agreed on, then at the very least the documentation shipped with 9.1 should mention it being deprecated and talk about migration strategies. Or 9.1 could include a CPAN-like program that makes it easier to install PGXN extensions, if that is applicable, so there is an overlap period where people could get the legacy add-ons either way. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deprecating contrib for PGXN
Robert Haas wrote: On Tue, May 17, 2011 at 4:45 PM, Joshua D. Drake j...@commandprompt.com wrote: On 05/17/2011 01:31 PM, Darren Duncan wrote: I have missed it if this was discussed before but ... Would now be a good time to start deprecating the contrib/ directory as a way to distribute Pg add-ons, with favor given to PGXN and the like instead? If PGXN moves into .Org infrastructure (which I believe is currently the plan) then yes, contrib should go away. What is the benefit of getting rid of it? Maybe something could be clarified for me first. Are the individual projects in contrib/ also distributed separately from Pg, on their own release schedules, so users can choose to upgrade them independently of upgrading Pg itself, or so their developers can have a lot of flexibility to make major changes without having to follow the same stability or deprecation timetables of Pg itself? If the only way to get a contrib/ project is bundled with Pg, then the project developers and users don't get the flexibility that they otherwise would have. That's the main answer, I think. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
Jaime Casanova wrote: On Sun, May 15, 2011 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote: we should probably try to agree on which of the various options you mention makes most sense. well... my original patch only handle the simplest case, namely, try to make the cast that the user wants and if none is defined fall to the base types... anything else will complicate things as you shown... actually, things looks very simple until we start creating trees of domains... what options look sane to you? The sanest option I see is don't overload the CAST syntax for subtypes. Just call the foo2bar() function instead. You still get code with the same level of terseness and that is just as easy to read and understand, and there is no question of semantics. Also, that solution works right now. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
Jaime Casanova wrote: On Sat, May 14, 2011 at 8:42 PM, Darren Duncan dar...@darrenduncan.net wrote: First of all, what if cast(timestamp as int) was already defined? Which cast then would you expect to be invoked here? '1800-01-01 00:00:00'::int i will expect an error in that case... what you're doing there is casting an unknown to integer, for that to be valid you need an intermediate cast to timestamp or in my case to datetime Sorry, my bad; I meant to say (might be slightly misspelled): ('1800-01-01 00:00:00'::timestamp)::int Now, since all values of a DOMAIN are also values of the base type the DOMAIN is defined as being a subset of, then the sub-expression within the parenthesis denotes a value that is both a timestamp and a datetime at the same time. So, if a generic CAST(timestamp as int) is already defined, and you define a CAST(datetime as int), then what should the above code (correct for misspelling) do, or should it fail? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
Jaime Casanova wrote: If i create a DOMAIN an then want to create a CAST from that domain to another type it gives an error. Consider this example: create domain datetime as timestamp with time zone check (value between '1753-01-01 00:00:00' and '-12-31 23:59:59'); create function datetime2int(datetime) returns int language sql stable strict as $$ select $1::date - '1753-01-01'::date; $$; create cast(datetime as int) with function datetime2int(datetime); if i try to cast, get this error: select now()::datetime::int; ERROR: cannot cast type datetime to integer The problem is that in find_coercion_pathway() the very first thing we do is to get the base type of both: the source and target types. So, the way to make it work is to create the function and the cast on the base types. But what if i create 2 domains on the same base types and want a different behaviour on a cast to the same target type? I think that overloading the same cast syntax to get different behavior for different domains over the same base type is a bad idea. First of all, what if cast(timestamp as int) was already defined? Which cast then would you expect to be invoked here? '1800-01-01 00:00:00'::int ... the one for timestamp or the one for datetime? Second of all, what if you had 2 domains defined over timestamp and they overlapped and they both defined a cast as you did, with generic syntax? And you were casting a value in both domains as an int? I think it would be best that the generic cast syntax only be useable for casts defined on the base type, and if you want a domain-specific one you should use the function syntax such as your datetime2int(). That way it is easier for users to predict what behavior will occur, and implementation will be easier too. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
Darren Duncan wrote: I think it would be best that the generic cast syntax only be useable for casts defined on the base type, and if you want a domain-specific one you should use the function syntax such as your datetime2int(). That way it is easier for users to predict what behavior will occur, and implementation will be easier too. Replying to myself, I offer another alternative: What you ask for is indeed supported, but that if for a given input value more than one cast applies to it, particularly for 2 overlapping domains, then which cast is invoked is undefined, so for example the DBMS may just use the first one it finds. It is then up to the user to ensure that when they define casts over domains that they just define ones that either produce the same outputs for the same overlapping inputs (the best answer) or they ensure that they don't overlap in their input domains. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Robert Haas wrote: On Wed, May 11, 2011 at 11:43 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote: Examples of open union types could be number, which all the numeric types compose, and so you can know say that you can use the generic numeric operators on values you have simply if their types compose the number union type, and it still works if more numeric types appear later. Likewise, the string open union could include both text and blob, as both support catenation and substring matches or extraction, for example. This would aid to operator overloading in a generic way, letting you use the same syntax for different types, but allowing types to mix is optional; eg, you could support add(int,int) and add(real,real) without supporting add(int,real) etc but the syntax add(x,y) is shared, and you do this while still having a strong type system; allowing the mixing is optional case-by-case. Coming from a Haskell perspective, this is a great idea, but I don't think the union feature should be used to implement it. I'm unclear what the point of such a feature would be. A union of all the common numeric types is not much different from the existing type numeric. In the case of the open union numeric, one point is that users or extensions could come up with new types that add themselves to the union, for example say a type for complex numbers (I didn't see a built-in such). But I'm just citing numeric as an example; there would be a lot more in practice, potentially one for every individual type, so for example if operators were defined for the open union rather than for the base type, then users/extensions could define their own types and easily declare you can use it like this type but its different in some important way, which may just be an implementation difference. Operations that don't care about the differences can just be written against the open union type where they just work and those that do care can be more specific. Joseph Adams said: Coming from a Haskell perspective, this is a great idea, but I don't think the union feature should be used to implement it. Closed unions correspond to algebraic data types in Haskell, e.g.: data Ordering = LT | EQ | GT while open unions are better-suited to type classes: (+) :: (Num a) = a - a - a While closed unions would often be used for your first examlpe, I see they're still useful for type classes as well. Especially in the face of open unions being available, the closed unions let users say, no, I don't really want the meaning of this union type to change just because someone else declares a new type in scope (that adds itself to the union). For example, one could declare a system_numeric closed union type that only includes Pg built-in numerics and users of that can be confident that nothing about it will change later unless the definition of system_numeric itself or the types it unions are changed. But open unions would be preferred in places they wouldn't cause trouble, where you want to allow easier user extensibility. I, for one, would like to see PostgreSQL steal some features from Haskell's type system. PostgreSQL seems to implement a subset of Haskell's system, without type classes and where functions can have only one type variable (anyelement). I think that Haskell and other functional languages have a lot to teach relational DBMSs and I see them as being highly compatible. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
To follow-up my earlier comments ... I suspect for practical purposes we may want to limit the scope of some type features. For example, the greatest benefits for open union / mixin types is with routines/operators, not so much with tables. So, Pg could choose to support open unions but only for routines, where the declared types of table columns must still be other types. Then you don't have to worry about searching your data for where one might be used, but only search your system catalog for routines/operators that use it. But closed unions should be supported in tables too. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011: Darren Duncan wrote: To follow-up, an additional feature that would be useful and resembles union types is the variant where you could declare a union type first and then separately other types could declare they are a member of the union. I'm talking about loosely what mixins or type-roles or interfaces etc are in other languages. The most trivial example would be declaring an ENUM-alike first and then separately declaring the component values where the latter declare they are part of the ENUM, and this could make it easier to add or change ENUM values. But keep in mind that this is a distinct concept from what we're otherwise talking about as being union types. -- Darren Duncan Should this be a TODO item? The general idea of C-style unions, sure. Mixin-style stuff ... not sure. Seems like it'd be pretty painful. From the perspective of users, the single greatest distinction between these 2 kinds of unions is being closed versus being open, and that is the primary reason to choose one over the other. A closed union is the C-style, where the union type declares what other types or values it ranges over. The closed union is best when the union definer can reasonably assume that the union won't either ever or would rarely be changed, and in particular can assume that application or database code would have knowledge of the parts that it deals specially with, so it can be assumed that if the closed union type ever is changed then any code designed to use it may be changed at the same time. A good example for a closed union would be a boolean type which just ranges over the two singletons false and true or an order type which ranges just over the three singletons decrease, same, increase. Or a type which enumerates the 7 days of the week, as this is unlikely to change in the life of a system. An open union is the mixin style, where the component types declare they are part of the union. The open union is best when it is likely that there would be either user-defined or extension-defined new types for the union to come along later, and we want to have code that can be generic or polymorphic for any types that can be used in particular ways. Examples of open union types could be number, which all the numeric types compose, and so you can know say that you can use the generic numeric operators on values you have simply if their types compose the number union type, and it still works if more numeric types appear later. Likewise, the string open union could include both text and blob, as both support catenation and substring matches or extraction, for example. This would aid to operator overloading in a generic way, letting you use the same syntax for different types, but allowing types to mix is optional; eg, you could support add(int,int) and add(real,real) without supporting add(int,real) etc but the syntax add(x,y) is shared, and you do this while still having a strong type system; allowing the mixing is optional case-by-case. Supporting the open union is closer to supporting ANYTYPE while the closed union isn't so much. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Thanks for picking up on my mentioning union types; I wasn't sure if anyone did. Merlin Moncure wrote: On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote: On 05/06/2011 04:08 PM, Alvaro Herrera wrote: Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011: I see VARIANT/ANYTYPE as the most general case of supporting union types, which, say, could have more specific examples of allow any number or date here but nothing else. If VARIANT is supported, unions in general ought to be also. Okay, so aside from the performance (storage reduction) gained, there's this argument for having variant/union types. It seems to me that this is indeed possible to build. Completely general VARIANT, though, is rather complex. A declared union, where you specify exactly which types can be part of the union, can be catalogued, so that the system knows exactly where to look when a type needs to be modified. A general VARIANT however looks complex to me to solve. The problem is this: if an user attempts to drop a type, and this type is used in a variant somewhere, we would lose the stored data. So the drop needs to be aborted. Similarly, if we alter a type (easy example: a composite type) used in a variant, we need to cascade to modify all rows using that composite. If the unions that use a certain type are catalogued, we at least know what tables to scan to cascade. In a general variant, the system catalogs do not have the information of what type each variant masquerades as. We would need to examine the variant's masqueraded types on each insert; if the current type is not found, add it. This seems a bit expensive. So how is a declared union going to look and operate? Something like this? CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz): CREATE TABLE bar (myunion foo); INSERT INTO bar (myunion) VALUES (ival=1), (tval='some text'); UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP; Something like that could actually be quite nice for a number of purposes. using your hypothetical example, could you cast types to the union? select 1::int::foo; record types would presumably work as well? you could do a lot of *really* neat stuff that way... Like with other respondents to this topic, I consider it much more useful and important, as well as practical, to support explicitly defined type unions than a wide-open ANYTYPE. As for how it would operate: The example above implies a union type implementation that is like C's concept of such, where you have to explicitly state how you want the value bit pattern to be interpreted, by naming ival/etc, rather than the DBMS just knowing that a particular value is of a specific type, because per-value stored metadata says so (like with SQLite). While that might be best in practice for implementation, I had envisioned something more like set unions, so you could instead do it like this: CREATE TYPE foo AS UNION (int, text, timestamptz): CREATE TABLE bar (myunion foo); INSERT INTO bar (myunion) VALUES (1), ('some text'); UPDATE bar SET myunion = CURRENT_TIMESTAMP; Unlike a record type, where multiple attributes may have the same time, presumably with a union, they would all be distinct, and so you could use the type name itself to refer to each option; you don't have to make up ival or whatever ... unless there are situations where types don't have names. When doing operations that are type-generic, such as equality test or assignment, especially assignment, between 2 things that are both declared to be of type foo, you could just do it with no verbosity same as if you were doing 2 int or text or whatever. When doing operations specific to int or text or whatever, or if you are assigning a foo-declared thing to an int/text-declared thing, then you'd want an explicit cast or assertion, such as select myunion::int + 3 as answer from bar where is_a(myunion,'int'). Whether you want an explicit cast to go the other way, I would assume you don't need to, like when you have DOMAINs; eg, I would expect the 4th line above to just work, because the system knows the type of CURRENT_TIMESTAMP and it knows that this is a member of the union type of myunion. I see a UNION type as being like a DOMAIN type in reverse. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
To follow-up, an additional feature that would be useful and resembles union types is the variant where you could declare a union type first and then separately other types could declare they are a member of the union. I'm talking about loosely what mixins or type-roles or interfaces etc are in other languages. The most trivial example would be declaring an ENUM-alike first and then separately declaring the component values where the latter declare they are part of the ENUM, and this could make it easier to add or change ENUM values. But keep in mind that this is a distinct concept from what we're otherwise talking about as being union types. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera wrote: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. I note that this has been requested in the past: http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php and both Oracle and MS-SQL have it and apparently people find them useful. I didn't find any indication that SQL contains anything resembling this. I see VARIANT/ANYTYPE as the most general case of supporting union types, which, say, could have more specific examples of allow any number or date here but nothing else. If VARIANT is supported, unions in general ought to be also. The most effective way of supporting VARIANT or union types in general is having an implementation where in the general case each value in the database knows its own data type rather than the data type being provided by a context such as what table column it is in. For example, if rather than storing a data value directly we store a 2-attribute struct naming a data type and pointing to or holding the data value. See how SQLite works as an example of how VARIANTs or unions could work, although that on its own would need to be made more comprehensive for Pg. I claim ignorance as to how Pg currently implements these matters. Where VARIANT/union types are supported by default, declaring more specific types is just a type constraint and an optimization. Of course, when we know the type of a column/etc isn't going to be VARIANT or some other union type, then a simple optimization allows us to just store the value and have its type provided by context rather than the struct. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stored procedures - use cases?
Peter Eisentraut wrote: Another point, as there appear to be diverging camps about supertransactional stored procedures vs. autonomous transactions, what would be the actual use cases of any of these features? Let's collect some, so we can think of ways to make them work. An analogy I like to use for a very capable DBMS is that of an operating system, and each autonomous transaction is like a distinct process/thread in this system. The DBMS is like a virtual machine in which processes/autonomous transactions run. Like with an operating system, a process/auto-transaction can be started by another one, or by the OS/DBMS (or a root process/auto), and once running all processes are mutually independent to a large extent, in that each has its own separatable privileges or state or view of the database, the database being an analogy to the file system. A process/auto-transaction can be started by a DBMS client, analogous to a user, but it doesn't have to be. The message passing feature that Pg has, listen/notify, is like inter-process communication between these processes/autos. A stored procedure always runs within the context of one process/auto, and a regular transaction or savepoint or whatever is specific to a process/auto. Has anyone else thought of the DBMS as operating system analogy? I don't recall specifically reading this anywhere, but expect the thought may be common. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stored procedures
Peter Eisentraut wrote: So the topic of real stored procedures came up again. Meaning a function-like object that executes outside of a regular transaction, with the ability to start and stop SQL transactions itself. I would like to collect some specs on this feature. So does anyone have links to documentation of existing implementations, or their own spec writeup? A lot of people appear to have a very clear idea of this concept in their own head, so let's start collecting those. I've thought a lot about this too. The general case of a stored procedure should be all powerful, and be able to directly invoke any code written in SQL or other languages that a DBMS client can directly invoke on the DBMS, as if it were a client, but that the procedure is stored and executed entirely in the DBMS. But the stored procedure also has its own lexical variables and supports conditionals and iteration and recursion. A stored procedure is invoked as a statement and doesn't have a return value; in contrast, a function has a return value and is invoked within a value expression of a statement. A stored procedure can see and update the database, and can have IN/INOUT/OUT parameters. A stored procedure can have side-effects out of band, such as user I/O, if Pg supports that. The general stored procedure should be orthogonal to other concerns, in particular to transactions and savepoints; executing one should not should not implicitly start or commit or rollback a transaction or savepoint. However, it should be possible to explicitly declare that procedure is a transaction, so that starts and ends are neatly paired regardless of how the procedure exits, that is a transaction lifetime is attached to its lexical scope, but this would be optional. A stored procedure should be able to do data manipulation, data definition, explicit transaction control (except perhaps when defined to be a transaction), privilege control, message passing, and so on. As for semantics, lets say that when a stored procedure is invoked, its definition will be pulled from the system catalog in a snapshot and be compiled, then run normally no matter what it does, even if the definition of the procedure itself is changed during its execution; in the latter case, it just means that once the execution finishes, subsequent calls to it would then call the updated version or fail. So just compiling the procedure may need a catalog lock or whatever, but when it starts executing a transaction isn't required. Any stored procedure in general should be able to invoke stored procedures, to any level of nesting, just like in any normal programming language. There might be restrictions on what individual procedures can do depending on how they're declared; for example, if one is declared to have a scope-bound transaction, then it or ones it invokes can't have explicit transaction control statements. But such restrictions are an orthogonal or case-dependent matter. (When we have a distinct stored procedure, I also believe that a stored function should be more restricted, such as only having IN parameters and not being able to see the database but by way of parameters, and that it should be deterministic. But that ship has sailed and I'm not going to argue for any changes to functions.) -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Robert Haas wrote: I am halfway tempted to say that we need to invent our own procedural language that is designed not for compatibility with the SQL standard or Oracle, but for non-crappiness. I'm way ahead of you on that one. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Merlin Moncure wrote: On Fri, Mar 25, 2011 at 10:05 PM, David E. Wheeler da...@kineticode.com wrote: On Mar 25, 2011, at 9:12 PM, Robert Haas wrote: As I've said before, I believe that the root cause of this problem is that using the same syntax for variables and column names is a bad idea in the first place. If we used $foo or ?foo or ${foo} or $.foo or foo!!$#? to mean the parameter called foo, then this would all be a non-issue. Yes *please*. Man that would make maintenance of such functions easier. +1 on using $foo. Even with the standardization risk I think it's the best choice. Prefer $foo to ${foo} though. The foo syntax should be orthogonal to everything else and not have anything specifically to do with parameters. Rather, foo anywhere is just a delimited case-sensitive identifier and can be used anywhere that foo can where the latter is a case-insensitive identifier. As for the SQL standard for bind parameters, as I recall they use :foo and so :foo would be the sensitive more general case of that. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1 - Release Theme
I was under the impression that QUEL was actually a good language in some ways, and that it was more relational and better than SQL in some ways. http://en.wikipedia.org/wiki/QUEL_query_languages Maybe bringing it back would be a good idea, but as an alternative to SQL rather than a replacement. In any event, QUEL was somewhat similar to SQL. -- Darren Duncan Rajasekhar Yakkali wrote: Following a great deal of discussion, I'm pleased to announce that the PostgreSQL Core team has decided that the major theme for the 9.1 release, due in 2011, will be 'NoSQL'. ... the intention is to remove SQL support from Postgres, and replace it with a language called 'QUEL'. This will provide us with the flexibility we need to implement the features of modern NoSQL databases. With no SQL support there will obviously be some differences in the query syntax that must be used to access your data. hmm.. shock it is this shift for 9.1 due in mid 2011 is unexpectedly soon :) Curious to understand as to - how this relates to every feature that is provide at the moment based on RDBMS paradigm. ACID compliance, support for the features provided by SQL, referential integrity, joins, caching etc, .. - Also does this shift take into an assumption that all the use cases fit the likes of data access patterns usecases similar to facebook/twitter? or to address the the likes of those ? Thanks, Raj -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Pavel Stehule wrote: 2011/3/26 Darren Duncan dar...@darrenduncan.net: I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. -1 this is not based on any pattern on SQL. It's not simple, and it introduce a reserved keywords Okay, here's a much simpler proposal with the most important bit of the old one. 1. In all situations where there is ambiguity such that an identifier reference (not declaration) may be referring to either a lexical variable/parameter of the current routine, or to the name of the table column of the contextually current table of the current SQL statement, the ambiguity is always resolved in favor of the lexical var/param. If I am not mistaken, that is what PL/PgSQL already does since 9.0. 2. If an identifier reference has a leading . then that will force it to be interpreted as a column instead (and the code will fail if there is no such column), and so .colname is a shorthand for tablename.colname; but like with the old colname it only works when just 1 of the source tables has colname else it is still ambiguous like before. Example: select (.mycol + myvar * myparam) as mynewcol from mytbl; This solution is a very terse and understandable change. There are no reserved keywords. Legacy user code has no change where there were no conflicts before. Legacy user code has no change in the case of conflict if it was previously resolved to favor the lexical var/param. Legacy user code only gains a leading . in the few places where conflict was resolved in favor of a column name before where a same-named lexical/param existed. So what's not to like about this? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Pavel Stehule wrote: sorry - I dislike this. The design is correct, but it is against to SQL verbosity. A reader must to thinking about missing tablenames. I dont't think so it is good solution, because it doesn't solve a backing compatibility problem - somebody must to fix a function still, and I think so it is much preferable to fix like: select (mytbl.mycol + myvar * myparam) as mynewcol from mytbl; your proposal saves a five chars, but it has a negative impacts on readability - there should be more tables. There are no reason to introduce a new concepts - SQL knows a aliases. Well, going forward, I know I would much rather have to say mytbl.mycol than have to say myfunc.myparam. And I certainly would want to expect that when one says ... as foo that this foo is treated as a declaration unambiguously and is never substituted for some parameter or there be other grief as I seem to recall having in 8.4. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that invisible table. It might be acceptable to fail entirely, but not to act as though the table isn't there. That's an excellent example and point. Is there a 'right' answer (with regard to the SQL spec, what other databases do, etc)? I'm not aware that anybody's got an amazingly satisfactory solution. PG's answer is of course to use up-to-the-minute DDL regardless of what the transaction might see for other purposes, which certainly has got disadvantages if you're hoping for truly serializable behavior. But I'm not sure there's a better answer. You could make an argument for failing any serializable transaction that's affected by DDL changes that happen after it started. I don't know whether that cure would be worse than the disease. If transaction A commits successfully before transaction B commits, regardless of when transaction B started, and transaction A changes/adds/etc any constraints on the database, then I would expect transaction B to only commit successfully if all of its data changes pass those new/changed constraints. If B were allowed to commit without that being the case, then it would leave the database in an inconsistent state, that is a state where its data doesn't conform to its constraints. A database should always be consistent on transaction boundaries, at the very least, if not on statement boundaries. As to whether B's failure happens when it tries to commit or happens earlier, based on visibility issues with A's changes, doesn't matter to me so much (do what works best for you/others), but it should fail at some point if it would otherwise cause inconsistencies. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Robert Haas wrote: On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote: Tom, Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name. I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so that one could always explicitly qualify what they are referring to? For example, you could have the 3 sch, lex, attr (I may have missed some useful ones). The sch TLN would unambiguously refer directly to a schema object, such as a database table. The lex TLN would unambiguously refer directly to a lexical, either a parameter of the current routine or to a lexical variable. The attr TLN would unambiguously refer to a table/etc column/attribute in the manner typical for SQL. Use them like: sch.foo - the table/etc foo lex.foo - the lexical variable foo attr.foo - the column foo Use of these TLN are optional where there is no ambiguity. The TLN are not reserved words, but if one has an entity named the same, then references to it must be TLN-qualified; eg: lex.sch lex.lex lex.attr Now these are just examples. You may find a different set works better. -- There are also alternate solutions. For example, it could be mandated that lexical-scope aliases for any data/var-like schema object are required in routines, where the aliases are distinct from all lexical vars/params/etc, and then all SQL/code in the routines may only refer to the schema objects by the aliases. Effectively this makes it so that routines can no longer see non-lexical vars but for those from parameters, and this aliasing is defining a parameter whose argument is supplied by the DBMS automatically rather than as an explicit routine caller argument. That way, inside a routine body there are only lexical names for things, and so no namespace-qualification is ever needed by the regular SQL. Similarly, if you always think of table column names as referring to an attribute or element of a table variable, then just reference the column qualified by the table name (or the lexical alias thereof). Same as you do in any other programming language. Of course, sometimes you don't have to qualify column name references as context could make it unambiguous. Or, a shorthand like a simple leading . could unambiguously say you're referring to a column of the particular table in context. With those in place, all unqualified references are straight to lexical variables or parameters. And so, this is also an effective way to resolve the ambiguity and I prefer the latter design personally. Here's an example in quasi-PL/PgSQL: create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as declare myvar integer := 5; $body$ begin select (.mycol + myvar * myparam) as mynewcol from mytbl; end; $body$ Note that I've already thought through this last example as these methods of avoiding ambiguity are loosely-speaking how my language Muldis D avoids the problem faced by many SQL procedures. The .mycol syntax specifically was inspired originally for me by Perl 6 where the lack of something just before the . means that the implicit topic variable is referred to, like if you said $_.mycol. A Perl 6 analogy being something like: $mytbl.map:{ .mycol + $myvar * $myparam } aka: $mytbl.map:{ $_.mycol + $myvar * $myparam } -- -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: auxiliary functions for record type
Pavel Stehule wrote: Hello I wrote a few functions for record type - record_expand, record_get_fields, record_get_field, record_set_fields. A usage of this functions is described in my blog http://okbob.blogspot.com/2010/12/iteration-over-record-in-plpgsql.html Do you think, so these functions can be in core? These are relative strong and enough general with zero dependency. Sure, these functions are not defined in ANSI SQL. Regards Pavel Stehule That looks good in principle. I see it as being valuable and important that users can define generic relational operators/functions, meaning ones that can work with any relations like built-ins can, and the ability to iterate over record fields, or at least introspect a relation to see what fields it has, is a good foundation to support this. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
Daniel Farina wrote: On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan dar...@darrenduncan.net wrote: You don't have to kludge things by implementing arrays as blobs for example; you can implement them as relations instead. Geospatial types can just be tuples. Arrays of structured types can just be relations with an attribute per type attribute. Arrays of simple types can just be unary relations. In practice, my guess is the performance of these approaches would be terrible for a number of workloads. I don't agree that arrays having their own storage implementation is a kludge: there are even operators like unnest that can be used to turn them back into relations. I'm not discounting this at all. The data model can formally define types in one way and a DBMS can implement various well known cases in specialized ways that are more efficient. Arrays are a good example. But with other cases they still get a default implementation. If arrays are flexible enough, then different arrays could be implemented differently, eg some as blobs and some as relations; the former could be better for small or simple arrays and the latter for large or complex arrays. I have long thought (but never really gave voice to) there being value having first-class relation values, but I would also say that's another kettle of fish. I also want closures, and don't think that's completely nuts. I think that first-class functions are important at least, if not full-blown closures. You can define generic relational restrictions or summaries or whatever with such; eg, the general case of a WHERE is a function that takes a relation and a function as input, and results in a relation; more restricted cases of WHERE can be defined with simpler functions that take 2 relation inputs instead. You may want to learn more about this, first. Postgres's type system, while relatively simple, is not as ill-conceived or primitive as you seem to assume it is. There are also a lot of important system details, like expanding/replacing the typmod facility that only allows Postgres 32 bits of type-augmenting data (such as the (1) in the type numeric(1)). I'm not saying that Pg's system is primitive et al. The thread starter said it needed an overhaul, so indicating there are deficiencies, and I'm suggesting some effective ways to do that. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
David Fetter wrote: For the past couple of years, I've been hearing from the PostGIS people among others that our type system just isn't flexible enough for their needs. It's really starting to show its age, or possibly design compromises that seemed reasonable a decade or more ago, but are less so now. To that end, I've put up a page on the wiki that includes a list of issues to be addressed. It's intended to be changed, possibly completely. http://wiki.postgresql.org/wiki/Refactor_Type_System What might the next version of the type system look like? Are you talking about changes to the type system as users see it or just changes to how the existing behavior is implemented internally? If you're talking about, as users see it, which the other replies to this thread seem to be saying, though not necessarily the url you pointed to which looks more internals ... As a statement which may surprise no one who's heard me talk about it before ... I've mostly completed a type system specification that would be useable by Postgres, as the most fundamental part of my Muldis D language. The type system is arguably the most central piece of any DBMS, around which everything else is defined and built. You have data, which is structured in some way, and has operators for it. If you look at a DBMS from the perspective of being a programming language implementation, you find that a database is just a variable that holds a value of a structured type. In the case of a relational database, said database is a tuple whose attribute values are relations; or in the case of namespaces/schemas, the database tuple has tuple attributes having relation attributes. If a database is a variable, then all database constraints are type constraints on the declared type of that variable, and you can make said constraints arbitrarily complicated. From basic structures like nestable tuples and relations, plus a complement of basic types like numbers and strings, and arbitrary constraints, you can define data types of any shape or form. A key component of a good type system is that users can define data types, and moreover where possible, system-defined types are defined in the same ways as users define types. For example, stuff like temporal types or geospatial types are prime candidates for being defined like user-defined types. If you define all structures using tuples and relations, you can easily flatten this out on the implementation end and basically do everything as associated flat relation variables as you do now. So what I propose is both very flexible and easy to implement, scale, and optimize, relatively speaking. You don't have to kludge things by implementing arrays as blobs for example; you can implement them as relations instead. Geospatial types can just be tuples. Arrays of structured types can just be relations with an attribute per type attribute. Arrays of simple types can just be unary relations. You can also emulate all of the existing Pg features and syntax that you have now over the type system I've defined, maintaining compatibility too. I also want to emphasize that, while I drew inspiration from many sources when defining Muldis D, and there was/is a lot I still didn't/don't know about Postgres, I have found that as I use and learn Postgres, I'm finding frequently that how Postgres does things is similar and compatible to how I independently came up with Muldis D's design; I'm finding more similarities all the time. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MULTISET and additional functions for ARRAY
I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurally they are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a table and a multiset are just different names for a relation, loosely speaking. The association of a multiset-typed attribute of a table with said table is like the association of a child and parent table in a many-to-one. So reuse your structure for tables to hold multisets. -- Darren Duncan Itagaki Takahiro wrote: Postgres supports ARRAY data types well, but there are some more array functions in the SQL standard. Also, the standard has MULTISET data type, that is an unordered array. It looks easy to support additional array functions. There might be some confusion to treat multi-dimensional arrays with them, but we could treat all arrays as one-dimensional like as unnest(). MULTISET supports are more difficult. We have corresponding type IDs for each array, but we might not want to add additional IDs for multiset for each type. Any ideas for the issue? If we reuse type IDs of arrays for multisets, the multisets would have some special typmod. For example, typmod = 0 means multiset, and positive value means array with max cardinality. Note that the SQL standard doesn't mention about multi-dimensional arrays. So, we can use typmod = -1 as a free-size and free-dimensional array for backward compatibility. If we have troublesome issues to support multiset data types, I'm thinking to add multiset functions that receives ARRAY types instead at time first time, because an ARRAY is a MULTISET by definition. Some of functions for multisets seems to be useful for arrays, too. Comments and suggestions welcome. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MULTISET and additional functions for ARRAY
Merlin Moncure wrote: On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler da...@kineticode.com wrote: On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote: That sounds like a composite type to me. No, it's perpendicular in the sense that while a composite type allows you to have different columns, this multiset thing lets you have rows (I initially thought about them as sets of scalars, but AFAIU they could in turn be rows) How is that different from an array of RECORDs? I could ask the same question about a TABLE, the ordering issue aside. This is one place that SQL made things more complicated than they needed to be. Multisets have generally the same structure *and* operators (union, etc) as tables, but they use different syntax for each. A better design would be to make tables and multisets interchangeable. Its an unnecessary distinction. not a whole lot outside of syntax details...there is a decent summary here: http://waelchatila.com/2005/05/18/1116485743467.html I like this part: Alternatively the SQL standard also permits the same construct with the bracket trigraphs ??( and ??) :-D As I recall, the concept of using stuff like ?( or ?) etc was so that SQL could be written in EBCDIC which natively lacks some of the bracketing characters that ASCII has. Hence, such is an alternative way to spell either { } or [ ] (I forget which). -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Delimited identifier brhavior
Tatsuo Ishii wrote: test=# select * from t1_foo; select * from t1_foo; i --- (0 rows) It seems PostgreSQL thinks t1_foo is equivalent to t1. Is this an expected behavior? That code looks badly written in any event. Delimiters should be put around each part of an identifier or chain as a whole, such as: select * from t1_foo; Or with schema-delimited objects, for example, any of these: schema.table schema.table schema.table schema.table Personally, I treat all of my identifiers as being case-sensitive. Knowing that Pg treats non-delimited identifiers as being lowercase, I write undelimited when the identifier is entirely lowercase, and I delimit ones that have any uppercase. And by doing this consistently everything works correctly. Since most of my identifiers are lowercase anyway, the code also reads cleanly in general. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL command to edit postgresql.conf, with comments
Dimitri Fontaine wrote: Andrew Dunstan and...@dunslane.net writes: On 10/12/2010 05:02 PM, Dimitri Fontaine wrote: So, what you do is have a file per GUC, file name is the GUC name, first line contains *ONLY* current value, the rest of the file is comments. You're joking, right? No. I just want both comments and SQL commands. If you refuse this simple file scheme, keep your postgresql.conf and don't remote edit it. That's my proposal, I'm happy that it comes with laughter :) Maybe I missed something important, but why is it not possible to retain the single existing postgres.conf file format (human writable) *and* have it machine/SQL-editable *and* maintain the comments? I should think that it would be possible to do all of these without too much trouble. All you would need is for the file parser to retain the comments as metadata, include them in the relations that the SQL commands see where the latter can also edit them as data, and then write out the updated file with comments. The fact that Postgres already explicitly supports comment metadata in its system catalog means it must already know something about this. If something is missing, then expand the catalog so it represents all the details you want to preserve. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL command to edit postgresql.conf, with comments
Josh Berkus wrote: The fact that Postgres already explicitly supports comment metadata in its system catalog means it must already know something about this. We support what? Postgres has COMMENT ON ... SQL for various database objects and I assumed that said comments would be stored in the system catalog. http://www.postgresql.org/docs/9.0/interactive/sql-comment.html The solution to this is simple, and was previously discussed on this list: (a) have a postgresql.conf.auto (b) add a default include for postgresql.conf.auto at the beginning of PostgreSQL.conf (c) SQL updates go to postgresql.conf.auto, which consists only of setting = value in alphabetical order. (d) We document that settings which are changed manually in postgresql.conf will override postgresql.conf.auto. (e) $$profit$$!! I agree that this looks like an effective solution. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL command to edit postgresql.conf, with comments
Josh Berkus wrote first: Postgres has COMMENT ON ... SQL for various database objects and I assumed that said comments would be stored in the system catalog. Oh. Now that's an interesting perspective ... you're suggesting that we take the comments and apply them as COMMENTS on the specific pg_settings? Yes, something along those lines. I should point out that in general I consider a COMMENT ON to just be a less ambiguous form of a -- or /* */ and that it would be a good idea for a code or config parser to preserve comments given in the latter formats in the same way it preserves the former when it can reasonably infer what to associate the latter comments with. That wouldn't solve the issues of ordering, or of comments in the file not associated with a setting, but might be a good 80% solution. Yes, and for something like this an 80% solution is good. As for ordering, that can be handled with more metadata, like a visual order number. Tom Lane wrote second: Well, if the settings were stored in a catalog ... which they are not ... that might be a useful idea. The catalog is where they *should* be stored, at least from the user's perspective. The reason that COMMENT ON isn't a terribly helpful analogy is that it puts the burden on the user to associate specific comment texts with specific database objects. The problem with parsing and modifying freestyle comments as seen in postgresql.conf is exactly (or mostly) that there's no reasonable way of mechanically determining which comments are associated with which setting. We could invent some arbitrary rule or other, but it'd have little to do with the way people perceive what's in such a file. There's no reason that you can't have both kinds of comments. You could have comments that are specially formatted such that the parser will then recognize they are associated with something specific and so put them in the system catalog. And then you can have other comments not formatted that way and the parser will then pass over them like whitespace. I agree with Josh's proposal: keep mechanically-generated settings in a separate file, and don't even pretend to allow comments to be kept there. I agree with the separate files approach as being the most practical in at least the short term. However, I think that this file could support comments too, but they would just be limited to end-of-line comments on the same line as the setting, and it would be only these comments appearing in the system catalog by default. Comments in the file for user editing would only appear in the catalog if specially formatted, which for now could just mean taking end-of-line comments. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL command to edit postgresql.conf, with comments
Richard Broersma wrote: On Tue, Oct 12, 2010 at 3:54 PM, Josh Berkus j...@agliodbs.com wrote: Oh. Now that's an interesting perspective ... you're suggesting that we take the comments and apply them as COMMENTS on the specific pg_settings? On a side note regarding comments, I'd like to make a request for a more comprehensive commenting mechanism. The first though that comes to my mind would allow for comments to be stored and annotated using XML or sgml. It'd be nice to be able to generate user documentation from selected comments taken from application derived database objects. I don't know, maybe this is already possible. When you start going there, you have new issues to consider. (For the record I also prefer plain text for comments.) I should point out that the group making the Perl 6 language has already been looking into such matters extensively of essentially unifying code comments and code documentation into a common metadata both accessible in the source code and programmatically at runtime. I think this approach goes beyond comments as we normally know them, which I just think of plain text strings associated with some code element. But if you want to pursue bringing documentation into this, I suggest looking at what Perl 6, and other languages, have done. While some of the results of the Perl 6 discussion may have just been in the forums, these urls at least are related to it: - http://perlcabal.org/syn/S02.html#Whitespace_and_Comments - http://github.com/perl6/specs/raw/master/S26-documentation.pod for I'm not proposing adopting their syntax, but some features or design may be useful to learn from. I also want to point out that the FoxPro language constitutes some prior art about including comments as data fields in their runtime-accessible objects. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
Greg Smith wrote: You didn't quote the next part of that, which says fsync() is not sufficient to guarantee that your data is on stable storage and on MacOS X we provide a fcntl(), called F_FULLFSYNC, to ask the drive to flush all buffered data to stable storage. That's exactly what turning on fsync_writethrough does in PostgreSQL. See http://archives.postgresql.org/pgsql-hackers/2005-04/msg00390.php as the first post on this topic that ultimately led to that behavior being implemented. From the perspective of the database, whether or not the behavior is standards compliant isn't the issue. Whether pages make it to physical disk or not when fsync is called, or when O_DSYNC writes are done on platforms that support them, is the important part. If you the OS doesn't do that, it is doing nothing useful from the perspective of the database's expectations. And that's not true on Darwin unless you specify F_FULLFSYNC, which doesn't happen by default in PostgreSQL. It only does that when you switch wal_sync_method=fsync_writethrough Greg Smith also wrote: The main downside to switching the default on either OS X or Windows is developers using those platforms for test deployments will suffer greatly from a performance drop for data they don't really care about. As those two in particular are much more likely to be client development platforms, too, that's a scary thing to consider. I think that, bottom line, Postgres should be defaulting to whatever the safest and most reliable behavior is, per each platform, because data integrity is the most important thing, ensuring that a returning commit has actually written data to disk. If performance is worse, then so what? Code that does nothing has the best performance of all, and is also generally useless. Whenever there is a tradeoff to be made, reliability for speed, then users should have to explicitly choose the less reliable option, which would demonstrate they know what they're doing. Let the testers explicitly choose a faster and less reliable option for the data they don't care about, and otherwise by default users who don't better should get the safest option, for data they likely care about. That is a DBMS priority. This matter reminds me of a discussion on the SQLite list years ago about whether pragma synchronous=normal or synchronous=full should be the default, and thankfully 'full' won. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: plpgsql - for in array statement
Alvaro Herrera wrote: What about FOR EACH var IN array_expr LOOP ... I think this requires reserving EACH, which could cause a regression for working code. Maybe there's a way to make it work? Code that quotes all of its identifiers, such as with: FOR EACH var IN array_expr LOOP ... ... would also gain a significant amount of future-proofing since then the language can add keywords at will, without there being conflicts with user-defined identifiers. Similarly, quoting identifiers also carries present-day advantages as then one can name identifiers whatever is most suitably descriptive for them without worrying whether the language has a pre-defined meaning for the used words. The quoting also has the nice bonus of making them case-sensitive. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: plpgsql - for in array statement
Alvaro Herrera wrote: What about FOR EACH var IN array_expr LOOP ... I think this requires reserving EACH, which could cause a regression for working code. Maybe there's a way to make it work? What about saying FOR-EACH instead? A good general solution that I'd expect to not cause regressions is to separate multiple-word keywords with dashes rather than spaces. Since unquoted identifiers don't have dashes, I think, and moreover because the whole FOR-EACH would occupy the first position of the statement rather than the first two, there should be no ambiguity. Parsing should be easier, too, because keywords formatted like this would just be a single term rather than having infinite variations due to embedded whitespace. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: plpgsql - for in array statement
Andrew Dunstan wrote: On 09/28/2010 09:31 PM, Darren Duncan wrote: Code that quotes all of its identifiers, such as with: FOR EACH var IN array_expr LOOP ... This doesn't help in the least if the array is an expression rather than simply a variable - we're not going to start quoting expressions. I wrote that wrong. I should have said array_var not array_expr. I am certainly not advocating quoting expressions, and didn't mean to imply that here. My point was that if a token is always interpreted as a keyword rather than an identifier when there is ambiguity, then quoting would let users name an identifier each or EACH. In any event, I will not push this since it doesn't address the real issue of language changes not breaking the general case of legacy code; it only says how users can insulate themselves. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What happened to the is_type family of functions proposal?
Craig Ringer wrote: On 25/09/2010 11:51 AM, Darren Duncan wrote: There should just be a single syntax that works for all types, in the general case, for testing whether a value is a member of that type, or alternately whether a value can be cast to a particular type. snip Pg already gets it right in this regard by having a single general syntax for type casting, the value::type-name and value membership of a type should be likewise. or the standard: CAST(value AS typename) Indeed. The exact syntax doesn't matter to me but the point is that the type name is its own lexical element, conceptually a function argument, rather than being a substring of another one. Maybe to test if a value can be cast as a type, you can continue the :: mnemonic, say adding a ? for yes and a ! for no. For example, value?::type-name tests if the value can be cast as the type and value!::type-name or not value?::type-name tests the opposite. An expression like this results in a boolean. Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-) I see that someone has been paying attention. Yes, the idea of using ? or ! to derive a boolean expression from some other expression did indeed come from Perl 6. The ? means is so, ! means is not. A very useful mnemonic in general. Good shorthand, I guess, but a CAST syntax extension or alternate CAST version would be a bonus for readability. Well, sure. But something consistent with cast syntax that Pg provides. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What happened to the is_type family of functions proposal?
Colin 't Hart wrote: The fact that this wraps would seem to me to make the implementation of is_date() difficult. Having separate is_foo() syntax per type is a bad design idea, same as having a different equality test like eq_int() or assignment syntax like assign_str() per type. There should just be a single syntax that works for all types, in the general case, for testing whether a value is a member of that type, or alternately whether a value can be cast to a particular type. For example, one could say is_type( value, type-name ) or it could be spelled isa() or if you wanted to be more ambitious it could be an infix op, like value isa type-name to test when a value is of a type already. Pg already gets it right in this regard by having a single general syntax for type casting, the value::type-name and value membership of a type should be likewise. Maybe to test if a value can be cast as a type, you can continue the :: mnemonic, say adding a ? for yes and a ! for no. For example, value?::type-name tests if the value can be cast as the type and value!::type-name or not value?::type-name tests the opposite. An expression like this results in a boolean. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.0.0 release scheduled
Tom Lane wrote on 2010.09.08: The core committee has decided that it's time to press forward with releasing 9.0. Barring catastrophic bug reports in the next week, 9.0.0 will be wrapped Thursday 9/16 for public announcement Monday 9/20. Is this still the timetable, or is it being affected at all by the delay in migrating repositories while the CVS is cleaned up? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bad variable subst after AS
I don't know if this is a bug or not, but if not, it looks like a misfeature ... When executing the following in Pg 8.4.4: CREATE OR REPLACE FUNCTION f () RETURNS TABLE(a1 INTEGER) AS $BODY$ BEGIN RETURN QUERY SELECT a0 AS a1 FROM rv; RETURN; END; $BODY$ LANGUAGE plpgsql; ... I get this error: ERROR: syntax error at or near $1 SQL state: 42601 My impression of this is that Pg is treating the a1 after the AS like it was a variable reference and so substituted it for $1. Now that just seems wrong to me. I can understand either a0 or rv getting a substitution, but something following an AS being substituted is just wrong. Is that a bug and if not then what is the rationale for working that way, and can it be changed? Meanwhile, what is the best way to write f to work around this misbehavior? Thank you. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad variable subst after AS
Thank you to the 4 people who replied. Heikki Linnakangas wrote: It's a known misfeature, PL/pgSQL isn't very smart about replacing variables with parameter markers. The good news is that this has been completely rewritten in 9.0. The above will work in 9.0. If you can't upgrade to 9.0, you'll have to rename the variable or use a different alias in the AS clause. Good to hear that this is fixed in 9.0; and I do intend to upgrade any week now. Andrew Dunstan wrote: Remove the AS clause. You don't need it here at all. Okay, that seems to be the best workaround while running under 8.4; or I would use something like _a1 instead for documentation purposes or for referencing. Alvaro Herrera wrote: Meanwhile, what is the best way to write f to work around this misbehavior? Give the column a different alias, one not colliding with a variable name. In this circumstance, the whole point of the AS clause is, because I was declaring in the function signature that it exported a table with a column named a1, I used the AS clause to make sure the selected column was named a1, else conceptually there would be a type mismatch between declared and actual result. Making them exactly the same is the whole point of the exercise. I'm in the school of thought that a table column's name is the only proper way to identify it, rather than the ordinal position being significant for identity, so even though SQL supports the latter, I consider it a misfeature of SQL that leads to error-prone code, and try to not rely on it when I can help it. The fact that Pg would make things work when the result column name is different than the declared name points squarely to ordinal position as identity, as that's the only way it could work. Considering that behavior, I agree that using a different name is reasonable under 8.4 to make this work. Pavel Stehule said: It's not a bug - just you cannot use a variable there. Table name, column names are specified in planner time, and cannot be parametrized. p.s. you can use a dynamic SQL - EXECUTE statement - RETURN QUERY EXECUTE - but it doesn't help you, because you cannot overwrite a function definition. I think you misunderstood what I was trying to do. In contrast to what you said, I was *not* expecting the a1 in AS a1 to be treated as a variable. But no worries; Heikki/Andrew/Alvaro understood what I meant. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
Robert Haas wrote: On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine dfonta...@hi-media.com wrote: I think they call that dynamic scope, in advanced programming language. I guess that's calling for a quote of Greenspun's Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp. So the name of the game could be to find out a way to implement (a limited form of) dynamic scoping in PostgreSQL, in C, then find out all and any backend local variable that needs that to support autonomous transactions, then make it happen… Right? Interestingly, PostgreSQL was originally written in LISP, and there are remnants of that in the code today; for example, our heavy use of List nodes. But I don't think that has much to do with this project. I plan to reserve judgment on the best way of managing the relevant state until such time as someone has gone to the trouble of identifying what state that is. It would probably do Pg some good to try and recapture its functional language roots where reasonably possible. I believe that, design-wise, functional languages really are the best way to do object-relational databases, given that pure functions and immutable data structures are typically the best way to express anything one would do with them. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autonomous transactions (was Re: TODO note)
Robert Haas wrote: On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart colinth...@gmail.com wrote: I note that the implementation of tab completion for SET TRANSACTION in PSQL could benefit from the implementation of autonomous transactions (also TODO). I think it's safe to say that if we ever manage to get autonomous transactions working, there are a GREAT MANY things which will benefit from that. There's probably an easier way to get at that Todo item, though, if someone feels like beating on it. One problem with autonomous transactions is that you have to figure out where to store all the state associated with the autonomous transaction and its subtransactions. Another is that you have to avoid an unacceptable slowdown in the tuple-visibility checks in the process. As I understand it, in many ways, autonomous transactions are like distinct database client sessions, but that the client in this case is another database session, especially if the autonomous transaction can make a commit that persists even if the initial session afterwards does a rollback. Similarly, using autonomous transactions is akin to multi-processing. Normal distinct database client sessions are like distinct processes, but usually are started externally to the DBMS, but autonomous transactions are like processes started within the DBMS. Also, under the assumption that everything in a DBMS session should be subject to transactions, so that both data-manipulation and data-definition can be rolled back, autonomous transactions are like a generalization of supporting sequence generators that remember their incremented state even when the action that incremented it is rolled back; the sequence generator update is effectively an autonomous transaction, in that case. The point being, the answer to how to implement autonomous transactions could be as simple as, do the same thing as how you manage multiple concurrent client sessions, more or less. If each client gets its own Postgres OS process, then an autonomous transaction just farms out to another one of those which does the work. Or maybe there could be a lighter weight version of this. Does this design principle seem reasonable? If autonomous transactions could be used a lot, then maybe the other process could be kept connected and be fed other subsequent autonomous actions, such as if it is being used to implement an activity log, so some kind of IPC would be going on. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers