Re: [pgsql-advocacy] [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Darren Duncan
Is there a good reason for this thread being copied to the advocacy list?  It 
seems to me just on topic for hackers. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [pgsql-advocacy] [HACKERS] First draft of update announcement

2014-03-19 Thread Darren Duncan

On 2014-03-18, 2:42 PM, Josh Berkus wrote:

Other PostgreSQL 9.3 only fixes in this update include:

* Add read-only data_checksum parameter


I recall being told last fall that this would not be added to 9.3.x (9.3.1 at 
the time I think) and only to 9.4.x because such a feature addition was 
something only allowed for major releases and not minor ones which were just 
supposed to be security and bug fixes.


So what changed that it is added in 9.3.x after all?

-- Darren Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2013-08-27 Thread Darren Duncan

On 2013.08.27 7:57 PM, Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:

I have a proposal.



Assuming we decide to do away with RULEs,


You lost me already.

If we had replacement functionality for everything that can be done with
rules, we could start to think about when we might begin to tell people
they can't use rules.  Unfortunately, we don't have that precondition.

regards, tom lane


That's a really old post/thread, and I'm not arguing for any kind of action 
related to RULEs, please disregard the message. -- Darren Duncan





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Exorcise zero-dimensional arrays (Was: Re: Should array_length() Return NULL)

2013-06-12 Thread Darren Duncan

On 2013.06.12 10:11 AM, Josh Berkus wrote:

Multidim arrays are why we can't have nice things.


Yeah, I think that was not our best moment.  :-(


Actually, if you think about it, we've never had arrays in PostgreSQL
... we have always had matrixes.  If you think about things that way,
most of the current functionality makes sense.


If the type system were changed so that arrays were were always just 
one-dimensional, you can define your matrix simply as a binary relation type 
whose primary key attribute has the type of a fixed-length array of integers, 
where the number of elements in the array is the number of dimensions in the 
matrix, and the array elements themselves defined the coordinates in the matrix. 
 This design confers a number of benefits.  Also, the case of the 
zero-dimension matrix needs no special treatment; the key array has zero 
elements.  Would that not work? -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-05-20 Thread Darren Duncan
 being too specific 
to details of one.  And of course, cross-invocation of code written in multiple 
languages is made much easier.


Note, just to be clear, my proposal does not necessitate that all of a node tree 
has to be kept in memory at once.  This design should be adaptable to a 
streaming approach, especially as it is expected to be able to handle database 
dumps or transfers of arbitrary size, same as SQL engines can today.  That is in 
contrast to probably what most application languages would assume, where 
everything would fit in memory at once.  But the ability to stream or not would 
largely be an implementation detail.  Realistically, all code should fit in 
memory at once, and anything that would have to be buffered out of memory would 
be say embedded data literals (whether large strings or simply large relations).


If you're not sure how my proposal would address any of the needs or wants 
raised in the thread, go ahead and ask, and I will try and answer as time permits.


-- Darren Duncan

On 2013.05.11 9:27 AM, Tom Lane wrote:

David Fetter da...@fetter.org writes:

On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:

Some kind of extendable parser would be awesome.  It would need to tie
into the rewriter also.

No, I don't have a clue what the design looks like.



That's a direction several of the proprietary RDBMS vendors have
proposed.  I think it'd be great :)



Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
to pass expression trees in and out.  This could have other benefits
as to clustering space, shortening the planning cycle, etc., but let's
not go there for now.  My knowledge is very, very sketchy, but when I
squint, the expression trees we use look a lot like JSON.  Are they
isomorphic?


By the time you've got an expression tree, the problem is mostly solved,
at least so far as parser extension is concerned.

More years ago than I care to admit, I worked on systems that had
run-time-extensible parsers at Hewlett-Packard, so technology for this
does exist.  But my (vague) memory of those systems is that the parser's
language capabilities were more limited than bison's, perhaps only
LL(1).  Parsing spec-compatible SQL that way might be a challenge.

A larger issue is that if you don't have the whole grammar available
to check, it's difficult to be sure there are no parsing conflicts.
I seem to remember that we hit some conflicts between different
extension ROMs back at HP :-(

Another point is that extensions that are actually interesting require
a lot more than new syntax.  Robert mentioned the rewriter, but you'd
typically need planner and executor additions as well.  It's possible to
see how whole new plan node types might be added by a plugin so far as
the executor is concerned, but I haven't a clue how we'd get the planner
to emit them ...





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing event trigger support functions in 9.3

2013-05-09 Thread Darren Duncan

On 2013.05.09 10:40 AM, Pavel Stehule wrote:

I am writing a article about 9.3. I found so event trigger functions is not
complete. We have only pg_event_trigger_dropped_objects() function. It looks
really strange and asymmetric. Can we implement similar function for CREATE as
minimum to 9.3?

I am expecting so this function should not be too complex - and can be moved to
contrib maybe (if it is too late now).


Really, the touted new event triggers feature in fact only works for dropping 
objects rather than for all DDL or other events?  If so, then the feature seems 
seriously incomplete.  Are drops the main expected use case for the feature, 
rather than tell me when something happened? -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing event trigger support functions in 9.3

2013-05-09 Thread Darren Duncan

On 2013.05.09 11:22 AM, Alvaro Herrera wrote:

Darren Duncan wrote:

On 2013.05.09 10:40 AM, Pavel Stehule wrote:

I am writing a article about 9.3. I found so event trigger functions is not
complete. We have only pg_event_trigger_dropped_objects() function. It looks
really strange and asymmetric. Can we implement similar function for CREATE as
minimum to 9.3?

I am expecting so this function should not be too complex - and can be moved to
contrib maybe (if it is too late now).


Really, the touted new event triggers feature in fact only works for
dropping objects rather than for all DDL or other events?


It works for all DDL.  It provides complete information for drops, not
other kinds of commands.  This will be expanded later.


If so, then the feature seems seriously incomplete.


Feel free to contribute during the 9.4 development cycle.

Your appreciation for the work that already went into this feature is
duly noted.


Sorry if I came across as not appreciating the work already done.  I in fact do 
appreciate it.


I made a mistake.  Please disregard my initial comment.

-- Darren Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Darren Duncan

On 2013.03.26 1:40 AM, Albe Laurenz wrote:

Darren Duncan wrote:

So, determining if 2 rows are the same involves an iteration of dyadic logical
AND over the predicates for each column comparison.  Now logical AND has an
identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results
in p for all p.  Therefore, any 2 rows with zero columns each are the same.

Since any 2 rows with zero columns are the same, the UNIQUE predicate is FALSE
any time there is more than 1 row in a table.

Does anyone agree or disagree with this logic?


Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.


Consider the context however.  We're talking about a UNIQUE constraint and so 
what we want to do is prevent the existence of multiple tuples in a relation 
that are the same for some defined subset of their attributes.  I would argue 
that logically, and commonsensically, two tuples with no attributes are the 
same, and hence a set of distinct tuples having zero attributes could have no 
more than one member, and so a UNIQUE constraint over zero attributes would say 
the relation can't have more than one tuple.  So unless someone wants to argue 
that two tuples with no attributes are not the same, my interpretation makes 
more sense and is clearly the one to follow. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-25 Thread Darren Duncan

On 2013.03.25 1:17 AM, Albe Laurenz wrote:

Darren Duncan wrote:

  From my usage and
http://www.postgresql.org/docs/9.2/interactive/sql-createtable.html I see that
Postgres requires constraints like unique (and primary) keys, and foreign keys,
to range over at least 1 attribute/column.

I propose that this be generalized so that constraints may instead be nullary,
that is, range over zero or more attributes/columns instead.


I don't think that a new way to force single-row tables is
a good enough use case, and for completeness' sake is only
a good argument if it simplifies things and makes them more
comprehensible, which I don't think is the case here.

The semantics seem unclear to me:
The standard defines UNIQUE on the basis of the UNIQUE predicate:
unique predicate ::= UNIQUE table subquery
and states:
1) Let T be the result of the table subquery.
2) If there are no two rows in T such that the value of each column
in one row is non-null and is not distinct
from the value of the corresponding column in the other row,
then the result of the unique predicate is
*True*; otherwise, the result of the unique predicate is *False*.

Since an imagined zero-column query would have an empty set of
result columns, you could with equal force argue that these columns
satisfy the condition or not, because the members of the empty
set have all the properties you desire.

So I see no compelling argument that such a UNIQUE constraint
would force a single-row table.


I do see that compelling argument, and it has to do with identities.

The above definition of UNIQUE predicate says that the UNIQUE predicate is 
FALSE iff, for every pair of rows in T, the 2 rows of any pair are the same. 
Further, 2 rows are the same iff, for every corresponding column, the values in 
both rows are the same.  Further, 2 such values are the same iff they are both 
not null and are mutually not distinct.


So, determining if 2 rows are the same involves an iteration of dyadic logical 
AND over the predicates for each column comparison.  Now logical AND has an 
identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results 
in p for all p.  Therefore, any 2 rows with zero columns each are the same.


Since any 2 rows with zero columns are the same, the UNIQUE predicate is FALSE 
any time there is more than 1 row in a table.


Hence, a UNIQUE constraint over zero columns signifies a row-comparison 
predicate that unconditionally results in TRUE, and so no two rows at all would 
be allowed in the table with that constraint at once, thus restricting the table 
to at most one row.


Does anyone agree or disagree with this logic?  And can the definition of 
UNIQUE predicate reasonably be read the way I stated?  And if the SQL 
definition is ambiguous, could it reasonably be said that the behavior I stated 
is reasonable for Postgres and would match expectations of people using it?


Saying 2 empty rows are the same makes as much sense as saying 2 empty arrays 
are the same, does it not?



The desired effect can be had today with a unique index:

CREATE TABLE singleton (id integer);
CREATE UNIQUE INDEX singleton_idx ON singleton((1));


Okay, that is helpful, and less of a kludge than what I was doing, but it is 
still a kludge compared to what I'm proposing, which I see as elegant.


-- Darren Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-25 Thread Darren Duncan

On 2013.03.25 5:55 PM, Craig Ringer wrote:

On 03/25/2013 10:28 PM, Tom Lane wrote:

Yeah, they are, because things break when they're set wrong.

They also make debugging and support harder; you need to get an
ever-growing list of GUC values from the user to figure out what their
query does. bytea_output, standard_conforming_strings, etc. Yick.

That said, I don't have a better answer for introducing non-BC changes.


Given the general trouble GUC values cause, is there a plan to deprecate and 
remove each of the existing ones over time?  As long as post-removal there isn't 
any actual loss of functionality, but users might have to change their code to 
do it the one true way, that would seem a good thing. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-25 Thread Darren Duncan

On 2013.03.25 6:03 PM, Darren Duncan wrote:

On 2013.03.25 5:55 PM, Craig Ringer wrote:

On 03/25/2013 10:28 PM, Tom Lane wrote:

Yeah, they are, because things break when they're set wrong.

They also make debugging and support harder; you need to get an
ever-growing list of GUC values from the user to figure out what their
query does. bytea_output, standard_conforming_strings, etc. Yick.

That said, I don't have a better answer for introducing non-BC changes.


Given the general trouble GUC values cause, is there a plan to deprecate and
remove each of the existing ones over time?  As long as post-removal there isn't
any actual loss of functionality, but users might have to change their code to
do it the one true way, that would seem a good thing. -- Darren Duncan


To clarify, I mean GUC related to backwards compatibility matters, such as 
bytea_output or standard_conforming_strings, things that affect the logical 
behavior of code.  I don't mean all GUC, not at all, most of the ones I know 
about should remain configurable. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] adding support for zero-attribute unique/etc keys

2013-03-24 Thread Darren Duncan
From my usage and 
http://www.postgresql.org/docs/9.2/interactive/sql-createtable.html I see that 
Postgres requires constraints like unique (and primary) keys, and foreign keys, 
to range over at least 1 attribute/column.


I propose that this be generalized so that constraints may instead be nullary, 
that is, range over zero or more attributes/columns instead.


Supporting this gives us more logical completeness, while not allowing lists of 
zero attributes is akin to having math without zeroes.


The semantics of having a nullary / zero-attribute unique/primary key on a table 
is to restrict the table to having at most one row at a time.


Practically speaking, I find that nullary primary/unique keys are useful in the 
numerous situations where one might want to have a table consisting of just one 
record (or zero), because it stores data that there is just a single instance 
of, such some kinds of config data, rather than a list of such.


Such tables let you store such data in a strongly typed and well self-documented 
manner, same as normal tables.


Currently in order to do this, I have to work around the lack of nullary keys by 
creating a table with a dummy column having a check constraint saying its value 
must equal a constant, and have a unary unique/primary key on that.  But this is 
an unpleasant kludge and one shouldn't have to do that.


Nullary foreign keys can also be useful, or at least should be supported for 
logical completeness.  The semantics of having one is to say that one table may 
only have rows if another table also has rows.


I was also going to say that it is important to be able to declare a composite 
type with zero attributes, which is a way to make a singleton type.  However I'm 
not sure that we don't already support this so I won't say more on that.


The primary request I have is nullary unique/primary key support, which I see 
has a solid use case.


(Note:  Logically speaking, if you have a nullary key, it would be the primary 
key, and all other unique keys are redundant.  Except perhaps that others can 
exist in order to support implementation matters of foreign keys that require 
their targets to be unique.)


How much work would it be to support this?  But also important, does anyone 
either agree it should be supported or does anyone want to counter-argue that it 
shouldn't be supported?


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-parseable object descriptions

2013-03-18 Thread Darren Duncan

On 2013.03.18 1:03 PM, Alvaro Herrera wrote:

For Dimitri's patch to add support for dropped objects in event
triggers, there's an open question about how to report objects that are
being dropped in a tabular format.


Now that JSON is a built-in type with 9.2+, could we not perhaps use that to 
represent some things in a more natural manner than a tabular format?  JSON is 
designed to be machine-parseable, and some objects such as routine definitions 
are naturally trees of arbitrary depth. -- Darren Duncan





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-10 Thread Darren Duncan

I agree with Jeff.

Options that change the language at initdb or create-database time just fragment 
the language.


It is best to just have 1 language where options are providable either 
dynamically per connection or otherwise lexically, so that then they are really 
just shorthands for the current local usage, and the language as a whole is the 
same.


That also means you can have example code out there and know it will work on any 
Postgres install, invariant of static global options.  If language modifiers are 
local or lexical, then any example code presumably would include the switches to 
turn them on for that example.


That all being said, I also think it is best to explicitly overload operators 
with extra parameter types, such as defining another operator with the signature 
of (Nunber,String) with the same base name as string catenation, rather than 
making numbers implicitly stringify.  But I can also accept implicit 
stringification / language behavior changes if it is a lexical/temporary effect 
that the same user is still explicitly turning on.


-- Darren Duncan

Jeff Davis wrote:
On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: 

And we not only don't give them the behavior they want; we
don't even have a meaningful way to give the option of opting into
that behavior at initdb or create-database time.


I strongly object to offering options that change the language in such a
substantial way. initdb-time options still mean that we are essentially
dividing our language, and therefore the applications that support
postgres, in half (or worse). One of the things I really like about
postgres is that we haven't forked the language with a million options
like mysql has. I don't even like the fact that we have a GUC to control
the output format of a BYTEA.

For every developer who says wow, that mysql query just worked without
modification there is another one who says oh, I forgot to test with
option XYZ... postgres is too complex to support, I'm going to drop it
from the list of supported databases.

I still don't see a compelling reason why opting out of overloading on a
per-function basis won't work. Your objections seemed fairly minor in
comparison to how strongly you are advocating this use case.

In particular, I didn't get a response to:

http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025

For what it's worth, I'm glad that people like you are pushing on these
usability issues, because it can be hard for insiders to see them
sometimes.

Regards,
Jeff Davis



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] feature proposal - triggers by semantics

2012-11-15 Thread Darren Duncan

Craig Ringer wrote:

On 11/15/2012 06:25 PM, Hannu Krosing wrote:

On 11/15/2012 09:48 AM, Craig Ringer wrote:

If you want to prevent TRUNCATE, deny the privilege or add a trigger
that aborts the command.

You can abort the transaction but not skip action as currently it is only
possible to skip in ROW level triggers.

So I'd modify this request to allow BEFORE EACH STATEMENT triggers
to also be able to silently skip current action like BEFORE EACH ROW
triggers can.

Then this request would simply be satisfied by a simple trigger which
rewrites TRUNCATE into DELETE .

That seems sensible to me, too.


To further explain ...

What I'm desiring here with respect to TRUNCATE is that users are allowed to use 
TRUNCATE syntax as an alternative to DELETE (they are GRANTed both) but that any 
triggers defined for DELETE can also be applied to TRUNCATE without too much 
difficulty.  So users can use different syntactic constructs that look similar 
without having to think about, is this going to be audited.


I understand that ROW level triggers don't exist yet for TRUNCATE (this is 
already clearly documented in the manual) and adding them would mean TRUNCATE 
would do a table scan in their presence.


I still think the syntax of TRUNCATE FOR EACH ROW would be useful, but if no one 
agrees, then I'll just make do with alternative solutions mentioned here.  That 
is, either a statement-level TRUNCATE trigger of some kind, or simply disallow 
TRUNCATE privileges.


Thank you.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] feature proposal - triggers by semantics

2012-11-15 Thread Darren Duncan

Dimitri Fontaine wrote:

Darren Duncan dar...@darrenduncan.net writes:

So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH
ROW


Kevin has been proposing that we consider an alternative approach in
some other cases that I think would work better for you, too. Namely, to
have access to OLD and NEW in FOR EACH STATEMENT triggers, where they
would be relations rather than records.

Regards,


Yes, I believe that would work very well.  In fact, that would provide some 
power features.  I look forward to this, probably the best solution. -- Darren 
Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] feature proposal - triggers by semantics

2012-11-14 Thread Darren Duncan
I have a feature request, which at one level should require little code change, 
but at another level may require more.


Since Postgres 9.3 is going to be doing some significant feature additions for 
triggers, I'd like to see some more.


As they currently exist, triggers always fire based on certain SQL syntax used, 
rather than on the semantics of what is actually going on.


I would like to see a new class of triggers that fire when particular database 
operations happen regardless of what SQL syntax was used.


As a simple example, I'd like to be able to define a trigger like AFTER DELETE 
ON foo FOR EACH ROW and have that trigger be invoked not only by a DELETE on 
foo but also by a TRUNCATE on foo.  So I would like to do some auditing action 
when a row of foo is deleted, no matter how it happens.


The reason this particular example in particular is important is that TRUNCATE 
is documented as a data-manipulation action semantically equivalent to an 
unqualified DELETE in its effects, primarily.  As such, I would expect the same 
triggers to fire as would for an unqualified DELETE.


The reason I propose it be a new kind of trigger is so that then we also retain 
the ability to declare triggers that fire on DELETE and not on TRUNCATE.


Less important, but also nice at least from the ability to be less verbose, is 
that said trigger could also run when an UPDATE happens, optionally, since an 
UPDATE can be considered semantically a DELETE+INSERT.  But adding the TRUNCATE 
support is most important because it simply doesn't exist now, while UPDATE you 
can get just by adding or update.


I suggest that the simplest way to add this feature is to just extend the 
existing syntax for defining a FOR EACH ROW so that TRUNCATE is also an option, 
besides INSERT/UPDATE/DELETE.


In that case, the semantics of the TRUNCATE statement could be altered as 
follows:  Iff TRUNCATE foo is invoked and foo has an TRUNCATE FOR EACH ROW 
trigger defined on it, then an unqualified DELETE FROM foo will be performed 
instead with its usual semantics.  If such a trigger is not defined on foo, then 
the old TRUNCATE semantics happen.


As such, this case of the feature can be added without breaking anything legacy.

So, I'm partly proposing a specific narrow new feature, TRUNCATE FOR EACH ROW, 
but I'm also proposing the ability to generally define triggers based not on the 
syntax used but the actual action requested.


A tangential feature request is to provide a runtime config option that can 
cause TRUNCATE to always behave as unqualified DELETE FROM regardless of any 
triggers, as if it were just a syntactic shorthand.  Or alternately/also provide 
extra syntax to TRUNCATE itself where one can specify which behavior to have, 
and both options can be given explicitly to override any config option.


-- Darren Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-12 Thread Darren Duncan

Josh Berkus wrote:

I don't think you're listening, none of those things are problems and
so not user hostile.


Having an upgrade fail for mysterious reasons with a cryptic error
message the user doesn't understand isn't user-hostile?  Wow, you must
have a very understanding group of users.

Lemme try to make it clear to you exactly how user-hostile you're being:

1. User downloads 9.2 today.
2. User builds a new application.
3. User finds the doc page on RULEs, decides they're a nifty concept.
4. New application includes some RULEs.

snip

I have a proposal.

Assuming we decide to do away with RULEs, change the *documentation* for RULEs 
right away in all supported maintenance branches (including 9.2), saying that 
RULEs will be deprecated, but don't change any code / add any warnings until 9.3.


Then, no later than the next bug/security fix minor release, 9.2.2/etc, the 
documentation for RULEs all says that, yes we have RULEs, but you shouldn't use 
them on any new projects as they are going away, and you should migrate any 
existing uses, and uses will warn starting in 9.3.0.  This documentation change 
can also be highlighted in a bullet point in the 9.2.2/etc release 
announcements.  If necessary, also make reference in the docs to some tool or 
procedure to help find any uses of RULEs and help with the migration.


Since this isn't a code change, it should be very conservative and be safe to 
include in maintenance branches, and it will alert users right where they're 
most likely to look, the RULEs documentation, without any undue delay.


How does that sound?

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Successor of MD5 authentication, let's use SCRAM

2012-10-12 Thread Darren Duncan

Stephen Frost wrote:

* Josh Berkus (j...@agliodbs.com) wrote:

Problem is, the fact that setting up SSL correctly is hard is outside of
our control.


Agreed, though the packagers do make it easier..


Unless we can give people a run these three commands on each server and
you're now SSL authenticating script, we can continue to expect the
majority of users not to use SSL.  And I don't think that level of
simplicity is even theoretically possible.


The Debian-based packages do quite a bit to ease this pain.  Do the
other distributions do anything to set up SSL certificates, etc on
install?  Perhaps they could be convinced to?


This has bit me.

At my work we started a project on Debian, using the 
http://packages.debian.org/squeeze-backports/ version of Postgres 9.1, and it 
included the SSL out of the box, just install that regular Postgres or Pg client 
package and SSL was ready to go.


And now we're migrating to Red Hat for the production launch, using the 
http://www.postgresql.org/download/linux/redhat/ packages for Postgres 9.1, and 
these do *not* include the SSL.


This change has been a pain, as we then disabled SSL when we otherwise would 
have used it.


(Though all database access would be over a private server-server network, so 
the situation isn't as bad as going over the public internet.)


How much trouble would it be to make the 
http://www.postgresql.org/download/linux/redhat/ packages include SSL?


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Successor of MD5 authentication, let's use SCRAM

2012-10-12 Thread Darren Duncan

John R Pierce wrote:

On 10/12/12 9:00 PM, Darren Duncan wrote:
And now we're migrating to Red Hat for the production launch, using 
the http://www.postgresql.org/download/linux/redhat/ packages for 
Postgres 9.1, and these do *not* include the SSL. 


hmm?  I'm using the 9.1 for CentOS 6(RHEL 6) and libpq.so certainly has 
libssl3.so, etc as references.  ditto the postmaster/postgres main 
program has libssl3.so too.   maybe your certificate chains don't come 
pre-built, I dunno, I haven't dealt with that end of things.


Okay, I'll have to look into that.  All I know is out of the box SSL just worked 
on Debian and it didn't on Red Hat; trying to enable SSL on out of the box 
Postgres on Red Hat gave a fatal error on server start, at the very least 
needing the installation of SSL keys/certs, which I didn't have to do on Debian. 
-- Darren Duncan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-11 Thread Darren Duncan

Josh Berkus wrote:

For 9.3, I suggest we create a DDL trigger by default which prevents
RULEs and throws an ERROR that explains they are now deprecated.


Well, even if we were considering this, the sequence would need to be:

1. Announce in 9.3 that RULES will be going away RSN.
2. In 9.4, send a warning every time someone loads/edits a user-defined
RULE.
3. In 10.0, get rid of CREATE RULE.


I think we can easily move up the first 2 steps.

1. Announce right now, effectively in 9.2, that RULES are going away soon.
2. In 9.3, send the warning.

Then optionally 3. In 9.4 can be where CREATE RULE is removed, or stay with 10.0 
there and we have a solid 2 years of warnings instead of one.


It seems to me that step 1 is completely outside the release cycle, as it 
doesn't involve changing any code.  Since 9.2 just came out, we can just do #1 
as of 9.2.


The only reason I see to delay #1 is if we aren't sure we're going to go ahead 
with it, and give a few months to think about it before announcing this major 
thing suddenly.  Waiting until 9.3 just to make an announcement is silly.


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: is JSON really a type (Re: [HACKERS] data to json enhancements)

2012-09-30 Thread Darren Duncan

Hannu Krosing wrote:
Reflecting over the dual possible interpretation of what it does mean to 
convert between text and json data types it has dawned to me that 
the confusion may come mainly from wanting json to be two things at once:


1. - a serialisation of of a subset of javascript objects to a string.

2. - a dynamic type represented by the above serialisation.

case 1
--

If we stick with interpretation 1. then json datatype is really no more 
than a domain based on text type and having a CHECK is_valid_json() 
constraint.

snip


case 2
--

My suggestions on using typecasts for convert-to-json were result of 
this interpretation of json-as-dynamic-type.


Having thought more of this I now think that we probably should leave 
JSON alone and develop an separate dynamic type here.

snip

I think it would be best to have 2 main JSON-concerning data types:

1. A proper subset of text consisting of every value meeting some 
is_valid_json() constraint, as a DOMAIN; every value of this type is a text.


2. A type that is disjoint from text, that is, no value of this type would 
compare as equal to any text value.  It would be considered a collection type, 
similarly to how an array or tuple or relation is, but having arbitrary depth 
and that is heterogeneous in the general case.


You could say that #1 is to textual source code what #2 is to a parsed syntax 
tree of that code.  Or that #1 is to textual XML what #2 is to an XML DOM.


It would be type #2 above that is the primary JSON type, which has all the 
special operators for working with JSON, while type #1 would be opaque, just a 
character string, and must be cast as type #2 in order to use any special 
operators on it.  Similarly, all the converting operators between other types 
and JSON would be with #2 only, and producing #1 must go through #2.


So call #1 say JSON_source and #2 say JSON_model, or JSON_text and JSON 
respectively.


That's how I think it should work.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-08-21 Thread Darren Duncan

That sounds like a good change to me. -- Darren Duncan

Craig Ringer wrote:

Hi all

I'm seeing lots of confusion from people about why:

REVOKE CONNECT ON DATABASE foo FROM someuser;

doesn't stop them connecting. Users seem to struggle to understand that:

- There's a default GRANT to public; and
- REVOKE removes existing permissions, it doesn't add deny rules

It'd really help if REVOKE consistently raised warnings when it didn't 
actually revoke anything.


Even better, a special case for REVOKEs on objects that only have owner 
and public permissions could say:


WARNING: REVOKE didn't remove any permissions for user blah. This 
table/db/whatever
has default permissions, so there were no GRANTs for user blah to 
revoke. See the documentation

for REVOKE for more information.


Opinions?


--
Craig Ringer






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: auto updatable views

2012-07-01 Thread Darren Duncan
My thoughts on this is that it would be a very valuable feature to have, and 
would make Postgres views behave more like they always were intended to behave, 
which is indistinguishible to users from tables in behavior where all possible, 
and that the reverse mapping would be automatic with the DBMS being given only 
the view-defining SELECT, where possible. -- Darren Duncan


Dean Rasheed wrote:

I've been playing around with the idea of supporting automatically
updatable views, and I have a working proof of concept. I've taken a
different approach than the previous attempts to implement this
feature (e.g., 
http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php),
instead doing all the work in the rewriter, substituting the view for
its base relation rather than attempting to auto-generate any rules or
triggers.

Basically what it does is this: in the first stage of query rewriting,
just after any non-SELECT rules are applied, the new code kicks in -
if the target relation is a view, and there were no unqualified
INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the
view is simply updatable. If so, the target view is replaced by its
base relation and columns are re-mapped. Then the remainder of the
rewriting process continues, recursively handling any further
non-SELECT rules or additional simply updatable views. This handles
the case of views on top of views, with or without rules and/or
triggers.


snip


Obviously there's still more work to do but the early signs seem to be
encouraging.

Thoughts?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] relation complex types

2012-06-02 Thread Darren Duncan

Jaime Casanova wrote:

I knew that we create an entry in pg_type for every table we create,
what i didn't know is that we actually create 2 entries.

for example CREATE TABLE foo (i int); will create types foo and _foo.
so, any reason to create 2 entries?


I don't know offhand; maybe its the corresponding row/tuple type and 
table/relation type?



anyway, what really kept my attention is that CREATE SEQUENCE also
create an entry in pg_type. there is any reason for that?


This I'm fairly sure, is due to a sequence generator being implemented as a 
table/relation-typed variable, so it has a corresponding type like the regular 
table/relation-typed variables in the database.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transformations between types and languages

2012-05-15 Thread Darren Duncan

Peter Eisentraut wrote:

Here is a draft design for the transforms feature, which I'd like to
work on.  The purpose of this is to allow adapting types to languages.
The most popular case is to enable converting hstore to something useful
like a dict or a hash in PL/Python or PL/Perl, respectively.  In
general, the type and the language don't know of each other, and neither
need to be in core.  Maybe you want to adapt PostGIS types to pygeometry
objects in PL/Python (made up example, but you get the idea).


This is a good idea in principle.

I expect we should be able to use the same syntax both for system-defined types 
and user-defined types.


I would expect, though, that in some common cases one can't avoid say having to 
call hstore_to_plpython() directly, in order to disambiguate, and we may want to 
provide terser syntax for using the desired TRANSFORM.


For example, if we have a Perl 5 hash, that could reasonably either map to an 
hstore or to a tuple.  Or a Perl 5 string with false utf8 flag could map to 
either a character string or a byte string.  Or a Perl 5 empty string (result of 
1==0) could map to the false Boolean.  Or a Perl 5 string that looks like a 
number could map to either a character string or some kind of numeric.  Or a 
Perl 5 number 1 could map to either a numeric 1 (result of 1==1) or the true 
Boolean.


Or we have to tighten the conversion rules so that things which are sometimes 
equivalent and sometimes not on one side have different interpretations in the 
transform.


Ideally the feature would also work not only for interfacing with PLs but also 
with client languages, since conceptually its alike but just differing on who 
calls who.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Darren Duncan
Your proposal makes me think of something similar which might be useful, 
INclusion constraints.  As exclusion constraints might be thought of like a 
generalization of unique/key constraints, inclusion constraints are like a 
generalization of foreign key constraints.  The inclusion constraints 
basically allow some comparison operator other than is-equal to test if values 
in one table match values in another table, and the constraint allows the former 
if the test results in true.  An example of said inclusion test is whether the 
range in one table is contained in a range in another table.  I assume/hope 
that, similarly, now that we have range types in 9.2, that the existing 
exclusion constraints can be used with range comparison operators.


As to your actual proposal, it sounds like a generalization of the relational 
join or set intersection operator where instead of comparing sets defined in 
terms of an enumeration of discrete values we are comparing sets defined by a 
range, which conceptually have infinite values depending on the data type the 
range is defined over.  But if we're doing this, then it would seem to make 
sense to go further and see if we have set analogies for all of our relational 
or set operators, should we want to do work with non-discrete sets.


Now this sounds interesting in theory, but I would also assume that these could 
be implemented by an extension in terms of existing normal relational operators, 
where each range value is a discrete value, combined with operators for unioning 
or differencing etc ranges.  A relation of ranges effectively can represent a 
discontinuous range; in that case, the empty discontinuous range is also 
canonically representable by a relation with zero tuples.


Jeff, I get the impression your proposal is partly about helping performance by 
supporting this internally, rather than one just defining it as a SQL function, 
am I right?


-- Darren Duncan

Jeff Davis wrote:

I hope this is not an inappropriate time for 9.3 discussions. The flip
side of asking for submissions in the first couple commitfests means
that I need to submit proposals now.

What is a Range Join?

See attached SQL for example. The key difference is that the join
condition is not equality, but overlaps ().

Problem statement: slow. Nested loops are the only option, although they
can benefit from an inner GiST index if available. But if the join is
happening up in the plan tree somewhere, then it's impossible for any
index to be available.

Proposed solution: a modified merge join that can handle ranges.

 1. Order the ranges on both sides by the lower bound, then upper bound.
Empty ranges can be excluded entirely.
 2. Left := first range on left, Right := first range on right
 3. If Left or Right is empty, terminate.
 4. If lower(Left)  upper(Right), discard Right, goto 2
 5. If lower(Right)  upper(Left), discard Left, goto 2
 6. return (Left, Right) as joined tuple
 7. Right := next range on right
 8. goto 3

If we get step 4 or step 5 keeps getting triggered, and a btree index is
available (ordered by lower bound), we can re-probe to go to the correct
position, and consider that the new top range on that side. This is an
optimization for the case where there are large numbers of ranges with
no match on the other side.

Thanks to Nathan Boley for helping me devise this algorithm. However,
any bugs are mine alone ;)

Weaknesses: I haven't thought through the optimization, but I suspect it
will be hard to be very accurate in the costing. That might be OK,
because there aren't very many options anyway, but I'll need to think it
through.

Questions:

 * Is this idea sane? -- that is, are ranges important enough that
people are willing to maintain a new operator?
 * The more general problem might be spatial joins which can operate
in N dimensions, and I doubt this would work very well in that case.
Does someone know of a spatial join algorithm (without IP claims) that
would be as good as this one for ranges?
 * Other thoughts?

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

Pavel Stehule wrote:

2011/10/30 Darren Duncan dar...@darrenduncan.net:

I agree that this feature would be quite useful and should be included in
SQL. The exact syntax is less of an issue, but just the ability to cleanly
say select all columns except for these.  I have in fact argued for the
same feature in the past.

If you want to and can implement this feature then more power to you.  I'll
look forward to it being in Pg 9.2.

I think then the only discussion point should be what (terse) syntax to use
for it, not whether the feature should exist at all.

Arguing against this feature is like arguing against supporting where
not() or except or not in.  One should be able to do complements not
only of rows but of columns too.  Basic good language design.


My practice  speaks so this is not true - I don't know only bad
designed projects or very bad designed projects that needs too.

I don't see any reason why do it on SQL level.

It can sence only in psql as same special filter - if we would to
enhace a report features there.


The SQL level is exactly the correct and proper place to do this.

Its all about mathematical parity.  That is the primary reason to do it.

- SELECT * gives you a whole set.
- SELECT foo, bar gives you a subset of that.
- SELECT ALL BUT foo, bar gives you the complementary subset.

There's a variety of uses for specifying complementary subsets, and when the 
clean syntax is available, people will start using it for cleaner code, even if 
they previously had workarounds.


The complementary subset should be implemented in exactly the same place and 
manner as the forward subset, on the SQL level.


Arguing against this is like arguing against a subtraction operator, because we 
can emulate using addition plus negation, or saying subtraction should just be a 
special filter in a client app.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan
I think the loose consensus I've picked up from people in this thread is that 
the ability to tersely specify a complementary subset of columns is something 
that is simple enough and wouldn't hurt us to have it but that its utility is 
limited such that a lot of people wouldn't want to do the work to implement it 
either.


Eric B. Ridge wrote:

Anyways, looks like it might be an uphill battle to get the idea accepted
(let alone any code I write!), but I ain't givin' up just yet.


I think this is the bottom line here.

The real question to ask ourselves is, if Eric Ridge is willing to do all the 
work to implement this feature, and the code quality is up to the community 
standards and doesn't break anything else, then will the code be accepted?


Its one thing to argue whether a new small feature is useful enough to go to the 
trouble to implement, and its another thing to argue whether that feature is 
harmful enough to reject a free working implementation (of otherwise conforming 
code quality) from someone who has already gone to the trouble to implement it.


Eric, if you want to implement this, I say more power to you, and I will use it.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

David Wilson wrote:

On Sun, Oct 30, 2011 at 1:10 AM, Darren Duncan dar...@darrenduncan.net wrote:


The SQL level is exactly the correct and proper place to do this.

Its all about mathematical parity.  That is the primary reason to do it.

- SELECT * gives you a whole set.
- SELECT foo, bar gives you a subset of that.
- SELECT ALL BUT foo, bar gives you the complementary subset.


That's not actually entirely true given the usual SQL (and
mathematical) meaning of set. This feature relates to the set of
attributes returned regarding elements of the returned set, not the
set itself- the actual returned set is identical regardless of the
column-specifier formulation. Claiming this as an SQL mathematical
purity issue is a bit disingenuous, as SQL set manipulation takes
place at the member level rather than the attribute level- SQL is
otherwise quite explicit about requiring explicit listings of the
attributes that the client is interested in regarding a returned set
of member rows.


SQL rowsets/tables are distinctly sets across two dimensions, both across 
columns and across rows.  You have sets of rows and sets of columns at the same 
time.  And it is useful to slice the rowset/table along either or both 
dimension, wherein you produce a subset in that dimension.  We presently have 
the terse syntax for specifying both the subset we do want and the subset we 
want the complement of, for rows, but not for columns.  It is true that columns 
and rows are not the same, but they are both still sets.



Arguing against this is like arguing against a subtraction operator, because
we can emulate using addition plus negation, or saying subtraction should
just be a special filter in a client app.


That would be true if this was an argument against WHERE or
EXCEPT. Column specification and row specification are very
different and cannot be conflated.


Well I raised the WHERE/EXCEPT analogy in my initial reply.

Not conflating rows and columns is why we have different syntax to work with 
them.


That's not to say this proposal is without merit, merely that your
arguments for it are poorly founded and not particularly relevant.


I disagree, but regardless, other arguments have been made for the feature that 
are more based in utility, and I agree with those, how having the feature can 
save a lot of users a lot of work.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:

The real question to ask ourselves is, if Eric Ridge is willing to do all the
work to implement this feature, and the code quality is up to the community 
standards and doesn't break anything else, then will the code be accepted?


It's entirely possible that it will get bounced on standards-compliance
grounds.  In particular, I don't think it's acceptable to introduce a
new reserved keyword for this --- that would fall under the fails to
not break anything else category.

regards, tom lane


Well then we come up with a (SQL-level) syntax for the feature that doesn't 
introduce new reserved keywords.


As I said before, the important thing is to have the feature, and that the exact 
syntax is the main point to discuss.


Postgres already has a number of syntactic features that aren't in the SQL 
standard and coexist, and so we add one of those.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

Eric Ridge wrote:

On Sun, Oct 30, 2011 at 4:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

That's just a gut feeling, I've not tried it ... but the proposed
syntax sure looks a lot like a call to a function named EXCLUDING.


I think what makes it okay is that its new use is only defined to
immediately follow an asterisk in the target_el production.  If you
look at gram.y:11578 (from git HEAD), I was thinking this:

snip

That's an interesting thought.

I had previously envisioned EXCLUDING to appear exactly once in the SELECT list, 
where it functions as a column analogy to EXCEPT for rows.  So you'd say:


  SELECT list of stuff to keep EXCLUDING list of stuff to skip FROM ...

But then your comment makes me think that your EXCLUDING actually could have a 
much tighter scope, and as a result might appear multiple times, like AS can:


  SELECT foo.* EXCLUDING foo.x, bar.* EXCLUDING bar.y, baz.z, (a+b) AS c FROM 
...

So if you do this, EXCLUDING has syntactic precedence similar to AS, and while 
AS is only valid not following a *, EXCLUDING is only valid following a *, and 
so EXCLUDING directly modifies a single * and not the SELECT list in general.


Is that where you're going with this?

If so, I think that would make the feature even more valuable and more 
syntactically clean than I had previously thought.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

Eric Ridge wrote:

I don't actually like the term EXCLUDING, but it conveys what's
happening and is already defined as a keyword.  I thought about
EXCEPT, but that doesn't work for obvious reasons, and NOT might
just be confusing.


How about BUT?

Is that already in use by something?  Its nice and short and conveys the 
except meaning.


And there is already precedent for using that word for this purpose.

CJ Date already uses ALL BUT in his literature as a modifier to his 
illustrative relation projection syntax to give the complementary projection, 
like with r{x,y} vs r{all but x,y}.


Also, a more tenuous connection, Larry Wall likes but as logical modifier.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Darren Duncan

Mark Mielke wrote:

On 10/30/2011 03:50 PM, Eric Ridge wrote:

Changes of omission can break your code just as easily.


I think I wasn't as clear as I intended. In many ways, I think use of 
* in the first place is wrong for code (despite that I do it as well). 
Therefore, * EXCLUDING (...) would also be wrong. It comes to does 
the code know what it wants?

snip


select * is not deterministic from a programming perspective.


I understand what you're saying.  However, we're stuck with * because it is in 
the standard and is widely used, and if we have * anyway, then the exclusion 
proposal is just an enhancement to that.  So there is no reason to reject the 
complementary columns feature because of the problems with select *; you might 
as well argue to get rid of select *. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-29 Thread Darren Duncan
I agree that this feature would be quite useful and should be included in SQL. 
The exact syntax is less of an issue, but just the ability to cleanly say 
select all columns except for these.  I have in fact argued for the same 
feature in the past.


If you want to and can implement this feature then more power to you.  I'll look 
forward to it being in Pg 9.2.


I think then the only discussion point should be what (terse) syntax to use for 
it, not whether the feature should exist at all.


Arguing against this feature is like arguing against supporting where not() or 
except or not in.  One should be able to do complements not only of rows but 
of columns too.  Basic good language design.


-- Darren Duncan

Eric Ridge wrote:

Would y'all accept a patch that extended the SELECT * syntax to let
you list fields to exclude from the A_Star?

Quite regularly I'll be testing queries via psql and want to see all
the columns from a fairly wide table except maybe a giant text or xml
column.  A syntax like:

 SELECT * EXCLUDING (big_col1, big_col2) FROM foo;

would be pretty handy.  It would definitely save some typing in
certain cases.  It seems like such a syntax would better document the
intent of a query too, rather than leaving one wondering if big_col1
was supposed to be omitted from the target list or not.

Anyways, I just wanted to run the idea by youse guys before I put too
much more effort into it.  I've already made what appear to be the
minimum necessary changes to gram.y, and a few quick greps through the
code make me think the rest will be pretty easy.

Maybe the SQL spec says something about this and nobody's done the work yet?

Thanks for your input!

eric




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch releases upcoming this week

2011-09-20 Thread Darren Duncan

Tom Lane wrote:

Dave Page dp...@pgadmin.org writes:

On Tue, Sep 20, 2011 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:

As has been mentioned a couple times, we're well overdue for updates of
the back branches. �Seems like time to get that done, so we'll be
wrapping 8.2.x and up this Thursday for release Monday the 26th.



8.2 up, including 9.1.1? I'm not sure our QA guys will be able to cope
with verification of so many individual installers in that timeframe -
8.2 - 9.0 is hard enough to do in one go.


Well, all the pre-9.1 branches are definitely badly in need of updates.
9.1 maybe could go without at this point, but we do have one crasher bug
and one serious memory leak fixed there, neither new in 9.1.  I'd just
as soon not establish a precedent for not releasing the same fixes at
the same time in all branches.

How about we wrap them all, but you could let your team slip the 9.1
update for a day or so if they need more time?  It's certainly less
critical than the older branches.


I would prefer that all branches have synchronized patch releases as they seem 
to have had in the past, and that the latest production is included, 9.1.1 in 
this case, even if its change set is more minor. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there really no interest in SQL Standard?

2011-09-19 Thread Darren Duncan
FYI, one of the main goals of the Muldis D language is to be an open source SQL 
standard.  It is intended to satisfy both relational and NoSQL folks, and 
predates UnQL significantly.


Muldis D has always been published openly and is comprehensive enough to cover 
anything that SQL does, and anyone is welcome to improve it.


Moreover, this standard has built-in resilience against embrace, extend and 
extinguish by including explicit versioning with authorities (Perl 6 inspired 
that feature), so that if anyone forks the language, it is possible for the 
different versions to be easily distinguishable and non-conflicting, and in a 
more benign respect it is designed to be extensible so DBMSs can be free to 
evolve under it, adding unique features, while not causing compatibility 
conflicts with other DBMSs in the process.


Note that I have fallen behind in specifying a number of intended significant 
design improvements/simplifications to the spec proper, though much of this is 
hashed out in the laundry list TODO_DRAFT file in github.


-- Darren Duncan

Joe Abbate wrote:

On 09/19/2011 12:40 PM, Christopher Browne wrote:

On Mon, Sep 19, 2011 at 12:20 PM, David Fetter da...@fetter.org wrote:
Actually, I think it *is* a bad idea, as it would require construction
from whole cloth of kinds of mostly political infrastructure that we
don't have, as a community and aren't necessarily notably competent to
construct.

The nearest sort of thing that *could* conceivably be sensible would
be to participate in UnQL
http://www.unqlspec.org/display/UnQL/Home.  That's early enough in
its process that it's likely somewhat guidable, and, with the
popularity of NoSQL, being at the ground breaking of a common query
language to access that would likely be useful to us.

If we wanted to start a new standards process, I imagine it would best
involve embracing truly relational, stepping back to PostQUEL, and
promoting a standard based on something off more in that direction.


If I were looking for something truly relational I wouldn't go towards
JSON or NoSQL, I'd go with something like Dee
(http://www.quicksort.co.uk/ ) which IIRC were interested in building a
PostgreSQL inteface.


As much as that might sound like a terrible idea, trying to take
over SQL by forking it strikes me as a much *worse* idea.


My intention was not to take over anything.  I only think it may be
useful to discuss SQL features, informally or otherwise, with other open
source competitors such as SQLite, MySQL (brethren), Firebird, etc.,
and Josh, having been close to the MySQL camp (even physically, from
what I recall :-) is possibly well suited to start that discussion.

Joe




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Returned Array

2011-08-12 Thread Darren Duncan

David E. Wheeler wrote:

On Aug 12, 2011, at 6:17 PM, Alex Hunsaker wrote:

Anyway, the attached patch fixes it for me. That is when we don't have
an array state, just return an empty array.  (Also adds some
additional comments)


Fix confirmed, thank you!

+1 to getting this committed before the next beta/RC.


Policy question.  If this problem hadn't been detected until after 9.1 was 
declared production, would it have been considered a bug to fix in 9.1.x or 
would it have been delayed to 9.2 since that fix might be considered an 
incompatibility?  If the latter, then I'm really glad it was found now. -- 
Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-11 Thread Darren Duncan

Christopher Browne wrote:

Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1,
there's a pretty good reason NOT to support that, namely that this
breaks relational handling of tables.  PostgreSQL is a *relational*
database system, hence it's preferable for structures to be
relational, as opposed to hierarchical, which is what any of the
suggested nestings are.


I won't argue with whether or not nested naming is a good idea, but I will argue 
with your other comment about breaking relational handling.


A relational database is a database in which all data is kept in relation-typed 
variables, which SQL calls tables, and you can perform all queries and updates 
with just relation-valued expressions and statements.


Organizing the tables into a multi-level namespace, either fixed-depth or 
variable-depth, rather than using a flat namespace, does not make the database 
any less relational, because the above definition and any others still hold.


The less relational argument above is a red herring or distraction.  One can 
argue against namespace nesting just fine without saying that.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-11 Thread Darren Duncan
I will put my support for David Johnston's proposal, in principle, though minor 
details of syntax could be changed if using ! conflicts with something. -- 
Darren Duncan


David Johnston wrote:

On Mon, Jul 11, 2011 at 10:12 AM, Florian Pflug f...@phlo.org wrote:

On Jul11, 2011, at 07:08 , Darren Duncan wrote:

Christopher Browne wrote:
Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1, 
there's a pretty good reason NOT to support that, namely that this 
breaks relational handling of tables.  PostgreSQL is a *relational* 
database system, hence it's preferable for structures to be 
relational, as opposed to hierarchical, which is what any of the 
suggested nestings are.



Rather, the argument is that it was intentional for the structuring of

table naming to, itself, be relational, and changing that definitely has
some undesirable characteristics.


The need for recursive queries is the most obvious undesirable, but it's

not the only undesirable thing, by any means.

I do not see how recursive queries (really iteration of records) even enters
the picture...

Right now I can emulate a hierarchical schema structure via a naming scheme
- for example  schemabase_sub1_sub2_etc.  I am simply looking for a formal
way to do the above AND also tell the system that I want all schemas under
schemabase to be in the search path.  Heck, I guess just allowing for
simply pattern matching in search_path would be useful in this case
regardless of the presence of an actual schema hierarchy.  Using LIKE
syntax say: SET search_path TO schemabase_sub1_% or something similar.
The only missing ability becomes a way for graphical tools to represent the
schema hierarchy using a tree-structure with multiple depths.

I can see how adding . and .. and relative paths would confuse the issue
those are not necessary features of a multi-level schema depth.

The above, combined with a different separator for intra-level
namespace/schema delineation, would allow for an unambiguous way to define
and use a hierarchical schema with seemingly minimal invasion into the
current way of doing things. You could almost implement it just by requiring
a specific character to act as the separator and then construct the actual
schema using single-level literals and supporting functions that can convert
them into an hierarchy.  In other words, the schema table would still only
contain one field with the full parent!child as opposed to (schema,
parent) with (VALUES('parent',null),('child','parent')).

In other words, if we use ! as the separator, any schema named
parent!child  could be stored and referenced as such but then if you run a
getChildren(parent) function it would return child along with any other
schemas of the form parent!%.  In this case the % sign could maybe only
match everything except ! and the * symbol could be used to match ! as
well.

I could give more examples but I hope the basic idea is obvious.  The main
thing is that the namespace hierarchy usage is standardized in such a way
that pgAdmin and other GUI tools can reliably use for display purposes and
that search_path can be constructed in a more compact format so that every
schema and sub-schema is still absolutely referenced (you can even have the
SET command resolve search_path at execution time and then remain static
just like CREATE VIEW SELECT * FROM table.

David J.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-09 Thread Darren Duncan

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

If for some reason we needed to have tables that happened to be called
x.y.z and a.b.c accessible from a single SQL session, we could allow
that much more simply by allowing schemas to be nested.  Then we could
allow arbitrary numbers of levels, not just three.


FWIW, I actually tried to do that back when we first introduced schema
support (the fact that the code calls them namespaces and not schemas
is a leftover from that idea).  It turns out to be a whole lot harder
than it sounds, because of the ambiguity you get about which name goes
at what level.  A simple example of this is: if you write x.y in a
query, is that meant to be table x's column y, or is it meant to be
field y within a composite column x of some table in the query?
We've resolved that by requiring you to write (x).y when you mean
the latter, but it's not exactly an intuitive or pleasant answer.
In the same way, if namespaces can be nested to different levels,
it gets really messy to support abbreviations of any sort --- but
the SQL spec requires us to be able to do so.


What if you used the context of the calling code and resolve in favor of 
whatever match is closest to it?  The problem is related to general-purpose 
programming languages.


Basically start looking in the lexical context for an x and if you find one 
use that; otherwise, assuming we're talking about referencing code that lives in 
the database such as a function, look at the innermost schema containing the 
referencing code and see if it has a direct child named x; otherwise go up one 
level to a parent schema, and so on until you get to the top, and finding none 
by then say it doesn't exist.


If there are several x in this search sequence, only use the first one 
regardless of whether it has a y, so to prevent bugs from too much complexity. 
 Same for just looking for x by itself in fact, not just an x.y.


For the case of calling code that doesn't live in the database such as a 
client-side query, I believe there are session variables like current schema 
or such, and you can use this as the starting point for the search for x, 
looking first at what that schema directly contains, and then its parent, and so on.


Something like that.

Or ignore what I said about starting in a lexical context and do what you 
already do there, but keep what I said about relative order of schemas to 
search, only searching direct children of ancestors of the current code's 
context schema starting with the current context.


You could also come up with some relative name syntax such as filesystems 
support with their ../ and such, but that's further from standard SQL.


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-09 Thread Darren Duncan

Jeff Davis wrote:

On Fri, 2011-07-08 at 23:39 -0700, Darren Duncan wrote:
What if you used the context of the calling code and resolve in favor of 
whatever match is closest to it?  The problem is related to general-purpose 
programming languages.


Basically start looking in the lexical context for an x and if you find one 
use that; otherwise, assuming we're talking about referencing code that lives in 
the database such as a function, look at the innermost schema containing the 
referencing code and see if it has a direct child named x; otherwise go up one 
level to a parent schema, and so on until you get to the top, and finding none 
by then say it doesn't exist.


This is an example of where data languages and normal programming
languages have a crucial difference.

With a data language, you have this problem:
 1. An application uses a query referencing 'y.z.foo' that resolves to
internal object with fully-qualified name 'x.y.z'.
 2. An administrator creates object 'y.z.foo'.

Now, the application breaks all of a sudden.

In a normal prgramming language, if the schema of the two foos are
different, the compiler could probably catch the error. SQL really has
no hope of catching it though.

PostgreSQL has this problem now in a couple ways, but it's much easier
to grasp what you might be conflicting with. If you have multiple nested
levels to traverse and different queries using different levels of
qualification, it gets a little more messy and I think a mistake is more
likely.


Well, my search path suggestion was based on Tom Lane's comment that the SQL 
spec requires us to be able to [support abbreviations] and I expected it would 
be syntactically and semantically backwards compatible with how things work now.


FYI, with Muldis D, being more green fields, there are no search paths in the 
general case, and every entity reference is unambiguous because it has to be 
fully-qualified.


However, I also support relative references, and in fact require their use for 
references within the same database, which carries a number of benefits, at the 
cost of being a few characters more verbose than when using a search path.  So 
introducing new things with the same names in different namespaces won't break 
anything there, even if they are closer.  Its essentially like navigating a 
Unix filesystem but with . rather than /.


So for example, if you had 2 sibling schemas s1 and s2, each with 2 
functions f1,f2 and a table t, then s1.f1 would reference s1.f2 and s1.t 
as sch.lib.f2 and sch.data.t respectively, while s1.f1 would refer to the 
entities in s2 as sch.par.s2.lib.f1 and sch.par.s2.data.t and such (a function 
can also refer to itself anonymously as rtn if it's recursive).  The sch is 
like . in Unix and the par is like .. in Unix.  The data is for data 
tables or views (and cat is for catalog tables/views) while lib is for 
user-defined types, routines, constraints, etc (and sys is for built-in types 
and routines, but sys may be omitted and search paths exist just for 
built-ins).  Synonyms are also supported.


I don't expect you would adopt relative (fully-qualified) references, because 
the syntax isn't in standard SQL (I think), but I did.  Unless you like them and 
can come up with a syntax that will fit into how SQL does things.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-08 Thread Darren Duncan

Jeff Davis wrote:

On Thu, 2011-07-07 at 20:56 -0700, Darren Duncan wrote:

When you create a temporary table, PostgreSQL needs to add rows in
pg_class, pg_attribute, and probably other system catalogs. So there are
writes, which aren't possible in a read-only transaction. Hence the
error. And no, there is no workaround.

That sounds like a deficiency to overcome.

It should be possible for those system catalogs to be virtual, defined like 
union views over similar immutable tables for the read-only database plus 
mutable in-memory ones for the temporary tables.


Ideally, yes, from a logical standpoint there are catalog entries that
are only interesting to one backend.

But that doesn't mean it's easy to do. Remember that catalog lookups
(even though most go through a cache) are a path that is important to
performance. Also, more complex catalog interpretations may introduce
some extra bootstrapping challenges.


Are there any plans in the works to do this?


I don't think so. It sounds like some fairly major work for a
comparatively minor benefit.

Suggestions welcome, of course, to either make the work look more minor
or the benefits look more major ;)


What I said before was a simplification; below I present my real proposal.

I think an even better way to support this is would be based on Postgres having 
support for directly using multiple databases within the same SQL session at 
once, as if namespaces were another level deep, the first level being the 
databases, the second level the schemas, and the third level the schema objects.


Kind of like what the SQL standard defines its catalog/schema/object namespaces.

This instead of needing to use federating or that contrib module to use multiple 
Pg databases of the same cluster at once.


Under this scenario, we make the property of a database being read-only or 
read-write for the current SQL session associated with a database rather than 
the whole SQL session.  A given transaction can read from any database but can 
only make changes to the ones not read-only.


Also, the proper way to do temporary tables would be to put them in another 
database than the main one, where the whole other database has the property of 
being temporary.


Under this scenario, there would be separate system catalogs for each database, 
and so the ones for read-only databases are read-only, and the ones for other 
databases aren't.


Then the system catalog itself fundamentally isn't more complicated, per 
database, and anything extra to handle cross-database queries or whatever, if 
anything, is a separate layer.  Code that only deals with a single database at 
once would be an optimized situation and perform no worse than it does now.


Furthermore, federating databases is done with the same interface, by adding 
remote/foreign databases as extra databases at the top level namespace.


Fundamentally, a SQL session would be associated with a Pg server, not a 
database managed by such.  When one starts a SQL session, there are initially no 
databases visible to them, and the top-level namespace is empty.


They then mount a database, similarly to how one mounts an OS filesystem, by 
providing appropriate connection info, either just the database name or also 
user/pass or also remote host etc as is applicable, these details being the 
difference between using a local/same-Pg-cluster db or a remote/federated one, 
and the details also say whether it is temporary or initially read-only etc.


See also how SQLite works; this mount being analogous to their attach.

Such a paradigm is also how my Muldis D language interfaces databases; this is 
the most flexible, portable, extensible, optimizable, and elegant approach I can 
think of.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-08 Thread Darren Duncan

Jeff Davis wrote:

On Thu, 2011-07-07 at 23:21 -0700, Darren Duncan wrote:
I think an even better way to support this is would be based on Postgres having 
support for directly using multiple databases within the same SQL session at 
once, as if namespaces were another level deep, the first level being the 
databases, the second level the schemas, and the third level the schema objects.

snip
Then the system catalog itself fundamentally isn't more complicated, per 
database, and anything extra to handle cross-database queries or whatever, if 
anything, is a separate layer.  Code that only deals with a single database at 
once would be an optimized situation and perform no worse than it does now.


One challenge that jumps to mind here is that an Oid would need to
become a pair (catalog, oid). Even if the end result isn't much more
complex, getting there is not trivial.


Yes, but that would just be in-memory or in temporary places external to every 
database.  On disk internal to a database there would just be the oid.  In fact, 
another aspect of the database model I defined is that each database is 
entirely self-contained; while you can do cross-database queries, you don't have 
cross-database constraints, in the general case.



See also how SQLite works; this mount being analogous to their attach.


I'm not sure SQLite is the best example. It has a radically different
architecture.


Still, its an example I know of where you can access several clearly separable 
databases at once through a common namespace.  While one might argue this is a 
substitute for multiple schema support, I don't because with multiple schemas 
you can have integrity constraints that cross schemas.  The namespaces issue is 
largely orthogonal to self-containment or integrity in my model.


But look at Oracle too, at least how I understand it.

Oracle supports CONNECT TO ... AUTHORIZE .../etc SQL, meaning you can define 
what databases you are accessing within the SQL session, rather than having to 
do it externally.  I assume that Oracle's features correspond somewhat to my 
proposal, and so enable cross-database queries in the illusion that several 
databases are one.


Suffice it to say, I have thought through my proposed model for years, with one 
of its (and Muldis D's) express purposes in providing a common normalized 
paradigm that all the existing SQL DBMSs can map to with consistent behavior 
whether Oracle or SQLite, and I haven't stated all of it here (a lot more is in 
my published language spec).  Key mapping points are the boundaries of a 
database's self-definability.  And namespace nesting is actually 
arbitrary-depth, so accounting for everything from no native schema support to 
schema plus package namespace support.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-08 Thread Darren Duncan

Robert Haas wrote:

But if that's what you want, just don't put your data in different
databases in the first place.  That's what schemas are for.

If for some reason we needed to have tables that happened to be called
x.y.z and a.b.c accessible from a single SQL session, we could allow
that much more simply by allowing schemas to be nested.  Then we could
allow arbitrary numbers of levels, not just three.  The whole point of
having databases and schemas as separate objects is that they do
different things: schemas are just containers for names, allowing
common access to data, and databases are completely separate entities,
allowing privilege separation for (say) a multi-tenant hosting
environment.  We're not going to throw out the latter concept just so
people can use two dots in their table names instead of one.


I agree with what you're saying in general and that schema namespaces should be 
nestable to arbitrary levels.  One dot or two isn't an issue I have.


Dividing based on databases or on schemas is a big and important distinction.

I see that the semantic purpose of using multiple databases is to allow things 
to be completely independent and self-defined, where one can understand the 
meaning of any one database in isolation.  So one can take each of the 2 
databases and walk off with them in opposite directions, and each can still be 
used and understood.


Whereas, schemas are namespaces for organizing entities within a single database 
where any of those entities may be interdependent, such as defining a data type 
in one schema and using it as the declared type with a routine or table or 
constraint in another.


But just because you use multiple databases in order for them to be independent, 
sometimes one still wants to use them together, and an abstraction loosely like 
federating is useful here.



I think you should make more of an effort to understand how the system
works now, and why, before proposing radical redesigns.


Well yes, of course.  But that will take time and I think I already understand 
enough about it to make some useful contributions in the meantime.  How much or 
what I already know may not always come across well.  If this bothers people 
then I can make more of an effort to reduce my input until I have more solid 
things to back them up.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-07 Thread Darren Duncan

Guillaume Lelarge wrote [on pgsql-general]:

On Thu, 2011-07-07 at 16:01 +, mike beeper wrote [on pgsql-general]:

I have a function that creates a temp table, populate it with results
during intermediate processing, and reads from it at the end.  When
the transaction is marked as read only, it does not allow creation of
temp table, even though there are no permanent writes to the db.  Are
there any workarounds? The following block errors out.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED READ ONLY;
create temp table test(test int);


When you create a temporary table, PostgreSQL needs to add rows in
pg_class, pg_attribute, and probably other system catalogs. So there are
writes, which aren't possible in a read-only transaction. Hence the
error. And no, there is no workaround.


That sounds like a deficiency to overcome.

It should be possible for those system catalogs to be virtual, defined like 
union views over similar immutable tables for the read-only database plus 
mutable in-memory ones for the temporary tables.


Are there any plans in the works to do this?

On the other hand, if one can have lexical-scope tables (table-typed routine 
variables), and I know Pg 8.4+ has named subqueries which handle a lot of cases 
where temp tables would otherwise be used, I would certainly expect those to 
work when you're dealing with a readonly database.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-27 Thread Darren Duncan

Jeff Davis wrote:

On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:
I believe that the best general solution here is for every ordered base type to 
just have a single total order, which is always used with that type in any 
generic order-sensitive operation, including any ranges defined over it, or any 
ORDER BY or any ,,etc.  The built-in generic text type would have exactly 1 
system-defined collation that can't be changed, and it would be something simple 
and generic, such as simply sorting on the codepoint as integers.


Well, we're trying to support SQL, and SQL supports collations, so I
don't think we can just ignore that.


I'm not saying you can't support collations.  See also my reply to Tom.


I also agree with Tom that it's not a good idea. My reasons are:

 * Practical considerations, such as having a bunch of cruft from
duplicated types all over the system. With sufficient changes to the
type system, maybe that could be overcome. Or perhaps domains could be
used to make that work for range types (sort of), but the result would
not be very consistent with the rest of the system.


Yes, duplication can be avoided.


 * It doesn't seem to be based in any mathematical argument. A type is a
set of values, and there's no reason it can't have several total orders;
or no total order at all. So it appears to just be piggybacking on the
type system infrastructure as a place to hold the metadata for a total
order.


Yes, I agree that a type is a set of values, and a type can have 0..N total 
orders.  My proposal is just that, for those types that have at least 1 total 
order, exactly 1 of those is defined to be used implicitly in contexts where a 
total order is desired and no explicit collation is given, such as in ranges.



 * Who's to say that a compare function is the only way to specify a
total order? There might be other interfaces that would support
something closer to a lexicographic sort. So, from a theoretical
standpoint, trying to attach a single notion of total order to a type
seems strange, because there might be multiple interfaces for specifying
even one total order.


Thank you for bringing this up, the notion of multiple interfaces for specifying 
even one total order.  My example of a compare function was just an example, and 
it is valuable to consider that this may not be the only way to do it.



 * It would require extra explicit type annotations. If you have 12 text
types, the only way to practically use any text type is to constantly
specify which more-specific text type it actually is (probably using
the :: operator). That is not necessarily a bad choice if starting a
language from scratch and forming the syntax in a way that it's
reasonable to do. But this is SQL, and lots of type annotations are
un-SQL-like.


Well sometimes it doesn't hurt to suggest solutions from the point of view that 
one can start the language from scratch, because that provides a clean way to 
conceptualize and explain a feature.  And then people can find some middle 
ground that adapts benefits from that idea for the feature without changing SQL 
more than needed.  Witness the various improvements to Perl 5 that were first 
expressed in terms of Perl 6.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-27 Thread Darren Duncan

Jeff Davis wrote:

On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:

Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:
I believe that the best general solution here is for every ordered base type to 
just have a single total order, which is always used with that type in any 
generic order-sensitive operation, including any ranges defined over it, or any 
ORDER BY or any ,,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that.  So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.
I'm not saying that you can't use multiple orderings with a data type.  I'm just 
saying that the type only has *at most* one (possibly none) *native* ordering, 
which is what is used when you do something ordered-sensitive with the type, 
such as have a range.


So, are you saying that it would be impossible to have a range that uses
a different ordering? What about ORDER BY? What about BTrees?

And if those things can use different orders for the same type, then
what is the difference between what you are suggesting and a default
ordering for the type (which we already support)?

I suppose it's hard to tell what you mean by native.

Regards,
Jeff Davis


Maybe I'm just talking about default ordering then. -- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-26 Thread Darren Duncan

To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type to 
just have a single total order, which is always used with that type in any 
generic order-sensitive operation, including any ranges defined over it, or any 
ORDER BY or any ,,etc.  The built-in generic text type would have exactly 1 
system-defined collation that can't be changed, and it would be something simple 
and generic, such as simply sorting on the codepoint as integers.


When we want to have some other native ordering for an existing type, such as 
when we want to use a different text collation, we do this by creating a *new 
base type*, using CREATE TYPE or some shorthand thereof, and this new type 
defines its own ordering, such as that a particular text collation is used.


For example:

  CREATE TYPE text__en_US AS (v text);

  CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, because 
having a different text collation implies that we consider 'foo'::text__en_US 
and 'foo'::text__C to be different values.


I believe that any other approach is worse, and in particular I believe that 
creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes 
of some other type, whose set of member values is a subset of the other type's 
values, and that have the same ordering.  Multiple CREATE type over the same 
base type don't interfere with each other like multiple DOMAIN could.


Assuming that what CREATE TYPE produces is actually a base type, I believe there 
is no better solution using the facilities that SQL provides.


If there is concern about performance related to CREATE TYPE being a composite 
type, I'm sure it is possible to engineer an optimization for when the type has 
just 1 attribute so that performance isn't an issue.  The main point I'm trying 
to raise here is about semantics and good type systems.


Likewise, don't let concern about syntax for using values of such composite 
types.  Once again, there can be shorthands if necessary.


In fact, Postgres could provide a general shorthand for creating a composite 
type of 1 attribute whose purpose is to make one type that is like but unequal 
to another, and using this shorthand could also cause the composite type to 
overload/polymorph all the operators of it's attribute type, so that the syntax 
to define one is very short.


For example:

  CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is just 
shorthand for defining an ordering function for text__en_US.  A more general 
form could be:


  CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

  RANGE OF text__en_US

  RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like 
this:

  range('foo','bar')  # default collation

  range('foo'::text__en_US, 'bar'::text__en_US)  # en_us collation

If that seems verbose, I have a few words for you:

1.  Users should in practice name their wrapper types over their intended 
meaning, not their mechanics, such as like this (not using text for variety), 
and that may be more terse:


  CREATE TYPE acct_num WRAPS integer;  # inherits integer ordering by default

2.  If the wrapper types overload the base operators, either automatically or 
selectively (does it make sense to multiply an acct_num?), one doesn't have to 
keep unpacking and packing them to use them in most cases.  For example, I'd 
expect many text wrappers to polymorph catenation or substring etc, so no extra 
syntax.


3.  In practice, most literal values come from applications and are given to SQL 
code either as function parameters or bind parameter arguments.  While lots of 
example code may have literal values in it, I would think that most real-work 
code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US 
for example.  You'd more likely see the less common var::text__en_US or such.


So that's my position, CREATE TYPE on the regular types or the like is the best 
solution, and anything else is an inferior solution.


Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in en_US, but not in C, so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
  (anyelement, anyelement) - anyrange
? To work with the type system, you need to be able to figure out

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-26 Thread Darren Duncan

Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:
I believe that the best general solution here is for every ordered base type to 
just have a single total order, which is always used with that type in any 
generic order-sensitive operation, including any ranges defined over it, or any 
ORDER BY or any ,,etc.


We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that.  So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.


I'm not saying that you can't use multiple orderings with a data type.  I'm just 
saying that the type only has *at most* one (possibly none) *native* ordering, 
which is what is used when you do something ordered-sensitive with the type, 
such as have a range.


To be specific, if the type system supports a concept like Perl 6 roles (or 
other languages have similar concepts) where types can declare themselves 
members of a union type such as Ordered, then types of that union would have 
the native ordering and other types wouldn't and then generic range operators 
could be declared over ANYORDERED or such.


When we want to have some other native ordering for an existing type, such as 
when we want to use a different text collation, we do this by creating a *new 
base type*,


Nope.  This has all sorts of problems that you're conveniently ignoring,
beginning with the need to duplicate all of the infrastructure for the
type (such as non-ordering-related operators), and then moving into
difficulties arising from added ambiguity as to which operator is meant.


Well a related solution is to have exactly 1 text wrapper type which has 2 
attributes, one being the text value and the other being the collation name. 
Then you just have 1 type that does the job instead a separate one per 
collation.  But to keep the semantics, the collation name is part of the 
identity of the type.  For example:


  CREATE TYPE collated_text AS (t text, c collation);

The key point I'm trying to support is that collation issues are firmly attached 
to the text type, not the range type.


Anyway, if a better solution can be arrived at for the problem at hand, then 
good for the team; meanwhile, what I've proposed is the best one I can think of.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Darren Duncan

Tom Lane wrote:

Florian Pflug f...@phlo.org writes:

On Jun20, 2011, at 19:16 , Merlin Moncure wrote:

On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote:
hm, what if there *was( only one range type per base type, but in the
various contexts where specific ordering and collation was important
you could optionally pass them in?  Meaning, the specific ordering was
not bound rigidly to the type, but to the operation?



I suggested that previously here
  http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php



In the ensuing discussion, however, it became clear that by doing so
range types become little more than a pair of values. More specifically,
a range then *doesn't* represent a set of values, because whether or
not a value is in the range depends on a specific sort order.


Yeah, that doesn't seem like the way to go.  If a range value doesn't
represent a well-defined set of base-type values, we lose a lot of the
mathematical underpinnings for range operations.

So ... just how awful would it be if we hard-wired range types to always
use their base type's default btree sort ordering and the database's
default collation?  In principle that sucks, but I'm not sure how wide
the use-cases actually will be for other choices.

The other viable alternative seems to be to require those two properties
(btree opclass and collation) to be part of a specific range type
definition.  The complaint about that seemed to be that we couldn't
infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
that by identifying one range type as the default for the base type,
and then using that one in cases where we have no ANYRANGE input?

regards, tom lane


I still think that the most elegant solution is for stuff like collation to just 
be built-in to the base types that the range is ranging over, meaning we have a 
separate text base type for each text collation, and the text operators are 
polymorphic over all those base types.  Having collations and stuff as something 
off to the side not built-in to text/etc types is the root of the problem.  The 
range-specific stuff can remain ANYELEMENT and no special-casing is required. 
Also, besides range constructors, a generic membership test like value in 
range is polymorphic. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Darren Duncan

Florian Pflug wrote:

On Jun20, 2011, at 20:58 , Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:
I still think that the most elegant solution is for stuff like collation to just 
be built-in to the base types that the range is ranging over, meaning we have a 
separate text base type for each text collation, and the text operators are 
polymorphic over all those base types.  Having collations and stuff as something 
off to the side not built-in to text/etc types is the root of the

problem.

I tend to agree that this aspect of the SQL standard isn't terribly well
designed, but it's the standard and we're stuck with it.  We're not
going to support two parallel methods of dealing with collations.


Plus, you can always define a DOMAIN for every collation you intent to use,
and stay clear of COLLATE clauses except as part of these domain definitions.

Most interestingly, this is also the workaround Jeff Davis suggested for
those who absolutely need two range types over the same base type (i.e.
define one of the ranges over a domain).

best regards,
Florian Pflug


That DOMAIN-based solution ostensibly sounds like a good one then, under the 
circumstances.  What I *don't* want to see is for things like ranges to have 
their own collations and the like.  From the perspective of all range-specific 
things, the types over which they're defined like text should just have their 
own native ordering, which defines the range's sense of before and after. 
If DOMAIN effectively does that for text types, then that is the way to go. -- 
Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-12 Thread Darren Duncan

Robert Haas wrote:

On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug f...@phlo.org wrote:

I think the collation is going to have to be baked into the type
definition, no?  You can't just up and change the collation of the
column as you could for a straight text column, if that might cause
the contents of some rows to be viewed as invalid.

Now you've lost me. If a text range is simply a pair of strings,
as I suggested, and collations are applied only during comparison
and RANGE_EMPTY(), why would the collation have to be baked into
the type?

If you're referring to the case
 (1) Create table with text-range column and collation C1
 (2) Add check constraint containing RANGE_EMPTY()
 (3) Add data
 (4) Alter column to have collation C2, possibly changing
 the result of RANGE_EMPTY() for existing ranges.
then that points to a problem with ALTER COLUMN.


No, I'm saying that you might have a column containing  '[a, Z)', and
someone might change the collation of the column from en_US to C.
When the collation was en_US, the column could legally contain that
value, but now that the collation is C, it can't.  ALTER TABLE isn't
going to recheck the validity of the data when someone changes the
collation: that's only supposed to affect the sort order, not the
definition of what is a legal value.


You can have the same collation problem even without range types.

Consider the following:
 (1) Create table with the 2 text columns {L,R} and both columns have the 
collation en_US.

 (2) Add check constraint requiring L = R.
 (3) Add a record with the value 'a' for L and 'Z' for R.
 (4) Alter the columns to have the collation C.

Good language design principles demand that the semantics for this simplified 
case and the semantics for replacing {L,R} with a single range-of-text-typed 
column be the same, including what happens with CHECK and ALTER TABLE.


Likewise, anything that affects ORDER BY should affect {,,=,=} and friends 
the same way and vice-versa and likewise should affect range validity.


It makes sense for collation to be considered part of text data types, and 
changing collation is casting from one text type to another.  Generally 
speaking, any inherent or applied aspect of a text or other value (such as 
collation) that affects the results of any deterministic operations on those 
values (such as sorting) should be considered part of the data type of those values.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Darren Duncan

Jeff Davis wrote:

On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:

Merlin Moncure mmonc...@gmail.com writes:

right. hm -- can you have multiple range type definitions for a
particular type?

In principle, sure, if the type has multiple useful sort orderings.


Right. Additionally, you might want to use different canonical
functions for the same subtype.


I don't immediately see any core types for which we'd bother.


Agreed.


BTW, Jeff, have you worked out the implications of collations for
textual range types?


Well, it seems to work is about as far as I've gotten.

As far as the implications, I'll need to do a little more research and
thinking. But I don't immediately see anything too worrisome.


I would expect ranges to have exactly the same semantics as ORDER BY or  etc 
with respect to collations for textual range types.


If collation is an attribute of a textual type, meaning that the textual type or 
its values have a sense of their collation built-in, then ranges for those 
textual types should just work without any extra range-specific syntax, same 
as you could say ORDER BY without any further qualifiers.


If collation is not an attribute of a textual type, meaning that you normally 
have to qualify the desired collation for each order-sensitive operation using 
it (even if that can be defined by a session/etc setting which still just 
ultimately works at the operator rather than type level), or if a textual type 
can have it built in but it is overridable per operator, then either ranges 
should have an extra attribute saying what collation (or other type-specific 
order-determining function) to use, or all range operators take the optional 
collation parameter like with ORDER BY.


Personally, I think it is a more elegant programming language design for an 
ordered type to have its own sense of a one true canonical ordering of its 
values, and where one could conceptually have multiple orderings, there would be 
a separate data type for each one.  That is, while you probably only need a 
single type with respect to ordering for any real numeric type, for textual 
types you could have a separate textual type for each collation.


In particular, I say separate type because a collation can sometimes affect 
differently what text values compare as same, as far as I know.


On a tangent, I believe that various insensitive comparisons or sortings are 
very reasonably expressed as collations rather than some other mechanism, eg if 
you wanted sortings that compare different letter case as same or not, or with 
or without accents as same or not.


So under this elegant system, there is no need to ever specify collation at 
the operator level (which could become quite verbose and unweildy), but instead 
you can cast data types if you want to change their sense of canonical ordering.


Now if the various text-specific operators are polymorphic across these text 
type variants, users don't generally have to know the difference except when it 
matters.


On a tangent, I believe that the best definition of equal or same in a type 
system is global substitutability.  Ignoring implementation details, if a 
program ever finds that 2 operands to the generic = (equality test) operator 
result in TRUE, then the program should feel free to replace all occurrences of 
one operand in the program with occurrences of the other, for optimization, 
because generic = returning TRUE means one is just as good as the other.  This 
assumes generally that we're dealing with immutable value types.


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Darren Duncan

Jeff Davis wrote:

On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
Can Pg be changed to support . in operator names as long as they don't just 
appear by themselves?  What would this break to do so?


Someone else would have to comment on that. My feeling is that it might
create problems with qualified names, and also with PG's arg.function
call syntax.


With respect to qualified names or arg.function, then unless the function 
can be symbolic, I considered your examples to be the appear by themselves, 
hence . by itself wouldn't be a new operator, and I generally assumed here 
that any multi-character operators with . to be symbolic.


In any event, I also saw Tom's reply about DOT_DOT being a token already.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Darren Duncan

Jeff Davis wrote:

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?
I believe that ranges aka intervals are widely useful generic types, next after 
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.


I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.


Generally speaking, the best way to go about this is to define the *generic* 
data type in the core, and leave most operators to extensions.  So, in core, we 
need to have the way to select a range value over ANYTYPE either completely as a 
value literal or in terms of endpoint values from arbitrary expressions or 
variables, store the range value in a database, retrieve it, and access its 
component attributes (endpoints, open/closed) in user-defined constraint and 
operator definitions.


The fundamental value of ranges is the fact that they're a concise way to store 
and express an interval over an ordered type, and to either compare such 
intervals or test whether individual values or sets of values are in intervals. 
 And people do that a *lot* (such as with dates), so I see having this range 
type, which is generic and orthogonal to other types in the same way as arrays 
or tables are, in core just makes the most sense, and as previously illustrated, 
ranges are useful in places one might not always think about.


Ranges are also much more flexible than BETWEEN for what it does, because AFAIK 
you can't indicate open or closed with BETWEEN.


You should not need to define separate range types or operators for each ordered 
type, same as you should not have to do so for arrays, or where such 
functionality is defined should be similar; whatever functionality for arrays 
you do or don't define in core, do corresponding things for ranges.


Now assuming that a range/interval value is generally defined in terms of a pair 
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or 
{,,=,=} etc or LIMIT makes sense), it will be essential that this value is 
capable of distinguishing open and closed intervals.


Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.


On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges 
is actually being overloaded for 2 very distinct concepts, which are probably 
best dealt with separately, possibly as distinct types.


This discussion came up in the development of Perl 6 too, and that discussion is 
probably worth looking into.


Ranges/intervals in the general sense can *not* be used to enumerate a list of 
values in a standard type-sensical manner, such as FOREACH requires. 
Ranges/intervals are about *comparison*, meaning combinations of tests of how 2 
arbitrary values of an ordered type sort relative to each other, and that's it. 
 This usage works for integers, other numbers, strings, dates, and so on, all 
in a natural manner.


Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are 
used in distinct contexts.  Enumeration requires next/prev-value operators, 
while ranges/intervals in general do not.  Enumeration requires discrete types 
(or the faking of such) like integers while ranges work for continuous types.


Moreover, in practice, one probably wants enumerations to be more flexible than 
just monotonic increases.  With enumerations you'd probably want to start go 
top-down or bottom-up, you might want to increase geometrically or by some other 
formula rather than incrementally.


I totally agree with sharing syntax and using ranges/intervals to define 
sequence generators, but a range value should be considered immutable like a 
number or string while a sequence generator may mutate.


For syntax, one could use x..y to define an interval while x...y for a 
sequence generator, or that's what Perl 6 does.


See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that 
talks about how Perl 6 does ranges.


Also, if Postgres has some concept of type-generic special values -Inf and +Inf 
(which always sort before or after any other value in the type system), those 
can be used as endpoints to indicate that the interval is unbounded.


I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.


Well, what you have is the least one would want.

Unless you have some

Re: [HACKERS] Range Types and extensions

2011-06-05 Thread Darren Duncan

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?


I believe that ranges aka intervals are widely useful generic types, next after 
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.


In particular, the usefulness of ranges/intervals is often orthogonal to many 
other things, and for many types including numbers, strings, temporals.


Now assuming that a range/interval value is generally defined in terms of a pair 
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or 
{,,=,=} etc or LIMIT makes sense), it will be essential that this value is 
capable of distinguishing open and closed intervals.


For example, a range value can be represented by a tuple with 4 attributes, 
where two of those are the endpoint values, and two of those are booleans saying 
whether each of the endpoints is inside or outside the range/interval.


Also, if Postgres has some concept of type-generic special values -Inf and +Inf 
(which always sort before or after any other value in the type system), those 
can be used as endpoints to indicate that the interval is unbounded.


Unless you have some other syntax in mind, I suggest lifting the range literal 
syntax from Perl 6, where .. is an infix operator building a range between its 
arguments, and a ^ on either side means that side is open, I think; so there 
are 4 variants: {..,^..,..^,^..^}.


Now as to general usefulness of intervals ...

Any operation that wants to deal with a range somehow, such as the BETWEEN 
syntax, could instead use a range/interval; for example, both of:


  foo in 1..10

  foo between 1 and 10

... would mean the same thing, but the 1..10 can be replaced by an arbitrary 
value expression or variable reference.


Likewise with:

  date in start ..^ end

  date = start and date  end

... mean the same thing.

The LIMIT clause could take a range to specify take and skip count at once.

Array slicing can be done using foo[first..last] or such.

A random number generator that takes endpoints can take a range argument.

An array or relation of these range can represent ranges with holes, and the 
general results of range union operations.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types and extensions

2011-06-05 Thread Darren Duncan

Pavel Stehule wrote:

2011/6/6 Darren Duncan dar...@darrenduncan.net:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?


it should be supported by FOREACH statement in PL/pgSQL


Yes, absolutely.  I know this feature is loved in Perl.  But this usage would 
only work for a more limited range of data types, namely those over which one 
can build a sequence generator, such as integers, because they have a 
next-value/prev-value function defined.  In other words, while range types in 
general would work for any ordered type, FOREACH would only work for the subset 
of those that are ordinal types. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] deprecating contrib for PGXN

2011-05-17 Thread Darren Duncan

I have missed it if this was discussed before but ...

Would now be a good time to start deprecating the contrib/ directory as a way to 
distribute Pg add-ons, with favor given to PGXN and the like instead?


It would make sense to leave contrib/ alone for 9.1, but I believe that it 
should start slimming down as we move towards 9.2, with any content that can 
easily be migrated to PGXN/etc being taken out of contrib/ .


Or, the policy would be to stop adding new things to contrib/ except in the odd 
case where that is surely the best place to put it, so only the legacy things 
are there, and for the legacy things, they are removed case-by-case as workable 
distributions for them first appear on PGXN/etc.


An analogy for policy here would be Perl 5 and what Perl modules it bundles. 
The Perl modules that have the most business being bundled with Perl are those 
minimal ones whose function is to go out to CPAN and install other modules.


Another analogy would be Parrot and languages implemented over it.  Originally, 
various language compilers were bundled with Parrot, and they gradually migrated 
to their own distributions, Rakudo for example.


If this general policy of deprecating contrib/ is agreed on, then at the very 
least the documentation shipped with 9.1 should mention it being deprecated and 
talk about migration strategies.  Or 9.1 could include a CPAN-like program that 
makes it easier to install PGXN extensions, if that is applicable, so there is 
an overlap period where people could get the legacy add-ons either way.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deprecating contrib for PGXN

2011-05-17 Thread Darren Duncan

Robert Haas wrote:

On Tue, May 17, 2011 at 4:45 PM, Joshua D. Drake j...@commandprompt.com wrote:

On 05/17/2011 01:31 PM, Darren Duncan wrote:

I have missed it if this was discussed before but ...

Would now be a good time to start deprecating the contrib/ directory as
a way to distribute Pg add-ons, with favor given to PGXN and the like
instead?

If PGXN moves into .Org infrastructure (which I believe is currently the
plan) then yes, contrib should go away.


What is the benefit of getting rid of it?


Maybe something could be clarified for me first.

Are the individual projects in contrib/ also distributed separately from Pg, on 
their own release schedules, so users can choose to upgrade them independently 
of upgrading Pg itself, or so their developers can have a lot of flexibility to 
make major changes without having to follow the same stability or deprecation 
timetables of Pg itself?


If the only way to get a contrib/ project is bundled with Pg, then the project 
developers and users don't get the flexibility that they otherwise would have.


That's the main answer, I think.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-16 Thread Darren Duncan

Jaime Casanova wrote:

On Sun, May 15, 2011 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote:

we should probably try to agree on which
of the various options you mention makes most sense.


well... my original patch only handle the simplest case, namely, try
to make the cast that the user wants and if none is defined fall to
the base types...

anything else will complicate things as you shown... actually, things
looks very simple until we start creating trees of domains...
what options look sane to you?


The sanest option I see is don't overload the CAST syntax for subtypes.  Just 
call the foo2bar() function instead.  You still get code with the same level of 
terseness and that is just as easy to read and understand, and there is no 
question of semantics.  Also, that solution works right now. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Darren Duncan

Jaime Casanova wrote:

On Sat, May 14, 2011 at 8:42 PM, Darren Duncan dar...@darrenduncan.net wrote:

First of all, what if cast(timestamp as int) was already defined?  Which
cast then would you expect to be invoked here?

 '1800-01-01 00:00:00'::int


i will expect an error in that case... what you're doing there is
casting an unknown to integer, for that to be valid you need an
intermediate cast to timestamp or in my case to datetime


Sorry, my bad; I meant to say (might be slightly misspelled):

  ('1800-01-01 00:00:00'::timestamp)::int

Now, since all values of a DOMAIN are also values of the base type the DOMAIN is 
defined as being a subset of, then the sub-expression within the parenthesis 
denotes a value that is both a timestamp and a datetime at the same time.


So, if a generic CAST(timestamp as int) is already defined, and you define a 
CAST(datetime as int), then what should the above code (correct for 
misspelling) do, or should it fail?


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-14 Thread Darren Duncan

Jaime Casanova wrote:

If i create a DOMAIN an then want to create a CAST from that domain to
another type it gives an error.
Consider this example:

create domain datetime as timestamp with time zone
   check (value between '1753-01-01 00:00:00' and '-12-31 23:59:59');

create function datetime2int(datetime) returns int
  language sql stable strict as $$
select $1::date - '1753-01-01'::date;
$$;

create cast(datetime as int) with function datetime2int(datetime);


if i try to cast, get this error:
select now()::datetime::int;
ERROR:  cannot cast type datetime to integer

The problem is that in find_coercion_pathway() the very first thing we
do is to get the base type of both: the source and target types. So,
the way to make it work is to create the function and the cast on the
base types.
But what if i create 2 domains on the same base types and want a
different behaviour on a cast to the same target type?


I think that overloading the same cast syntax to get different behavior for 
different domains over the same base type is a bad idea.


First of all, what if cast(timestamp as int) was already defined?  Which cast 
then would you expect to be invoked here?


  '1800-01-01 00:00:00'::int

... the one for timestamp or the one for datetime?

Second of all, what if you had 2 domains defined over timestamp and they 
overlapped and they both defined a cast as you did, with generic syntax?  And 
you were casting a value in both domains as an int?


I think it would be best that the generic cast syntax only be useable for casts 
defined on the base type, and if you want a domain-specific one you should use 
the function syntax such as your datetime2int().


That way it is easier for users to predict what behavior will occur, and 
implementation will be easier too.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-14 Thread Darren Duncan

Darren Duncan wrote:
I think it would be best that the generic cast syntax only be useable 
for casts defined on the base type, and if you want a domain-specific 
one you should use the function syntax such as your datetime2int().


That way it is easier for users to predict what behavior will occur, and 
implementation will be easier too.


Replying to myself, I offer another alternative:

What you ask for is indeed supported, but that if for a given input value more 
than one cast applies to it, particularly for 2 overlapping domains, then which 
cast is invoked is undefined, so for example the DBMS may just use the first one 
it finds.


It is then up to the user to ensure that when they define casts over domains 
that they just define ones that either produce the same outputs for the same 
overlapping inputs (the best answer) or they ensure that they don't overlap in 
their input domains.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan

Robert Haas wrote:

On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:

Examples of open union types could be number, which all the numeric types
compose, and so you can know say that you can use the generic numeric
operators on values you have simply if their types compose the number union
type, and it still works if more numeric types appear later.  Likewise, the
string open union could include both text and blob, as both support
catenation and substring matches or extraction, for example.

This would aid to operator overloading in a generic way, letting you use the
same syntax for different types, but allowing types to mix is optional; eg,
you could support add(int,int) and add(real,real) without supporting
add(int,real) etc but the syntax add(x,y) is shared, and you do this
while still having a strong type system; allowing the mixing is optional
case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.


I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
numeric.


In the case of the open union numeric, one point is that users or extensions 
could come up with new types that add themselves to the union, for example say a 
type for complex numbers (I didn't see a built-in such).


But I'm just citing numeric as an example; there would be a lot more in 
practice, potentially one for every individual type, so for example if operators 
were defined for the open union rather than for the base type, then 
users/extensions could define their own types and easily declare you can use it 
like this type but its different in some important way, which may just be an 
implementation difference.  Operations that don't care about the differences can 
just be written against the open union type where they just work and those that 
do care can be more specific.


Joseph Adams said:

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a


While closed unions would often be used for your first examlpe, I see they're 
still useful for type classes as well.  Especially in the face of open unions 
being available, the closed unions let users say, no, I don't really want the 
meaning of this union type to change just because someone else declares a new 
type in scope (that adds itself to the union).  For example, one could declare a 
system_numeric closed union type that only includes Pg built-in numerics and 
users of that can be confident that nothing about it will change later unless 
the definition of system_numeric itself or the types it unions are changed.  But 
open unions would be preferred in places they wouldn't cause trouble, where you 
want to allow easier user extensibility.



I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).


I think that Haskell and other functional languages have a lot to teach 
relational DBMSs and I see them as being highly compatible.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan

To follow-up my earlier comments ...

I suspect for practical purposes we may want to limit the scope of some type 
features.


For example, the greatest benefits for open union / mixin types is with 
routines/operators, not so much with tables.


So, Pg could choose to support open unions but only for routines, where the 
declared types of table columns must still be other types.  Then you don't have 
to worry about searching your data for where one might be used, but only search 
your system catalog for routines/operators that use it.


But closed unions should be supported in tables too.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Darren Duncan

Alvaro Herrera wrote:

Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:

Darren Duncan wrote:
To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan

Should this be a TODO item?


The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.


From the perspective of users, the single greatest distinction between these 2 
kinds of unions is being closed versus being open, and that is the primary 
reason to choose one over the other.


A closed union is the C-style, where the union type declares what other types or 
values it ranges over.  The closed union is best when the union definer can 
reasonably assume that the union won't either ever or would rarely be changed, 
and in particular can assume that application or database code would have 
knowledge of the parts that it deals specially with, so it can be assumed that 
if the closed union type ever is changed then any code designed to use it may be 
changed at the same time.


A good example for a closed union would be a boolean type which just ranges over 
the two singletons false and true or an order type which ranges just over the 
three singletons decrease, same, increase.  Or a type which enumerates the 7 
days of the week, as this is unlikely to change in the life of a system.


An open union is the mixin style, where the component types declare they are 
part of the union.  The open union is best when it is likely that there would be 
either user-defined or extension-defined new types for the union to come along 
later, and we want to have code that can be generic or polymorphic for any types 
that can be used in particular ways.


Examples of open union types could be number, which all the numeric types 
compose, and so you can know say that you can use the generic numeric operators 
on values you have simply if their types compose the number union type, and it 
still works if more numeric types appear later.  Likewise, the string open union 
could include both text and blob, as both support catenation and substring 
matches or extraction, for example.


This would aid to operator overloading in a generic way, letting you use the 
same syntax for different types, but allowing types to mix is optional; eg, you 
could support add(int,int) and add(real,real) without supporting 
add(int,real) etc but the syntax add(x,y) is shared, and you do this while 
still having a strong type system; allowing the mixing is optional case-by-case.


Supporting the open union is closer to supporting ANYTYPE while the closed union 
isn't so much.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Darren Duncan

Thanks for picking up on my mentioning union types; I wasn't sure if anyone did.

Merlin Moncure wrote:

On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote:


On 05/06/2011 04:08 PM, Alvaro Herrera wrote:

Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:


I see VARIANT/ANYTYPE as the most general case of supporting union types,
which,
say, could have more specific examples of allow any number or date here
but
nothing else.  If VARIANT is supported, unions in general ought to be
also.

Okay, so aside from the performance (storage reduction) gained, there's
this argument for having variant/union types.  It seems to me that this
is indeed possible to build.  Completely general VARIANT, though, is
rather complex.  A declared union, where you specify exactly which types
can be part of the union, can be catalogued, so that the system knows
exactly where to look when a type needs to be modified.  A general
VARIANT however looks complex to me to solve.

The problem is this: if an user attempts to drop a type, and this type
is used in a variant somewhere, we would lose the stored data.  So the
drop needs to be aborted.  Similarly, if we alter a type (easy example:
a composite type) used in a variant, we need to cascade to modify all
rows using that composite.

If the unions that use a certain type are catalogued, we at least know
what tables to scan to cascade.

In a general variant, the system catalogs do not have the information of
what type each variant masquerades as.  We would need to examine the
variant's masqueraded types on each insert; if the current type is not
found, add it.  This seems a bit expensive.

So how is a declared union going to look and operate? Something like this?

   CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
   CREATE TABLE bar (myunion foo);
   INSERT INTO bar (myunion) VALUES (ival=1), (tval='some text');
   UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;


Something like that could actually be quite nice for a number of purposes.


using your hypothetical example, could you cast types to the union?

select 1::int::foo;

record types would presumably work as well?  you could do a lot of
*really* neat stuff that way...


Like with other respondents to this topic, I consider it much more useful and 
important, as well as practical, to support explicitly defined type unions than 
a wide-open ANYTYPE.


As for how it would operate:  The example above implies a union type 
implementation that is like C's concept of such, where you have to explicitly 
state how you want the value bit pattern to be interpreted, by naming ival/etc, 
rather than the DBMS just knowing that a particular value is of a specific type, 
because per-value stored metadata says so (like with SQLite).


While that might be best in practice for implementation, I had envisioned 
something more like set unions, so you could instead do it like this:


CREATE TYPE foo AS UNION (int, text, timestamptz):
CREATE TABLE bar (myunion foo);
INSERT INTO bar (myunion) VALUES (1), ('some text');
UPDATE bar SET myunion = CURRENT_TIMESTAMP;

Unlike a record type, where multiple attributes may have the same time, 
presumably with a union, they would all be distinct, and so you could use the 
type name itself to refer to each option; you don't have to make up ival or 
whatever ... unless there are situations where types don't have names.


When doing operations that are type-generic, such as equality test or 
assignment, especially assignment, between 2 things that are both declared to be 
of type foo, you could just do it with no verbosity same as if you were doing 2 
int or text or whatever.


When doing operations specific to int or text or whatever, or if you are 
assigning a foo-declared thing to an int/text-declared thing, then you'd want an 
explicit cast or assertion, such as select myunion::int + 3 as answer from bar 
where is_a(myunion,'int').


Whether you want an explicit cast to go the other way, I would assume you don't 
need to, like when you have DOMAINs; eg, I would expect the 4th line above to 
just work, because the system knows the type of CURRENT_TIMESTAMP and it knows 
that this is a member of the union type of myunion.  I see a UNION type as being 
like a DOMAIN type in reverse.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Darren Duncan
To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Darren Duncan

Alvaro Herrera wrote:

A customer came to us with this request: a way to store any data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.


I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
say, could have more specific examples of allow any number or date here but 
nothing else.  If VARIANT is supported, unions in general ought to be also.


The most effective way of supporting VARIANT or union types in general is having 
an implementation where in the general case each value in the database knows its 
own data type rather than the data type being provided by a context such as what 
table column it is in.  For example, if rather than storing a data value 
directly we store a 2-attribute struct naming a data type and pointing to or 
holding the data value.


See how SQLite works as an example of how VARIANTs or unions could work, 
although that on its own would need to be made more comprehensive for Pg.


I claim ignorance as to how Pg currently implements these matters.

Where VARIANT/union types are supported by default, declaring more specific 
types is just a type constraint and an optimization.


Of course, when we know the type of a column/etc isn't going to be VARIANT or 
some other union type, then a simple optimization allows us to just store the 
value and have its type provided by context rather than the struct.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stored procedures - use cases?

2011-04-25 Thread Darren Duncan

Peter Eisentraut wrote:

Another point, as there appear to be diverging camps about
supertransactional stored procedures vs. autonomous transactions, what
would be the actual use cases of any of these features?  Let's collect
some, so we can think of ways to make them work.


An analogy I like to use for a very capable DBMS is that of an operating system, 
and each autonomous transaction is like a distinct process/thread in this 
system.  The DBMS is like a virtual machine in which processes/autonomous 
transactions run.


Like with an operating system, a process/auto-transaction can be started by 
another one, or by the OS/DBMS (or a root process/auto), and once running all 
processes are mutually independent to a large extent, in that each has its own 
separatable privileges or state or view of the database, the database being an 
analogy to the file system.


A process/auto-transaction can be started by a DBMS client, analogous to a user, 
but it doesn't have to be.  The message passing feature that Pg has, 
listen/notify, is like inter-process communication between these processes/autos.


A stored procedure always runs within the context of one process/auto, and a 
regular transaction or savepoint or whatever is specific to a process/auto.


Has anyone else thought of the DBMS as operating system analogy?  I don't recall 
specifically reading this anywhere, but expect the thought may be common.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stored procedures

2011-04-21 Thread Darren Duncan

Peter Eisentraut wrote:

So the topic of real stored procedures came up again.  Meaning a
function-like object that executes outside of a regular transaction,
with the ability to start and stop SQL transactions itself.

I would like to collect some specs on this feature.  So does anyone have
links to documentation of existing implementations, or their own spec
writeup?  A lot of people appear to have a very clear idea of this
concept in their own head, so let's start collecting those.


I've thought a lot about this too.

The general case of a stored procedure should be all powerful, and be able to 
directly invoke any code written in SQL or other languages that a DBMS client 
can directly invoke on the DBMS, as if it were a client, but that the procedure 
is stored and executed entirely in the DBMS.  But the stored procedure also has 
its own lexical variables and supports conditionals and iteration and recursion.


A stored procedure is invoked as a statement and doesn't have a return value; 
in contrast, a function has a return value and is invoked within a value 
expression of a statement.  A stored procedure can see and update the database, 
and can have IN/INOUT/OUT parameters.  A stored procedure can have side-effects 
out of band, such as user I/O, if Pg supports that.


The general stored procedure should be orthogonal to other concerns, in 
particular to transactions and savepoints; executing one should not should not 
implicitly start or commit or rollback a transaction or savepoint.  However, it 
should be possible to explicitly declare that procedure is a transaction, so 
that starts and ends are neatly paired regardless of how the procedure exits, 
that is a transaction lifetime is attached to its lexical scope, but this would 
be optional.


A stored procedure should be able to do data manipulation, data definition, 
explicit transaction control (except perhaps when defined to be a transaction), 
privilege control, message passing, and so on.


As for semantics, lets say that when a stored procedure is invoked, its 
definition will be pulled from the system catalog in a snapshot and be compiled, 
then run normally no matter what it does, even if the definition of the 
procedure itself is changed during its execution; in the latter case, it just 
means that once the execution finishes, subsequent calls to it would then call 
the updated version or fail.  So just compiling the procedure may need a catalog 
lock or whatever, but when it starts executing a transaction isn't required.


Any stored procedure in general should be able to invoke stored procedures, to 
any level of nesting, just like in any normal programming language.  There might 
be restrictions on what individual procedures can do depending on how they're 
declared; for example, if one is declared to have a scope-bound transaction, 
then it or ones it invokes can't have explicit transaction control statements. 
But such restrictions are an orthogonal or case-dependent matter.


(When we have a distinct stored procedure, I also believe that a stored function 
should be more restricted, such as only having IN parameters and not being able 
to see the database but by way of parameters, and that it should be 
deterministic.  But that ship has sailed and I'm not going to argue for any 
changes to functions.)


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name

2011-04-07 Thread Darren Duncan

Robert Haas wrote:

I am halfway tempted to say that we need to invent our own procedural
language that is designed not for compatibility with the SQL standard
or Oracle, but for non-crappiness.


I'm way ahead of you on that one. -- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name

2011-04-05 Thread Darren Duncan

Merlin Moncure wrote:

On Fri, Mar 25, 2011 at 10:05 PM, David E. Wheeler da...@kineticode.com wrote:

On Mar 25, 2011, at 9:12 PM, Robert Haas wrote:


As I've said before, I believe that the root cause of this problem is
that using the same syntax for variables and column names is a bad
idea in the first place.  If we used $foo or ?foo or ${foo} or $.foo
or foo!!$#? to mean the parameter called foo, then this would all
be a non-issue.

Yes *please*. Man that would make maintenance of such functions easier.


+1 on using $foo.  Even with the standardization risk I think it's the
best choice. Prefer $foo to ${foo} though.


The foo syntax should be orthogonal to everything else and not have anything 
specifically to do with parameters.  Rather, foo anywhere is just a delimited 
case-sensitive identifier and can be used anywhere that foo can where the latter 
is a case-insensitive identifier.


As for the SQL standard for bind parameters, as I recall they use :foo and so 
:foo would be the sensitive more general case of that.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1 - Release Theme

2011-04-01 Thread Darren Duncan
I was under the impression that QUEL was actually a good language in some ways, 
and that it was more relational and better than SQL in some ways.


  http://en.wikipedia.org/wiki/QUEL_query_languages

Maybe bringing it back would be a good idea, but as an alternative to SQL rather 
than a replacement.


In any event, QUEL was somewhat similar to SQL.

-- Darren Duncan

Rajasekhar Yakkali wrote:

Following a great deal of discussion, I'm pleased to announce that the
PostgreSQL Core team has decided that the major theme for the 9.1
release, due in 2011, will be 'NoSQL'.

... the intention is to remove SQL support from
Postgres, and replace it with a language called 'QUEL'. This will
provide us with the flexibility we need to implement the features of
modern NoSQL databases. With no SQL support there will obviously be
some differences in the query syntax that must be used to access your
data. 

hmm..  shock it is this shift for 9.1 due in mid 2011 is unexpectedly
soon :)

Curious to understand as to

- how this relates to every feature that is provide at the moment based on
RDBMS paradigm.

ACID compliance, support for the features provided by SQL,  referential
integrity, joins, caching etc, ..

-  Also does this shift take into an assumption that all the use cases fit
the likes of data access patterns  usecases similar to facebook/twitter?
or to address the the likes of those ?

Thanks,
Raj




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)

2011-03-26 Thread Darren Duncan

Pavel Stehule wrote:

2011/3/26 Darren Duncan dar...@darrenduncan.net:

I mention 2 possible solutions here, both which involve syntax alterations,
each between the -- lines.  I personally like the second/lower
option more.


-1

this is not based on any pattern on SQL. It's not simple, and it
introduce a reserved keywords


Okay, here's a much simpler proposal with the most important bit of the old one.

1.  In all situations where there is ambiguity such that an identifier reference 
(not declaration) may be referring to either a lexical variable/parameter of the 
current routine, or to the name of the table column of the contextually current 
table of the current SQL statement, the ambiguity is always resolved in favor of 
the lexical var/param.  If I am not mistaken, that is what PL/PgSQL already does 
since 9.0.


2.  If an identifier reference has a leading . then that will force it to be 
interpreted as a column instead (and the code will fail if there is no such 
column), and so .colname is a shorthand for tablename.colname; but like with 
the old colname it only works when just 1 of the source tables has colname 
else it is still ambiguous like before.


Example:

select (.mycol + myvar * myparam) as mynewcol from mytbl;

This solution is a very terse and understandable change.

There are no reserved keywords.  Legacy user code has no change where there were 
no conflicts before.  Legacy user code has no change in the case of conflict if 
it was previously resolved to favor the lexical var/param.


Legacy user code only gains a leading . in the few places where conflict was 
resolved in favor of a column name before where a same-named lexical/param existed.


So what's not to like about this?

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)

2011-03-26 Thread Darren Duncan

Pavel Stehule wrote:

sorry - I dislike this. The design is correct, but it is against to
SQL verbosity. A reader must to thinking about missing tablenames. I
dont't think so it is good solution, because it doesn't solve a
backing compatibility problem - somebody must to fix a function still,
and I think so it is much preferable to fix like:

  select (mytbl.mycol + myvar * myparam) as mynewcol from mytbl;

your proposal saves a five chars, but it has a negative impacts on
readability - there should be more tables.

There are no reason to introduce a new concepts - SQL knows a aliases.


Well, going forward, I know I would much rather have to say mytbl.mycol than 
have to say myfunc.myparam.  And I certainly would want to expect that when 
one says ... as foo that this foo is treated as a declaration unambiguously 
and is never substituted for some parameter or there be other grief as I seem to 
recall having in 8.4. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactional DDL, but not Serializable

2011-03-25 Thread Darren Duncan

Tom Lane wrote:

Stephen Frost sfr...@snowman.net writes:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Well, basically, you can't have that.  Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from.  Being
serializable does not excuse you from the obligation to check for
FK violations in that invisible table.  It might be acceptable to
fail entirely, but not to act as though the table isn't there.



That's an excellent example and point.  Is there a 'right' answer (with
regard to the SQL spec, what other databases do, etc)?


I'm not aware that anybody's got an amazingly satisfactory solution.
PG's answer is of course to use up-to-the-minute DDL regardless of what
the transaction might see for other purposes, which certainly has got
disadvantages if you're hoping for truly serializable behavior.  But I'm
not sure there's a better answer.  You could make an argument for
failing any serializable transaction that's affected by DDL changes that
happen after it started.  I don't know whether that cure would be worse
than the disease.


If transaction A commits successfully before transaction B commits, regardless 
of when transaction B started, and transaction A changes/adds/etc any 
constraints on the database, then I would expect transaction B to only commit 
successfully if all of its data changes pass those new/changed constraints.


If B were allowed to commit without that being the case, then it would leave the 
database in an inconsistent state, that is a state where its data doesn't 
conform to its constraints.  A database should always be consistent on 
transaction boundaries, at the very least, if not on statement boundaries.


As to whether B's failure happens when it tries to commit or happens earlier, 
based on visibility issues with A's changes, doesn't matter to me so much (do 
what works best for you/others), but it should fail at some point if it would 
otherwise cause inconsistencies.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)

2011-03-25 Thread Darren Duncan

Robert Haas wrote:

On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote:

Tom,


Personally I'd vote for *not* having any such dangerous semantics as
that. We should have learned better by now from plpgsql experience.
I think the best idea is to throw error for ambiguous references,
period. 

As a likely heavy user of this feature, I agree with Tom here.  I really don't 
want the column being silently preferred in SQL functions, when PL/pgSQL 
functions are throwing an error.  I'd end up spending hours debugging this.

Also, I don't understand why this would be a dump/reload issue if $1 and $2 
continue to work.


Because an identifier that previously referred unambiguously to a column might 
now be ambiguous, if there is a parameter with the same name.


I mention 2 possible solutions here, both which involve syntax alterations, each 
between the -- lines.  I personally like the second/lower option more.




Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so 
that one could always explicitly qualify what they are referring to?


For example, you could have the 3 sch, lex, attr (I may have missed some 
useful ones).


The sch TLN would unambiguously refer directly to a schema object, such as a 
database table.


The lex TLN would unambiguously refer directly to a lexical, either a 
parameter of the current routine or to a lexical variable.


The attr TLN would unambiguously refer to a table/etc column/attribute in the 
manner typical for SQL.


Use them like:

  sch.foo - the table/etc foo
  lex.foo - the lexical variable foo
  attr.foo - the column foo

Use of these TLN are optional where there is no ambiguity.

The TLN are not reserved words, but if one has an entity named the same, then 
references to it must be TLN-qualified; eg:


  lex.sch
  lex.lex
  lex.attr

Now these are just examples.  You may find a different set works better.

--

There are also alternate solutions.

For example, it could be mandated that lexical-scope aliases for any 
data/var-like schema object are required in routines, where the aliases are 
distinct from all lexical vars/params/etc, and then all SQL/code in the routines 
may only refer to the schema objects by the aliases.


Effectively this makes it so that routines can no longer see non-lexical vars 
but for those from parameters, and this aliasing is defining a parameter whose 
argument is supplied by the DBMS automatically rather than as an explicit 
routine caller argument.


That way, inside a routine body there are only lexical names for things, and so 
no namespace-qualification is ever needed by the regular SQL.


Similarly, if you always think of table column names as referring to an 
attribute or element of a table variable, then just reference the column 
qualified by the table name (or the lexical alias thereof).  Same as you do in 
any other programming language.  Of course, sometimes you don't have to qualify 
column name references as context could make it unambiguous.  Or, a shorthand 
like a simple leading . could unambiguously say you're referring to a column 
of the particular table in context.


With those in place, all unqualified references are straight to lexical 
variables or parameters.


And so, this is also an effective way to resolve the ambiguity and I prefer the 
latter design personally.


Here's an example in quasi-PL/PgSQL:

  create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as
  declare
  myvar integer := 5;
  $body$
  begin
  select (.mycol + myvar * myparam) as mynewcol from mytbl;
  end;
  $body$

Note that I've already thought through this last example as these methods of 
avoiding ambiguity are loosely-speaking how my language Muldis D avoids the 
problem faced by many SQL procedures.


The .mycol syntax specifically was inspired originally for me by Perl 6 where 
the lack of something just before the . means that the implicit topic variable 
is referred to, like if you said $_.mycol.


A Perl 6 analogy being something like:

  $mytbl.map:{ .mycol + $myvar * $myparam }

aka:

  $mytbl.map:{ $_.mycol + $myvar * $myparam }

--

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-10 Thread Darren Duncan

Pavel Stehule wrote:

Hello

I wrote a few functions for record type - record_expand,
record_get_fields, record_get_field, record_set_fields.

A usage of this functions is described in my blog
http://okbob.blogspot.com/2010/12/iteration-over-record-in-plpgsql.html

Do you think, so these functions can be in core? These are relative
strong and enough general with zero dependency. Sure, these functions
are not defined in ANSI SQL.

Regards

Pavel Stehule


That looks good in principle.  I see it as being valuable and important that 
users can define generic relational operators/functions, meaning ones that can 
work with any relations like built-ins can, and the ability to iterate over 
record fields, or at least introspect a relation to see what fields it has, is a 
good foundation to support this. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring the Type System

2010-11-14 Thread Darren Duncan

Daniel Farina wrote:

On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan dar...@darrenduncan.net wrote:

You don't have to kludge things by implementing arrays as blobs for example;
you can implement them as relations instead.  Geospatial types can just be
tuples. Arrays of structured types can just be relations with an attribute
per type attribute.  Arrays of simple types can just be unary relations.


In practice, my guess is the performance of these approaches would be
terrible for a number of workloads. I don't agree that arrays having
their own storage implementation is a kludge: there are even operators
like unnest that can be used to turn them back into relations.


I'm not discounting this at all.  The data model can formally define types in 
one way and a DBMS can implement various well known cases in specialized ways 
that are more efficient.  Arrays are a good example.  But with other cases they 
still get a default implementation.  If arrays are flexible enough, then 
different arrays could be implemented differently, eg some as blobs and some as 
relations; the former could be better for small or simple arrays and the latter 
for large or complex arrays.



I have long thought (but never really gave voice to) there being value
having first-class relation values, but I would also say that's
another kettle of fish. I also want closures, and don't think that's
completely nuts.


I think that first-class functions are important at least, if not full-blown 
closures.  You can define generic relational restrictions or summaries or 
whatever with such; eg, the general case of a WHERE is a function that takes a 
relation and a function as input, and results in a relation; more restricted 
cases of WHERE can be defined with simpler functions that take 2 relation inputs 
instead.



You may want to learn more about this, first. Postgres's type system,
while relatively simple, is not as ill-conceived or primitive as you
seem to assume it is. There are also a lot of important system
details, like expanding/replacing the typmod facility that only allows
Postgres 32 bits of type-augmenting data (such as the (1) in the type
numeric(1)).


I'm not saying that Pg's system is primitive et al.  The thread starter said it 
needed an overhaul, so indicating there are deficiencies, and I'm suggesting 
some effective ways to do that.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring the Type System

2010-11-13 Thread Darren Duncan

David Fetter wrote:

For the past couple of years, I've been hearing from the PostGIS
people among others that our type system just isn't flexible enough
for their needs.  It's really starting to show its age, or possibly
design compromises that seemed reasonable a decade or more ago, but
are less so now.

To that end, I've put up a page on the wiki that includes a list of
issues to be addressed.  It's intended to be changed, possibly
completely.

http://wiki.postgresql.org/wiki/Refactor_Type_System

What might the next version of the type system look like?


Are you talking about changes to the type system as users see it or just changes 
to how the existing behavior is implemented internally?  If you're talking 
about, as users see it, which the other replies to this thread seem to be 
saying, though not necessarily the url you pointed to which looks more internals ...


As a statement which may surprise no one who's heard me talk about it before ...

I've mostly completed a type system specification that would be useable by 
Postgres, as the most fundamental part of my Muldis D language.


The type system is arguably the most central piece of any DBMS, around which 
everything else is defined and built.


You have data, which is structured in some way, and has operators for it.

If you look at a DBMS from the perspective of being a programming language 
implementation, you find that a database is just a variable that holds a value 
of a structured type.  In the case of a relational database, said database is a 
tuple whose attribute values are relations; or in the case of 
namespaces/schemas, the database tuple has tuple attributes having relation 
attributes.


If a database is a variable, then all database constraints are type constraints 
on the declared type of that variable, and you can make said constraints 
arbitrarily complicated.


From basic structures like nestable tuples and relations, plus a complement of 
basic types like numbers and strings, and arbitrary constraints, you can define 
data types of any shape or form.


A key component of a good type system is that users can define data types, and 
moreover where possible, system-defined types are defined in the same ways as 
users define types.  For example, stuff like temporal types or geospatial types 
are prime candidates for being defined like user-defined types.


If you define all structures using tuples and relations, you can easily flatten 
this out on the implementation end and basically do everything as associated 
flat relation variables as you do now.


So what I propose is both very flexible and easy to implement, scale, and 
optimize, relatively speaking.


You don't have to kludge things by implementing arrays as blobs for example; you 
can implement them as relations instead.  Geospatial types can just be tuples. 
Arrays of structured types can just be relations with an attribute per type 
attribute.  Arrays of simple types can just be unary relations.


You can also emulate all of the existing Pg features and syntax that you have 
now over the type system I've defined, maintaining compatibility too.


I also want to emphasize that, while I drew inspiration from many sources when 
defining Muldis D, and there was/is a lot I still didn't/don't know about 
Postgres, I have found that as I use and learn Postgres, I'm finding frequently 
that how Postgres does things is similar and compatible to how I independently 
came up with Muldis D's design; I'm finding more similarities all the time.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-11 Thread Darren Duncan
I think that it would be best to implement MULTISET in the same way that a TABLE 
is implemented.  Logically and structurally they are the same thing, but that a 
MULTISET typically is used as a field value of a table row.  Aka, a table and a 
multiset are just different names for a relation, loosely speaking.


The association of a multiset-typed attribute of a table with said table is like 
the association of a child and parent table in a many-to-one.


So reuse your structure for tables to hold multisets.

-- Darren Duncan

Itagaki Takahiro wrote:

Postgres supports ARRAY data types well, but there are some
more array functions in the SQL standard. Also, the standard
has MULTISET data type, that is an unordered array.

It looks easy to support additional array functions. There
might be some confusion to treat multi-dimensional arrays
with them, but we could treat all arrays as one-dimensional
like as unnest().

MULTISET supports are more difficult. We have corresponding
type IDs for each array, but we might not want to add additional
IDs for multiset for each type. Any ideas for the issue?

If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.

If we have troublesome issues to support multiset data types,
I'm thinking to add multiset functions that receives ARRAY
types instead at time first time, because an ARRAY is a
MULTISET by definition. Some of functions for multisets
seems to be useful for arrays, too.

Comments and suggestions welcome.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-11 Thread Darren Duncan

Merlin Moncure wrote:

On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler da...@kineticode.com wrote:

On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:


That sounds like a composite type to me.

No, it's perpendicular in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have rows
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)

How is that different from an array of RECORDs?


I could ask the same question about a TABLE, the ordering issue aside.

This is one place that SQL made things more complicated than they needed to be. 
 Multisets have generally the same structure *and* operators (union, etc) as 
tables, but they use different syntax for each.  A better design would be to 
make tables and multisets interchangeable.  Its an unnecessary distinction.



not a whole lot outside of syntax details...there is a decent summary
here: http://waelchatila.com/2005/05/18/1116485743467.html

I like this part: Alternatively the SQL standard also permits the
same construct with the bracket trigraphs ??( and ??)  :-D


As I recall, the concept of using stuff like ?( or ?) etc was so that SQL could 
be written in EBCDIC which natively lacks some of the bracketing characters that 
ASCII has.  Hence, such is an alternative way to spell either { } or [ ] (I 
forget which).


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Delimited identifier brhavior

2010-11-11 Thread Darren Duncan

Tatsuo Ishii wrote:

test=# select * from t1_foo;
select * from t1_foo;
 i 
---

(0 rows)

It seems PostgreSQL thinks t1_foo is equivalent to t1. Is this an
expected behavior?


That code looks badly written in any event.  Delimiters should be put around 
each part of an identifier or chain as a whole, such as:


  select * from t1_foo;

Or with schema-delimited objects, for example, any of these:

  schema.table

  schema.table

  schema.table

  schema.table

Personally, I treat all of my identifiers as being case-sensitive.  Knowing that 
Pg treats non-delimited identifiers as being lowercase, I write undelimited when 
the identifier is entirely lowercase, and I delimit ones that have any 
uppercase.  And by doing this consistently everything works correctly.  Since 
most of my identifiers are lowercase anyway, the code also reads cleanly in general.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL command to edit postgresql.conf, with comments

2010-10-12 Thread Darren Duncan

Dimitri Fontaine wrote:

Andrew Dunstan and...@dunslane.net writes:

On 10/12/2010 05:02 PM, Dimitri Fontaine wrote:

  So, what you do is have a file per GUC, file name is the GUC name, first
line contains *ONLY* current value, the rest of the file is comments.

You're joking, right?


No. I just want both comments and SQL commands. If you refuse this
simple file scheme, keep your postgresql.conf and don't remote edit it.

That's my proposal, I'm happy that it comes with laughter :)


Maybe I missed something important, but why is it not possible to retain the 
single existing postgres.conf file format (human writable) *and* have it 
machine/SQL-editable *and* maintain the comments?  I should think that it would 
be possible to do all of these without too much trouble.  All you would need is 
for the file parser to retain the comments as metadata, include them in the 
relations that the SQL commands see where the latter can also edit them as data, 
and then write out the updated file with comments.  The fact that Postgres 
already explicitly supports comment metadata in its system catalog means it must 
already know something about this.  If something is missing, then expand the 
catalog so it represents all the details you want to preserve. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL command to edit postgresql.conf, with comments

2010-10-12 Thread Darren Duncan

Josh Berkus wrote:

 The fact that Postgres already
explicitly supports comment metadata in its system catalog means it must
already know something about this.


We support what?


Postgres has COMMENT ON ... SQL for various database objects and I assumed 
that said comments would be stored in the system catalog.


  http://www.postgresql.org/docs/9.0/interactive/sql-comment.html


The solution to this is simple, and was previously discussed on this list:

(a) have a postgresql.conf.auto
(b) add a default include for postgresql.conf.auto at the beginning of
PostgreSQL.conf
(c) SQL updates go to postgresql.conf.auto, which consists only of
setting = value in alphabetical order.
(d) We document that settings which are changed manually in
postgresql.conf will override postgresql.conf.auto.
(e) $$profit$$!!


I agree that this looks like an effective solution.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL command to edit postgresql.conf, with comments

2010-10-12 Thread Darren Duncan

Josh Berkus wrote first:

Postgres has COMMENT ON ... SQL for various database objects and I
assumed that said comments would be stored in the system catalog.


Oh.  Now that's an interesting perspective ... you're suggesting that we
take the comments and apply them as COMMENTS on the specific pg_settings?


Yes, something along those lines.

I should point out that in general I consider a COMMENT ON to just be a less 
ambiguous form of a --  or /* */ and that it would be a good idea for a code 
or config parser to preserve comments given in the latter formats in the same 
way it preserves the former when it can reasonably infer what to associate the 
latter comments with.



That wouldn't solve the issues of ordering, or of comments in the file
not associated with a setting, but might be a good 80% solution.


Yes, and for something like this an 80% solution is good.

As for ordering, that can be handled with more metadata, like a visual order 
number.


Tom Lane wrote second:

Well, if the settings were stored in a catalog ... which they are not
... that might be a useful idea.


The catalog is where they *should* be stored, at least from the user's 
perspective.


The reason that COMMENT ON isn't a terribly helpful analogy is that it
puts the burden on the user to associate specific comment texts with
specific database objects.  The problem with parsing and modifying
freestyle comments as seen in postgresql.conf is exactly (or mostly)
that there's no reasonable way of mechanically determining which
comments are associated with which setting.  We could invent some
arbitrary rule or other, but it'd have little to do with the way
people perceive what's in such a file.


There's no reason that you can't have both kinds of comments.  You could have 
comments that are specially formatted such that the parser will then recognize 
they are associated with something specific and so put them in the system 
catalog.  And then you can have other comments not formatted that way and the 
parser will then pass over them like whitespace.



I agree with Josh's proposal: keep mechanically-generated settings in a
separate file, and don't even pretend to allow comments to be kept there.


I agree with the separate files approach as being the most practical in at least 
the short term.  However, I think that this file could support comments too, but 
they would just be limited to end-of-line comments on the same line as the 
setting, and it would be only these comments appearing in the system catalog by 
default.  Comments in the file for user editing would only appear in the catalog 
if specially formatted, which for now could just mean taking end-of-line comments.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL command to edit postgresql.conf, with comments

2010-10-12 Thread Darren Duncan

Richard Broersma wrote:

On Tue, Oct 12, 2010 at 3:54 PM, Josh Berkus j...@agliodbs.com wrote:

Oh.  Now that's an interesting perspective ... you're suggesting that we
take the comments and apply them as COMMENTS on the specific pg_settings?


On a side note regarding comments, I'd like to make a request for a
more comprehensive commenting mechanism.  The first though that comes
to my mind would allow for comments to be stored and annotated using
XML or sgml.  It'd be nice to be able to generate user documentation
from selected comments taken from application derived database
objects.

I don't know, maybe this is already possible.


When you start going there, you have new issues to consider.  (For the record I 
also prefer plain text for comments.)


I should point out that the group making the Perl 6 language has already been 
looking into such matters extensively of essentially unifying code comments and 
code documentation into a common metadata both accessible in the source code and 
programmatically at runtime.


I think this approach goes beyond comments as we normally know them, which I 
just think of plain text strings associated with some code element.


But if you want to pursue bringing documentation into this, I suggest looking at 
what Perl 6, and other languages, have done.


While some of the results of the Perl 6 discussion may have just been in the 
forums, these urls at least are related to it:


 - http://perlcabal.org/syn/S02.html#Whitespace_and_Comments
 - http://github.com/perl6/specs/raw/master/S26-documentation.pod for

I'm not proposing adopting their syntax, but some features or design may be 
useful to learn from.


I also want to point out that the FoxPro language constitutes some prior art 
about including comments as data fields in their runtime-accessible objects.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-09-30 Thread Darren Duncan

Greg Smith wrote:
You didn't quote the next part of that, which says fsync() is not 
sufficient to guarantee that your data is on stable
storage and on MacOS X we provide a fcntl(), called F_FULLFSYNC, to ask 
the drive to flush all buffered data to stable storage.  That's exactly 
what turning on fsync_writethrough does in PostgreSQL.  See 
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00390.php as the 
first post on this topic that ultimately led to that behavior being 
implemented.


 From the perspective of the database, whether or not the behavior is 
standards compliant isn't the issue.  Whether pages make it to physical 
disk or not when fsync is called, or when O_DSYNC writes are done on 
platforms that support them, is the important part.  If you the OS 
doesn't do that, it is doing nothing useful from the perspective of the 
database's expectations.  And that's not true on Darwin unless you 
specify F_FULLFSYNC, which doesn't happen by default in PostgreSQL.  It 
only does that when you switch wal_sync_method=fsync_writethrough


Greg Smith also wrote:

The main downside to switching the default on either OS X or Windows is

developers using those platforms for test deployments will suffer greatly from a
performance drop for data they don't really care about. As those two in
particular are much more likely to be client development platforms, too, that's
a scary thing to consider.

I think that, bottom line, Postgres should be defaulting to whatever the safest 
and most reliable behavior is, per each platform, because data integrity is the 
most important thing, ensuring that a returning commit has actually written data 
to disk.  If performance is worse, then so what?  Code that does nothing has the 
best performance of all, and is also generally useless.


Whenever there is a tradeoff to be made, reliability for speed, then users 
should have to explicitly choose the less reliable option, which would 
demonstrate they know what they're doing.  Let the testers explicitly choose a 
faster and less reliable option for the data they don't care about, and 
otherwise by default users who don't better should get the safest option, for 
data they likely care about.  That is a DBMS priority.


This matter reminds me of a discussion on the SQLite list years ago about 
whether pragma synchronous=normal or synchronous=full should be the default, and 
thankfully 'full' won.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: plpgsql - for in array statement

2010-09-28 Thread Darren Duncan

Alvaro Herrera wrote:

What about

FOR EACH var IN array_expr LOOP ...

I think this requires reserving EACH, which could cause a regression for
working code.  Maybe there's a way to make it work?


Code that quotes all of its identifiers, such as with:

  FOR EACH var IN array_expr LOOP ...

... would also gain a significant amount of future-proofing since then the 
language can add keywords at will, without there being conflicts with 
user-defined identifiers.


Similarly, quoting identifiers also carries present-day advantages as then one 
can name identifiers whatever is most suitably descriptive for them without 
worrying whether the language has a pre-defined meaning for the used words.


The quoting also has the nice bonus of making them case-sensitive.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: plpgsql - for in array statement

2010-09-28 Thread Darren Duncan

Alvaro Herrera wrote:

What about

FOR EACH var IN array_expr LOOP ...

I think this requires reserving EACH, which could cause a regression for
working code.  Maybe there's a way to make it work?


What about saying FOR-EACH instead?

A good general solution that I'd expect to not cause regressions is to separate 
multiple-word keywords with dashes rather than spaces.


Since unquoted identifiers don't have dashes, I think, and moreover because the 
whole FOR-EACH would occupy the first position of the statement rather than the 
first two, there should be no ambiguity.


Parsing should be easier, too, because keywords formatted like this would just 
be a single term rather than having infinite variations due to embedded whitespace.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: plpgsql - for in array statement

2010-09-28 Thread Darren Duncan

Andrew Dunstan wrote:

On 09/28/2010 09:31 PM, Darren Duncan wrote:


Code that quotes all of its identifiers, such as with:

  FOR EACH var IN array_expr LOOP ...


This doesn't help in the least if the array is an expression rather than 
simply a variable - we're not going to start quoting expressions.


I wrote that wrong.  I should have said array_var not array_expr.  I am 
certainly not advocating quoting expressions, and didn't mean to imply that 
here.  My point was that if a token is always interpreted as a keyword rather 
than an identifier when there is ambiguity, then quoting would let users name an 
identifier each or EACH.  In any event, I will not push this since it 
doesn't address the real issue of language changes not breaking the general case 
of legacy code; it only says how users can insulate themselves. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What happened to the is_type family of functions proposal?

2010-09-25 Thread Darren Duncan

Craig Ringer wrote:

On 25/09/2010 11:51 AM, Darren Duncan wrote:

There should just be a single syntax that works for all types, in the
general case, for testing whether a value is a member of that type, or
alternately whether a value can be cast to a particular type.

snip

Pg already gets it right in this regard by having a single general
syntax for type casting, the value::type-name and value membership
of a type should be likewise.


or the standard:

CAST(value AS typename)


Indeed.  The exact syntax doesn't matter to me but the point is that the type 
name is its own lexical element, conceptually a function argument, rather than 
being a substring of another one.



Maybe to test if a value can be cast as a type, you can continue the ::
mnemonic, say adding a ? for yes and a ! for no.

For example, value?::type-name tests if the value can be cast as
the type and value!::type-name or not value?::type-name
tests the opposite. An expression like this results in a boolean.


Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)


I see that someone has been paying attention.

Yes, the idea of using ? or ! to derive a boolean expression from some other 
expression did indeed come from Perl 6.  The ? means is so, ! means is not. 
 A very useful mnemonic in general.


Good shorthand, I guess, but a CAST syntax extension or alternate CAST 
version would be a bonus for readability.


Well, sure.  But something consistent with cast syntax that Pg provides.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What happened to the is_type family of functions proposal?

2010-09-24 Thread Darren Duncan

Colin 't Hart wrote:
The fact that this wraps would seem to me to make the implementation of 
is_date() difficult.


Having separate is_foo() syntax per type is a bad design idea, same as having a 
different equality test like eq_int() or assignment syntax like assign_str() per 
type.


There should just be a single syntax that works for all types, in the general 
case, for testing whether a value is a member of that type, or alternately 
whether a value can be cast to a particular type.


For example, one could say is_type( value, type-name ) or it could be 
spelled isa() or if you wanted to be more ambitious it could be an infix op, 
like value isa type-name to test when a value is of a type already.


Pg already gets it right in this regard by having a single general syntax for 
type casting, the value::type-name and value membership of a type should 
be likewise.


Maybe to test if a value can be cast as a type, you can continue the :: 
mnemonic, say adding a ? for yes and a ! for no.


For example, value?::type-name tests if the value can be cast as the type 
and value!::type-name or not value?::type-name tests the opposite. 
An expression like this results in a boolean.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.0.0 release scheduled

2010-09-18 Thread Darren Duncan

Tom Lane wrote on 2010.09.08:

The core committee has decided that it's time to press forward with
releasing 9.0.  Barring catastrophic bug reports in the next week,
9.0.0 will be wrapped Thursday 9/16 for public announcement Monday 9/20.


Is this still the timetable, or is it being affected at all by the delay in 
migrating repositories while the CVS is cleaned up? -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bad variable subst after AS

2010-09-16 Thread Darren Duncan

I don't know if this is a bug or not, but if not, it looks like a misfeature ...

When executing the following in Pg 8.4.4:

  CREATE OR REPLACE FUNCTION f () RETURNS TABLE(a1 INTEGER) AS
  $BODY$
  BEGIN
RETURN QUERY SELECT a0 AS a1 FROM rv;
RETURN;
  END;
  $BODY$
  LANGUAGE plpgsql;

... I get this error:

  ERROR: syntax error at or near $1
  SQL state: 42601

My impression of this is that Pg is treating the a1 after the AS like it was
a variable reference and so substituted it for $1.

Now that just seems wrong to me.  I can understand either a0 or rv getting a
substitution, but something following an AS being substituted is just wrong.

Is that a bug and if not then what is the rationale for working that way, and
can it be changed?

Meanwhile, what is the best way to write f to work around this misbehavior?

Thank you.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad variable subst after AS

2010-09-16 Thread Darren Duncan

Thank you to the 4 people who replied.

Heikki Linnakangas wrote:

It's a known misfeature, PL/pgSQL isn't very smart about replacing variables 
with parameter markers.

The good news is that this has been completely rewritten in 9.0. The above will 
work in 9.0.

If you can't upgrade to 9.0, you'll have to rename the variable or use a 
different alias in the AS clause.


Good to hear that this is fixed in 9.0; and I do intend to upgrade any week now.

Andrew Dunstan wrote:

Remove the AS clause. You don't need it here at all.


Okay, that seems to be the best workaround while running under 8.4; or I would 
use something like _a1 instead for documentation purposes or for referencing.


Alvaro Herrera wrote:

Meanwhile, what is the best way to write f to work around this misbehavior?


Give the column a different alias, one not colliding with a variable name.


In this circumstance, the whole point of the AS clause is, because I was 
declaring in the function signature that it exported a table with a column named 
a1, I used the AS clause to make sure the selected column was named a1, else 
conceptually there would be a type mismatch between declared and actual result. 
 Making them exactly the same is the whole point of the exercise.


I'm in the school of thought that a table column's name is the only proper way 
to identify it, rather than the ordinal position being significant for identity, 
so even though SQL supports the latter, I consider it a misfeature of SQL that 
leads to error-prone code, and try to not rely on it when I can help it.


The fact that Pg would make things work when the result column name is different 
than the declared name points squarely to ordinal position as identity, as 
that's the only way it could work.


Considering that behavior, I agree that using a different name is reasonable 
under 8.4 to make this work.


Pavel Stehule said:

It's not a bug - just you cannot use a variable there. Table name,
column names are specified in planner time, and cannot be
parametrized.

p.s. you can use a dynamic SQL - EXECUTE statement - RETURN QUERY
EXECUTE - but it doesn't help you, because you cannot overwrite a
function definition.


I think you misunderstood what I was trying to do.  In contrast to what you 
said, I was *not* expecting the a1 in AS a1 to be treated as a variable.  But 
no worries; Heikki/Andrew/Alvaro understood what I meant.


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2010-09-16 Thread Darren Duncan

Robert Haas wrote:

On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine dfonta...@hi-media.com 
wrote:

I think they call that dynamic scope, in advanced programming
language. I guess that's calling for a quote of Greenspun's Tenth Rule:

 Any sufficiently complicated C or Fortran program contains an ad hoc
 informally-specified bug-ridden slow implementation of half of Common
 Lisp.

So the name of the game could be to find out a way to implement (a
limited form of) dynamic scoping in PostgreSQL, in C, then find out all
and any backend local variable that needs that to support autonomous
transactions, then make it happen… Right?


Interestingly, PostgreSQL was originally written in LISP, and there
are remnants of that in the code today; for example, our heavy use of
List nodes.  But I don't think that has much to do with this project.
I plan to reserve judgment on the best way of managing the relevant
state until such time as someone has gone to the trouble of
identifying what state that is.


It would probably do Pg some good to try and recapture its functional language 
roots where reasonably possible.  I believe that, design-wise, functional 
languages really are the best way to do object-relational databases, given that 
pure functions and immutable data structures are typically the best way to 
express anything one would do with them. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] autonomous transactions (was Re: TODO note)

2010-09-15 Thread Darren Duncan

Robert Haas wrote:

On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart colinth...@gmail.com wrote:

I note that the implementation of tab completion for SET TRANSACTION in PSQL
could benefit from the implementation of autonomous transactions (also
TODO).


I think it's safe to say that if we ever manage to get autonomous
transactions working, there are a GREAT MANY things which will benefit
from that.  There's probably an easier way to get at that Todo item,
though, if someone feels like beating on it.

One problem with autonomous transactions is that you have to figure
out where to store all the state associated with the autonomous
transaction and its subtransactions.  Another is that you have to
avoid an unacceptable slowdown in the tuple-visibility checks in the
process.


As I understand it, in many ways, autonomous transactions are like distinct 
database client sessions, but that the client in this case is another database 
session, especially if the autonomous transaction can make a commit that 
persists even if the initial session afterwards does a rollback.


Similarly, using autonomous transactions is akin to multi-processing.  Normal 
distinct database client sessions are like distinct processes, but usually are 
started externally to the DBMS, but autonomous transactions are like processes 
started within the DBMS.


Also, under the assumption that everything in a DBMS session should be subject 
to transactions, so that both data-manipulation and data-definition can be 
rolled back, autonomous transactions are like a generalization of supporting 
sequence generators that remember their incremented state even when the action 
that incremented it is rolled back; the sequence generator update is effectively 
an autonomous transaction, in that case.


The point being, the answer to how to implement autonomous transactions could be 
as simple as, do the same thing as how you manage multiple concurrent client 
sessions, more or less.  If each client gets its own Postgres OS process, then 
an autonomous transaction just farms out to another one of those which does the 
work.  Or maybe there could be a lighter weight version of this.


Does this design principle seem reasonable?

If autonomous transactions could be used a lot, then maybe the other process 
could be kept connected and be fed other subsequent autonomous actions, such as 
if it is being used to implement an activity log, so some kind of IPC would be 
going on.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >