from:"Tom Dunstan"

Re: [HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-06 Thread Tom Dunstan

> On 5 Oct 2016, at 8:11 PM, Pantelis Theodosiou  wrote:
> 
> This can be solved by chaining modifying CTEs. 
> 
> Something like this (not tested)  that can work with multiple rows inserted:

Thanks for the suggestion, but it was actually slower than our current 
implementation, I believe due to always looking up t1’s id in that join rather 
than only doing it when we didn’t get an id back from the insert. My hope with 
this feature suggestion / request was that we wouldn’t have to do that 
subsequent lookup at all, as pg would just give it back to us.

Maybe it would be a win if we were inserting multiple rows, but this code is 
actually in a trigger on a dummy table that we COPY data in to - thus it can’t 
be rewritten as a rule or a multi-row insert like that.

Thanks

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-04 Thread Tom Dunstan

Hi all

We recently moved to using 9.5 and were hoping to use the new upsert 
functionality, but unfortunately it doesn’t quite do what we need.

Our setup is something like this:

CREATE TABLE t1 (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  bk1 INT,
  bk2 UUID
  — other columns
);
CREATE UNIQUE INDEX t1_bk ON t1 (bk1, bk2);

CREATE TABLE t2 (
  t1_id BIGINT NOT NULL REFERENCES t1
 — other stuff
);

Data comes in as inserts of one tuple each of t1 and t2. We expect inserts to 
t1 to be heavily duplicated. That is, for stuff coming in we expect a large 
number of rows to have duplicate (bk1, bk2), and we wish to discard those, but 
not discard the t2 tuple - those should always be inserted and reference the 
correct t1 record.

So we currently have an insertion function that does this:

BEGIN
  INSERT INTO t1 (bk1, bk2, other columns)
  VALUES (bk1val, bk2val, other values)
  RETURNING id
  INTO t1_id;
EXCEPTION WHEN unique_violation THEN
  SELECT id
  FROM t1
  WHERE bk1 = bk1val AND bk2 = bk2val
  INTO t1_id;
END;

INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);

We were hoping that we’d be able to do something like this:

INSERT INTO t1 (bk1, bk2, other columns)
  VALUES (bk1val, bk2val, other values)
  ON CONFLICT (bk1val, bk2val) DO NOTHING
  RETURNING id
  INTO t1_id;
INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);

But unfortunately it seems that the RETURNING clause returns null when there’s 
a conflict, rather than the existing row’s value.

I understand that there is ambiguity if there were multiple rows that were in 
conflict. I think this sort of functionality really only makes sense where the 
conflict target is a unique constraint, so IMO it would make sense to only 
support returning columns in that case.

I imagine that this would be possible to do more efficiently than the 
subsequent query that we are currently doing given that postgres has already 
found the rows in question, in the index at least. I have no idea how hard it 
would actually be to implement though. FWIW my use-case would be supported even 
if this only worked for indexes where the to-be-returned columns were stored in 
the index using Anastasia’s covering + unique index patch, when that lands.

Thoughts?

Tom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IPv6 link-local addresses and init data type

2016-05-30 Thread Tom Dunstan


> On 31 May 2016, at 2:07 AM, Tom Lane  wrote:
> 
> The impression I have is that scopes are not very well standardized ---
> eg, OS X reports things like "fe80::1%lo0" not just "%0".  If we could
> get around that problem it would be worth doing.

Yeah, we’d have to just store it as a string I think. That’s why I was happy to 
see that inet was already a varlena but only with known-length content.

Cheers

Tom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] IPv6 link-local addresses and init data type

2016-05-29 Thread Tom Dunstan

Hi all

I just ran into an issue that was originally reported way back in 2007 - 
https://www.postgresql.org/message-id/flat/0262b803-b664-4ebe-85b4-3c9a40ea6...@mobygames.com
 


Basically the inet data type cannot store or parse valid ipv6 address literals 
with a scope / zone id suffix. Apparently the combination of virtualised linux, 
ipv6 network and JVM that we are using has combined to report connections on 
localhost as coming from ‘::1%0’, which our app is unsuccessfully attempting to 
store in the db in an inet column. This is the first time that I have ever seen 
this, but perhaps it will get more common as ipv6-first usage increases.

Given that inet is a varlena struct with only known-length fields, it seems 
potentially possible to extend it to add an optional, variable length zone id 
on the end, with the result being backwards compatible with existing data.

Thoughts?

Cheers

Tom

Re: [HACKERS] Alter or rename enum value

2016-04-04 Thread Tom Dunstan

Just stumbled across this thread while looking for something else…

> On 28 Mar 2016, at 12:50 AM, Tom Lane  wrote:
> What you really need is to prevent the new value from being inserted
> into any indexes, but checking that directly seems far more difficult,
> ugly, and expensive than the above.
> 
> I do not know whether this would be a meaningful improvement for
> common use-cases, though.  (It'd help if people were more specific
> about the use-cases they need to work.)

My team’s use case is: We have to add new values to an enum (no removal or 
renames) during occasional database schema migration as part of software 
releases. We’re using a db migration tool that understands that postgres can do 
most schema changes in a transaction, so it attempts to run our add enum value 
statements in a transaction, even if we mark them as individual changes. With 
some huffing and puffing we’ve managed to work around this limitation in the 
tool, but it would definitely be nice to keep our enum-related migrations with 
our other changes and drop the custom non-transactional migration code that 
we’ve had to write.

The suggested solution of disallowing any use of the new value during the same 
transaction that is added in would work fine for us. In the (so far 
unprecedented) case that we need to use the new value in a migration at the 
same time, we’d always have the option of splitting the migration up into two 
transactions.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-07-03 Thread Tom Dunstan

On 4 July 2014 11:37, Tom Lane  wrote:
>
> > Sure it can - it append RETURNING PRIMARY KEY and hand back a ResultSet
> > from whatever was returned. It's CURRENTLY doing that, but it's appending
> > RETURNING * and leaving it up to the caller of getGeneratedKeys() to work
> > out which columns the caller is interested in.
>
> I might be mistaken, but as I read the spec for getGeneratedKeys, whether
> or not a column is in a primary key has got nothing to do with whether
> that column should be returned.  It looks to me like what you're supposed
> to return is columns with computed default values, eg, serial columns.
> (Whether we should define it as being *exactly* columns created by the
> SERIAL macro is an interesting question, but for sure those ought to be
> included.)  Now, the pkey might be a serial column ... but it equally
> well might not be; it might not have any default expression at all.
> And there certainly could be serial columns that weren't in the pkey.
>
> The fact that the spec is kinda fuzzy doesn't entitle us to ignore the
> plain meaning of the term "generated key".
>

Well, strictly speaking no, but in reality it doesn't look like other
databases that support this feature support anything other than returning
auto-increment fields on primary keys, often only supporting a single
column (see [1]). Oracle doesn't support even that - callers have to call
the variant that specifies a column list, since historically there was no
real support for recognising such columns, oracle being sequence based.

As such, there can't be any callers in the wild that will expect this to
return anything other than the primary key, because there's no support for
doing anything else. And if the caller really DOES want other columns
returned, they can always call the variant that allows them to specify
those columns, or just issue a normal query with a returning clause. This
feature really exists to support the default case where those columns
aren't specified by the caller, and given current driver support (and
sanity) the only thing any caller will expect from such a result is the
primary key.

Cheers

Tom

[1]
http://www.postgresql.org/message-id/cappfruwqy0z66trv4xmdqnyv0prjky+38x+p4vkhmrrw5rb...@mail.gmail.com

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-07-03 Thread Tom Dunstan

On 4 July 2014 00:07, Tom Lane  wrote:

> TBH, I thought that RETURNING PRIMARY KEY was already fairly iffy
> in that the application would have little idea what it was getting back.
> IF EXISTS would make it so spongy as to be useless, IMO.
>

IF EXISTS is pretty pointless - while the behaviour of getGeneratedKeys()
isn't defined for cases where there aren't any, it's only meaningful if the
caller has previously asked for the keys to be returned, and someone asking
to do that where it doesn't make sense can get an error as far as I'm
concerned. No-one does this in practice.

Looking at the feature as a more general SQL construct, ISTM that if
someone requests RETURNING PRIMARY KEY where there isn't one, an error is
appropriate. And for the IF EXISTS case, when on earth will someone request
a primary key even if they're not sure one exists?

> It sounds to me like designing this for JDBC's getGeneratedKeys method
> is a mistake.  There isn't going to be any way that the driver can support
> that without having looked at the table's metadata for itself, and if
> it's going to do that then it doesn't need a crutch that only partially
> solves the problem anyhow.
>

Sure it can - it append RETURNING PRIMARY KEY and hand back a ResultSet
from whatever was returned. It's CURRENTLY doing that, but it's appending
RETURNING * and leaving it up to the caller of getGeneratedKeys() to work
out which columns the caller is interested in.

Turns out that's actually ok - most Java-based ORMs have more than enough
metadata about the tables they manage to know which columns are the primary
key. It's the driver that doesn't have that information, which is where
RETURNING PRIMARY KEY can help by not forcing the driver to request that
every single column is returned.

The only downside that I see is cases where someone requests the keys to be
returned but already has a RETURNING clause in their insert statement -
what if the requested columns don't include the primary key? IMO it's
probably enough to document that if the caller wants to issue a RETURNING
clause then they better make sure it contains the columns they're
interested in. This is already way outside anything that standard ORMs will
do (they don't know about RETURNING) so anyone doing this is hand-crafting
anyway.

Cheers

Tom

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-26 Thread Tom Dunstan

On 27 June 2014 06:14, Gavin Flower  wrote:

> On 27/06/14 00:12, Rushabh Lathia wrote:
>
>> INSERT INTO dept VALUES (10,'ACCOUNTING','NEW YORK') returning primary
>> key, dname;
>>
>> I think allowing other columns with PRIMARY KEY would be more useful
>> syntax.
>> Even in later versions if we want to extend this syntax to return UNIQUE
>> KEY,
>> SEQUENCE VALUES, etc.. comma separation syntax will be more handy.
>>
>>
> I agree 100%.
>

If the query is being hand-crafted, what's to stop the query writer from
just listing the id columns in the returning clause? And someone specifying
RETURNING * is getting all the columns anyway.

The target use-case for this feature is a database driver that has just
done an insert and doesn't know what the primary key columns are - in that
case mixing them with any other columns is actually counter-productive as
the driver won't know which columns are which. What use cases are there
where the writer of the query knows enough to write specific columns in the
RETURNING clause but not enough to know which column is the id column?

Consistency is nice, and I can understand wanting to treat the PRIMARY KEY
bit as just another set of columns in the list to return, but I'd hate to
see this feature put on the back-burner to support use-cases that are
already handled by the current RETURNING feature. Maybe it's easy to do,
though.. I haven't looked into the implementation at all.

Cheers

Tom

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-10 Thread Tom Dunstan

>
> Is it going to save enough to justify depending on a syntax that won't
> be universal for a long time to come?
>

Oh, and on the won't-be-universal-for-a-while point - the status quo works
fine, it's just less efficient than it should be. Once someone upgrades to
9.5 or whatever, and upgrades to the matching JDBC driver version, they'll
get the newer efficient call for free.

In the python world PEP249 has a lastrowid property that drivers can
implement, but I don't know how much it's used or our support for it. Any
python devs out there want to chime in? I don't know about other drivers.

Obviously anyone hand-crafting their queries won't be able to do that until
they know it's safe. But that's always the case with new syntax.

Cheers

Tom

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-10 Thread Tom Dunstan

On 11 June 2014 10:09, Tom Lane  wrote:

> I'm not even 100% sold that automatically returning the primary key
> is going to save any application logic.  Could somebody point out
> *exactly* where an app is going to save effort with this type of
> syntax, compared to requesting the columns it wants by name?
> Is it going to save enough to justify depending on a syntax that won't
> be universal for a long time to come?
>

Well, in e.g. Hibernate there's piece of code which calls
getGeneratedKeys() to fetch the inserted primary key (it only supports
fetching a single generated key column in this way) if the underlying
database supports that. The postgresql dialect specifies that it does
support that code path, so at the moment any hibernate users who aren't
explicitly specifying the "sequence" type for their id generation will be
calling that, and the JDBC driver will be appending "RETURNING *" under the
hood for all inserts.

Looking at the oracle hibernate dialect is instructive as to the state of
support for the explicit-column-list variation:

// Oracle driver reports to support getGeneratedKeys(), but they only

// support the version taking an array of the names of the columns to

// be returned (via its RETURNING clause).  No other driver seems to

// support this overloaded version.

And so hibernate doesn't support the explicit-column-list version at all
since apparently no-one else supports it, and just marks that code path as
unsupported for oracle. I presume that the situation is similar in other
java-based ORMs.

Looking at some other drivers that I would expect to support
getGeneratedKeys() in a sane way given their identity/auto-increment
semantics reveals:

 - JTDS driver for MSSQL/Sybase piggybacks a second query to do "SELECT
SCOPE_IDENTITY() AS blah" / "SELECT @@IDENTITY AS blah" to fetch the key if
that was requested. It looks like this driver does support specifying the
column name, but it only allows a single column to be given, and it
promptly ignores the passed in value and calls the non-specified version.

 - MySQL driver internally returns a single ID with the query result, and
the driver then appears to add an auto-increment amount to calculate the
rest of the values. I guess MySQL must allocate the ids in
guaranteed-sequential chunks. MySQL only supports a single auto-increment
key. If called with the explicit column version, the passed-in column names
are ignored.

So looks like other JDBC driver/server combos only support this for
single-column primary keys. But for those cases things pretty much work as
expected. It would be nice to be able to support at least primary keys with
this feature.

We could try to teach every ORM out there to call the explicit column-list
version, but given other lack of support for it I doubt they'll be
interested, especially if the reason is because we don't want to add enough
support to make getGeneratedKeys() work efficiently.

FWIW I reckon for most users of ORMs at least it will be enough to support
this for direct inserts to tables - views is a nice-to-have but I'd take
tables-only over not at all.

Cheers

Tom

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-10 Thread Tom Dunstan

On 10 June 2014 17:49, Hannu Krosing  wrote:

> RETURNING GENERATED KEYS perhaps, but then how do we determine that?
>
>  What about RETURNING CHANGED FIELDS ?
>
> Might be quite complicated technically, but this is what is probably
> wanted.
>

Seems to be getting further away from something that describes the main use
case - changed fields sounds like something that would apply to an update
statement.

>  Any column that was filled with a default value? But that's potentially
> returning far more values than the user will want - I bet 99% of users just
> want their generated primary key.
>
>
> Probably not true - you would want your ORM model to be in sync with what
> is database after you save it if you plan to do any further processing
> using it.
>

Well, yes, but since RETURNING is non-standard most ORMs are unlikely to
support fetching other generated values that way anyway. The ones that I've
dealt with will do an insert, then a select to get the extra fields. I
don't know if other JDBC drivers allow applications to just specify any old
list of non-key columns to the execute method, but I suspect not, given
that the way they fetch those columns is rather less general-purpose than
our RETURNING syntax.

>
> The second paragraph refers to [3] and [4] where the application can
> specify which columns it's after. Given that there's a mechanism to specify
> which keys the application wants returned in the driver, and the driver in
> that case can just issue a RETURNING clause with a column list, my gut feel
> would be to just support returning primary keys as that will handle most
> cases of e.g. middleware like ORMs fetching that without needing to know
> the specific column names.
>
> Why not then just leave the whole thing as it is on server side, and let
> the ORM specify which "generated keys" it wants ?
>

Because java-based ORMs (at least) mostly don't have to - other
server/driver combos manage to implement getGeneratedKeys() without being
explicitly given a column list, they just do the sane thing and return the
appropriate identity column or whatever for the inserted row.

I agree that in hand-crafted JDBC there's no particular problem in making a
user specify a column list, (although I don't think I've EVER seen anyone
actually do that in the wild), but most middleware will expect
getGeneratedKeys() to just work and we should try to do something about
making that case work a bit more efficiently than it does now.

Cheers

Tom

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-09 Thread Tom Dunstan

A definite +1 on this feature. A while ago I got partway through hacking
the hibernate postgres dialect to make it issue a RETURNING clause to spit
out the primary key before I realised that the driver was already doing a
RETURNING * internally.

On 10 June 2014 05:53, Jim Nasby  wrote:

> I was wondering that myself. I think it's certainly reasonable to expect
> someone would wan RETURNING SEQUENCE VALUES, which would return the value
of
> every column that owned a sequence (ie: ALTER SEQUENCE ... OWNED BY). ISTM
> that would certainly handle the performance aspect of this, and it sounds
> more in line with what I'd expect getGeneratedKeys() to do.

Keep in mind that not all generated keys come from sequences. Many people
have custom key generator functions, including UUIDs and other exotic
things like Instagram's setup [1].

RETURNING GENERATED KEYS perhaps, but then how do we determine that? Any
column that was filled with a default value? But that's potentially
returning far more values than the user will want - I bet 99% of users just
want their generated primary key.

The spec is a bit vague [2]:

Retrieves any auto-generated keys created as a result of executing
this Statement object. If this Statement object did not generate any
keys, an empty ResultSet object is returned.

Note:If the columns which represent the auto-generated keys were
not specified, the JDBC driver implementation will determine the
columns which best represent the auto-generated keys.

The second paragraph refers to [3] and [4] where the application can
specify which columns it's after. Given that there's a mechanism to specify
which keys the application wants returned in the driver, and the driver in
that case can just issue a RETURNING clause with a column list, my gut feel
would be to just support returning primary keys as that will handle most
cases of e.g. middleware like ORMs fetching that without needing to know
the specific column names.

Cheers

Tom

[1]
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
[2]
http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getGeneratedKeys()
[3]
http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20int[])
[4]
http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20java.lang.String[])

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Tom Dunstan

On 10 February 2014 20:11, Hannu Krosing  wrote:
> The fastest and lowest parsing cost format for "JSON" is tnetstrings
> http://tnetstrings.org/ why not use it as the binary wire format ?
>
> It would be as binary as it gets and still be generally parse-able by
> lots of different platforms, at leas by all of these  we care about.

If we do go down the binary encoding path in a future release, can I
please suggest *not* using something like tnetstrings, which suffers
the same problem that a few binary transport formats suffer,
particularly when they're developed by people whose native language
doesn't distinguish between byte arrays and strings - all strings are
considered byte arrays and it's up to an application to decide on
character encoding and which things are data vs strings in the
application.

This makes writing a parser in a language which does treat byte arrays
and strings differently very difficult, see e.g. the java tnetstrings
API [1] which is forced into treating strings as byte arrays until the
programmer then asks it to parse the thing again, but please treat
everything as a string this time. The msgpack people after much
wrangling have ended up issuing a new version of the protocol which
avoids this issue and which they are strongly encouraging users to
switch to, see [2] for the gory details.

While we may not ever store types in our jsonb format other than the
standard json data types (I can foresee people wanting to do it,
though), I would strongly recommend picking a format which at least is
clear that a value is a string (text, whatever), and preferably makes
it clear what the character encoding is. Or maybe it should just
follow whatever the client encoding is at the time - as long as that
is completely unambiguous to a client.

Cheers

Tom

[1] https://github.com/asinger/tnetstringsj
[2] https://github.com/msgpack/msgpack/issues/128

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-01-14 Thread Tom Dunstan

On 15 January 2014 03:07, Stephen Frost  wrote:
> For my 2c, I could absolutely see it as being worthwhile to have an
> independent directory to install not-from-package extensions.  That
> would keep things which are "managed by the package system" and things
> which are installed independent separate, which is absolutely a good
> thing, imv.

Another use case previously mentioned is that it makes
user-installable extensions for developer-targeted bundles like
Postgres.app possible. Postgres.app ships with contrib and a few other
extensions by default, but if you want to install more, you have to
chuck them into extensions dir inside the app bundle itself, so minor
updates which replace the bundle will then lose your installed
extensions. A nicer approach would be to allow it to also look for
extensions under ~/Library/ as well as in the bundled distribution,
but that's not possible if postgres only looks in one place, short of
hand hacking fs links in the extension dir.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-04 Thread Tom Dunstan

On 5 Dec 2013, at 03:48, Andrew Dunstan  wrote:
>>> Well I guess we could say something like:
>>> 
>>>FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE
>>>(b-condition)
>>> 
> 
> OK, those make sense. I wonder whether this should be done via a USING clause 
> on the constraint that pointed to the partial unique index. Or would that be 
> too obscure?

Well you could put a USING clause on the end and it would read pretty 
unambiguously. Requiring that the user specify it rather than trying to guess 
which index to use would also probably be an easier path to getting that 
feature in, at least for a first cut.

I won’t be doing work towards putting a where clause on the referenced side 
just yet, though. One thing at a time.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-04 Thread Tom Dunstan

On 5 Dec 2013, at 06:10, Tom Lane  wrote:
> Andrew Dunstan  writes:
 Well I guess we could say something like:

 FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE
 (b-condition)

> 
> I like what you have above. 

Yeah. Given both the apparent ambiguity of the current placement, and the fact 
that the current placement would be right where you’d put a where clause on the 
referenced table, that’s the only sane way to do it. And it’s not so bad.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-03 Thread Tom Dunstan

On 4 December 2013 01:24, Robert Haas  wrote:
> Yeah, more or less, but the key is ensuring that it wouldn't let you
> create the constraint in the first place if the partial index
> specified *didn't* match the WHERE clause.  For example, suppose the
> partial index says WHERE parent_entity = 'event' but the constraint
> definition is WHERE parent_event = 'somethingelse'.  That ought to
> fail, just as creating a regular foreign constraint will fail if
> there's no matching unique index.

The where clause only applies to queries against the FK table, and we
don’t currently fail if there isn’t a matching index on the fk column
when creating a FK (I’ve been bitten by that before).

We fail if there isn’t a unique index on the referenced
table/column(s), but queries against that table on insert/update not
the FK table are unchanged (save that we don’t bother with them at all
if the where clause expression fails for the given tuple).

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-02 Thread Tom Dunstan

On 3 Dec 2013, at 12:37, Tom Lane  wrote:

> Tom Dunstan  writes:
>> Well, with this patch, under the hood the FK query is doing (in the case of 
>> RESTRICT):
> 
>> SELECT 1 FROM ONLY "public"."comment" x WHERE (the id) 
>> OPERATOR(pg_catalog.=) "parent_id" AND (parent_entity = 'event') FOR KEY 
>> SHARE OF x;
> 
> Hm.  The RI trigger code goes to extreme lengths to make sure that the
> query strings it generates will invoke exactly the operators it intends
> --- thus the OPERATOR(pg_catalog.=) syntax, which would otherwise be
> merely overkill.  The added text you are showing above seems trivially
> vulnerable to unwanted behavior and even security bugs, if executed in
> say an unexpected search_path context.  I am not sure that we have the
> technology to automatically make arbitrary expressions proof against that
> sort of hazard, but in any case you don't seem to be trying very hard.

Not trying at all.

The source for that clause has come out of deparse_expression - it doesn’t seem 
like it would be totally impossible to create a deparse_expression_qualified 
which could get get_rule_expr to emit fully qualified operators, functions etc. 
I can see that my regression testing to test for different expressions is going 
to get quite large to guard against this stuff. Thanks for the pointer.

> Another issue that would need to be thought about is trojan-horse
> functions in the WHERE clause.  IIRC, RI trigger queries might run as the
> owner of either the referencing or the referenced tables.  If those two
> don't fully trust each other then this is opening the gates for mischief.

On insert / update in the table with the fk, the where clause is evaluated 
separately first against the current trigger tuple. If the check passed, we 
then do the check against the referenced table (as that table’s owner), which 
is unchanged from now and does not have a where clause.

For updates / deletes on the referenced table, the queries are run against the 
FK table, as that table’s owner, with the extra where clause. But given that 
the FK is defined on that table, I’m not sure that I see a problem - presumably 
the table’s owner was the one who created the FK, right?

Am I missing a scenario here?

Thanks

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-02 Thread Tom Dunstan

On 3 December 2013 12:12, Tom Lane  wrote:
> "Problem"?  It's not a bug that you get hstore 1.2 when you dump from 9.2
> and reload into 9.3; that's a feature.  You wanted an upgrade, presumably,
> or you'd not have been going to 9.3 in the first place.  The entire reason
> why the extension mechanism works like it does is to allow that sort of
> reasonably-transparent upgrade.  It would not be a step forward to break
> that by having pg_dump prevent it (which it would fail to do anyway,
> likely, since the receiving installation might not have 1.1 available
> to install).

Well, I was suggesting further down that extensions which we'd like to
consider built-in because they make guarantees about backwards
compatibility should simply be versionless, since we expect a
transparent upgrade, and as you point out the old version isn't likely
to be there. So maybe hstore wasn't a good example. Those should
continue to be dumped without versions.

(The counter argument is that if hstore 1.2 has awesome new features,
why should people wait until the next major version to enjoy them,
unless they depend on something changing in core?)

Extensions in contrib live in a weird place. Totally builtin stuff
should obviously be dumped without versions, and stuff which is
completely separate and follows its own release schedule should
obviously be versioned. I guess we consider all modules in contrib to
offer the same transparent upgrade guarantees as builtins, so they
shouldn't be versioned, but it feels like some of them should be, if
only because some aren't particularly tied in to the backend all that
tightly. But I guess that's a bogus metric, the true metric is whether
we want people to treat them as basically built-in, with the upgrade
guarantees that go along with that.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-02 Thread Tom Dunstan

On 3 December 2013 02:02, Dimitri Fontaine  wrote:
> Stephen Frost  writes:
>> On the other hand, I can appreciate the concern that we don't really
>> want a dump/restore to include the extension definition when it's
>> already on the filesystem.  That said, it amazes me that we don't
>> include the version # of the extension in pg_dump's 'CREATE EXTENSION'
>> command..  How is that not a problem?
>
> Including the version number would be a problem.
>
> When you install PostgreSQL 9.1, you only have hstore 1.0.
> When you install PostgreSQL 9.2, you only have hstore 1.1.
> When you install PostgreSQL 9.3, you only have hstore 1.2.

ISTM that the real solution to this particular problem is to decouple
the extensions that are currently in contrib from a specific postgres
version. We have an extension mechanism now, and a distribution
mechanism (which people may or may not like, personally I'd still
rather install rpms) so why do we still need to ship these things as a
blessed bundle which is tied to a specific release?

If things were split out, it would be much easier for extension
authors to maintain branches targeting different major versions of
pgsql. Users could then upgrade those separately from upgrading the
major db version.

If this were considered seriously, we could still package up a contrib
tarball with the same set as we have now, and for testing we could
either teach the buildfarm client to pull the contrib modules from
their respective homes, or replace the contrib dirs with git submodule
refs.

I suppose there's a case to be made that there will always be some
extensions which are inseparable - plpgsql is the obvious case - but
we do go to a fair bit of effort to keep backward compatibility in
that case. So the version number issue isn't as much of a deal. Most
extensions we'd expect to be more fluid, and we'd expect users to
upgrade at (perhaps) a different rate to the main db. Is there a case
for a small number of "built-in" extensions being versionless, but
everything else requiring a version number AND dumping out with the
version number?

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-02 Thread Tom Dunstan


On 3 Dec 2013, at 03:37, Robert Haas  wrote:
> I also like this feature.   It would be really neat if a FOREIGN KEY
> constraint with a WHERE clause could use a *partial* index on the
> foreign table provided that the index would be guaranteed to be predOK
> for all versions of the foreign key checking query.  That might be
> hard to implement, though.

Well, with this patch, under the hood the FK query is doing (in the case of 
RESTRICT):

SELECT 1 FROM ONLY "public"."comment" x WHERE (the id) OPERATOR(pg_catalog.=) 
"parent_id" AND (parent_entity = 'event') FOR KEY SHARE OF x;

If we stick a partial index on the column, disable seq scans and run the query, 
we get:

tom=# create index comment_event_id on comment (parent_id) where parent_entity 
= 'event';
CREATE INDEX
tom=# set enable_seqscan = off;
SET
tom=# explain SELECT 1 FROM ONLY "public"."comment" x WHERE 20 
OPERATOR(pg_catalog.=) "parent_id" AND (parent_entity = 'event') FOR KEY SHARE 
OF x;
   QUERY PLAN   


 LockRows  (cost=0.12..8.15 rows=1 width=6)
   ->  Index Scan using comment_event_id on comment x  (cost=0.12..8.14 rows=1 
width=6)
 Index Cond: (20 = parent_id)
 Filter: (parent_entity = 'event'::commentable_entity)
(4 rows)

Is that what you had in mind?

> Whether that works or not, it seems to me that a good deal of thought
> will need to be given to what dependencies get created when creating a
> constraint of this type.

Hmm, yeah I hadn’t thought about that. OTOH, it seems that at least some of the 
expected functionality works anyway:

tom=# alter table comment drop column parent_entity ;
ERROR:  cannot drop table comment column parent_entity because other objects 
depend on it
DETAIL:  constraint comment_blog_fk on table comment depends on table comment 
column parent_entity
constraint comment_event_fk on table comment depends on table comment column 
parent_entity

I guess those bits that I copied from the check constraint code must have 
included creating the appropriate pg_depend entries. :)

I’ll add some more checks to the regression tests.

Did you have other scenarios in mind?

Thanks

Tom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-02 Thread Tom Dunstan

On 3 Dec 2013, at 01:34, Andrew Dunstan  wrote:

> We wanted to apply FK constraints to a very large table, but grandfather in 
> certain cases that didn't meet the constraint. That could have been done very 
> simply using this feature.

Yeah, references to old data is the other obvious case for conditionally 
turning foreign keys off. I decided against using it as an example when trying 
to sell the feature as there are much easier workarounds than in the 
different-parent case, and as a solution it really is a bit more ugly. But I 
suspect that it’s one that people have to deal with fairly regularly.

It might encourage people to keep e.g. old ids in the same column when they 
should be putting them in a separate one, but then sometimes you don’t get to 
just rewrite the schema as there are legacy systems etc etc.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-02 Thread Tom Dunstan

Hi Laurenz!

On 2 Dec 2013, at 19:27, Albe Laurenz  wrote:
> What strikes me is that since foreign key constraints are implemented
> as triggers in PostgreSQL, this solution would probably not have many
> performance benefits over a self-written trigger that implements the
> same functionality.  Since you need two triggers for your example,
> the performance might even be worse than a single self-written trigger.

Well, the main cost on insert in the FK table should be looking for matching 
rows in the referenced tables, which the patch avoids for non-matching rows. So 
while you’ll get the overhead of N triggers firing, you only pay the expected 
query cost (which will even use a partial index if you’ve got one set up). Each 
of the referenced tables is only involved in one FK, so there’s no difference 
in cost there.

> Now performance isn't everything, but that would mean that the benefit
> of your proposal is entirely on the usability side.

Well, I don’t think there’s much of a performance hit, and I don’t think any of 
the alternatives would perform much better in practice, but certainly 
performance wasn’t  a motivating factor for this feature, it was a) correctness 
and b) avoiding the ugliness of the existing solutions. 

> I personally don't think that it is so difficult to write a trigger
> for that functionality yourself, but I guess that the argument for
> this feature rests on how coveted such a functionality would be
> (to justify the trade-off in code complexity).

The patch is pretty small so far - and more than half of it is regression 
tests. So there’s not much extra code complexity IMO. I wouldn’t want to touch 
the FK system with anything but the lightest touch.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposed feature: Selective Foreign Keys

2013-11-27 Thread Tom Dunstan

Hi all!

The Problem
-
One case that traditional SQL doesn't handle very well is when you have a child 
entity which can be attached to a number of different parent entities. Examples 
might be comments, tags or file attachments - we might have 20 different 
entities in the system that we would like our users to be able add comments to, 
but the existing solutions for mapping this all have downsides.

Existing solution 1: Join tables ahoy
If I can have a list of comments on every other object in the system, and I 
want to have referrential integrity, then the obvious thing to do is create a 
join table between each entity and the comments table.
Pros:
 - Straight forward, traditional object-with-collection-of-child SQL structure
Cons:
 - If a parent object gets deleted here, we can't use foreign keys to delete 
e.g. a child comment, so we'll have to either explicitly do it as part of our 
delete logic or have a cleanup process to catch orphans. Or do a dance with 
delete triggers on the join tables deleting the comment.
 - For n entities requiring comments in the system, we need n join tables.If we 
want both comments and e.g. tags and likes on all of our entities, we now have 
3n join tables for what should be some relatively self-contained on-the-side 
data - this is could be more tables than the entire rest of the system
 - It's difficult to create any kind of self-contained component for building 
applications in this scenario, as it will need to know about every other entity 
in the system, or be able to poke around inside whatever ORM or data access 
system that you have to work out what join tables it needs when running queries.

Existing solution 2: Enter the matrix
Instead of having n join tables, let's just mash them all together, with a 
column per parent object, and a check constraint to force exactly one of those 
columns to be set.
Pros:
 - Less bloat in the number of tables
Cons:
 - Doesn't solve orphan problem
 - Addition of a new entity which needs comments and we now have to add another 
column onto it, potentially rewriting the whole thing
 - Ugly

Existing solution 3: Embed the matrix
Kinda like the dependency matrix table, except that all the columns referencing 
potential parent objects we put into the comment table instead.
Pros:
 - Everything contained in column table
 - No orphans, since cascaded deletes will now delete the actual comment
Cons:
 - Comment table now has references to every single type that it may be 
attached to
 - Addition of a new entity and we probably have to rewrite the comment table 
now

Existing solution 4: Abandon ye all referential integrity
Have a column indicating parent type and another one for the id. In the case of 
comments this would be directly on the comment table itself. In the case of 
something like tags that we might expect to be shared between entities, it 
would be in a single join table. 
Pros:
 - Pretty self-contained
 - Data model which has neither lots of empty columns or lots of tables
 - Can make new entities "commentable" without rewriting anything
 - Because it's self-contained, can build application components that don't 
need to know much about the rest of your system. For example this is the 
approach that the grails taggable and commentable plugins take.
Cons:
 - No referential integrity, since we can't have a single column pointing to 
different tables with existing foreign key infrastructure
 - Since there's no real db support for doing things this way, existing ORMs 
etc don't really know how use a single column to join against multiple 
different tables based on a discriminator or 'switch' column.

Existing solution 5: Everything's a thing
Make your entity hierarchy have a base level object which can have comments 
attached, and then everything that you need to be "commentable" has to extend 
that. You can do that in an ORM, or with table inheritance in the database.
Pros:
 - Single top-level thing to hang your data on
Cons:
 - You've polluted your object hierarchy just to hang some stuff off of the end 
of it rather than it being driven by behaviours
 - You're going to be paying a performance penalty - everything that extends 
that base level object will now join against it incessantly, and you now have a 
global id sequence or whatever that you may not want.

Basically none of the above handle the situation very well. The cleanest is 
solution 4, but lack of RI sucks.

Feature Proposal: Selective foreign keys.
-
Allow foreign keys to have where clauses. The above comment example using 
solution 4 might then look like then following:

CREATE TABLE comment as (
  id bigserial primary key,
  content text not null,
  parent_entity regclass not null,
  parent_id int8
);
ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) 
REFERENCES blog(id) WHERE (parent_entity = ‘blog');
ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY (parent

[HACKERS] Evaluate arbitrary expression on tuple inside trigger function?

2013-07-28 Thread Tom Dunstan

Hi all

I'm trying to hack a trigger function to evaluate an expression on the
tuple that the trigger has been fired for, kinda like a check
constraint. I looked at ExecRelCheck in execMain.c which does
more-or-less what I want to do, and I have the parsed node tree all
ready to go. The problem that I'm facing is that ExecRelCheck uses a
passed in EState to set up the executor in the right mode, and with
the right memory context, but the EState doesn't get passed in to the
trigger function, and I can't see anything obvious hanging off the
TriggerData that does get passed in that would give me access to it.

Can anyone either point me to where I might be able to get a handle on
the current EState, or otherwise recommend a way forward?

Thanks

Tom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2013-06-12 Thread Tom Dunstan

On 13 June 2013 04:30, Peter Eisentraut  wrote:

> I'm wondering where "IF NOT EXISTS" and "OR REPLACE" will meet.
>

CREATE OR REPLACE (or ALTER / UPDATE ?) would definitely be useful for
enums, where it would be nice if we could teach an ORM to generate DDL
based on the current values of the enum in code, and know that after the
operation had completed, the database enum type matched the code enum type.
I don't think a sequence of ALTER TYPE ADD VALUE IF NOT EXISTS quite does
the trick, as it doesn't guarantee that the db enum is in the same order as
the code enum, which may or may not be important. I'd expect a CREATE OR
ALTER for enums to raise an error if any of the elements were out of order.

Currently to get to a known state for enums you have to write manual
migration scripts, and while that tends to be how I roll anyway, often when
starting projects in rails / grails / hibernate etc people rely on db
schemas generated by the framework as it lets them prototype with less
mucking around. It would be nice for those frameworks to be able to
generate enum types in a known state.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-12 Thread Tom Dunstan

On 12 June 2013 17:30, Dave Page  wrote:

> Messing with the path (or the dynamic load path) can cause all sorts
> of fun and interesting problems for users, as we found in the early
> days with the EDB installers. I realise it doesn't help these users
> (who doubtless don't know it exists) but what we do these days is drop
> a "pg_env.sh" file in the installation directory that the user can
> source to set their PATH and various PG* environment variables when
> they need/want to.
>

Well, I was imagining something like a helpful dialog box saying "would you
like me to fix your path? I'm just going to source
/Applications/Postgres.app/env.sh in your bash_profile" and the user can
click "ok" or "no thanks I'll do it myself". It might lead to even more
confusion, but it's got to be better than the pg gem silently linking
against the wrong libpq and subsequently failing in interesting ways.

Of course, if they've already installed the pg gem then it's too late
anyway, but at least reinstalling it would then work.

Blech. The more I think about it, the more I like the idea of libpq bundled
with the gem.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-12 Thread Tom Dunstan

On 12 June 2013 16:30, Craig Ringer  wrote:

> None of this is hard if you have  clue what you're doing. Rebuild the Pg
> gem against the right libpq by fixing your PATH so it finds the right
> pg_config, set host=/tmp, or set host=localhost. Any of the three will
> work. Unfortunately most of these users seem to struggle with that, and
> their approach to "it didn't work" appears to be "find another
> tool/tutorial and try that instead".
>

So we need an "official" tutorial? But which distribution would we point
people to? :)


> The postgres.app documentation its self doesn't look quite right when it
> comes to Ruby, actually. For Ruby/Rails it says the user should use "gem
> install pg" but it doesn't tell them to set the PATH first, so they'll get
> whatever random pg_config is on the PATH first, often Apple's elderly Pg
> with its different socket directory path, etc. Sure, they can get around
> that just by setting host: localhost, but it'd be nice to see that improved
> so it tells them how to build their Pg gem against the correct libpq. Or,
> better, has Postgres.app automatically install it for them when they
> install it.
>

Hmm, but where to install it? People using rvm or bundler will have their
gems tucked away in a variety of places.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-12 Thread Tom Dunstan

On 12 June 2013 16:12, Craig Ringer  wrote:

> Yes, they do - including a horde of deeply confused and frustrated Rails
> users struggling to understand why they're getting "no such file or
> directory" or "permission denied" messages about Pg's unix socket,
> because of course they're linked to Apple's libpq which has a different
> default unix socket path, and unless they explicitly specify `host:
> localhost` in their Rails database.yml they get a unix socket connection.
>

Maybe we could ask the rails people to stick a "host: localhost" into their
postgresql examples? Might help a bit, and most users at that level won't
need the absolutely most up-to-date libpq. Of course, there are probably
hundreds of tutorials all over the net that won't have the fix.

> Another option would be to have to explicitly allow use of Apple's
> PostgreSQL (based on known install paths) though; think
> "--use-system-postgresql".
>

Yeah, or --allow-old-libpq or something like that. How will the gem
installer know if the pg_config that it has found is a system one or not?
Going by version number is probably easier.

> I get the strong impression from what I've been reading that a fairly
> typical Rails user setup is:
>
> * Install homebrew
> * Install PostgreSQL using homebrew but don't start it
> * Build the Pg gem against homebrew postgresql's libpq
> * Download and run postgres.app
> * Run your Pg gem using the libpq from homebrew against the postgres.app
> server
>
> Ugh.
>

Hmm. Seems like all the more reason to steer people away from socket-based
comms.

> Good point... though that also raises more concerns regarding consumers
> of the Pg library. And extensions, for that matter; if extensions are
> out-of-tree you need versioned subdirectories, otherwise you'll have
> conflicts between 9.2 and 9.3 (for example) versions of the same
> extensions.
>

Right. I was picturing something like
~/Library/Postgres.app/9.2/extensions. We shouldn't be breaking extensions
within a major release.

> It's also another issue with libpq. User upgrades Postgres.app and
> suddenly their Ruby gems stop working with some linkage error they
> probably don't understand.
>
> (All this is, IMO, really a lesson in why Apple should introduce a
> non-awful packaging system into OS X).

Well, I'm not holding my breath on their packaging changing anytime soon. :)

I wonder if a better approach might be to actually have the gem bundle its
own copy of libpq. Then there's no question of linkage errors when the
server on the system changes, and if users are using tcp rather than unix
sockets, they should be pretty well insulated from those sorts of issues.
Just specify the right port and you're good to go.

Is libpg buildable without building the whole tree? Is it downloadable
without downloading the whole distribution? Hmm.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-11 Thread Tom Dunstan

On 12 June 2013 14:19, Craig Ringer  wrote:

> Postgres.app is the source of quite a lot of other pain too, though. One
> of the bigger problems is that people want/need to link to its libpq
> from client drivers like Ruby's Pg gem, but almost inevitably instead
> link to libpq from Apple's ancient pre-installed PostgreSQL.
>

Oh, interesting. Do the ruby/rails folks use that rather than a pure-ruby
driver? I guess I'm spoiled - most of my development happens on the JVM,
and the JDBC driver doesn't use libpq.

> Without a solution to how to sanely share the client libraries I'm not
> sure private-tree-packaged PostgreSQL is interesting enough to really
> worry about making extensions easier to install.

Hmm, so what might a sane solution look like? It looks like the proper way
to build the pg gem is to specify the full path to pg_config. Maybe we
could convince the pg gem authors to error out if the found version of
postgresql is too old? I presume that we only discover the problems when
someone tries to actually use the driver - or do we find out at gem
installation time?

Another alternative is for the Postgres.app to add its bin dir to the
user's (or system's) path on first startup. Then the correct pg_config will
be found (and the correct psql, pgdump etc etc as well). The app could in
theory even go looking for existing pg gem installed under .rvm or whatever
and prompt the user to reinstall the gem.

> After all, users can
> currently just open Postgres.app as a folder and drop the exts in there,
> or use PGXS and "make install", just like usual.
>

They can do either of those, but if they then upgrade the app, I presume
that the extensions will disappear, and they'll need to rebuild or
reinstall them, which is a bit of a pain.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-11 Thread Tom Dunstan

Hi Josh

On 11 June 2013 04:37, Josh Berkus  wrote:

> I don't personally see a reason for plural locations, but it would be
> nice if it recursed (that is, looked for .so's in subdirectories).  My
> reason for this is that I work on applications which have in-house
> extensions as well as public ones, and I'd like to keep the two
> separated by directory.

I gave one example of a use-case for multiple directories upthread - the
Postgres.app mac app has contrib, plv8 and postgis bundled under its
application folder, but it would be nice to allow users to drop extra
extensions under ~/Library/Postgres.app somewhere.

If we had directory recursion then you could sort of fake it using symlinks
(as long as we follow the symlinks) but it's pretty messy, the wrapper app
would have to make a dir under ~/Library the actual extension dir and have
a symlink back to its bundled extensions. Not the end of the world though.

For any of that to work the dir (or dirs) would need to be settable by
config or startup option - compiled in wouldn't cut it, since the absolute
dir of the end user's home directory isn't known at compile time.

Cheers

Tom

Re: [HACKERS] Configurable location for extension .control files

2013-06-04 Thread Tom Dunstan

On 5 June 2013 05:58, Andres Freund  wrote:

> Yea, I know of Dimitri's work ;). But I really would like this to work
> for C extensions as well. For me personally its no problem at all that
> this wouldn't work on conservatively configured machines. Heck, I
> *don't* want it to work on production machines. But being able to
> configure a dev machine to allow it would be very helpful.

Just the other day I was looking at what it would take to drop some extra
compiled extensions somewhere that the Mac Postgres.app could find them,
and was mildly disappointed to see that it would have to be inside the app
bundle itself, so they would then disappear on upgrade etc. The more maccy
way to install extensions for a user app generally is to stick them under
~/Library/.

Note that Postgres.app is very much an application developer workstation
targeted distribution, so the security issues don't really come into the
picture. It would also be enough in this case to allow multiple paths to be
compiled in rather than pulled from postgresql.conf, but either would work.

Cheers

Tom

Re: [HACKERS] [JDBC] JPA + enum == Exception

2013-02-07 Thread Tom Dunstan

> -Original Message-
> From: pgsql-jdbc-ow...@postgresql.org 
> [mailto:pgsql-jdbc-ow...@postgresql.org] On Behalf Of Marc G. Fournier
> I'm trying to use enum's in a database, but the java guys are telling me that 
> they are having problems with inserts ...
> reading from the database isn't a problem, but there appears to be an issue 
> with converting from string -> enum when saving it back again ...

This is interesting, it seems to be a difference between executing the
sql directly and using a prepared statement:

tomtest=# create type mood as enum ('happy', 'meh', 'sad');
CREATE TYPE
tomtest=# create table enumcast  (current_mood mood);
CREATE TABLE
tomtest=# insert into enumcast values ('sad');
INSERT 0 1
tomtest=# select * from enumcast ;
 current_mood
--
 sad
(1 row)


That works ok, but when attempting to use a prepared statement:

ps = con.prepareStatement("insert into enumcast values (?)");
ps.setString(1, "meh");
ps.executeUpdate();

we get a

org.postgresql.util.PSQLException: ERROR: column "current_mood" is of
type mood but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

Cue sad trombone. You can fix this with implicit casts using CREATE
CAST, or an explicit cast in the query, but this shouldn't really be
necessary for what is a basic use case for enums. In any case ORMs
won't know how to do that without writing custom converters, which
makes me sad. I had intended that ORMs could just treat enum fields as
text fields basically and not have to care about the underlying
implementation.

Cc'ing hackers - why the difference here? I presume that the input
function is getting triggered when the value is inline in the SQL, but
not so when the statement is prepared. Should we consider creating an
implicit cast from text to enums when we create an enum? Or is there
some other way to get the expected behaviour here?

Cheers

Tom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL extensions packaging

2008-07-23 Thread Tom Dunstan

Oops, sent with wrong from header...

-- Forwarded message --
From: "Tom Dunstan" <[EMAIL PROTECTED]>
To: "Dimitri Fontaine" <[EMAIL PROTECTED]>
Date: Wed, 23 Jul 2008 19:40:30 -0400
Subject: Re: [HACKERS] PostgreSQL extensions packaging
Hi!

On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine
<[EMAIL PROTECTED]> wrote:
> I promised to have an in-depth look at the archives before to spend time on
> my ideas for $subject, but failed to do so.

I guess that means you missed both the original discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
my initial patch in that direction and subsequent discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then
:(.

There were two core components to my idea of modules/packages:
 - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
installer etc) and installation into a database. The intention was a)
to standardize package installation generally so that users didn't
have to read n different sets of installation instructions for n
different packages, and b) so that a db owner could install into their
own database any module that had been installed on the system, even if
that might include e.g. C functions that they otherwise would not be
able to install without being a superuser.

- Have dependency tracking so that pg_dump could emit e.g. "LOAD
MODULE foo;" rather than all the different instructions to recreate
the module.

So the proposed installation procedure would be more along the lines of:

yum install postgresql-module-postgis
echo "load module postgis" | psql mydb

My intention was to use whatever native package manager was
appropriate for your distro rather than trying to recreate CPAN,
although some people in the original discussion wanted to go down that
route.

The patch that I provided didn't do any of the dependency stuff yet -
I had been investigating various ways to do it automagically, although
I haven't worked on it for a little while. It may be that the straight
forward explicit declaration that you have here is a better way to do
it.

I didn't have versioning and interdependencies between modules yet,
although it's an obvious extension to the idea.

> A package can also host variables, which visibility are
> package global: any SQL into the package can refer directly to package
> variables.

That was way out of scope for my more modest suggestion - I certainly
wasn't going to change pl/pgsql semantics. For example, how do those
variables behave upon a transaction rollback?

> Now, what would be really good to have would be this pg_pkg command I was
> dreaming about in another -hacker mail:

This turns into recreating CPAN. I like the idea of a "blessed" set of
packages, but would rather not require all postgresql users to have a
full build environment (especially on windows) and have to replace
their native packaging solution. It seems that you agree that
fetching/installing should be separate from loading/installing into
the database. Good. Some posters on the original thread were
suggesting that the fetch/install step should somehow do the database
installation as well, which sounded like a huge can of worms.

I think that we can come up with a package/module format that allows
installation at the OS level without demanding a whole set of download
/ build machinery. If someone then wants to build that and have it
install packages, then fine, but we definitely should not require it
to be able to install stuff.

Look forward to your comments

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL extensions packaging

2008-07-23 Thread Tom Dunstan

Hi!

On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine
<[EMAIL PROTECTED]> wrote:
> I promised to have an in-depth look at the archives before to spend time on
> my ideas for $subject, but failed to do so.

I guess that means you missed both the original discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
my initial patch in that direction and subsequent discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then
:(.

There were two core components to my idea of modules/packages:
 - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
installer etc) and installation into a database. The intention was a)
to standardize package installation generally so that users didn't
have to read n different sets of installation instructions for n
different packages, and b) so that a db owner could install into their
own database any module that had been installed on the system, even if
that might include e.g. C functions that they otherwise would not be
able to install without being a superuser.

- Have dependency tracking so that pg_dump could emit e.g. "LOAD
MODULE foo;" rather than all the different instructions to recreate
the module.

So the proposed installation procedure would be more along the lines of:

yum install postgresql-module-postgis
echo "load module postgis" | psql mydb

My intention was to use whatever native package manager was
appropriate for your distro rather than trying to recreate CPAN,
although some people in the original discussion wanted to go down that
route.

The patch that I provided didn't do any of the dependency stuff yet -
I had been investigating various ways to do it automagically, although
I haven't worked on it for a little while. It may be that the straight
forward explicit declaration that you have here is a better way to do
it.

I didn't have versioning and interdependencies between modules yet,
although it's an obvious extension to the idea.

> A package can also host variables, which visibility are
> package global: any SQL into the package can refer directly to package
> variables.

That was way out of scope for my more modest suggestion - I certainly
wasn't going to change pl/pgsql semantics. For example, how do those
variables behave upon a transaction rollback?

> Now, what would be really good to have would be this pg_pkg command I was
> dreaming about in another -hacker mail:

This turns into recreating CPAN. I like the idea of a "blessed" set of
packages, but would rather not require all postgresql users to have a
full build environment (especially on windows) and have to replace
their native packaging solution. It seems that you agree that
fetching/installing should be separate from loading/installing into
the database. Good. Some posters on the original thread were
suggesting that the fetch/install step should somehow do the database
installation as well, which sounded like a huge can of worms.

I think that we can come up with a package/module format that allows
installation at the OS level without demanding a whole set of download
/ build machinery. If someone then wants to build that and have it
install packages, then fine, but we definitely should not require it
to be able to install stuff.

Look forward to your comments

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash in pgCrypto?

2008-06-17 Thread Tom Dunstan

Coming to this thread a bit late as I've been out of email
connectivity for the past week...

On Tue, Jun 17, 2008 at 2:43 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> In any case, trying to define a module as a schema doesn't help at all
> to solve the hard problem, which is how to get this stuff to play nice
> with pg_dump.  I think that the agreed-on solution was that pg_dump
> should emit some kind of "LOAD MODULE foo" command, and *not* dump any
> of the individual objects in the module.  We can't have that if we try
> to equate modules with schemas instead of making them a new kind of
> object.

This is certainly the end result that I'd like, and intend to work
towards. My main concern has been cases where a module-owned table
gets updated with data that would not be recreated/updated by the LOAD
MODULE in the dump. PostGIS support tables are one example of this,
PL/Java classpath / function information is another. There are
probably many more.

I see two potential solutions:

a) explicitly mark such tables as requiring data to be dumped somehow,
and have pg_dump emit "upsert" statements for all rows in the table.

b) allow modules to define a function that can pg_dump can call to
emit appropriate extra restore commands, above whatever LOAD MODULE
foo does. This has the downside of requiring more work from module
owners (though perhaps a default function that effectively does option
a) could be provided), with a potential upside of allowing module
dumps to become upgrade-friendly by not being tied to a particular
version's table layout.

Thoughts?

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] Database owner installable modules patch

2008-05-11 Thread Tom Dunstan

On Sat, May 10, 2008 at 11:02 AM, Bruce Momjian <[EMAIL PROTECTED]> wrote:
>
> Where are we on this?

I haven't had time to do any work since the original patch. That patch
was fairly basic - it just ran install / uninstall scripts and created
catalog entries, and introduced some slightly exotic syntax to do it
(INSTALL/UNINSTALL vs CREATE/DROP). The next version is intended to
handle dependencies, which should make uninstallation straight forward
for most cases. I was intending to revert the syntax creativity and
make the commands CREATE/DROP too.

I'll get a bit of time to look at both this and the enum patch this week.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread Tom Dunstan

On Tue, Apr 29, 2008 at 12:25 AM, Thomas Mueller
<[EMAIL PROTECTED]> wrote:

>  What do you think about it? Do you think it makes sense to implement
>  this security feature in PostgreSQL as well? If not why not? Does
>  PostgreSQL have another solution or plan to solve the SQL injection
>  problem?

Damn, am I the only person who likes the idea?

To those suggesting that it's just treating the symptom: well of
course it is. But using e.g. Exec-Shield / PIE / stack protection
weren't bad ideas just because buffer overflows are the fault of the
application developer. And who wants to grep through every module they
install on their system every time they do an update just in case some
feature that they never use has added a bad query? Assuming they have
the source. PHP apps are notorious for it, of course, but it isn't
just them.

Now, I reckon the only way to sanely do it without mucking up people's
ad-hoc queries would be to have it as a permission that would default
to on, but be REVOKE-able. Then it can be revoked from the user/role
that $untrusted application connects as, but still allow people to get
in from a trusted account and get their hands dirty when they need to.

Would it catch ALL holes? No, as we saw in the order by case, and
there are probably others (not sure if I like the proposed solution
for that, btw). Would it catch a fair number? Absolutely.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-28 Thread Tom Dunstan

On Mon, Apr 28, 2008 at 2:24 PM, Zeugswetter Andreas OSB SD
<[EMAIL PROTECTED]> wrote:
>  I think you are not considering existing btree indexes here
>  (for the reordering case) ?

You're quite right, I had not considered existing indexes. There's no
easy way to deal with that other than rebuilding them. :(

I *still* think someone with a big table would prefer to drop/create
their indexes rather than go through a nasty ALTER COLUMN which would
have the seemingly much worse outcome of rebuilding their whole table
AND any indexes. But whatever - I'll implement option 1 and submit it,
as a marked improvement over the status quo. If I can make option 2
work fast enough I'll consider submitting it as a feature improvement
thereafter, but given the general consensus for option 1 I'm not
pushing option 2 much any more.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-25 Thread Tom Dunstan

On Sat, Apr 26, 2008 at 2:51 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>  I'm not ... it strikes me that it will add implementation complexity and
>  runtime overhead for a feature that two days ago we didn't think we
>  needed at all, and IMHO one we still shouldn't be thinking to expend a
>  lot of work on.

Well, I *did* think it was necessary, I just hadn't spent the effort
in coming up with a solution. And on the effort side, I'm not going to
be hacking the optimizer any day soon. :)

>  I like #1 with no rewrite support.  That strikes me as covering 99%
>  of the requirements with 10% of the work.

>  Further, as already noted, if you do have to rewrite then a series of
>  manual ALTER COLUMN TYPE operations would probably be a *better* answer
>  than a monolithic implementation, because of the locking problems
>  involved in doing it in one transaction.

I don't understand this if it's calling option 2 the monolithic
implementation. I was intending that the values be permanent tokens if
you like, so that ZERO rewriting would be required for any types of
modification. So I don't see where locking comes in. I don't want
rewriting either.

Hmm, maybe I haven't explained it properly. Here's an example:

CREATE TYPE thing AS ENUM('vegetable', 'animal');

Hypothetical new pg_enum:
 enum_id | value | order | label
-+---+---+---
   0 | 0 | 0 | vegetable
   0 | 1 | 1 | animal

ALTER TYPE thing AS ENUM('animal', 'mineral', 'vegetable');

pg_enum:
 enum_id | value | order | label
-+---+---+---
   0 | 0 | 2 | vegetable
   0 | 1 | 0 | animal
   0 | 2 | 1 | mineral

So we store the 'value' column on disk, and it never changes. The
downside is that we now have to look up the order when we call a
non-equality operator, but I reckon we can pretty efficiently cache
that, so the lookup is just a couple of array index operations. The
upside is that we can reorder, and we'll never run out of values
"in-between" existing ones.

Anyway, sorry if all of the above *was* clear and I just misunderstood
the comment. If there's consensus to go with option 1 I'll pursue that
path. It's much less of a change to go from option 1 to option 2 than
the current code to either of them anyway, so doing some benchmarking
of both options shouldn't be hard if I want to.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-25 Thread Tom Dunstan

Oops, sorry for the crossed emails, slight delay in my main being received.

On Sat, Apr 26, 2008 at 2:18 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>  I think with something like your 16bit/16bit design, and say ten free
>  codes between each original assignment, it'd be okay to not support the
>  rewriting stuff at all.

Yeah. I'm more concerned about someone wanting reordering, hence the
second option in my other mail.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-25 Thread Tom Dunstan

On Sat, Apr 26, 2008 at 2:07 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>  > The very first idea that I had was to have an enum value as
>  > the combination of both an enum id and the ordinal value.
>
>  I seem to remember that we discussed that and rejected it, but I don't
>  remember the reasoning...

I don't think there was any terribly strong objection. IIRC I
originally proposed trying to fit everything into 2 bytes, you
objected to that as "unnecessary bit-shaving" and proposed 8 bytes, I
didn't want to give up pass-by-value, plus my initial pg_enum design
was rather denormalized - the current solution was a compromise that
fixed that and kept everyone happy. :) But we didn't really consider
updates too carefully. Maybe it was just a bit too cute a solution.

So two alternative proposals, both with a 2 byte "enum id" and a 2 byte "value":

1 - We space the values out as evenly as we can across the 65000ish
range and allow people to delete, insert and append, but not reorder.
If they do the above gratuitously we might have to do a rewrite, but
they'll have to get fairly busy to do it. Rewrite would be required
for reorderings.

2- We totally give up the idea of storing a value on disk that is
directly comparable (other than equality), and simply number from zero
up, using that number to index into an array (or use as syscache key
or whatever) containing the real ordering information. We can then
reorder or do any other operations to our heart's content.

I'm actually favouring option 2 - I think it can be done in such a way
as to not be much of an overhead compared to the status quo, and you
know that if we don't implement proper reordering now, someone will
ask for it, and we'll be having this discussion at a similar time
after 8.4 goes out.

I'm happy to work on a patch for this if it meets general approval.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-25 Thread Tom Dunstan

On Fri, Apr 25, 2008 at 11:57 PM, Alvaro Herrera
<[EMAIL PROTECTED]> wrote:
>  We already support rewriting tables ... (albeit only one at a time, I
>  admit.  Doing it for more than one can cause deadlocks).
>
>  Still, if the user wants to pay the cost, why should we prohibit it?

One scenario I'm not happy about is this: the friendly db admin has
happily added an extra value to the end before and the operation has
been a snap - no rewriting required. But this time either a) oid
wraparound has occurred, b) she's inserted one or c) she's reordered
them. Bam - we start rewriting the entire database. That's not the
kind of surprise I like giving people, and the current situation of
either don't allow updates at all, or the alternative to surprises of
always rewrite everything seem pretty deficient. And I don't want to
only allow updates if they won't cause a rewrite, it's
nondeterministic.

I've already suggested some alternatives in the reply to Brendan that
would solve some of this, but I suppose another gross-seeming way to
stop surprise rewrites would be to never do one unless given a FORCE
REWRITE clause on the ALTER statement or something like that, and fail
if a rewrite is required not specified.

>  So the user can create a new enum with the options he
>  wants, then rewrite his tables one by one, then drop the original.

They can pretty much do this now, they just need to define an implicit
cast I think.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Update: < * Allow adding enumerated values to an existing

2008-04-25 Thread Tom Dunstan

On Sat, Apr 26, 2008 at 12:10 AM, Brendan Jurd <[EMAIL PROTECTED]> wrote:
>  Has anyone had a close look at how hard it would be allow just the
>  "add to the end" capability?

If the OIDs haven't wrapped around since the enum was created, it's
trivial. If they have, well someone with more OID-fu than me will have
to explain what we'd have to do to guarantee getting a new OID higher
than a certain value. And if your highest enum oid happens to be
4^32-1, you're in trouble :).

I wonder if it's worth revisiting the decision to save enums on disk
as oids. The very first idea that I had was to have an enum value as
the combination of both an enum id and the ordinal value. We would
presumably make both say 16bits so we could still be be passed by
value. This would restrict us to 2^16 enum types per database and 2^16
values per type, but if anyone is getting within an order of magnitude
of either limit I'd be very interested in seeing what they're doing.

The nice thing about the above approach is that we could space out the
ordinal values so as to allow people to insert a fair number of extra
values before forcing a rewrite of the table. The only thing we really
couldn't handle that way would be reordering - we'd need an extra
layer of indirection for that, which would have some performance
penalties. None of the standard operators for enums require a syscache
lookup currently, only I/O does,

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lessons from commit fest

2008-04-16 Thread Tom Dunstan

On Wed, Apr 16, 2008 at 1:07 PM, Magnus Hagander <[EMAIL PROTECTED]> wrote:

>  I think pg_indent has to be made a lot more portable and easy to use
>  before that can happen :-) I've run it once or twice on linux machines,
>  and it comes out with huge changes compared to what Bruce gets on his
>  machine. Other times, it doesn't :-) So yeah, it could be that it just
>  needs to be made easier to use, because I may certainly have done
>  something wrong.

Yeah, I recall trying it a while ago and not having happy memories. It
should be possible (and indeed, mandatory) for patch submitters to run
it over their code before submitting it - having core guys whose time
is so valuable dealing with indentation etc seems an incredible waste.
It won't fix everything, as Tom points out, but it's a better starting
point.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Dunstan

(I've already said that I think the wiki is a great step forward,
FWIW, and I'm not in any way suggesting that we should just drop it
and pick my favorite issue tracker or anything. However, for those
interested in discussion about theoretical benefits and cons of the
different systems...)

On Fri, Apr 11, 2008 at 8:14 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>  For the umpteenth time bug trackers *still* require someone to maintain the
>  list. It's more structured which is great if it matches the structure you
>  want, but it still requires someone to go open bugs when a bug is reported by
>  email, close bogus bugs or bugs fixed via collateral damage, update bugs when
>  discussions happen on the list about them, etc.

Perhaps I wasn't clear. I was describing the specific case where a
patch submitter would be required by project policy to submit a patch
to a tracker of some kind before discussing it on the list. So there
wouldn't be much of an opportunity for those to fall through. And
owners of a particular patch would be expected to keep them up to date
re discussions. I wasn't discussing emailed bug reports.

The problem with a tracker is that it will give you a list of every
damn thing that people have put in there, and the data in there can
stagnate. The problem with manually maintained lists is that stuff
might not get on there at all. What I and at least one other person
have tried to say is that the problem of dead issues needing to be
closed is a much easier problem to deal with than the problem of an
issue not being there at all. Heck, *I* could trawl a tracker and
email authors of seemingly dead patches. But there's no way I could
maintain a patch list manually without following each and every
discussion.

>  I've seen no discussion about the structure the various bug trackers use and
>  which would match better with our processes. *That* would be the only useful
>  discussion to be having. What attributes do you think patch submissions have,
>  what is the workflow which sets which attributes at which time, who is
>  involved in which step of this workflow? Etc.

Well, I do recall reading at least one thread (not terribly recently)
discussing people's favourite trackers, but IIRC it turned into
something similar to what happens when we discuss CVS replacements :)

>  Proposing specific tools without a consensus on what that process is putting
>  the cart before the horse. You pick the tools to fit what you want to do. We
>  haven't decided what we want to do yet.

This is true. But your processes get shaped by your tools, too. Our
processes might be shaped by what we've got, and so continue forever.
There should be some awareness of what else is out there.

An example of this is the way code flows around the Linux kernel. I'm
not for a minute advocating their general structure, but git seems a
far better tool than the combination of CVS and emailing patches.
Patch announcements aren't always "here's the patch" as much as
"please pull from over here". Their tool support seems rather better
than ours. And it's changed the way they work.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Dunstan

On Fri, Apr 11, 2008 at 1:32 PM, Magnus Hagander <[EMAIL PROTECTED]> wrote:

>  > > The apache group seems to say the patches are indeed ignored, rather
>  > > then just delayed --- for us, every patch does get a reply, however
>  > > delayed.
>  > >
>  >
>  > Bruce, I think that this comes back to the perception versus reality
>  > discussion you and I have had on more than one occasion :). You are
>  > correct that we always, eventually reply but, until we do (especially
>  > when it takes a long time) it appears as if people are being ignored.
>
>  I will continue to claim that no, we don't always do that. The vast
>  majority of the time we do, but there is no way that we can claim to
>  respond to them all. No, I cannot point you to an example where this
>  has happened.

Well, I can provide an easy example: my first patch [1]. We hashed out
the design on -hackers as contributors are encouraged to do, and I
submitted my first patch to -patches. It included a bunch of
first-time-contributor questions that I had about the proper pgsql way
to do things. It got zero responses. It was as if I had dropped it
into a black hole. Eventually I re-submitted it after 8.2 was
released, and some time after that I got a your-patch-has-been-saved
email.

I have no idea how often that happens, perhaps I'm an exception, but
it was incredibly discouraging.

However I see this as being a side-issue - the problem is knowing the
current status of patches, not the occasional patch that drops
through. And if I as a submitter can stick a patch up on a wiki or
tracker and then email the list for feedback that's probably good
enough, and we could probably do away with -patches altogether,
dealing with the fragmentation issue. That alone would reassure a
contributor that their patch wouldn't get lost, though it wouldn't
guarantee that anyone would look at it.

The reason a tracker is better imo than a wiki is that a wiki still
needs someone to maintain an index page (or multiple index pages for
different queues), so there's still an opportunity for something to
fall through. Or are we suggesting that a first-time contributor
should be editing a patch queue index page on the wiki? Trackers don't
have these issues though - managing lists like this is what they were
born to do.

Cheers

Tom

[1] http://archives.postgresql.org/pgsql-patches/2006-09/msg0.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest queue

2008-04-10 Thread Tom Dunstan

On Thu, Apr 10, 2008 at 3:41 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>  >> What's wrong with a patch submitter submitting a patch to a tracker,
>  >> but then emailing the list for actual discussion?
>
>  What's what we have today with the wiki. We don't need any special software 
> to
>  do that. It does require some patch queue maintainer(s) to make sure things
>  get added or updated.

Right, which is what a tracker gives you. A patch submitter can stick
a patch up as WIP or whatever, and update it to
ready-for-commit-review when they're ready, and it's easy to get a
list of ready-to-review patches. If someone wants a patch to get
reviewed in a commit fest, then it better have the latest version and
an up-to-date status. I don't think getting submitters to follow the
rules will be very hard - as someone pointed out it's trivial compared
to the effort of writing a patch. The problem is more likely to be
cleaning up old patches that people submit that never make it to prime
time, but that's easier work for non-core people to help with.

Anyway, I've said my piece and I don't want to discourage movement to
a wiki - it seems a vast improvement in submitter-participation over
the status quo. I just think there are even better tools for the job.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest queue

2008-04-10 Thread Tom Dunstan

On Thu, Apr 10, 2008 at 3:03 PM, Stefan Kaltenbrunner
<[EMAIL PROTECTED]> wrote:

>  well what about having the tracker being subscribed to the list and let it
> create a bug/patch/ticket id automatically for new mails - that way all
> stuff is automatically tracked ? - That way it can be categorized in the
> course of the following discussion but no history gets lost.

I presume you mean for -patches and not -hackers. Even so I reckon
that would create vastly more noise than signal in the eventual
tracker - part of the existing problem has been that wading through
list archives is a pain for someone wanting to know the current status
of a patch. I can't see the above helping that.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest queue

2008-04-10 Thread Tom Dunstan

On Thu, Apr 10, 2008 at 1:07 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > The typical way to solve this is to have the tracker send an automatic
>  > notification email to a list saying "Hey, there's a new ticket at ,
>  > come and check it out".
>
>  Unfortunately that is the typical way to "solve" this. And it's awful.
>  It's like the ubiquitous cryptic phone call in movies saying "can't talk
>  right now but there's something you should know. Meet me under the bridge"

Yeah, it sucks, because people won't bother looking. It fails Tom's
"sniff" test.  (Although I can attest to having submitted a previously
discussed patch to -patches and received *zero* feedback, even
something like "we're too busy getting 8.2 out, come back later").

What's wrong with a patch submitter submitting a patch to a tracker,
but then emailing the list for actual discussion? "Hi there, I just
upload patch #12345 which implements TODO item n, can people please
have a look? I've done x, y and z, not sure about p and q". Then
discussion still happens on-list which is a much better discussion
medium, and the patch has a proper status page which the author can
keep up to date with the latest version etc etc.

If we feel the need to link patch status pages to the email archive,
there's no harm in asking that the original email contain the bug
number in the subject or something like that. That's going towards a
more structured approach than a wiki, but I don't personally see that
as a bad thing.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Database owner installable modules patch

2008-04-07 Thread Tom Dunstan

On Mon, Apr 7, 2008 at 7:55 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Tom Dunstan" <[EMAIL PROTECTED]> writes:
>  > OK, I found an example that does NOT fit the "just drop all
>  > dependencies" scenario, but that I would still like to support. I just
>  > had a look at the postgis pl/java support, and its install does stuff
>  > like "SELECT sqlj.install_jar('file://${PWD}/postgis_pljava.jar',
>  > 'postgis_pljava_jar',  false);" and "SELECT
>  > sqlj.add_type_mapping('geometry', 'org.postgis.pljava.PLJGeometry');".
>  > There's no way we can deal with that sort of thing automatically, so
>  > we'll have to support uninstall scripts regardless.
>
>  Well, that just begs the question of what those commands actually *do*.
>  It seems not unlikely that they'd be inserting data into tables that
>  would belong to the module, in which case an uninstall that dropped
>  the table would be fine.

Those tables belong to a *different* module, though. I'm picturing
three modules here: pljava, postgis, and a postgis-pljava support
module that requires the first two, since it should be possible to
install postgis without requiring pljava. The above stuff was from the
install script of the postgis-pljava code, but inserted data into
tables owned by pljava.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database owner installable modules patch

2008-04-07 Thread Tom Dunstan

Sorry to keep replying to myself, but part of the point of doing a
patch was to force myself (and whoever else is interested to examine
stuff that comes up...

On Mon, Apr 7, 2008 at 11:46 AM, Tom Dunstan <[EMAIL PROTECTED]> wrote:
>  None of that suggests that an uninstaller script would be needed if we
>  understood the deps well enough, but only allowing creates for
>  installs seems a bit restrictive.

OK, I found an example that does NOT fit the "just drop all
dependencies" scenario, but that I would still like to support. I just
had a look at the postgis pl/java support, and its install does stuff
like "SELECT sqlj.install_jar('file://${PWD}/postgis_pljava.jar',
'postgis_pljava_jar',  false);" and "SELECT
sqlj.add_type_mapping('geometry', 'org.postgis.pljava.PLJGeometry');".
There's no way we can deal with that sort of thing automatically, so
we'll have to support uninstall scripts regardless.

The question then becomes: is it worth trying to do stuff
automatically if we provide a manual method anyway? I think the answer
is probably yes, because having pgsql clean up automatically for the
vast majority of cases is a good thing. If it's only exotic cases that
need a manual uninstall script, why force one on everyone else?

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database owner installable modules patch

2008-04-07 Thread Tom Dunstan

On Mon, Apr 7, 2008 at 11:46 AM, Tom Dunstan <[EMAIL PROTECTED]> wrote:
> On Mon, Apr 7, 2008 at 3:59 AM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>  >  I wonder if there's much of a use case for any statements aside from 
> CREATE
>  >  statements. If we restrict it to CREATE statements we could hack things to
>  >  create pg_depend entries automatically. In which case we wouldn't need an
>  >  uninstall script at all.

>  >  The hacks to do this seem pretty dirty but on the other hand the idea of
>  >  having modules consist of a bunch of objects rather than arbitrary SQL
>  >  actually seems cleaner and more robust.
>
>  It *does* seem cleaner for the examples that I've looked at. Are they
>  totally representative though? Not sure. It also implies a bunch more
>  work to create stuff, as we need to understand what's going on so as
>  to create those pg_depend entries.

This has been bouncing around in my head a bit. I was picturing the
module code itself having to understand all the CREATE statements in
order to set up the dependencies... but perhaps an easier way would
simply be to have the create statements themselves insert a pg_depend
entry when they're done, if they detect that we're currently
installing a module. There's already a flag for that that the
superuser code looks at in the patch. Maybe you were ahead of me, and
this was the hack that you were referring to. :) I tend to hate global
flags like that because they leave weird non-obvious dependencies
across the codebase, but perhaps it's the best way to do it in this
case. It would mean hacking every create command in the system to
understand it, though. :(

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database owner installable modules patch

2008-04-06 Thread Tom Dunstan

On Mon, Apr 7, 2008 at 3:59 AM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>  I wonder if there's much of a use case for any statements aside from CREATE
>  statements. If we restrict it to CREATE statements we could hack things to
>  create pg_depend entries automatically. In which case we wouldn't need an
>  uninstall script at all.

Well, the example that got me interested in this stuff originally was
trying to make pl/java easier to install. It does a bunch of
CREATEs... and some GRANTs. Plus ISTM that a pretty common case might
be to create a table for some reference data and then fill it with
default values. Also, I just had a look at the postgis install script,
which at the very least seems to update an opclass entry after
creating it.

None of that suggests that an uninstaller script would be needed if we
understood the deps well enough, but only allowing creates for
installs seems a bit restrictive.

One thing that's nice about arbitrary sql for install / uninstall is
that module authors can test it outside the context of doing an actual
module installation - they just execute their scripts.

>  The hacks to do this seem pretty dirty but on the other hand the idea of
>  having modules consist of a bunch of objects rather than arbitrary SQL
>  actually seems cleaner and more robust.

It *does* seem cleaner for the examples that I've looked at. Are they
totally representative though? Not sure. It also implies a bunch more
work to create stuff, as we need to understand what's going on so as
to create those pg_depend entries. I'm receptive to the idea of
uninstall simply attempting to drop anything related to the module in
pg_depend in the correct order. I can't think of anything created by a
module that we couldn't represent there, and it's a nice way of
ensuring that an uninstall script cleans up properly.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-05 Thread Tom Dunstan

On Sat, Apr 5, 2008 at 5:11 PM, Martijn van Oosterhout
<[EMAIL PROTECTED]> wrote:
>  It even went so far that you could build the install/deinstall scripts
>  into the module itself, so all postgres had to do was dlopen() the
>  module it could access the install script. It fails due to the fact
>  that modules that don't require compilation are left out in the cold...

Well, there's no reason you can't allow both... look for
module_install() in the lib if it's there, or install.sql if it's not.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-04 Thread Tom Dunstan

On Sat, Apr 5, 2008 at 12:22 AM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Aidan Van Dyk" <[EMAIL PROTECTED]> writes:
>
>  > What if you didn't need super-user privileges to load "C" functions, on
>  > the conditions that:
>  > 1) There is no / in the obj_file filename (or some other "sanitizing"
>  >rules)
>  > 2) You're database owner
>
>  That's an interesting idea. It has the property that no super-user is 
> required
>  to do any fiddling *inside* your database. That is, the ISP can just do 
> CREATE
>  DATABASE and then leave you have at it without having to deal with installing
>  modules or granting any permissions inside your database.

Maybe it didn't come across, but it's exactly what I've been
suggesting in this thread, albeit a slightly different solution.

My idea was to have an installed "module", and rather than allowing
$database_owner to create C language functions based on what are
really implementation details for the given extension, just allow them
to say e.g. "install module postgis;" or equivalent - this would then
run either some init function or an appropriately named and placed
install script that would take care of everything. A module would be
expected to provide an uninstall script, too, to allow "uninstall
module foo" or whatever.

Under this scenario end users don't need access to the install
scripts, don't need to know the exact library name, and aren't given
the ability to e.g. create C language functions in ways that weren't
intended, like declaring one taking the wrong variable types or
something. The sysadmin can trust the module to do the right thing -
they don't have to trust the user.

As far as getting sysadmins to install contrib, this then becomes
really easy - just install them all by default and let database owners
install them into their own dbs as they wish.

>  It also opens the door to .deb packagers being able to put pgfoundry modules
>  in the same space. No other suggestion has offered any help to anything 
> except
>  blessed contrib modules.

Well, I actually was approaching the problem from the point of view of
creating yum installable rpms - killing contrib was just a nice side
effect :)

>  I would suggest a guc for the "safe" place and I would suggest it be a list 
> of
>  places. And I would suggest that for OS packagers they really want two
>  locations on that list, something like:
>   /usr/lib/postgresql/modules;/usr/local/lib/postgresql/modules
>  That way users can compile and install their own modules into /usr/local
>  without interfering with modules which come from OS packages.

+1.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-04 Thread Tom Dunstan

On Fri, Apr 4, 2008 at 10:48 AM, Jeremy Drake <[EMAIL PROTECTED]> wrote:

>  My opinion is, it doesn't matter what you call the modules/contrib stuff
>  if I can't use it, and I can't use it  if it is not loaded in my
>  database, and I can't load it without superuser privileges.

Right. Which is why some of us have been suggesting a model where all
modules currently in contrib are installed by default, but not enabled
until a database owner actually issues some sort of "Install module
foo" or whatever it looks like. Database owner privs are probably as
low as we can reasonably set the bar... is that sufficiently low to be
useful? If not, I suppose that we could add a specific "install /
uninstall module" privilege that could be granted to non-db-owner
users if that's the way the ISP prefers to work.

Regarding PostGIS etc, my hope is that if we standardize the
installation of postgresql modules in this manner, it will be much
easier for sysadmins to e.g. yum install postgis - they don't have to
run any SQL scripts by hand, they can get packages built for their
platform and distributed using the preferred platform distribution
method, and the modules will only be enabled for those users that
specifically enable them in their databases. We can't force sysadmins
to install random third party extensions to postgresql, but we can
make it a lot easer than it currently is.

Alternately, if that's still not enough, then if we do standardise the
interface it would be easier to bundle third party modules that live
outside the main source tree - just stick em in /modules when building
the tar files and they'll end up installed and ready-to-enable when
built.

Hmm. We could even do that for existing contrib modules if we want
them to live outside the core project - for example because their
maintainers don't need commit access to core. It would be easy enough
to have the buildfarm fetch blessed modules from their real location
(pgfoundry?) so that we maintain good test coverage.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-03 Thread Tom Dunstan

On Thu, Apr 3, 2008 at 10:36 PM, Andrew Dunstan <[EMAIL PROTECTED]> wrote:
>  No. I don't want to deprecate it, I want to get rid of it, lock, stock and
> barrel. If you think that we need more than renaming then we can discuss it,
> but I don't want a long death, I want one that is certain and swift.

I'll admit that I had thought that moving contrib modules over to a
modules dir as they were, uh, modularized would be the way forward.
Anything that doesn't fit the database-owner-installable pattern
(pgbench? start-scripts? others?) could end up in a utils dir, and
anything left in contrib shows us what's left to do before e.g. 8.4.
The end goal would be no more contrib dir by the next major release.

As a side note, how were you intending to rename contrib? Directory
shenanigans in CVS are horrible, particularly if you want  all your
old branches to still work.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-03 Thread Tom Dunstan

On Thu, Apr 3, 2008 at 9:17 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
>  > It's hard to see ISPs who won't install contrib from installing
>  > ${random module} from the big bad internet as has been discussed in
>  > this thread, but who knows?
>
>  Sure it is. The very word contrib brings about ideas of things like:
>
>  Unstable, Cooker, unofficial.

Point taken, and I completely agree. Part of the problem is that we
have explicitly encouraged this perception, ie "it's in contrib so the
barrier to entry is lower". That may not be the case anymore, or it
may just be that the bar is really really high for non-contrib stuff
vs other projects. Whatever the actual case is, I agree that the name
is unfortunate.

When I wrote the above I was thinking about it from the other way
around: doing a cpan or gem install of some random module seems even
less safe to me, but maybe I'm just revealing confidence in pgsql or
fear of some cpan code etc that ISPs don't share.

>  This would install all the modules but not enable them in the database
>  itself (of course). This could also be extended to the pls so that we
>  have exactly one mechanism to control those options as well.
>
>  ./configure --enable-module=pgcrypto --enable-module=plperl

That's basically where I was heading, although I took it a step
further: why not build and install all possible modules by default, if
we think they're up to quality?

One answer is: what do you do if some required library isn't
available? Do you fail with an error message or just don't build that
module? I don't like the idea of e.g. accidentally and silently not
installing pl/perl just because the sysadmin hadn't installed their
perl-devel package or whatever.

--enable-module=ALL could be pretty good, though, especially if it
build pl/perl etc that most sysadmins will want to install but do so
in less configure args. :)

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-03 Thread Tom Dunstan

On Thu, Apr 3, 2008 at 8:25 PM, Andrew Dunstan <[EMAIL PROTECTED]> wrote:

>  If this were at all true we would not not have seen the complaints from
> people along the lines of "My ISP won't install contrib". But we have, and
> quite a number of times. We have concrete evidence that calling it contrib
> actually works against us.

It's hard to see ISPs who won't install contrib from installing
${random module} from the big bad internet as has been discussed in
this thread, but who knows?

If we go with a solution that allows users to say "install mymodule;"
or whatever into their own database, is there any reason not to
install (as in make install) all modules currently called contrib by
default? Are there any security issues with modules in there? I seem
to remember something coming up involving dblink a while back...

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-03 Thread Tom Dunstan

I had some thoughts about similar issues when looking at what it would
take to make pl/java yum-installable. The end goal was to be able to
say e.g. yum install pljava; echo "create language pljava;" | psql
mydb. Currently there's a non-trivial install process involving
running an sql script and java class.

My idea was to have a createlang_init kind of function that could be
called when installing a language to set up appropriate functions,
tables etc. There would be a similar function to clean up when
dropping the lang.

On Thu, Apr 3, 2008 at 6:12 AM, Ron Mayer <[EMAIL PROTECTED]> wrote:
> Agreed.  Such a mechanism would only really apply for things
> that are installed in the database.   But from an end user's
> point of view, installing functions, index types, languages,
> data types, etc all see to fit the "pg_install postgis -d mydb",
> "pg_install pl_ruby -d mydb", etc. pattern pretty well.
>

Well, there are a couple of major differences. Firstly cpan, gem etc
are able to install all required dependencies themselves, at least
where no native compilation is required, because they are basically
their own platform. PG libs more or less require a build environment.

Secondly, and more importantly, module installation for those
environments happens once and is global; installation of native libs
for pgsql is different to instllation in a database. What happens in
the above scenario when the postgis libs are already installed? And
what about cleanup? Also, it would seem that such an install process
requires the server to be running - so much for packaging as
RPMs/debs/win32 installer etc.

I think a better solution would be to have a pg_install be a
distribution mechanism capable of installing binaries / scripts /
other resources, but have pgsql itself handle module installation into
a particular database. I'm thinking a "CREATE MODULE foo;" kind of
thing that would be capable of finding either a module install script
or a foo_init() function in libfoo.so/foo.dll. Similarly for cleanup,
so cleanup isn't dependent on pg_install lying around or the version
that was install still being the latest when pg_install looks for an
uninstall script.

This would allow modules to be installed site-wide but optionally
created / dropped from specific databases in a much saner manner, and
standard pgsql permissions could apply to installation of modules. It
would also allow creation of rpms etc that can be shipped by a
distribution, and then enabled by the user by calling the appropriate
command.

>> Finally, setting up modules so they can be built for Windows,
especially using MSVC, will probably be quite a challenge.
>>
>
> Indeed.   Seems ruby gems give you the option of installing a "ruby"
> version or a "windows" version that I'm guessing has pre-compiled
> object files.

Yeah, setting up Cygwin to build postgres is a pain (or was when I
last did so). If we're serious about setting up a central repository,
we should consider having a virtualized windows machine capable of
building binaries for the modules that people upload.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Tom Dunstan

On Fri, Feb 29, 2008 at 9:26 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Andrew Sullivan <[EMAIL PROTECTED]> writes:
>  > "Be conservative in what you send, and liberal in what you accept."
>
>  Yeah, I was about to quote that same maxim myself.  I don't have a big
>  problem with allowing uuid_in to accept known format variants.  (I'm
>  not sure about allowing a hyphen *anywhere*, because that could lead to
>  accepting things that weren't meant to be a UUID at all, but this HP
>  format seems regular enough that that's not a serious objection to it.)

This seems like a good enough opportunity to mention an idea that I
had while/after doing the enum patch. The patch was fairly intrusive
for something that was just adding a type because postgresql isn't
really set up for parameterized types other than core types. The idea
would be to extend the enum mechanism to allow UDTs etc to be
parameterized, and enums would just become one use of the mechanism.
Other obvious examples that I had in mind were allowing variable
lengths for that binary data type with hex IO for e.g. differently
sized checksums that people want, and allowing different formats for
uuids.

So the idea as applied to this case would be to do the enum-style
typesafe thing, ie:

create type coldfusion_uuid as generic_uuid('---');

...then just use that. I had some thoughts about whether it would be
worth allowing inline declarations of such types inside table creation
statements as well, and there are various related issues and thoughts
on implementation which I won't go into in this email. Do people think
the idea has legs, though?

>  What I was really complaining about was Josh's suggestion that we invent
>  a function to let users *output* UUIDs in random-format-of-the-week.
>  I can't imagine much good coming of that.  I think we should keep
>  uuid_out emitting only the RFC-standardized format.

Well, if the application is handing them to us in that format, it
might be a bit surprised if it gets back a "fixed" one. The custom
type approach wouldn't have that side effect.

Cheers

Tom

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Tom Dunstan

On Wed, Feb 27, 2008 at 9:26 PM, Florian G. Pflug <[EMAIL PROTECTED]> wrote:
>  I was thinking more along the line of letting a datatype specify a
>  function "void* ioprepare(typmod)" which returns some opaque object
>  specifying all that the input and output function needs to know.
>  We could than establish the rule that input/output functions may not
>  access the catalog, and instead pass them a pointer to that opaque object.

Callers of IO functions don't always know which type they're dealing
with - we had to go to some lengths to pass type information along
with the enum value itself so that it could be looked up in the
syscache in the output function. I think the main culprits are the
P/Ls, but I think there was a security related concern about passing
the type through to the IO function as well. If you want to do
something like this, it would certainly be possible to cache the enum
info for a particular type, but you might want to have a separate set
of io functions just for this case.

On the plus side, if such a cache were to be used by IO generally, we
could reimplement enums to just store the ordinal on disk and save a
couple of bytes, like I wanted but was unable to do the first time
around. :)

Enums are an easy case, though, as there's very little data to deal
with. I don't know about other UDTs out there - do any require more
extensive catalog access?

Cheers

Tom

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_dump additional options for performance

2008-02-26 Thread Tom Dunstan

On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote:
>  > As a user I'd really prefer all of this to be much more transparent, and 
> could
>  > well imagine the -Fc format to be some kind of TOC + zip of table data + 
> post
>  > load instructions (organized per table), or something like this.
>  > In fact just what you described, all embedded in a single file.
>
>  If its in a single file then it won't perform as well as if its separate
>  files. We can put separate files on separate drives. We can begin
>  reloading one table while another is still unloading. The OS will
>  perform readahead for us on single files whereas on one file it will
>  look like random I/O. etc.

Yeah, writing multiple unknown-length streams to a single file in
parallel is going to be all kinds of painful, and this use case seems
to be the biggest complaint against a zip file kind of approach. I
didn't know about the custom file format when I suggested the zip file
one yesterday*, but a zip or equivalent has the major benefit of
allowing the user to do manual inspection / tweaking of the dump
because the file format is one that can be manipulated by standard
tools. And zip wins over tar because it's indexed - if you want to
extract just the schema and hack on it you don't need to touch your
multi-GBs of data.

Perhaps a compromise: we specify a file system layout for table data
files, pre/post scripts and other metadata that we want to be made
available to pg_restore. By default, it gets dumped into a zip file /
whatever, but a user who wants to get parallel unloads can pass a flag
that tells pg_dump to stick it into a directory instead, with exactly
the same file layout. Or how about this: if the filename given to
pg_dump is a directory, spit out files in there, otherwise
create/overwrite a single file.

While it's a bit fiddly, putting data on separate drives would then
involve something like symlinking the tablename inside the dump dir
off to an appropriate mount point, but that's probably not much worse
than running n different pg_dump commands specifying different files.
Heck, if you've got lots of data and want very particular behavior,
you've got to specify it somehow. :)

Cheers

Tom

* The custom file format does not seem well advertised. None of the
examples on the pg_dump page use it, and I've never come across it in
my travels on the vast interwebs. Heck, I've even hacked on pg_dump
and I didn't know about it :). I won't suggest advertising it more
while this discussion is going on though, since it may be obsoleted by
whatever the final outcome is here.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_dump additional options for performance

2008-02-25 Thread Tom Dunstan

On Sun, Feb 24, 2008 at 6:52 PM, Jochem van Dieten <[EMAIL PROTECTED]> wrote:
>  Or we could have a switch that specifies a directory and have pg_dump
>  split the dump not just in pre-schema, data and post-schema, but also
>  split the data in a file for each table. That would greatly facilitate
>  a parallel restore of the data through multiple connections.

I'll admit to thinking something similar while reading this thread,
mostly because having to specify multiple filenames just to do a dump
and then do them all on the way back in seemed horrible. My idea was
to stick the multiple streams into a structured container file rather
than a directory though - a zip file a la JAR/ODF leapt to mind. That
has the nice property of being a single dump file with optional built
in compression that could store all the data as separate streams and
would allow a smart restore program to do as much in parallel as makes
sense. Mucking around with directories or three different filenames or
whatever is a pain. I'll bet most users want to say "pg_dump
--dump-file=foo.zip foo", back up foo.zip as appropriate, and when
restoring saying "pg_restore --dump-file=foo.zip -j 4" or whatever and
having pg_restore do the rest. The other nice thing about using a zip
file as a container is that you can inspect it with standard tools if
you need to.

Another thought is that doing things this way would allow us to add
extra metadata to the dump in later versions without giving the user
yet another command line switch for an extra file. Or even, thinking a
bit more outside the box, allow us to store data in binary format if
that's what the user wants at some point (thinking of the output from
binary io rather than on disk representation, obviously). Exposing all
the internals of this stuff via n command line args is pretty
constraining - it would be nice if pg_dump just produced the most
efficient dump, and if we decide at a later date that that means doing
things a bit differently, then we bump the dump file version and just
do it.

Just a thought...

Cheers

Tom

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] unclear enum error messages

2007-11-28 Thread Tom Dunstan

On Nov 28, 2007 11:01 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> > What is an actual enum type?  And how should a user react if he got this
> > message?  I would ask, "why not?".
>
> Yeah, I would too, but without a concrete example to look at it's hard
> to say if the situation could be improved.

Hmm. I've just tried, unsuccessfully, to trigger that branch both
calling those functions directly from psql and from plpython calling
plpy.prepare() and passing bogus type info in. It's harder than it
looks. I'm sure that there are code paths that will call those
functions without filling out that type info (e.g. from C user defined
function?), but they must surely fall outside of the expected use
cases. Hence the error code is probably correct. Unsure about the
message... we could make it extremely explicit e.g. "you're calling
this without filling in fcinfo" but we'd want to be really sure that
it's definitely only possible to trigger via user error before getting
that detailed. I'm not sure that we're completely confident of that,
so perhaps the vague-ish message is ok.

I wouldn't get too worried, though. These are defensive error messages
that are really just checking for sane input, and they seem difficult
to deliberately trip, let alone accidentally, so stressing about them
is probably unnecessary.

Cheers

Tom

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

On Nov 12, 2007 4:08 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> > What should the driver report then ? I believe the backend code considers 8
> > to be the major version, and 0123 to be the minor versions ?
>
> No, 8.1 is the major version.  In 8.2.5, 8.2 is the major, 5 is the
> minor version.

Which is nice in theory, except that the JDBC API doesn't give us the
option of a non-int major version number. We could fudge it with 80,
81 etc, but that's pretty ugly. You can imagine some database client
out there reporting that you're connected to a postgresql 82.5
database, rather than using the getDatabaseProductVersion() method
which is intended for that sort of thing.

For the most part, getting the combination of the major and minor
numbers as currently implemented should be enough for anything using
the driver, as we normally don't care about the difference between
8.2.1 and 8.2.2 in application code (heaven help mysql :)). It only
came up in this case because the minor number (as reported by the JDBC
driver) wasn't passed through.

Cheers

Tom

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

On Nov 12, 2007 2:13 PM, Andrew Dunstan <[EMAIL PROTECTED]> wrote:
> > Oh, that's nice. Unfortunately, though. it only seems to support major
> > version number differentiation as an int. Apparently the idea that you
> > might have a version number like 8.3 didn't occur to whoever wrote it,
> > although to be fair it looks like the only implementation that
> > actually uses it is Oracle, where that assumption probably holds.
> > Probably wouldn't be that hard to hack to our purposes though...
>
> 800, 801 ...

Nice try :), but as I read the javadoc for DialectFactory it seems to
suggest that hibernate gets the major number from our JDBC driver,
which dutifully reports it as 8. I doubt that we're suggesting hacking
the JDBC driver to lie just to get around this wrinkle when the
obvious solution is to submit a patch to hibernate that makes it pass
both major and minor numbers through, and the Oracle code could
happily ignore the latter.

Cheers

Tom

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

On Nov 12, 2007 1:08 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:

> If we do this, then it looks like we can hack this file also
> http://anonsvn.jboss.org/repos/hibernate/core/trunk/core/src/main/java/org/hibernate/dialect/DialectFactory.java

Oh, that's nice. Unfortunately, though. it only seems to support major
version number differentiation as an int. Apparently the idea that you
might have a version number like 8.3 didn't occur to whoever wrote it,
although to be fair it looks like the only implementation that
actually uses it is Oracle, where that assumption probably holds.
Probably wouldn't be that hard to hack to our purposes though...

Cheers

Tom

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

> All of this should work for functions, but operators are a whole
> different story. I strongly suspect that someone is not going to be
> able to use e.g. @@ in a HQL query. Are there ways to do tsearch type
> queries just using functions and more standard operators?

Of course, if someone's using tsearch then they've already thrown
database agnosticism out the window, so they could always just knock
up a native SQL query directly. But it can get quite fiddly if there
are a lot of fields coming back in the result set - that's why it
would be nice if hibernate could handle these cases itself.

Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

[oops, sent with non-subscribed from: address first time]

On Nov 12, 2007 10:55 AM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> I've posted files to pgsql-patches, as well as to Diego directly.

I dropped them into a Hibernate 3.2.5.ga source tree and ran the
hibernate tests with the 8.3 dialect against pgsql HEAD and got a few
errors. Diego, I assume that the hibernate tests are in a state where
we expect them to all pass? I didn't bother trying the original
dialect that hibernate shipped with, so I'm not sure if it passes or
not. Given that these seem like an improvement, I'll assume not.

> There are 3 files
> PostgreSQL8Dialect.java which implements enough for 8.0 and 8.1
> PostgreSQL82Dialect.java which extends PostgreSQL8Dialect
> PostgreSQL83Dialect.java which extends PostgreSQL8Dialect

Given that our releases are generally a feature superset of previous
ones, should we just make PostgreSQL83Dialect extend
PostgreSQL82Dialect? I note that atm they are identical. Or does that
offend anyone's delicate OO sensibilities?

> We can then push out a new file every release.

Yes, I like the general approach.

> - GUID support is possible, but really opens up the debate about how
> extensibility features should be handled.

Yeah. Should the GUID be mapped to e.g. java.util.UUID? Or just
string? etc. I had some thoughts about enums, but if someone's using
the annotation stuff (either JPA or hibernate specific) then they
already have a mechanism to map between a Java enum and a string, so
the only thing that wouldn't work would be DDL generation, since
hibernate wouldn't understand the necessaary CREATE TYPE commands.

> - For now, I think we should document the procedure for adding a local
> site Dialect which implements additional functions, with GUID as an
> example

Oh, were you just referring to making GUID functions available? Yeah
that shouldn't be too hard, but again I wonder if we should look at an
automatic way to generate those function declarations. Given that the
dialect can't read the database when it's instantiated, perhaps the
way to go would be to ship a resource file containing the expected
functions and have the dialect parse that before calling the
registration functions. There would then be a process that a user
could run against their own database to regenerate that file, and
they'd just need to drop it into their classpath for it to be picked
up.

All of this should work for functions, but operators are a whole
different story. I strongly suspect that someone is not going to be
able to use e.g. @@ in a HQL query. Are there ways to do tsearch type
queries just using functions and more standard operators?

Cheers

Tom

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-12 Thread Tom Dunstan

On Nov 12, 2007 10:55 AM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> I've posted files to pgsql-patches, as well as to Diego directly.

I dropped them into a Hibernate 3.2.5.ga source tree and ran the
hibernate tests with the 8.3 dialect against pgsql HEAD and got a few
errors. Diego, I assume that the hibernate tests are in a state where
we expect them to all pass? I didn't bother trying the original
dialect that hibernate shipped with, so I'm not sure if it passes or
not. Given that these seem like an improvement, I'll assume not.

> There are 3 files
> PostgreSQL8Dialect.java which implements enough for 8.0 and 8.1
> PostgreSQL82Dialect.java which extends PostgreSQL8Dialect
> PostgreSQL83Dialect.java which extends PostgreSQL8Dialect

Given that our releases are generally a feature superset of previous
ones, should we just make PostgreSQL83Dialect extend
PostgreSQL82Dialect? I note that atm they are identical. Or does that
offend anyone's delicate OO sensibilities?

> We can then push out a new file every release.

Yes, I like the general approach.

> - GUID support is possible, but really opens up the debate about how
> extensibility features should be handled.

Yeah. Should the GUID be mapped to e.g. java.util.UUID? Or just
string? etc. I had some thoughts about enums, but if someone's using
the annotation stuff (either JPA or hibernate specific) then they
already have a mechanism to map between a Java enum and a string, so
the only thing that wouldn't work would be DDL generation, since
hibernate wouldn't understand the necessaary CREATE TYPE commands.

> - For now, I think we should document the procedure for adding a local
> site Dialect which implements additional functions, with GUID as an
> example

Oh, were you just referring to making GUID functions available? Yeah
that shouldn't be too hard, but again I wonder if we should look at an
automatic way to generate those function declarations. Given that the
dialect can't read the database when it's instantiated, perhaps the
way to go would be to ship a resource file containing the expected
functions and have the dialect parse that before calling the
registration functions. There would then be a process that a user
could run against their own database to regenerate that file, and
they'd just need to drop it into their classpath for it to be picked
up.

All of this should work for functions, but operators are a whole
different story. I strongly suspect that someone is not going to be
able to use e.g. @@ in a HQL query. Are there ways to do tsearch type
queries just using functions and more standard operators?

Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [hibernate-team] PostgreSQLDialect

2007-11-11 Thread Tom Dunstan

> Hi, I've never used Hibernate but it seems to be that table of
> functions could be generated automatically.

That's the obvious solution. It would be nice if the dialect could
query the database itself to get a list of functions, since there will
be different sets of functions for different server versions, and the
functions can change when people install contrib modules or their own
functions. However, it doesn't look like the constructor for the
dialect gets given a connection or anything, so we'll probably have to
settle for static lists. It wouldn't be very hard to write a little
bit of java to parse pg_proc.h, but you'd want to filter out the types
that hibernate doesn't understand. One problem is that hibernate users
can install their own "types" - so hibernate might understand e.g.
polygons or whatever, but we won't know that at dialect initialization
time.

As someone who has contributed patches to both hibernate and pgsql I'd
be happy to help out on this, whatever the best way forward happens to
be. Top notch postgresql support in hibernate is something that I'd
very much like to see (and that goes for other JPA implementations as
well). I wasn't aware that it was particularly lacking, but clearly if
a function must be registered in the dialect to be usable by HQL,
there are an awful lot of functions that won't be available. I wonder
what happens with custom operators like tsearch provides...

> - You map "text" to CLOB. Not exactly sure what CLOB refers to but text
> column are not generally used for large objects. I mean, you can store
> up to a GB in them, but most such columns are not going to be large.

Actually, it's clob being mapped to text. I don't see a huge problem
with that, really, it'll often be mapped to a String at the java end
anyway. Think about it from the perspective of someone writing a
database agnostic hibernate application - they want a field to store
character data which can potentially be quite big - big enough that
they don't want to set arbitrary limits on it. So text pretty much
fits the bill since toasting was introduced.

It would be nice if we could register string data with no explicit
length as belonging to text as well, but it's not obvious how to do
that. Hmm.

The BLOB mapping is the one that looks wrong to me - surely that
should be bytea as well as varbinary, unless hibernate is explicitly
invoking the large object api. Perhaps it is.
 Although:
public boolean useInputStreamToInsertBlob() {
return false;
}
and in particular:
public boolean supportsExpectedLobUsagePattern() {
// seems to have spotty LOB suppport
return false;
}
I wonder what the fallback lob usage pattern is. Someone with better
knowledge of our jdbc driver might be able to point out whether the
above functions are saying the right things or not.

> - You have supportsRowValueConstructorSyntax commented out. It does, if
> you have a recent enough version, or do you mean something else?

The way to fix both that and the differing available functions would
probably be to have a subclass of the dialect for each server version.
MySQL seems to have about 5 :)
http://www.hibernate.org/hib_docs/v3/api/org/hibernate/dialect/package-summary.html.

As a side note to Diego, I'll say that it's great to see a hibernate
commiter being proactive about improving these things. Getting
attention to a bug or bugfix hasn't always been easy.

Cheers

Tom

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] "anyelement2" pseudotype

2007-02-19 Thread Tom Dunstan


Tom Lane wrote:

I realized that I can probably fix ATAddForeignKeyConstraint to do the
right thing by having it pass the two actual column types to
can_coerce_type, thus allowing check_generic_type_consistency to kick
in and detect the problem.


Yeah, I came to the same conclusion. No amount of refactoring in 
parse_coerce.c is going to get the original concrete types back to 
compare. That should fix the problem with arrays, enums and any 
potential future generic types without mentioning them explicitly in 
there a la the hack there currently, thankfully.


Cheers

Tom


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] "anyelement2" pseudotype

2007-02-16 Thread Tom Dunstan


Tom Lane wrote:

So it seems neither can_coerce_type() nor find_coercion_pathway() are
really particularly well thought out in terms of what they test or don't
test.  I'm not very sure what a good refactoring would look like,
but I am sure that I don't want all their call sites having to
individually account for ANYfoo types.  Any thoughts?


Yeah, I remember thinking at the time that some of it was a bit 
backwards, but it's been almost 6 months since I did the original enum 
patch, so I'll need to refresh my memory. I'll have a look over the 
weekend and see if I can come up with something that'll work for these 
various cases. To begin with I'll need to do a survey of the call sites 
to see what they really need, since perhaps it isn't what the coerce 
functions are currently offering. :) I completely agree that anything 
requiring call sites to understand specifics about ANY* types is a bad 
idea, the most that we would want would be a generic IsGeneric(typoid) 
macro, but it would be nice to hide that inside a coerce function as 
well. We'll see.


Cheers

Tom

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] "anyelement2" pseudotype

2007-02-15 Thread Tom Dunstan


Tom Lane wrote:

Actually ... now that I re-read that remark, I think you may have done
the wrong things with ANYENUM.  I think that ANYENUM may in fact be
closer to ANYARRAY than it is to ANYELEMENT, because ANYELEMENT pretty
nearly means "anything at all" whereas ANYARRAY identifies a subset of
types that share some properties, which is an accurate description of
ANYENUM as well.  In particular, it is sensible to have b-tree index
opclasses that are declared to operate on ANYARRAY.  If you've
got b-tree support for ANYENUM, as I hope you do, then you'll have to
patch that same spot in ri_triggers that now knows about ANYARRAY.

So you might want to take another pass through the code and see if you
shouldn't be modeling ANYENUM more closely on ANYARRAY than ANYELEMENT.


OK, thanks, I'll do that. Of course, they get used together all over the 
place as well, lots of

if(typiod == ANYARRAY || typoid == ANYELEMENT) {
type of stuff in the code.

I do have b-tree (and hash) support for enums, so it sounds like I'll 
have to hit the same spot. I've got what I thought was a reasonably 
comprehensive test of all the enum features which passes make check, so 
if there's a likely failure in that code then I'm missing a test 
somewhere. Did you have a test case for the ri_triggers stuff that you 
did? What's going to fail?


Thanks

Tom



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] "anyelement2" pseudotype

2007-02-15 Thread Tom Dunstan


Tom Dunstan wrote:

Tom Lane wrote:

As for actually adding it, grep for all references to ANYELEMENT and add
code accordingly; shouldn't be that hard.  Note you'd need to add an
anyarray2 at the same time for things to keep working sanely.


The enum patch [1] does exactly this with an ANYENUM pseudo-type. It 
should provide a pretty good overview of what will be required.


Whoops. I just had a look at the mail that Matt referenced at the top of 
this thread. An anyelement2 would require a bit more than what anyenum 
does, as the type-matching code that ensures that all generic args are 
of the same type would have to be changed, unlike anyenum. Hope I didn't 
lead you down the wrong path, Matt. OTOH, following the enum patch 
should land you in roughly the right areas, and you'd still need to add 
ANYELEMENT2 references in all the places that I had to add ANYENUM as well.


Cheers

Tom






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] "anyelement2" pseudotype

2007-02-14 Thread Tom Dunstan


Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

ANYENUM?  What's the use-case for that?


Well ... *somebody* suggested it here ... 
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php


Well, in that usage (ie, for enum I/O functions) it's not actually
necessary that the type system as a whole understand ANYENUM as
"something that any enum type can be cast to", because you're going to
hot-wire the pg_type entries during CREATE ENUM anyway.


Well, it's not just I/O functions in pg_type, it's functions, operators, 
aggregates, index methods etc. There are 34 OIDs used up by the enum 
patch, and most of those catalog entries would have to be duplicated per 
enum type by CREATE TYPE in the absence of ANYENUM; since you'd given 
the hand-wavy suggestion anyway, it seemed better not to spam the catalogs.


Regarding the type system understanding ANYENUM, most of the type system 
treats ANYENUM identically to ANYELEMENT, the only parts that really 
need to understand it are the bits that try to tie down concrete types. 
For those, non-enum types are rejected if the generic type is ANYENUM. 
That's it, basically.



What I'm
wondering is if there's a use-case for it during ordinary user
operations with enums.


Not really. I allowed it to occur in plpgsql, mostly for completeness, 
but I didn't bother for the other P/Ls as there didn't seem to be much 
of a use case.


Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] "anyelement2" pseudotype

2007-02-14 Thread Tom Dunstan


Tom Lane wrote:

As for actually adding it, grep for all references to ANYELEMENT and add
code accordingly; shouldn't be that hard.  Note you'd need to add an
anyarray2 at the same time for things to keep working sanely.


The enum patch [1] does exactly this with an ANYENUM pseudo-type. It 
should provide a pretty good overview of what will be required.


Cheers

Tom

[1] http://archives.postgresql.org/pgsql-patches/2007-02/msg00239.php


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] effective_cache_size vs units

2006-12-20 Thread Tom Dunstan


Tom Lane wrote:


(Hmm, I wonder what Tom Dunstan's enum patch does about case
sensitivity...)


Currently enum labels are case sensitive. I was a bit ambivalent about 
it... case insensitivity can lead to less surprises in some cases, but 
many programming languages that have enums are case sensitive, and so 
this wouldn't be a direct map for them. OTOH, if someone's doing evil 
things like sticking labels that differ only in case into an enum, 
perhaps they *should* be dissuaded. :)


The question is where does it end, though? Should we treat letters with 
accents and umlauts as equivalent as well? Do we remove punctuation 
characters? It gets into a (for me) more murky localization issue, and 
I'm not familiar with the postgresql apis for handling that. Maybe it's 
easy.


Since we basically accept any old thing into an enum label, I think we 
probably shouldn't muck with it. If we want to have some sort of 
normalized version, then we should probably restrict the characters that 
we accept fairly severely.


Also note that enum values are far more likely to be set by application 
code than by a human typing the value in directly, so in that sense the 
need for case insensitivity seems somewhat diminished.


I suppose we should think about mysql refugees at some point, though. I 
wonder what they do. The documentation is silent on the matter (and all 
their examples are in lower case). Mysql is generally case insensitive, 
right?


Cheers

Tom

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Tom Dunstan


Peter Eisentraut wrote:
An objection to enums on the ground that foreign keys can accomplish the 
same thing could be extended to object to any data type with a finite 
domain.


Exactly. The extreme case is the boolean type, which could easily be 
represented by a two-value enum. Or, if you were feeling masochistic, a 
FK to a separate table. Which is easier?


People regularly do stuff like having domains over finite text values, 
or having a FK to a separate (static) table, or using some sort of EAV. 
Enums are type-safe, easily ordered, relatively efficient and don't 
leave zillions of little static tables all over the place, a combination 
of attributes that none of the alternative solutions in this space present.


Cheers

Tom


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Tom Dunstan


Alvaro Herrera wrote:

I don't, because there are always those that are knowledgeable enough to
know how to reduce space lost to padding.  So it would be nice to have
2-byte enums on-disk, and resolve them based on the column's typid.  But
then, I'm not familiar with the patch at all so I'm not sure if it's
possible.


Not with this patch, and AFAIK not possible generally, without writing 
separate I/O functions for each type. I'd love to be able to do that, 
but I don't think it's possible currently. The main stumbling block is 
the output function (and cast-to-text function), because output 
functions do not get provided the oid of the type that they're dealing 
with, for security reasons IIRC. It was never clear to me why I/O 
functions should ever be directly callable by a user (and hence open to 
security issues), but apparently it was enough to purge any that were 
designed like that from the system, so I wasn't going to go down that 
road with the patch.


Cheers

Tom



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Custom Data Type Question

2006-11-15 Thread Tom Dunstan


Hi Greg

Greg Mitchell wrote:
I'm trying to create a custom data type similar to an enumeration type. 
However, I'd like the mapping of the int<->string to be dynamic instead 
of hard coded. I'd like to have a table that contains this mapping that 
can be appended to.  Creating this type is not very difficult. However,
for performance reasons, I'd like to cache the mapping so that the table 
is only queried once every connection unless it changes.


A simpler way to do this might be to only cache the list per query 
context. In your IO functions, you could whack a pointer to your cache 
onto fcinfo->flinfo->fn_extra, and the same flinfo gets passed in for 
e.g. all output function calls for that column for that query, IIRC. 
This was what I had in mind originally when I did the enum patch, but I 
ended up just using syscaches, which I think would be unavailable to you 
writing a UDT.


The upside of the above is that for a given query, the contents of your 
 table shouldn't change, so there's no mucking about with trying to 
keep things in other backends up to date. The downside is that you have 
to do the lookup per query, but if you're dealing with lots of data then 
it'll get dwarfed by the actual query, and if not, who cares?


The other question that leaps to mind is whether you want to have more 
than one of these types. If you do, you may have to have multiple 
versions of the IO functions, otherwise e.g. your output function might 
be passed the value 0, but was that the 0 representing the 'red' string 
from the rgb enum, or the 'northern' string from the hemisphere enum? 
You don't know, and postgresql won't tell you directly.


There are a few ways around this. In your case, it might be ok to 
compile different versions of the IO functions for each enum which point 
to different tables, or the same table with a discriminator. Or you 
could see the various different proposals when my patch was first 
discussed. See the thread starting at 
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00979.php or if 
you want a peek at the patch, see 
http://archives.postgresql.org/pgsql-patches/2006-09/msg0.php. A 
rather simpler starting point might be Andrew's enumkit 
http://www.oreillynet.com/pub/a/databases/2006/01/06/enumerated-fields-in-postgresql.html?page=last&x-showcontent=text, 
or possibly Martijn's tagged types at 
http://svana.org/kleptog/pgsql/taggedtypes.html.


Cheers

Tom


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Backup and restore through JDBC

2006-09-29 Thread Tom Dunstan


Markus Schaber wrote:

Marlon Petry wrote:

But I would need to have installed pg_dump and pg_restore in machine
client?
Without having installed pg_dump and pg_restore,how I could make


pg_dump and pg_restore should be runnable (possible with a small shell /
bash wrapper script) without any "installation", simply having them and
all neded libs lying in the current directory.


There's probably a case for having static builds of pg_dump and 
pg_restore around for various architectures, if only to help people out 
when they don't have access to a build environment etc. Either a set of 
static binaries on the website, or an easy way to build them from the 
source tree (they could then be copied to the target system).


It strikes me that Marlon hasn't really explained why he wants to use 
JDBC. I assume that your application is Java based, but trust me, 
invoking pg_dump through Runtime.exec() or whatever is going to be much, 
much easier than any of the other things you've suggested, such as 
making a pg_dump API and using JNI to call it. That's just pain city, in 
a bunch of ways.


Do you need to process the dump inside your program in some way? Or do 
you just need to store a dump and restore it later? Why the fascination 
with using an API?



On a unix box, when you're really crazy, and want to ignore all security
restrictions, you could even install pg_dump via inetd, and then
everyone connecting via TCP on the appropriate port gets a dump of the
database. :-)


Oh, man, my head just exploded reading that. That's taking evil and 
being *creative* with it. :)


Cheers

Tom


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] 8.3 Development Cycle

2006-09-22 Thread Tom Dunstan


Tom Lane wrote:

Tom Dunstan <[EMAIL PROTECTED]> writes:

Joshua's original mail suggested that only certain features would go in.
Is that still on the cards, or will other features be considered if
they're ready?


You'll note that Dave's mail said no such thing.


No, but it did explicitly mention features that just missed 8.2, so I 
just wanted some clarification, which you and Dave have now provided. 
Thanks.


Tom

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 8.3 Development Cycle

2006-09-22 Thread Tom Dunstan


Josh Berkus wrote:

I'm obviously thinking of enums which was ready (for review at least) a
few weeks ago, but has probably bitrotted slightly since then given the
number of patches that have landed in the tree. I intended to brush it
up as soon as the 8.3 tree was open and resubmit it. Will that be a
waste of time?


Ooops.   Are you sure these weren't committed?


Pretty sure. :) Why the oops? They haven't been mentioned in some PR 
material or something have they?


Cheers

Tom


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] 8.3 Development Cycle

2006-09-22 Thread Tom Dunstan


Dave Page wrote:

This will obviously be a short development cycle which will allow us to
get some of the features that just missed 8.2 out of the door, as well
as giving us the opportunity to try releasing before the summer (for
those in the northern hemisphere) rather than after.


Joshua's original mail suggested that only certain features would go in.
Is that still on the cards, or will other features be considered if
they're ready?

I'm obviously thinking of enums which was ready (for review at least) a
few weeks ago, but has probably bitrotted slightly since then given the
number of patches that have landed in the tree. I intended to brush it
up as soon as the 8.3 tree was open and resubmit it. Will that be a
waste of time?

Thanks

Tom




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)

2006-09-20 Thread Tom Dunstan


[EMAIL PROTECTED] wrote:

Really this whole debate only reinforces the point that there isn't
a single way of doing UUID generation. There are multiple libraries
out there each with pros and cons. It makes more sense to have
multiple pgfoundry UUID generating modules.


Exactly. If I lead you to the impression that I want UUIDv1 in core, this
was not the intent. What I intend to say is that different people want
different implementations, and one of the most useful versions, in my
opinion, is difficult to implement portably.


Actually, you could do it very portably, at the cost of a minute or so's 
worth of configuration. Simply have a GUC variable called, say, 
uuid_mac_address. Then the person who gets a box of dud NICs or who, 
like me, has a virtual server somewhere without a true ethernet port 
visible to the operating system, can easily set it. No cross-platform 
code, no requirement to build a third party module in contrib (at least 
not for v1 uuids).


I actually DO think that we should have at least one default generation 
routine in core, even if the above idea doesn't float and it's just v4 
random numbers. If we advertise that we have uuids, people will not 
expect to have to install a contrib module just to get some values 
generated. The SQL server function newsequentialid() which gives v1 
uuids, sort of, is ONLY available as a default value for a column, you 
can't use it in normal expressions (figure that out). So people clearly 
will expect to be able to generate these at the database level.


Using either v1s as configured above or v4s, there's no portability 
issue. Indeed MS SQL Server has a both available (newsequentialid() and 
newid()). And sufficient documentation should allow people to make their 
minds up regarding what their needs are. If they really want funky v3 
namespace ones then they can install a contrib, no problem with that.


Cheers

Tom

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Opinion wanted on UUID/GUID datatype output formats.

2006-09-20 Thread Tom Dunstan


devdb=# select * from tbluuid;
pk|
--+
 6b13c5a1afb4dcf5ce8f8b4656b6c93c |
 01e40a79b55b6e226bffb577e960453d |
(2 rows)
The UUID standards define a single perfectly clear format, and the one 
you show is not it.



I was wondering if we want to have a formatting function to be able
to provide other common formats of the uuid/guid?
If you stick to the standard format, I don't think that will be 
necessary.


+1. For people that care about the non-standard MSSQL format, they can
easily create their own function that will wrap it in {}.


Having been reading through this thread, I was about to make the above 
points, but was glad to see that I was beaten to it.


The dashless format is neither standards compliant nor compatible with 
other databases that have uuid functions (notably MS SQL Server and 
MySQL), nor with microsoft tools where they're used frequently. 
(ignoring the {} wrapping stuff which is trivial).


If we add a UUID type to core, I think that a vast majority of the 
people who are going to want to use it out there will be expecting the 
standard format with dashes. And asking them to put a formatting 
function into every query is beyond horrific.


If we want a general raw hex type then let's call it something else, 
because calling it UUID will just confuse people. Everyone else follows 
the standard on this; we should too.


Tom

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] An Idea for OID conflicts

2006-09-19 Thread Tom Dunstan

Whoops, I hadn't read this far when I sent the last message on the other 
thread.



Gevik Babakhani wrote:

3. Make a small utility that goes through a patch, finds the new OIDs
and changes them back to a value specified by the committer(s).


This is effectively what I ended up suggesting in the aforementioned 
post. There's no reason that the OIDs would have to start at 1, 
though, and that would probably necessitate a change in what I 
understand is the policy of starting a db cluster at OID 1. OTOH, 
while waiting for patch acceptance, there's probably benefit in using 
OIDs well above the current first free OID: you don't have to repeatedly 
run the script. If you started at 9000, say, you'd only have to run the 
update script when you were about to submit the patch. If you're right 
on the line, you'd have to run it every week or whatever.



Would this be workable?


Well, *I* think so, with the suggestions made above.

Andrew Dunstan wrote:
My idea was to have a file called reserved_oids.h which would contain 
lines like:


#error "do not include this file anywhere"
CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */

and which would be examined by the unused_oids script.

To get oids reserved, a committer would add lines to that file for you. 
When your patch was done, it would include stuff to remove those lines 
from the file.


That way you will be working with the oids your patch will eventually 
use - no later fixup necessary.


This sounds like a recipe for gaps in the OID list to me. If the patch 
gets rejected / dropped, you've got a gap. If the user doesn't use all 
the allocated OIDs, you've got a gap. What happens if the patch author 
decides that they need more OIDs? More time from a committer to reserve 
some. Plus it has the downside that the author has to ask for some to be 
reserved up front, they need to have a very good idea of how many 
they'll need, and a committer has to agree with them. If you'd asked me 
how many I thought I'd need when I started the enum patch, I would have 
told you a number much smaller than the number that I ended up using. :)


Of course, if you don't care about gaps, then half of the objections 
above go away, and the other half can be solved by reserving more than 
you'd need. Given the fairly contiguous nature of OIDs in the catalogs 
currently, it would appear that we do care about gaps, though.


I like the script idea much better. It wouldn't be bad to even allow 
patches to be submitted with OIDs in the high 9000 or whatever range. 
The committer responsible for committing the patch could just run the 
update script before comitting the code.


Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] OID conflicts

2006-09-19 Thread Tom Dunstan


Andrew Dunstan wrote:
You misunderstood me. I meant he wanted to make use of such a facility, 
not that he wanted to build it.


Yeah, I had enough things on my plate just getting enums to work :)

By all means float ideas on this - the problem seems to me to be 
ensuring that reservations are time limited, but no doubt Tom Lane and 
Bruce will have opinions ...


OK, one way for such a scheme to work (ie so that patch authors wouldn't 
have to worry about OIDs disappearing out from under them) would be to 
be able to specify OIDs in a slightly more symbolic form. All the OIDs 
allocated for a given patch are likely to be in a block, so you could do 
something like (FEATURE_NAME+0), (FEATURENAME+1) or something like that, 
 rather than hardcoding 2989 or whatever, and hope that the precompiler 
could sort it all out. Then you could just have some featureoid.h file 
which would define the starting points, and you could just update that 
file when someone allocates some oids to point to the latest free one.


I'm don't know whether there are enough patches which allocate OIDs for 
it to be worth building such infrastructure, though. It wouldn't just be 
the first line in catalog entries, it would have to be all over the 
place, e.g. pg_proc has a column whose value in the catalog file is a 
string with the oid numbers of the parameters that the function takes. I 
suspect, reluctantly, that the pain of patch authors maintaining these 
is likely to be less than the pain of hacking the catalog generation 
stuff to support something smarter. I'd love to be proven wrong, though.


Hmm. A simpler way lessen the pain might be to have a script which could 
update OIDs in your catalog files. You'd have to run it BEFORE doing a 
merge. Ie, suppose I've allocated 2978-2991 in my patch, but since then 
OIDs have been allocated up to and including 2980. The way to tell that 
this is the case would be to do a clean checkout and run the unused_oids 
script. Then you could feed the numbers  2978, 2991 and 2980 into the 
magical update_oids script which could do a fairly simple sed type of 
job on the catalog files. Then you could CVS update and look on in 
happiness when duplicate_oids returns nothing and unused_oids show no 
gaps. Obviously, being the good patch author that you are, you would 
then run your extensive regression tests to make sure that everything is 
sweet. :)


If there's interest in such a thing, I might have a go at it. If it 
works as described above, would it have a fighting chance of inclusion? 
I'd probably use perl rather than pure bourne shell / sed.


Cheers

Tom



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] An Idea for OID conflicts

2006-09-18 Thread Tom Dunstan


Tom Lane wrote:

The scary thing about a script is the assumption that it will make all
and only the changes needed.  Four-digit magic numbers are not that
uncommon in C code.  I think it might be safer if we made the arbitrary
OID range for an uncommitted patch be large, say eight digits (maybe
"12345xxx").  The script would knock these down to 4-digit numbers,
ie removing one tab stop, so it wouldn't be too hard on your formatting.
Given the current OID generation mechanism, the presence of large OIDs
in the initial catalogs shouldn't be a problem for testing patches,
even assuming that the OID counter gets that high in your test database.


Well, the only files that a script would touch would be in 
src/include/catalog, since any other reference to them should be through 
a #define anyway IMO. And I figured that the 9000 range was as good a 
magic number as any, at least in that directory. The nice thing about 
9000 numbers is that they're still under the 1 magic starting point 
for initdb, so you're guaranteed not to run into that range when 
inserting data while testing your patch. I'm a little shady on how OID 
uniqueness works though, so maybe it's not a problem. Are OIDs 
guaranteed to be unique across a whole cluster, or just in a given 
relation (including wraparound scenarios)?


The other point about the script scariness is that obviously you'd have 
to a) pass make check including whatever tests test out your patch (and 
there should be some if you've added a new proc or type or whatever), 
and b) the patch would have to survive review, where OID weirdness 
should leap out when the patch is viewed stripped of all the surrounding 
catalog noise. Maybe. :)


Anyway if having OIDs up above 1 isn't a problem, then it doesn't 
really matter either way, so having them stand out by being longer seems 
just as good to me as my suggestion.


Thanks

Tom



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] An Idea for OID conflicts

2006-09-18 Thread Tom Dunstan


Gregory Stark wrote:

Those types, functions and operators that aren't used by system tables could
be created by a simple SQL script instead. It's a hell of a lot easier to
write a CREATE OPERATOR CLASS call than to get all the OIDs in in four
different include files to line up properly.


No kidding.

Just FYI that wouldn't have worked for the enums patch, though, because 
of the pseudo anyenum type. That stuff really did need to be in the 
backend. For more common user defined types like uuid that are being 
discussed, it might work well. Heck, a bunch of the existing casts etc 
could probably be changed to SQL, and would become a great deal more 
readable in the process. Not that I'm advocating fixing a non-broken 
thing... :)


Cheers

Tom

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] OID conflicts

2006-09-18 Thread Tom Dunstan


Martijn van Oosterhout wrote:


Seems awfully complicated. The numbers don't mean anything, they don't
have to be contiguous. If you want to reduce the chance of conflict,
find a nice big block in unused_oids, add a random number between 0 and
100 and use that. Or squeeze yourself into a block that exactly fits
what you need. There's simply not that many patches that need numbers
to worry about anything complicated.

When I needed a few dozen OIDs for an (unapplied) patch I simply picked
2900 and went up from there. That was a while ago and there's no
conflict yet.


Well, since there were no gaps for a significant way up to the last used 
OID (I forget where the last gap was), I figured that there was a no-gap 
policy. If that's not the case, then indeed, writing a script might be 
overkill.


OTOH, I would think that the last allocated OID will continue to rise, 
and eventually your patch will have a conflict. Or alternately it'll get 
applied before then, and someone else will have to worry about whether 
the gap between the end of the contiguous OIDs and where your patch 
starts allocating them is enough for their needs, or whether they should 
stake out some other part of the reserved OID space for themselves. And 
of course, someone else might submit a patch in the meantime that uses 
OIDs in the gap or stakes out the same space. Any of the above cases, 
having a tool to remap the OIDs might be useful.


I suppose the main point is that given the hardcoded nature of OIDs used 
in the catalogs, allocating them is not a process that lends itself to 
parallel development. An easy way to remap OIDs in a patch sidesteps a 
bunch of issues.


Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] An Idea for OID conflicts

2006-09-18 Thread Tom Dunstan


Whoops, I hadn't read this far when I sent the last message on the other
thread.


Gevik Babakhani wrote:

3. Make a small utility that goes through a patch, finds the new OIDs
and changes them back to a value specified by the committer(s).


This is effectively what I ended up suggesting in the aforementioned
post. There's no reason that the OIDs would have to start at 1,
though, and that would probably necessitate a change in what I
understand is the policy of starting a db cluster at OID 1. OTOH,
while waiting for patch acceptance, there's probably benefit in using
OIDs well above the current first free OID: you don't have to repeatedly
run the script. If you started at 9000, say, you'd only have to run the
update script when you were about to submit the patch. If you're right
on the line, you'd have to run it every week or whatever.


Would this be workable?


Well, *I* think so, with the suggestions made above.

Andrew Dunstan wrote:
My idea was to have a file called reserved_oids.h which would contain 
lines like:


#error "do not include this file anywhere"
CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */

and which would be examined by the unused_oids script.

To get oids reserved, a committer would add lines to that file for you. 
When your patch was done, it would include stuff to remove those lines 
from the file.


That way you will be working with the oids your patch will eventually 
use - no later fixup necessary.


This sounds like a recipe for gaps in the OID list to me. If the patch
gets rejected / dropped, you've got a gap. If the user doesn't use all
the allocated OIDs, you've got a gap. What happens if the patch author
decides that they need more OIDs? More time from a committer to reserve
some. Plus it has the downside that the author has to ask for some to be
reserved up front, they need to have a very good idea of how many
they'll need, and a committer has to agree with them. If you'd asked me
how many I thought I'd need when I started the enum patch, I would have
told you a number much smaller than the number that I ended up using. :)

Of course, if you don't care about gaps, then half of the objections
above go away, and the other half can be solved by reserving more than
you'd need. Given the fairly contiguous nature of OIDs in the catalogs
currently, it would appear that we do care about gaps, though.

I like the script idea much better. It wouldn't be bad to even allow
patches to be submitted with OIDs in the high 9000 or whatever range.
The committer responsible for committing the patch could just run the
update script before comitting the code.

Cheers

Tom


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Mid cycle release?

2006-09-16 Thread Tom Dunstan


Joshua D. Drake wrote:
O.k. that was negative, sorry. Frankly I think that turning autovacuum 
on by default pretty much equates to, "I am lazy, and I don't want to 
actually evaluate my needs. Lets just go with MS Access"


Please ignore my negativity today. I apologize. I do not want autovacuum 
turned on by default but it isn't that big of a deal.


Dammit, I was halfway through a brilliant rebuttal! I'm going to post it 
anyway, since I think it's important to discuss the issues if we're 
going to make the right call. Your repented negativity is noted, though. :)


I can definitely see where you're coming from, it's a sort of tough-love 
scenario. There are legitimate counter arguments, though. The most 
obvious is that anyone who *does* evaluate their needs properly 
shouldn't have too much trouble turning it off, whereas there are lots 
of small database users out there who find having to set up a vacuum 
cron a pain. Example: I'm in the process of setting up a typo blog, 
using postgresql of course, but the database setup was secondary to the 
main thing that I was doing, and I'd completely forgotten about setting 
up a cron. Now I'm unlikely to produce blog posts at a rate that will 
cause the database to grow out of the "minuscule" range, but it should 
still be done, right?


I have to ask, what's wrong with lazy users? Software which allows you 
to be lazy gives you a warm tingly feeling, and you install it on your 
intranet server when no-one's looking. We want people to think of 
postgresql that way.


There are lots of MySQL specific pieces of software out there that 
started out as some guy/girl with a PHP and MySQL type of book. We can't 
turn that clock back, but making postgresql easier for the masses has to 
be a good thing for its adoption. The native win32 port is the poster 
child for this. It was a big PR win, no?


I would argue that leaving autovacuum off is only justifiable if we feel 
that it's going to be a bad choice for the majority of users. Many of 
the users who frequent postgresql lists understand the trade-off, but 
the ones that we're trying to attract don't. Is it better for them to 
discover manual vacuums when they're trying to incrementally improve 
performance (with the risk that they never discover them at all), or 
when their database is running like a dog because they've never vacuumed 
it at all?


One solution might be to turn it on in turn-key solutions: linux distro 
RPMs, Win32 installer (is it on there already?) etc, but leave it turned 
off in the source release. Would that help you, or are your clients 
using RPMs or whatever?


Cheers

Tom


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Mid cycle release?

2006-09-14 Thread Tom Dunstan


Joshua D. Drake wrote:


No one would expect Oracle to install Oracle and walk away. We are not 
MySQL, nor MS Access.


And thank goodness for that!

Tom


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jeremy Drake wrote:

On Wed, 13 Sep 2006, Tom Dunstan wrote:


I was under the impression that most VM products are x86 centric, which
wouldn't lead to huge amounts of diversity in the buildfarm results. At least,
not as far as architecture goes.


I have played with QEmu (www.qemu.org) which is open source and supports
multiple target architectures.  I'm not sure how stable all of the
different targets are, I know that sparc64 is not quite done yet.


Oh, I didn't realize Qemu did non-x86 architectures. Is it considered 
good enough at emulating e.g. a sparc for it to be useful to us? PearPC 
was a PowerPC emulator that got some press a while ago, although it 
appears that the project has stagnated a bit (probably because people 
who wanted to run OSX on intel hardware have a legit way to do it now :) )


The problem with these things is if something goes wrong, was it the 
patch that failed or the not-quite-perfect VM product? To cut down on 
those sorts of problems, I suppose we could have it do a clean, 
non-patched run first, and then only do the patched version if the clean 
version passed. We'd have to be reasonably unlucky to have a patch 
trigger a VM bug under those circumstance, I would think.


Cheers

Tom

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jim Nasby wrote:
That's something I'd be willing to do. And for many people that aren't 
committers but are still trusted in the community, we could probably 
bypass the checking.


That's a worthwhile point. How many patches come from the general 
community vs out of the blue? Patches from regulars could probably get a 
free pass, which might cut down the review burden substantially.


Another possibility would be to test these patches in some kind of 
virtual machine that gets blown away every X days, so that even if 
someone did get something malicious in there it wouldn't last long.


Yeah, nasties could be roughly separated into two categories: stuff 
which affects your box, or stuff which uses your box to affect someone 
else. A VM fixes the first, and a firewall blocking outgoing connections 
(with exceptions for the CVS server and patch buildfarm or whatever it 
is server) largely fixes the second.


I was under the impression that most VM products are x86 centric, which 
wouldn't lead to huge amounts of diversity in the buildfarm results. At 
least, not as far as architecture goes.


Cheers

Tom


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jim C. Nasby wrote:

There's been talk in the past of having some kind of system that
automatically attempts to build things that are in the patch queue, both
as an initial sanity-check and as a means to detect when something
bit-rots... perhaps it's becoming worthwhile to set that up.


After writing the enum patch, I hacked the buildfarm client code to 
apply a patch to the checked out code before building. You could then 
run it thusly:


./run_build.pl --nosend --nostatus --verbose \
  --patch=/home/tom/src/enums-v1.patch --patch-level=1

The idea was that patch authors could either run it manually or stick it 
in a cron so they could get emailed when the patch no longer cleanly 
applied, or when the patched source failed in make, make check etc. 
Obviously my motivation was to keep the enum patch up to date until we 
hit 8.3 and someone looks at it. To that end it might also be useful for 
it to die if duplicate_oids finds anything.


I submitted a patch to Andrew, but it needed a couple of tweaks 
(disabling patching on vpath builds, for example) and I don't think I 
ever got around to resubmitting it, but if there's more general interest 
I'll do so.


Note that it was intended for patch authors to run themselves rather 
than any kind of central mechanism to test the patch queue. While it 
would obviously be nice to know what the current status of any given 
patch in the queue is, the thing about the patch queue is that it 
contains patches that we haven't had time to review yet. It'll only take 
one patch to get into the queue containing a security vulnerability, or 
worse, a trojan, for it to seem unfortunate.


I had thoughts of hacking the buildfarm server to allow the posting of a 
patch along with results, so that authors could report results for their 
own patches, but ran out of time. Is there interest in doing that? 
Obviously it'd be a different server to the existing buildfarm.


Cheers

Tom

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

1 2 >

1 - 100 of 109 matches

Mail list logo