Re: [GENERAL] Large PostgreSQL servers

2012-03-22 Thread Merlin Moncure
On Wed, Mar 21, 2012 at 2:31 PM, Kjetil Nygård  wrote:
> Hi,
>
> We are considering to migrate some of our databases to PostgreSQL.
>
> We wonder if someone could give some hardware / configuration specs for
> large PostgreSQL installations.
> We're interested in:
>        - Number of CPUs
>        - Memory on the server
>        - shared_buffers
>        - Size of the database on disk
>
>
>
> PS: I have read in "PosgreSQL 9.0 High Performance" that one should not
> use more than 8GB for shared_buffers. But Robert Haas and comments say
> that one can use a lot more.
> http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html

If your database (or at least, the portion of it that sees regular
activity) fits completely in shared_buffers, it's a win because they
are faster than the o/s filesystem cache and they don't have to get
paged in and out.  OTOH, if your database does not fit, you can get
performance issues relating to them getting pushed in and out.
Another disadvantage of large shared buffers settings is it reduces
the amount of memory for other things, like temporary demands (sorts,
large result sets) or cached structures like plpgsql plans.  Once you
go over 50% memory into shared, it's pretty easy to overcommit your
server and burn yourself.  Of course, 50% of 256GB server is a very
different animal than 50% of a 4GB server.

Here's the takeaway for shared_buffers.
*) it's a nuanced setting.  for single user workloads its affects are
usually undetectable
*) it's more important for high write activity workloads.  for low
user high read olap type workloads, I usually set it lower, perhaps
even to 256mb -- it doesn't  help all that much and i'd rather have
that memory be on demand for the o/s
*) don't be afraid to buck the conventional wisdom if you're not
seeing the performance you think you should be getting (especially on
writes).  higher or lower shared_buffers can work
*) lots of other variables are at play -- o/s page flush policy for example.
*) it's unclear right now what the upcoming revolution in faster
storage means for database configuration and tuning.  my gut feeling
is that it's going to be generally less important as databases become
primarily cpu,lock, and algorithm (query plan) bound.
*) beware memory over commit.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Altering column type from text to bytea

2012-03-21 Thread Merlin Moncure
On Wed, Mar 21, 2012 at 2:56 PM, Alexander Reichstadt  wrote:

> Hi,
>
> when trying to change a text column to bytea I am getting the following
> error:
>
> *SQL error:*
>
> ERROR:  column "comment" cannot be cast to type bytea
>
>
> *In statement:*
> ALTER TABLE "public"."persons" ALTER COLUMN "comment" TYPE bytea
>

TRY ALTER TYPE ...USING using either direct cast or encode() function.

Please don't post emails in HTML format, thanks!

merlin


Re: [GENERAL] POSTGRESQL Newbie

2012-03-21 Thread Merlin Moncure
On Wed, Mar 21, 2012 at 2:04 AM, Geek Matter  wrote:
> scott,
>
> thanks for quick response you mean all  the dowmain .info and .org domains
> are using postgresql?
> my background from SQL server which has powerful graphical tools for data
> modeling, replication, and etc.
> how about postgresql? does it has free graphical tools for modeling,
> replication ?

The biggest difference you'll find between postgres and a product like
sql server is that postgres is less tool driven (although there are
many great tools) but provides a richer programming environment.  If
database development is your objective as opposed to high level design
and administration, you'll find postgres to be the best product in the
market.  The price is right and the community support is fantastic.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [BUGS] [GENERAL] Altering a table with a rowtype column

2012-03-20 Thread Merlin Moncure
On Tue, Mar 20, 2012 at 12:16 PM, Robert Haas  wrote:
> I think Tom's correct about what the right behavior would be if
> composite types supported defaults, but they don't, never have, and
> maybe never will.  I had a previous argument about this with Tom, and
> lost, though I am not sure that anyone other than Tom thinks that the
> current behavior is for the best.  But see commits
> a06e41deebdf74b8b5109329dc75b2e9d9057962 and
> a40b1e0bf32b1da46c1baa9bc7da87f207cd37d8.

I'll go further than that -- given the current infrastructure I'd say
that composite type defaults are not very well defined or useful
besides not being implemented. The way things work now:

create type foo as(a int, b int);
create table bar(f foo default row(1,2));

works perfectly ok.  how would you proxy the default from one of those
two columns?  does it make sense to do so?  defaults are applied to
table columns, not to types (you could argue that domains violate that
rule but IMO it's not the same thing).

type constraints are another matter.  this would be useful and
valuable but may end up being impossible to add for a lot of reasons
such as backwards compatibility and dealing with the standard's lack
(implemented nowhere in postgres except for the very special case of
IS NULL) of distinguishing between the type itself being null and it's
fields being null (making type constraints smack into plpgsql variable
declarations).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more "softly"

2012-03-19 Thread Merlin Moncure
On Mon, Mar 19, 2012 at 3:51 PM, Guillaume Lelarge
 wrote:
> On Sun, 2012-03-18 at 21:06 -0700, Aleksey Tsalolikhin wrote:
>> Hi.  When pg_dump runs, our application becomes inoperative (too
>> slow).  I was going to ask if nice'ing the postgres backend process
>> that handles the COPY would help but I just realized probably the
>> pg_dump takes out locks when it runs and nice'ing it would just make
>> it run longer...
>>
>> However the man page says "pg_dump does not block other users
>> accessing the database  (readers  or writers)."  But if we run a
>> pg_dump, the phone starts ringing, users are complaining that the web
>> app is not working.
>>
>> Would appreciate some pointer to help me reconcile these two
>> apparently contradictory facts.
>>
>
> Depends on what your app is doing. It doesn't block any usual use of the
> database: DML are all accepted. But you cannot drop a table that pg_dump
> must save, you cannot change its definition. So there are some DDL
> commands you cannot use during a dump.
>
> Other than this, your users shouldn't be blocked. If it happens again,
> you should look at pg_locks and pg_stat_activity to understand what's
> going on.

This.  pg_dump essentially runs a read only database wide transaction
that touches all objects.   This will allow all update, select,
delete, etc, but will block a say, CREATE INDEX or ALTER TABLE.  Let's
just say maybe some other transaction is firing off one of these guys
and is in turn already touched a customer record... bam.  Classic
priority inversion -- slowing down pg_dump will only make the problem
worse.

So the very first thing to determine is if you are seeing generalized
load issues (we don't have the info for that) or locking issues
because the solution will be entirely different depending on what
you're seeing.  Most likely case is you're having iowait issues and
moving the dump to another machine will fix the problem.   Next most
likely case is cpu (courtesy zlib) -- also easily fixed.  But you have
to know before changing things.  So:

What's general load (from top) during pg_dump "bad" times?
pg_locks, etc as Guillaume asked
iowait

etc.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-14 Thread Merlin Moncure
2012/3/14 François Beausoleil :
> Le mardi 13 mars 2012 à 11:15, Merlin Moncure a écrit :
>
>> 2012/3/13 François Beausoleil > (mailto:franc...@teksol.info)>:
>> >
>> > I'll go with the COPY, since I can live with the batched requirements just 
>> > fine.
>>
>> 30-40 'in transaction' i/o bound inserts is so slow as to not really
>> be believable unless each record is around 1 megabyte because being in
>> transaction removes storage latency from the equation. Even on a
>> crappy VM. As a point of comparison my sata workstation drive can do
>> in the 10s of thousands. How many records are you inserting per
>> transaction?
>>
>
>
> I took the time to gather statistics about the database server: 
> https://gist.github.com/07bbf8a5b05b1c37a7f2
>
> The files are a series of roughly 30 second samples, while the system is 
> under production usage. When I quoted 30-40 transactions per second, I was 
> actually referring to the number of messages processed from my message queue. 
> Going by the PostgreSQL numbers, xact_commit tells me I manage 288 commits 
> per second. It's much better than I anticipated.
>
> Anyways, if anybody has comments on how I could increase throughput, I'd 
> appreciate. My message queues are almost always backed up by 1M messages, and 
> it's at least partially related to PostgreSQL: if the DB can write faster, I 
> can manage my backlog better.
>
> I'm still planning on going with batch processing, but I need to do something 
> ASAP to give me just a bit more throughput.

well your iowait numbers are through the roof which makes things
pretty simple from a diagnosis point of view: your storage is
overloaded.  the only remedies are to try and make your queries more
efficient so that you are doing less writing, better use of
transactions, etc.  but looking at the log it appears the low hanging
fruit is already grabbed (synchronous_commit=off, etc).  so you have
to choose from a list of not very pleasant options:

*) fsync=off
*) tune the application
*) bring more/faster storage online.  a single ssd would probably make
your problem disappear.  in the vm world, hopefully you can at least
bring another volume online and move your wal to that.
*) HARDWARE.

In the entirety of my career, I have never found anything more
perplexing than the general reluctance to upgrade hardware to solve
hardware related performance bottlenecks.   Virtualization is great
technology but is nowhere near good enough in my experience to handle
high transaction rate database severs.  A 5000$ server will solve your
issue, and you'll spend that in two days scratching your head trying
to figure out the issue (irritating your customers all the while).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-13 Thread Merlin Moncure
2012/3/13 François Beausoleil :
>
>
> Le mardi 13 mars 2012 à 10:48, Merlin Moncure a écrit :
>
>> 2012/3/12 François Beausoleil > (mailto:franc...@teksol.info)>:
>> > Currently, I can sustain 30-40 writes per second on a Rackspace VPS. I 
>> > know it's not the ideal solution, but that's what I'm working with. 
>> > Following vmstat, the server is spending 30 to 40% of it's time in iowait. 
>> > I don't have measurements as to what files are touched, and I'd welcome 
>> > suggestions to measure the time PostgreSQL actually spends writing indices 
>> > vs data.
>> > >
>> >
>> >
>>
>>
>> you're almost certainly blocking on fsync. A real quick'n'dirty way
>> to confirm this (although it wont be as fast as COPY) would be to wrap
>> your inserts in a transaction. VMs tend to have really horrible
>> storage latency which can hurt postgres performance. Another option
>> would be to relax your commit policy (for example by flipping
>> synchronous_commit) if that fits within your safety requirements.
>>
>
>
> I already applied the tricks you have here: I have a transaction, and 
> synchronous_commit is off. I also have checkpoint_segments set to 96, and 10 
> minutes.
>
> I'll go with the COPY, since I can live with the batched requirements just 
> fine.

30-40 'in transaction' i/o bound inserts is so slow as to not really
be believable unless each record is around 1 megabyte because being in
transaction removes storage latency from the equation.  Even on a
crappy VM.  As a point of comparison my sata workstation drive can do
in the 10s of thousands.  How many records are you inserting per
transaction?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-13 Thread Merlin Moncure
2012/3/12 François Beausoleil :
> Hi all,
>
> When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
> what happens with indices? Are they updated only once after the operation, or 
> are they updated once per row? Note that I'm not replacing the table's data: 
> I'm appending to what's already there. I suspect batching writes will be 
> faster than writing each individual row using an INSERT statement.
>
> Currently, I can sustain 30-40 writes per second on a Rackspace VPS. I know 
> it's not the ideal solution, but that's what I'm working with. Following 
> vmstat, the server is spending 30 to 40% of it's time in iowait. I don't have 
> measurements as to what files are touched, and I'd welcome suggestions to 
> measure the time PostgreSQL actually spends writing indices vs data.

you're almost certainly blocking on fsync.  A real quick'n'dirty way
to confirm this (although it wont be as fast as COPY) would be to wrap
your inserts in a transaction.  VMs tend to have really horrible
storage latency which can hurt postgres performance.  Another option
would be to relax your commit policy (for example by flipping
synchronous_commit) if that fits within your safety requirements.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?

2012-03-09 Thread Merlin Moncure
On Fri, Mar 9, 2012 at 10:19 AM, Andy Colson  wrote:
> On 3/9/2012 9:47 AM, Merlin Moncure wrote:
>>
>> On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson  wrote:
>>>
>>> I know toast compresses, but I believe its only one row.  page level
>>> would
>>> compress better because there is more data, and it would also decrease
>>> the
>>> amount of IO, so it might speed up disk access.
>>
>>
>> er, but when data is toasted it's spanning pages.  page level
>> compression is a super complicated problem.
>>
>> something that is maybe more attainable on the compression side of
>> things is a userland api for compression -- like pgcrypto is for
>> encryption.  even if it didn't make it into core, it could live on
>> reasonably as a pgfoundry project.
>>
>> merlin
>
>
> Agreed its probably too difficult for a GSoC project.  But userland api
> would still be row level, which, in my opinion is useless.  Consider rows
> from my apache log that I'm dumping to database:

It's useless for what you're trying to do, but it would be useful to
people trying to compress large datums (data, I  know) before storage
using algorithms that postgres can't support, like lzo.

> date, url, status
> 2012-3-9 10:15:00, '/index.php?id=4', 202
> 2012-3-9 10:15:01, '/index.php?id=5', 202
> 2012-3-9 10:15:02, '/index.php?id=6', 202
>
> That wont compress at all on a row level.  But it'll compress 99% on a
> "larger" (page/multirow/whatever/?) level.

sure, but you can only get those rates by giving up the segmented view
of the data that postgres requires.  your tuples are very small and I
only see compression happening on the userland side by employing
tricks specific to your specific dataset (like employing "char" to map
the status, url mapping, etc).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?

2012-03-09 Thread Merlin Moncure
On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson  wrote:
> I know toast compresses, but I believe its only one row.  page level would
> compress better because there is more data, and it would also decrease the
> amount of IO, so it might speed up disk access.

er, but when data is toasted it's spanning pages.  page level
compression is a super complicated problem.

something that is maybe more attainable on the compression side of
things is a userland api for compression -- like pgcrypto is for
encryption.  even if it didn't make it into core, it could live on
reasonably as a pgfoundry project.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [BUGS] [GENERAL] Altering a table with a rowtype column

2012-03-08 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 2:49 PM, Merlin Moncure  wrote:
> On a practical level, the error blocks nothing -- you can bypass it
> trivially.   It's just an annoyance that prevents things that users
> would like to be able to do with table row types.  So I'd argue to
> remove the check, although I can kinda see the argument that it's not
> a bug unless the check was recently introduced so that it broke older
> code.

The behavior hasn't changed since at least as far back as 8.1, so
you're correct (once again) -- not a bug.  I'm really surprised I
haven't already bumped into this.  I usually don't mix
tables-as-storage with tables-as-composites though.

Mike, on 9.1, you'll probably get more mileage out of using the hstore
type for row storage if you want to do auditing in that style.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Synchronous replication + Fusion-io = waste of money OR significant performance boost? (compared to normal SATA-based SSD-disks)?

2012-03-08 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 2:54 AM, Joel Jacobson  wrote:
> My company is in the process of migrating to a new pair of servers, running 
> 9.1.
>
> The database performance monetary transactions, we require
> synchronous_commit on for all transactions.
>
> Fusion-io is being considered, but will it give any significant
> performance gain compared to normal SATA-based SSD-disks, due to the
> fact we must replicate synchronously?
>
> To make it more complicated, what about SLC vs MLC (for synchronous
> replication)?
>
> Assume optimal conditions, both servers have less than a meter between
> each other, with the best possible network link between them providing
> the lowest latency possible, maxed out RAM, maxed out CPUs, etc.
>
> I've already asked this question to one of the core members, but the
> answer was basically "you will have to test", I was therefore hoping
> someone in the community already had some test results to avoid
> wasting money.
>
> Thank you for any advice!

flash, just like hard drives, has some odd physical characteristics
that impose some performance constraints, especially when writing, and
double especially when MLC flash is used.  modern flash drives employ
non volatile buffers to work around these constraints that work pretty
well *most* of the time.  since MLC is much cheaper improvements in
flash controller technology are basically pushing SLC out of the
market except in high end applications.

if you need zero latency storage all the time and are willing to spend
the extra bucks, then pci-e  based SLC is definitely worth looking at
(you'll have another product to evaluate soon when the intel 720
ramsdale hits the market).  a decent MLC drive might work for you
though, i'd suggest testing there first and upgrading to the expensive
proprietary stuff if and only if you really need it.

my experience with flash and postgres is that even with low-mid range
drives like the intel 320 it's quite a challenge to make postgres be
i/o bound.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advisory transaction lock for 128-bit space

2012-03-08 Thread Merlin Moncure
On Thu, Mar 8, 2012 at 2:05 AM, Kiriakos Georgiou
 wrote:
> Indeed, if there is not some sort of implementation limitation, it would be
> cool to be able to lock two big integers like so:
>
>     pg_try_advisory_xact_lock(key1 bigint, key2 bigint)

Well, this would require expanding the structure that holds the
in-memory lock.  This is not free, since it's also used by the
database for internal lock tables.  I would advise trying to work
under the constraints of the current system.

If you want a database-wide advisory lock for rows, probably the best
bet is to make a sequence that is shared by all tables that want to
participate in advisory locking.  This is simple and works very well
regardless on how your keys are defined (uuid, natural, etc).  It's a
good use for a domain:

create sequence lockid_seq;
create domain lockid_t bigint default nextval('lockid_seq');
alter table foo add column lockid lockid_t;

etc.  You'll never exhaust a 64 bit sequence.  In fact, you can
reserve a few bits off the top in case you want to do some other
advisory locking for different reasons.  A bit hacky maybe, but it
works quite well.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [BUGS] [GENERAL] Altering a table with a rowtype column

2012-03-07 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 2:31 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Wed, Mar 7, 2012 at 11:45 AM, Mike Blackwell  
>> wrote:
>>> alter table a add column even_more_stuff boolean not null default false;
>
>> aha! that's not what you posted last time.  you appended 'not null
>> default false'; which inexplicably breaks the ALTER.
>
>> try this:
>> ALTER TABLE a ADD COLUMN even_more_stuff text not null;
>> ALTER TABLE a ALTER even_more_stuff set default false;
>> ALTER TABLE a DROP COLUMN even_more_stuff;
>> ALTER TABLE a ADD COLUMN even_more_stuff boolean not null default false;
>
>> (this really looks like a bug in postgres, cc-ing to bugs)
>
> It is not a bug.  The ALTER ADD ... DEFAULT ... form implies rewriting
> every existing tuple of the rowtype to insert a non-null value in the
> added column, and we don't have support for doing that to rowtype
> columns, only to the target table and descendants.

I'm not buying that..it implies no such thing.  In particular, for
table-as-rowtype columns, there's no way that I can see to have
default values be generated.  So why does it follow that the dependent
table has to be rewritten?  Column constraints are not enforced on the
rowtype, so it follows that default shouldn't be either considering
there's no way to get the default to fire.  Composite type (or table
based composite) defaults are applied to the composite as a whole, not
to specific fields.

On a practical level, the error blocks nothing -- you can bypass it
trivially.   It's just an annoyance that prevents things that users
would like to be able to do with table row types.  So I'd argue to
remove the check, although I can kinda see the argument that it's not
a bug unless the check was recently introduced so that it broke older
code.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Altering a table with a rowtype column

2012-03-07 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 1:17 PM, Mike Blackwell  wrote:
> As a followup, the workaround fails if there is data in the source table due
> to the initial null value placed in the existing data rows.
>
> [wcs1459@aclnx-cisp01 ~]$ psql --port=5433 -e -f x
> begin;
> BEGIN
> create table a (
>   id serial,
>   stuff text,
>   more_stuff text
> );
> psql:x:6: NOTICE:  CREATE TABLE will create implicit sequence "a_id_seq" for
> ser
>     ial column "a.id"
> CREATE TABLE
> create table a_audit (
>   id serial,
>   a_old a,
>   a_new a
> );
> psql:x:12: NOTICE:  CREATE TABLE will create implicit sequence
> "a_audit_id_seq"
>                  for serial column "a_audit.id"
> CREATE TABLE
> insert into a (stuff, more_stuff) values ('some', 'thing');
> INSERT 0 1
> ALTER TABLE a ADD COLUMN even_more_stuff boolean not null;
> psql:x:17: ERROR:  column "even_more_stuff" contains null values
> ALTER TABLE a ALTER even_more_stuff set default false;
> psql:x:18: ERROR:  current transaction is aborted, commands ignored until
> end of
>        transaction block
> ALTER TABLE a DROP COLUMN even_more_stuff;
> psql:x:19: ERROR:  current transaction is aborted, commands ignored until
> end of
>        transaction block
> ALTER TABLE a ADD COLUMN even_more_stuff boolean not null default false;
> psql:x:20: ERROR:  current transaction is aborted, commands ignored until
> end of
>        transaction block
> rollback;
> ROLLBACK

yup (please respond to the list) -- you can workaround the workaround
by UPDATEing the table to set the field before applying the not null
bit.  Note that if you did this, the foreign table containing the type
would have the new column all as null.

IMO, the server is being too strict on the dependency check.  Perhaps
there are some defenses here that are an early form of trying to get
field constraints to pass through to the foreign column, or it's just
a plain old bug.  I took a quick look at tablecmds.c to see if I could
find an easy fix, but it wasn't clear why the default was forcing an
dependency error and I punted.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Converting stored procedures from SQL Anywhere to PostGres.

2012-03-07 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 12:51 PM,   wrote:
> I am trying to convert stored procedures from SQL Anywhere to Postgres.
> I am getting error
>
> ERROR:  syntax error at or near "return"
> LINE 2: return integer AS
>        ^
>
>
> ** Error **
>
> ERROR: syntax error at or near "return"
> SQL state: 42601
> Character: 81
>
> when I run this.  I'm not sure what the problem is here.  Any help would
> be appreciated.
>
>
> CREATE OR REPLACE FUNCTION "iss"."ConfigAddKey" (in
> isscontrib.citext,pkeyname)
> return integer AS
> $BODY$
> begin
>
> declare l:retval integer;
> declare l:id integer;
>
> if exists(select id into l:id from iss.configkeys where keyname =
> pkeyname) then
>  l:retval := l:id
> else
>  insert into "iss".configkeys(keyname) values (pKeyname);
> end if;
> return retval
>
> end;
> $BODY$
> LANGUAGE 'plpgsql';

Well, the languages are obviously not compatible.  You're going to
have to convert them all and to do that you're going to have a decent
understanding of both languages.  I doubt there are any automatic
tools to do it.

start here:
http://www.postgresql.org/docs/current/static/plpgsql.html

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] corrupted table postgresql 8.3

2012-03-07 Thread Merlin Moncure
On Tue, Mar 6, 2012 at 6:34 PM, Matteo Sgalaberni  wrote:
> - Original Message -
>> On 6.3.2012 21:24, Matteo Sgalaberni wrote:
>> > Hi people!
>> >
>> > I have a pg 8.3. Today I issued in a database that comand:
>>
>> Which minor version? The last one in this branch is 8.3.18 and if
>> you're
>> running an old one, there might be an important bugfix ...
>
> 8.3.0, I read quickly all releases notes of 8.3.x and I didn't saw any fix 
> that could be directly related to what is happened to me...there are issues 
> related to ALTER TABLE but with other options like PRIMARY KEY or other 
> parameters.
> Are there fixes that could be related to what's happened to me?

hm:

* Also, a VACUUM FULL that failed partway through vacuuming a system
catalog could result in cache corruption in concurrent database
sessions.

* Fix btree index corruption from insertions concurrent with vacuuming
(Tom Lane)

* These bugs could result in index corruption after reindexing a
system catalog. They are not believed to affect user indexes.

I mostly agree, but it's possible you got hit by a fixed bug.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Altering a table with a rowtype column

2012-03-07 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 11:45 AM, Mike Blackwell  wrote:
>
> works for me -- what version are you on?
>
> merlin
>
> --
>
> [wcs1459@aclnx-cisp01 ~]$ psql --version
> psql (PostgreSQL) 9.1.1
> contains support for command-line editing
>
>
> [wcs1459@aclnx-cisp01 ~]$ cat x
> create table a (
>   id serial,
>   stuff text,
>   more_stuff text
> );
>
> create table a_audit (
>   id serial,
>   a_old a,
>   a_new a
> );
>
> alter table a add column even_more_stuff boolean not null default false;
>
>
> [wcs1459@aclnx-cisp01 ~]$ psql -f x
> psql:x:5: NOTICE:  CREATE TABLE will create implicit sequence "a_id_seq" for
> serial column "a.id"
> CREATE TABLE
> psql:x:11: NOTICE:  CREATE TABLE will create implicit sequence
> "a_audit_id_seq" for serial column "a_audit.id"
> CREATE TABLE
> psql:x:13: ERROR:  cannot alter table "a" because column "a_audit.a_new"
> uses its row type

aha! that's not what you posted last time.  you appended 'not null
default false'; which inexplicably breaks the ALTER.

try this:
ALTER TABLE a ADD COLUMN even_more_stuff text not null;
ALTER TABLE a ALTER even_more_stuff set default false;
ALTER TABLE a DROP COLUMN even_more_stuff;
ALTER TABLE a ADD COLUMN even_more_stuff boolean not null default false;

(this really looks like a bug in postgres, cc-ing to bugs)

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Altering a table with a rowtype column

2012-03-07 Thread Merlin Moncure
On Wed, Mar 7, 2012 at 11:22 AM, Mike Blackwell  wrote:
> Given a pair of tables:
>
> create table a (
>   id serial,
>   stuff text,
>   more_stuff text
> );
>
> create table a_audit (
>   id serial,
>   old_record a,
>   new_record a
> );
>
> How can one alter the structure of table a?  Attempting
>
> ALTER TABLE a ADD COLUMN even_more_stuff text;
>
> results in the message:
>
>  ERROR:  cannot alter table "a" because column "a_audit.new_record" uses its
> row type
>
> A solution that doesn't lose the existing data is preferable.

works for me -- what version are you on?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory usage and configuration settings

2012-03-05 Thread Merlin Moncure
On Mon, Mar 5, 2012 at 6:37 AM, Mike C  wrote:
> Hi,
>
> I have been using table 17-2, Postgres Shared Memory Usage
> (http://www.postgresql.org/docs/9.1/interactive/kernel-resources.html)
> to calculate approximately how much memory the server will use. I'm
> using Postgres 9.1 on a Linux 2.6 (RHEL 6) 64bit system, with 8GB RAM.
> Database is approximately 5GB, and is a mixture of read/write.
> Postgres is occasionally being killed by the linux oom-killer. I am
> trying to understand how much memory postgres could use, and how to
> change the configuration to bring it down to a level that won't get it
> killed.
>
> Key configuration values are:
>
> max_connections = 350
> shared_buffers = 4GB
> temp_buffers = 24MB
> max_prepared_transactions = 211
> work_mem = 16MB
> maintenance_work_mem = 131MB
> wal_buffers = -1
> wal_keep_segments = 128
> checkpoint_segments = 64
> effective_cache_size = 4GB
> autovacuum_max_workers = 4
>
> which I have interpreted to be:
>
> max_locks_per_transaction = 64
> max_connections = 350
> autovacuum_max_workers =4
> max_prepared_transactions = 211 (I've since realised this can be 0; I
> use prepared statements, not 2PC)
> shared_buffers = 4294967296
> wal_block_size = 8192
> wal_buffers = 16777216 (actually, -1, but following the documentation
> of max(16MB, shared_buffers/32) it should be 16MB).
> and wal segment size = 16777216, block_size = 8192
>
> And using the equations on the kernel resources page, I get:
>
> Connections = 6,678,000
>                     = (1800 + 270 * max_locks_per_transaction) *
> max_connections
>                     = (1800 + 270 * 64) * 350
> Autovacuum Workers = 76,320
>                                    = (1800 + 270 *
> max_locks_per_transaction) * autovacuum_max_workers
>                                    = (1800 + 270 * 64) * 4
> Prepared Transactions = 3,808,550
>                                      = (770 + 270 *
> max_locks_per_transaction) * max_prepared_transactions
>                                      = (770 + 270 * 64) * 211
> Shared Disk Buffers      = 36,077,725,286,400
>                                 = (block_size + 208) * shared_buffers
>                                 = (8192 + 208) * 4294967296
>                                 = ~33TB
> WAL Buffers = 137,573,171,200
>                     = (wal_block_size + 8) * wal_buffers
>                     = (8192 + 8) * 16777216
>                     = ~128GB
> Fixed Space Requirements = 788,480
> Overall = 36,215,309,808,950 bytes (~33.2 TB!)
>
> 33.2TB doesn't seem right, and while I know the equations are just
> approximations, this seems too much. What have I done wrong? I read a
> prior thread about this on the pgsql lists which seemed to indicate
> the equations for shared disk and wall buffers should be divided by
> the block_size 8192, and looking at it closer, wonder if the equation
> for both should just be overhead + buffer?
>
> Also what is the relationship between memory and work_mem (set to 16M
> in my case). I understand work_mem is per sort, and in most cases our
> queries only have a single sort. Does this mean an additional 16M per
> sorting client (350 * 16M = 5.6GB), or presumably it only uses the
> work memory as it needs it (i.e. does it preallocate all 16M for each
> sort, or on an as-needed basis depending on the size of sorted data?)
>
> Are there any other ways to calculate the worst case memory usage of a
> given postgres configuration?
>
> My gut feeling is to reduce shared_buffer to 1GB or less and reduce
> connections to ~150-200 (to reduce worst case work_mem impact).

One easy thing to neglect when doing memory counting is backend
private memory.  Each postgres process typically eats around 1mb and
this will grow if your processes are long-lived as the backend starts
to cache various structures.  As a rule of thumb I tend to use 4mb per
backend (you can confirm this yourself by subtracting SHR from RES).
In absolutely pathological cases (like heavy plpgsql backends with a
lot of tables and views) it can be worse.  4mb * 350 = 1.4gb...so
you're cutting things fairly close.

Looking at your postgresql.conf, your memory settings for
shared_buffers are a more aggressive than the often suggested 25% rule
but I bet it's the backend memory that's pushing you over the edge.

Rather than reducing backend count, I'd consider (carefully) using
pgbouncer to reduce overall connection count.  Or you can reduce
shared buffers, but in your case I'd probably cut it to 1GB if it was
me.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Return unknown resultset from a function

2012-03-05 Thread Merlin Moncure
On Mon, Mar 5, 2012 at 10:08 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> The only exception to this rule is cursors.  Reading from cursors via
>> FETCH allows you to pull data from a refcursor that was set up in a
>> previous function call and works pretty well, but comes with the giant
>> downside that the results can be directed only to the client.
>
> Hmm, couldn't you do a FETCH into a record variable in plpgsql?  Not
> that you'd not have problems manipulating the record variable, since
> plpgsql is pretty strongly typed itself.

Yeah -- good point on both sides -- you can do it, but it's pretty
limiting: you can only fetch a row at a time and the result data can't
be further expressed in another query.  A CTE based FETCH has been
suggested a couple of times as a hypothetical workaround.

Whether the data is processed on the server or the client the result
essentially the result is the same...you're forced into a highly
iterative method of programming that I try to avoid whenever possible.

TBH though I find the textual workarounds to the type system to work
pretty well, meaning that most of the useful things which were
historically only possible in C have been nicely wrapped or seem to be
just plain impossible (like handling mixed type variadic functions, or
receiving generic RECORDs as arguments).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Return unknown resultset from a function

2012-03-05 Thread Merlin Moncure
On Sun, Mar 4, 2012 at 1:52 PM, Jan Meyland Andersen  wrote:
> How do I return an unknown resultset from a function
>
> My main problem is that I do not know how many columns or the data type of
> the columns before runtime.
> It this possible at all?
>
> I also tried to return the data as a text array but I also have trouble with
> that.

The basic deal with postgres is that while it's legal to return
untyped RECORDSs from functions, only C functions can do that. Also,
when calling C functions you still have to decorate the returned
record with types when the query is called.  For a couple of good
examples of that see (\d+) the pg_locks view which wraps
pg_lock_status() record returning function or check out dblink which
makes heavy use of record returning functions.

The only exception to this rule is cursors.  Reading from cursors via
FETCH allows you to pull data from a refcursor that was set up in a
previous function call and works pretty well, but comes with the giant
downside that the results can be directed only to the client.

For pure server-side manipulation of untyped structures you have to
flatten everything to text.  You can do it yourself:

CREATE OR REPLACE FUNCTION get_records(table_name TEXT) RETURNS SETOF TEXT AS
$$
DECLARE
  query TEXT;
BEGIN
  query = format('SELECT %s::text FROM %s', table_name, table_name);

  RETURN QUERY EXECUTE query;
END;
$$ LANGUAGE PLPGSQL STABLE;

select get_records('foo');

Once you have the record in text representation you can throw it
around until it has to get casted back to 'foo' record type:

select (get_records('foo')::foo).*;

You can also do lots of wonderful things with the hstore type, or the
non plpgsql server-side languages (which basically flatten everything
to text).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8

2012-02-27 Thread Merlin Moncure
On Sun, Feb 26, 2012 at 6:16 AM, Jayashankar K B
 wrote:
> Ok. I did a manual patch and it Postgres 9.1.1 compiled for me without using 
> the --disable-spinlocks option.
> Thanks a lot for the patch. :)
> By the way, could you please point me to the explanation on the significance 
> of spinlocks for Postgres?

spinlocks are used all over the place to synchronize access to shared
data structures (see here: http://en.wikipedia.org/wiki/Spinlock also
see here: http://rhaas.blogspot.com/2011/01/locking-in-postgresql.html).
 you can awkwardly implement them in high level languages like C but
typically there are hardware primitives that are much faster and
better to use.

very generally speaking, spinlocks are a better than semaphores when
the lock duration is very short, contention isn't terrible, and the
time taken to acquire the lock matters.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8

2012-02-23 Thread Merlin Moncure
On Thu, Feb 23, 2012 at 3:09 PM, Jayashankar K B
 wrote:
> Hi Tom,
>
> Sorry about the cross-post.
> I am not aware of the procedures for patch etc.
> Could you please tell me how to use the patch ?

see general instructions here:
http://jungels.net/articles/diff-patch-ten-minutes.html

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] select as params to function

2012-02-17 Thread Merlin Moncure
On Fri, Feb 17, 2012 at 2:32 PM, Andy Colson  wrote:
> On 2/17/2012 2:11 PM, Pavel Stehule wrote:
>>
>> Hello
>>
>> PostgreSQL has no table data type. You can use a array of records instead
>>
>> and then
>>
>> select fx((select array(select * from tab)));

select fx(array(select t from tab t));

fx takes tab[]

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to dereference 2 dimensional array?

2012-02-16 Thread Merlin Moncure
On Thu, Feb 16, 2012 at 9:48 AM, ChoonSoo Park  wrote:
> I would like to construct hstore array from 2 dimensional array.
>
>
> For example,
>
>
> '{{f1, 1, f2, 123, f3, ABC, ...}, {f1, 2, f2, 345, f3, DEF, ...}}'
>
>
> Should be converted to
>
>
> 2 hstore values
>
> "f1"=>"1", "f2"=>"123", "f3"=>"ABC", ...
>
> "f2"=>"2", "f2"=>"345", "f3"=>"DEF", ...
>
>
> create or replace function convertHStore(p1 text[][]) RETURNS hstore[]
>
>
> hstore function requires text[] to convert array to hstore. Therefore I
> should be able to dereference 2 dimensional array element.
>
> Inside this custom plpgsql function, p1[1] is not valid syntax to
> dereference the 1st element in p1.
>
>
> Anyone knows how to solve this problem?

This is a good use of the 9.1 SLICE feature:

CREATE FUNCTION slice_hstore(text[]) RETURNS SETOF hstore AS $$
DECLARE
  x text[];
BEGIN
  FOREACH x SLICE 1 IN ARRAY $1
  LOOP
return next hstore(x);
  END LOOP;
END;
$$ LANGUAGE plpgsql;

select slice_hstore('{{g1, 1, f2, 123, f3, ABC}, {f1, 2, f2, 345, f3,
DEF}}'::text[]);

postgres=# select slice_hstore('{{g1, 1, f2, 123, f3, ABC}, {f1, 2,
f2, 345, f3, DEF}}'::text[]);
slice_hstore
-
 "f2"=>"123", "f3"=>"ABC", "g1"=>"1"
 "f1"=>"2", "f2"=>"345", "f3"=>"DEF"
(2 rows)

postgres=# select array(select slice_hstore('{{g1, 1, f2, 123, f3,
ABC}, {f1, 2, f2, 345, f3, DEF}}'::text[]));
   ?column?
---
 {"\"f2\"=>\"123\", \"f3\"=>\"ABC\", \"g1\"=>\"1\"","\"f1\"=>\"2\",
\"f2\"=>\"345\", \"f3\"=>\"DEF\""}
(1 row)


merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] " " around fields with psql

2012-02-13 Thread Merlin Moncure
On Sat, Feb 11, 2012 at 2:03 AM, Jasen Betts  wrote:
> On 2012-02-10, Steve Clark  wrote:
>
>> Is there a way with psql to get column output to be
>> "data1","data2",...,"datan"
>
> assuming you are trying to be compatible with CSV:
>
>  copy ( your_query_here ) to stdout with csv header ;

yeah -- that's the best way if you want actual csv,  from psql you'd
probably want to do \copy:
postgres=# \copy (select 1, '"', 'ab,c') to stdout csv header;
?column?,?column?,?column?
1,,"ab,c"

note that per csv rules columns are only required to be quoted to
protect from unambiguous parsing.  also, double quotes in your field
will be escaped.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] easy function or trigger to UPPER() all alpha data

2012-02-13 Thread Merlin Moncure
On Mon, Feb 13, 2012 at 11:42 AM,  wrote:
>
> Thank you very much.  This is most helpful.
>

you're welcome.  Keep in mind hstore features you need start with
postgres 9.0 and it's an extension aka contrib you have to add to the
database.  (also as Andreas noted, please try to keep responses
on-list).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] easy function or trigger to UPPER() all alpha data

2012-02-13 Thread Merlin Moncure
On Wed, Feb 8, 2012 at 10:51 AM, Andreas Kretschmer
 wrote:
> mgo...@isstrucksoftware.net  wrote:
>
>> We need to ensure that our data is in upper case only in the db.  Is there a
>> easy way to do this via a function without having to name each column
>> separately?
>
> You can define a TRIGGER for such tasks (befor insert or update), but
> you have to name each column (maybe not within triggers written in
> pl/perl, i'm not sure ...)

you can skirt the restriction with some hstore (ab)use...

create or replace function all_upper() returns trigger as
$$
begin
  new := populate_record(new, hstore(array_agg(key),
array_agg(upper(value from each(hstore(new));
  return new;
end;
$$ language plpgsql;

create trigger on_foo_insert before insert on foo
  for each row execute procedure all_upper();

postgres=# insert into foo values (1, 'abc', 'def');
INSERT 0 1
Time: 3.388 ms

postgres=# select * from foo;
 a |  b  |  c
---+-+-
 1 | ABC | DEF
 (1 row)

of course, if some non text datatype is sensitive to case in it's
textual formatting, this might break.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] " " around fields with psql

2012-02-10 Thread Merlin Moncure
On Fri, Feb 10, 2012 at 1:33 PM, Steve Clark  wrote:
> On 02/10/2012 02:12 PM, Scott Marlowe wrote:
>
> On Fri, Feb 10, 2012 at 11:26 AM, Steve Clark  wrote:
>
> Hello,
>
> Is there a way with psql to get column output to be
> "data1","data2",...,"datan"
>
> I tried -F "," but that left off the first and last quote.
>
> I can't seem to find a way in the man page.
>
> Well, you can do it yourself kinda like this:
>
> select '""||field1||'", "||field2||'" from sometable where yada.
>
> Ok that will work

for 9.1+ you can use built in format() function for a lot of fields:
select format('"%s", "%s", "%s", "%s"', procpid, usename, waiting,
query_start) from pg_stat_activity;

also with recent postgres you can use hstore to convert virtually any
query as such:
select '"' || array_to_string(avals(hstore(a)), '", "') || '"' from
pg_stat_activity a;

postgres=# select '"' || array_to_string(avals(hstore(q)), '", "') ||
'"' from (select 1 as a,2 as b,3 as c) q;
   ?column?
---
 "1", "2", "3"

etc.
merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Merlin Moncure
On Thu, Feb 9, 2012 at 4:20 PM, Andy Colson  wrote:
> On 2/9/2012 4:10 PM, David Salisbury wrote:
>>
>>
>>
>> On 2/9/12 10:08 AM, Rich Shepard wrote:
>>>
>>> I have reports containing macroinvertebrate collection data for several
>>> hundred (or several thousand) of taxa. There is no natural key since
>>> there
>>> are multiple rows for each site/date pair. Years ago Joe Celko taught
>>> me to
>>> seek natural keys whenever they might exist. They don't here. That's
>>> why I
>>> specifically mentioned that in my message.
>>
>>
>>
>> Interesting. I used to think natural keys were okay, but have since
>> decided
>> that surrogates are the way to go. That second layer of abstraction allows
>> for much easier data modifications when needed. What would be an example
>> of a natural key that would be good to use, and why would it be
>> preferable??
>>
>> I'd think the key value must never change, and even say kingdom values in
>> a
>> taxa table could possibly change.. might discover something new and do a
>> little reordering. :) Also natural keys might be strings, which I'm
>> thinking
>> would not be as efficient as integers for an index.
>>
>> -ds
>>
>
>
> Yeah, this is a Vim vs Emacs war.  (Vim, :-) )
>
> I prefer surrogates like you.  Its way to easy to pick something that one
> day has to change.
>
> Within the last year I remember a long thread about this same thing.

Sure, you can use surrogates, but you should still define or at least
be aware of a natural key if there is one. If you can't (which happens
on various type of data), then the surrogate is giving the illusion of
row uniqueness when there isn't one.   This is really a design error:
other keys could depend on this table's primary key which is a
provably ambiguous relationship.   Since your rows are not
informationally distinct from each other, why do you need to be able
to point at a specific one?

natural/surrogate is a performance/usability debate with various
tradeoffs.  but using surrogate to 'create' uniqueness is a logical
design error; maybe a very forgivable one for various reasons, but the
point stands.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Merlin Moncure
On Thu, Feb 9, 2012 at 10:49 AM, Rich Shepard  wrote:
>  I have a lot of data currently in .pdf files. I can extract the relevant
> data to plain text and format it to create a large text file of "INSERT INTO
> ..." rows. I need a unique ID for each row and there are no columns that
> would make a natural key so the serial data type would be appropriate.

The record should be logically unique as well as physically unique (of
if it isn't, why bother making a unique constraint at all?).
Sometimes you *have* to force a surrogate, for example if certain
(broken) client tools need a primary key to work, but aside from that
you shouldn't rely on a surrogate to generate uniqueness.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to write in Postgres

2012-02-07 Thread Merlin Moncure
On Tue, Feb 7, 2012 at 8:03 AM,  wrote:
>
>  I have a SQL statement that I'm trying to convert from a SQL Anywhere 
> function, but I'm getting a error.  Can't seem to figure out why.  Is the 
> substring usage incorrect and if it is how do I accomplish this.  I'm trying 
> to step through the item 1 character at a time so that I can reformat the 
> input correctly.
>
> This is inside a WHILE loop (While idx < 11 loop. idx has been initialized to 
> 1 and is declared as a integer.
> set chr = substr(lfeid,idx,1);

in plpgsql:

WHILE idx < 11
LOOP
  chr := substr(lfeid,idx,1);
END LOOP;

http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html

I could probably help you work it into a single query if you provided
some more detail.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Puzzling full database lock

2012-02-06 Thread Merlin Moncure
On Fri, Feb 3, 2012 at 2:55 PM, Christopher Opena  wrote:
> Merlin, thanks for the response.

no problem.  if you're open to architecture suggestions you might also
want to consider going with HS/SR and getting those large olap queries
off your main database.  you'll have to configure it to be very
forgiving of long running queries (which can pause replication) but
this should be a huge win in terms of total load on the main server.
also, you can skimp on hardware to a certain degree to cut costs.  we
still want to get to the bottom of this obviously but it can't hurt to
get a plan 'B' going...

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] multiple parameters to an AGGREGATE function

2012-02-03 Thread Merlin Moncure
On Fri, Feb 3, 2012 at 11:27 AM, Graham  wrote:
> Hi,
>   first time poster here ...
>
> I'm trying to write a simple Aggregate function which returns the nth
> element in a collection - ultimately I want to find 95th, 90th percentiles
> and so on.
>
> It'd be called like:
>
> select nth_element( value, 95 ) from something group by ...
>
> I'm basing this on an example I found on the Wiki:
>
> http://wiki.postgresql.org/wiki/Aggregate_Mode
>
> So, I have:
>
> CREATE OR REPLACE FUNCTION _final_nth( anyarray, integer )
>  RETURNS anyelement AS
> $BODY$
>    SELECT a
>    FROM unnest( $1 ) a
>    ORDER BY a
>    offset $2
>    LIMIT 1;
> $BODY$
> LANGUAGE 'sql' IMMUTABLE;
>
> CREATE AGGREGATE nth_element( anyelement, integer ) (
>  SFUNC=array_append,
>  STYPE=anyarray,
>  FINALFUNC=_final_nth,
>  INITCOND='{}'
> );
>
> Where the 2nd parameter would be, say 5 for the 5th element an so on.
>
> The function declaration seems fine. But the CREATE AGGREGATE declaration
> fails with:
>
> ERROR:  function array_append(anyarray, anyelement, integer) does not exist
>
> so, I suppose it's decided to call array_append with all the parameters on
> the command line, rather than just the array in the 1st element. Is there
> any way to stop it doing this?

I'm too busy to check, but it's probably calling your function at the
end with the last element scanned.  Just create it the way it wants
and log the arguments -- it should be usable.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Puzzling full database lock

2012-02-02 Thread Merlin Moncure
On Wed, Feb 1, 2012 at 6:38 PM, Christopher Opena  wrote:
> Hello folks,
>
> We've been running into some very strange issues of late with our PostgreSQL
> database(s).  We have an issue where a couple of queries push high CPU on a
> few of our processors and the entire database locks (reads, writes, console
> cannot be achieved unless the high CPU query procs are killed).  Further
> investigation shows ~59% total cpu usage (we have 16 total cores), low io,
> and mid-to-low memory usage (we have 74GB of memory, shared_buffers=16GB).
>  We had previously seen some high io problems but those turned out to be
> unconnected and ultimately solved, yet we are still seeing a complete lock
> of the DB occasionally as previously described.
>
> The queries themselves are not any different than normal usage on other
> databases; they are pulling back a little more data but there's nothing that
> stands out about them as far as query construction.
>
> One thing that we aren't sure of is whether or not we are running into a
> general connection pooling issue.  Our typical number of postgresql
> processes fluctuates between 1,400 and 1,600 - most of which are idle - as
> we have a number of application servers all connecting to a central
> read/write master (the master replicates out to a secondary via streaming
> replication).  We have max_processes set to 3,000 after tweaking some kernel
> memory parameters so at least we know we aren't exceeding that, but is there
> a practical "real world" limit or issue with setting this too high?
>
> Ultimately, the problem we're seeing is a full read/write lock on a system
> that is apparently at medium usage levels once we got rid of our high io red
> herring.  Honestly I'm a little stumped as to where to look next; is there
> some specific metric I might be missing here?
>
> Any help is greatly appreciated,

Random thoughts/suggestions:
*) Look for some correlation between non-idle process count and
locking situation.  You are running a lot of processes and if I was in
your shoes I would be strongly looking at pgbouncer to handle
connection pooling.  You could be binding in the database or (worse)
the kernel

*) Try logging checkpoints to see if there is any correlation with your locks.

*) An strace of both the 'high cpu' process and one of the blocked
process might give some clues -- in particular if you are being
blocked on a system call

*) Given enough time, do your high cpu queries ever complete? Are they
writing or reading?

*) What happens to overall system load if you lower shared_buffers to, say, 4gb?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Issue with CREATE EXTENSION tablefuncKreiter

2012-02-01 Thread Merlin Moncure
On Wed, Feb 1, 2012 at 8:52 AM, Chris Travers  wrote:
> Hi all;
>
> We have gotten a report from a user who is having issues with CREATE
> EXTENSION tablefunc.  I figured I would ask for additional insight and
> assistance at this point.
>
> When the user tries to run CREATE EXTENSION tablefunc; the following occurs:
>
> -bash-4.2$ dropdb ext_test
> -bash-4.2$ createdb ext_test
> -bash-4.2$ psql ext_test
> psql (9.1.2)
> Type "help" for help.
>
> ext_test=# select version();
>                                                    version
> -
>  PostgreSQL 9.1.2 on x86_64-redhat-linux-gnu, compiled by gcc (GCC)
> 4.6.2 20111027 (Red Hat 4.6.2-1), 64-bit
> (1 row)
>
> ext_test=# CREATE EXTENSION tablefunc;
> ERROR:  type "tablefunc_crosstab_2" already exists
>
> This lead me to conclude that we needed to CREATE EXTENSION FROM UNPACKAGED
> thinking this might be an upgrade issue.  However no luck.
>
> ext_test=# CREATE EXTENSION tablefunc FROM unpackaged;
> ERROR:  function normal_rand(integer, double precision, double
> precision) does not exist
>
>
> What should be tried next?

by any chance did you create the extension in the template1 database?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index on parent/child hierarchy

2012-01-31 Thread Merlin Moncure
On Sun, Jan 29, 2012 at 5:55 AM, Dmitriy Igrishin  wrote:
>> The point here is that you can exploit the tree structure with a btree
>> index.  Before we got recursive queries, this was often the best way
>> to do it, but now it's kind of a niche solution to be used when
>> certain things fall into place.
>>
>> merlin
>
> Another approarch  is to use ltree. It's easy and robust.
> http://www.postgresql.org/docs/9.1/static/ltree.html

yeah -- good point.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Don't Thread On Me (PostgreSQL related)

2012-01-26 Thread Merlin Moncure
On Thu, Jan 26, 2012 at 3:52 PM, Rodrigo E. De León Plicet
 wrote:
> Quote:
>
> ==
>
> This thread
>
> http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5143643.html
>
> was mentioned in a performance sub-group posting. Give it a read.
>
> Back? It means, so far as I can see, that PG is toast. It will fall
> down to being the cheap and dirty alternative to MySql, which even
> has, at least two, multi-threaded engines. DB2 switched it's *nix
> engine to threads from processes with release 9.5. Oracle claims it
> for releases going back to 7 (I haven't tried to determine which parts
> or applications; Larry has bought so many tchochtkes over the
> years...). SQL Server is threaded.
>
> Given that cpu's are breeding threads faster than cores,
> PG will fall into irrelevance.

The author of that post apparently doesn't understand that even though
postgresql hasn't 'switched to threads', it can still do more than one
thing at once.  Each process is itself an execution thread.  A
multi-threaded query planner is perfectly possible in postgresql
architecture -- however each one must reside in it's own process and
you have to use shared memory instead instead of pthreads and locking.
 Big whoop.  The only thing at stake with a multi threaded planner is
optimizing single user tasks which is, while important, a niche
optimization.  PostgreSQL is for more scalable than mysql for
multi-user loads and the gap is increasing.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Composite Type : pros and cons

2012-01-26 Thread Merlin Moncure
On Thu, Jan 26, 2012 at 3:22 AM, Leguevaques Alex
 wrote:
>
> Hello,
> I'm new to Pg and exploring its advanced functionalities for a project.
> I find composite type very interesting, but what are problems/limitations ?
> I'd want to create this structure for example:

Composite types add a little value in that you can apply frequently
grouped together fields in a table.  This can save a little typing.
The downside is you are diverging from classic sql mechanics a little
bit which can make some things awkward.

Where they really shine though is as variables if you are doing a lot
of backend programming with functions.  You can pass them too and from
functions and make arrays out of them...this is very powerful once you
get the hang of it.

If you are using a client stack that understands postgres composite
types, you can (ab)use this to send complicated datasets to and from
the database.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index on parent/child hierarchy

2012-01-25 Thread Merlin Moncure
On Wed, Jan 25, 2012 at 5:54 AM, Jason Armstrong  wrote:
> Hi
>
> I'm looking for advice on the best way to index a table that is defined as:
>
> create table uuid.master(id uuid, parent uuid references
> uuid.master(id), type_id smallint, primary key(id));
>
> Besides the primary key, I have these two indices on the table too:
> CREATE INDEX master_parent_idx ON uuid.master(parent);
> CREATE INDEX master_type_idx ON uuid.master(type_id);
>
> I have data arranged in four levels (ie type_id is from 1 to 4):
>
> 1. id=A type_id=1
> 2.  id=B parent=A type_id=2
> 3.   id=C parent=B type_id=3
> 4.    id=D parent=C type_id=4
> 2.  id=E parent=A type_id=2
> 3.   id=F parent=E type_id=3
> 4.    id=G parent=F type_id=4
> 4.    id=H parent=F type_id=4
> 4.    id=I parent=F type_id=4
> 3.   id=J parent=E type_id=3
> 4.    id=K parent=J type_id=4
>
> I want to count all type_id=4 for a particular type_id=1 uuid.
>
> I use this query:
>
> SELECT count(t4.id)
> FROM uuid.master AS t4
> INNER JOIN uuid.master AS t3 ON t4.parent=t3.id
> INNER JOIN uuid.master AS t2 ON t3.parent=t2.id
> INNER JOIN uuid.master AS t1 ON t2.parent=t1.id
> WHERE t1.id=UUID
>
> Apart from creating a separate table to keep track of the counts, is
> there a good way to index the table to help?

If your data is organized as a tree, tends to run deep (causing many
recursion joins),  and you often make sweeping subset operations
starting from a parent node, and your data isn't too heavily updated,
you might want to consider materialized path organization instead of
parent/child.  Both arrays and strings work for that.

 id=I parents=A,E,F type_id=4

SELECT count(*) FROM uuid.master WHERE parents LIKE 'A,E%' and type_id = 4;

The point here is that you can exploit the tree structure with a btree
index.  Before we got recursive queries, this was often the best way
to do it, but now it's kind of a niche solution to be used when
certain things fall into place.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to create unique primary keys across schemas?

2012-01-24 Thread Merlin Moncure
On Tue, Jan 24, 2012 at 5:23 AM, panam  wrote:
> Wow, this is pretty useful. Just to fit it more to my original use case, I
> used this:
>
> CREATE schema schema1;
> CREATE schema schema2;
> CREATE TABLE tbl (ID serial primary key,foo varchar,bar varchar);  --in
> public schema
> CREATE TABLE schema1.tbl (LIKE public.tbl INCLUDING ALL);  --draws ids from
> sequence in public schema
> CREATE TABLE schema2.tbl (LIKE public.tbl INCLUDING ALL);  --draws ids from
> sequence in public schema
> INSERT INTO schema1.tbl (foo,bar) VALUES ('asdf','qwer');
> INSERT INTO schema2.tbl (foo,bar) VALUES ('hello','world');
> INSERT INTO schema1.tbl (foo,bar) VALUES ('asdf','qwer');
> INSERT INTO schema2.tbl (foo,bar) VALUES ('hello','world');
>
> Thanks, I now consider this my best practice. This way, I don't have to
> allocate ranges any more a priori :)

Another quirky way to do it is with domains;

create sequence global_seq;
create domain gid bigint default nextval('global_seq');
create table foo (gid gid, f1 text);
create table bar (gid gid, f2 int);
etc.

This looks very appealing on the surface but domains have some quirks
that should give pause.  In particular, you can't make arrays of them,
although you can make arrays of rowtypes that have a domain in them.

Barring domains, you can just manually apply the default instead of
using a serial type:

create table foo (gid bigint default nextval('global_seq'));

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] unnest array of row type

2012-01-12 Thread Merlin Moncure
On Thu, Jan 12, 2012 at 8:47 AM, Pavel Stehule  wrote:
> 2012/1/12  :
>> Hi!
>>
>> CREATE TYPE my_row_type AS (c1 SMALLINT,c2 TEXT);
>>
>> CREATE FUNCTION tmp_get_c1(my_row_type) RETURNS SMALLINT AS 'SELECT $1.c1' 
>> LANGUAGE SQL;
>>
>> CREATE FUNCTION tmp_get_c2(my_row_type) RETURNS TEXT AS 'SELECT $1.c2' 
>> LANGUAGE SQL;
>>
>> CREATE TABLE test (c1 SMALLINT,c2 TEXT,c3 TEXT);
>>
>> INSERT INTO test
>> SELECT tmp_get_c1(r),tmp_get_c2(r),'x'
>> FROM (
>>        SELECT 
>> UNNEST(ARRAY[ROW(1::SMALLINT,'a'::TEXT),ROW(2::SMALLINT,'b'::TEXT)]) AS r
>> ) s;
>>
>> I get error "record type has not been registered" from the previous INSERT.
>>
>> I have tested version 9.1 and have confirmed that PL/PGSQL "FOREACH ... IN 
>> ARRAY ... LOOP ... END LOOP;" does the job. Because I wonder "INSERT INTO 
>> ... SELECT ... FROM" being faster than "LOOP ... END LOOP;", I raise this 
>> question.
>>
>
> insert into test select tmp_get_c1(r), tmp_get_c2(r), 'x' from (SELECT
> (x,y)::my_row_type as r from
> UNNEST(ARRAY[ROW(1::SMALLINT,'a'::TEXT),ROW(2::SMALLINT,'b'::TEXT)])
> AS (x smallint, y text)) x;

I don't think the tmp_get* functions are necessary (and even if they
were, you should mark them 'immutable').  Also that's unnecessarily
verbose. I would write it like this:

INSERT INTO test SELECT (r).c1, (r).c2, 'x' FROM
(
  SELECT UNNEST(ARRAY[ROW(1,'a'),ROW(2,'b')]::my_row_type[]) r
) x;

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question about the money type

2012-01-11 Thread Merlin Moncure
On Wed, Jan 11, 2012 at 9:32 AM, Szymon Guz  wrote:
> Hi,
> in the documentation for 8.2
> (http://www.postgresql.org/docs/8.2/interactive/datatype-money.html) there
> is info:
>
> Note: The money type is deprecated. Use numeric or decimal instead, in
> combination with theto_char function.
>
> However in later versions the info disappeared. It doesn't exist even in the
> 8.3.
>
> I think removing this type would a good solution... any plans for that?
>
> What is the status of Money type, is that OK or deprecated?

in 8.3 it was expanded to 64 bits.  Lack of range was the main reason
for the deprecation so it's status was reverted.  I happen to still
think it's pretty silly to justify as a core type but it's not going
anywhere.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] (check) constraints on composite type

2012-01-10 Thread Merlin Moncure
On Tue, Jan 10, 2012 at 1:44 PM, David Johnston  wrote:
> Sorry for not answering the main question but you really need to avoid
> phrases like "near future".  That said, new features are only released
> during major releases and so at best you would have to wait for 9.2 which is
> probably 9+ months out since 9.1 was just recently released.  The better
> question is whether 9.2 is likely to have what you need or whether it has a
> chance to be during 9.3 (1.5+ years out).  More generically the question is
> whether anyone is currently working on the feature.

yeah -- I've seen a bit of interest in this feature, but AFAIK
nobody's working on it, and no plausible design has been submitted
(which is a giant red flag in terms of 9.2 acceptance).  fair warning:
this is one of those items that sounds neat but is a lot more
complicated than it looks.

in the meantime, just do the workaround I do: make an immutable
function that takes your type as input, and manually apply the
constraint in various places like (like as a table constraint).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Radial searches of cartesian points?

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 11:01 AM,   wrote:
> Hi,
>
> I have a data set of several hundred thousand points. Each point is saved as 
> a three dimensional coordinate, i.e. (x, y, z). What I'd like to do is given 
> a point in that space, get a list of all of the points in the table within 
> some radius.
>
> I'm familiar with the q3c package that does this for points that lie on a 
> sphere, but is there something comparable for radial searches on 3D cartesian 
> points? Speed is definitely an issue given the number of points I have.
>
> Thanks for any suggestions!

see:
http://www.postgresql.org/docs/9.1/interactive/cube.html

and pay special attention to gist indexing portions.  cube only
indexes box operations, but you can cull the sphere using 3d distance
formula for points between inner and outer bounding cube.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Duplicated entries are not ignored even if a "do instead nothing" rule is added.

2012-01-03 Thread Merlin Moncure
On Tue, Jan 3, 2012 at 1:42 AM, Alban Hertroys  wrote:
> On 3 Jan 2012, at 5:20, 邓尧 wrote:
>
>> Hi,
>>
>> I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. 
>> After some searching I got a solution, which is adding a "do instead 
>> nothing" rule to the corresponding table, but it fails sometimes.
>
> Yeah, if a concurrent transaction tries to create the same record, one of the 
> transactions is going to find that it already exists on transaction commit. 
> An INSERT-rule is not going to protect you against that.

It will if you lock the table first in the same transaction...note
this will greatly hurt concurrency and you have to watch for
deadlocks.

INSERT...SELECT..WHERE is going to be vastly superior to a rule based
approach obviously.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 9.1 pg_dump setval() sets wrong value

2011-12-29 Thread Merlin Moncure
On Thu, Dec 29, 2011 at 10:53 AM, Greg Donald  wrote:
> On Thu, Dec 29, 2011 at 10:27 AM, Merlin Moncure  wrote:
>> if you take a bzipped schema only dump (pg_dump -s), I'd be happy to
>> look it over and eliminate the 'operator error' class of issues that
>> Tom is thinking might be happening.  private mail is ok.
>
> Operator error?  Wow.. so now I'm doing it wrong?
>
> I'll pass, thanks for all your "help" guys.  It's been a blast.

I did not suggest that.  I suggested they *might* be happening.  In
lieu of a reproducible test case (a couple of us took a stab at
creating one ourselves and could not come up with your issue) or some
other smoking gun it is something that has to ruled out.
Statistically speaking, operator error is by far the most common cause
of bug reports and as engineers we tend to rule out the most likely
culprits first.  It's nothing personal.

By the way, I've been diagnosing and fixing other people's problems on
this list for years and have a pretty good feel for what's a real
problem or not.  Your is one of those that can go either way and I was
willing to take time out of my day, gratis, to help you fix it on your
end and/or suggest a fix to the database itself.  Since you are
obviously unaware of the value of that help, not to mention Tom's
incredibly precious time, it is your curse to have to reset your
sequences for ever and ever.  Happy Holidays.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 9.1 pg_dump setval() sets wrong value

2011-12-29 Thread Merlin Moncure
On Thu, Dec 29, 2011 at 10:20 AM, Greg Donald  wrote:
> On Wed, Dec 28, 2011 at 4:57 PM, Tom Lane  wrote:
>>> < SELECT pg_catalog.setval('cp_state_id_seq', 52, true);
>>> > SELECT pg_catalog.setval('cp_state_id_seq', 1, false);
>>
>> These "grep" calls are showing just exactly not enough to prove
>> anything.
>
> Those grep calls prove my old backups with 8.4 pg_dump were good to go
> and now they are not with 9.1 pg_dump.
>
>> I remain unclear as to what state is actually in the
>> database, or what is being dumped,
>
> The whole thing is being dumped.  One command /usr/bin/pg_dump cp,
> that's it, nothing special.

if you take a bzipped schema only dump (pg_dump -s), I'd be happy to
look it over and eliminate the 'operator error' class of issues that
Tom is thinking might be happening.  private mail is ok.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 9.1 pg_dump setval() sets wrong value

2011-12-28 Thread Merlin Moncure
On Wed, Dec 28, 2011 at 12:34 PM, Adrian Klaver  wrote:
> On Wed, Dec 28, 2011 at 8:38 AM, Merlin Moncure  wrote:
>>
>> On Wed, Dec 28, 2011 at 10:20 AM, Adrian Klaver 
>> wrote:
>>
>> >
>> > I was thinking of the more generic case. The problem with 'new' in the
>> > above bug
>> > is that it is a reserved word and the list of reserved words recognized
>> > by
 > pg_dump changed in 9.0 or 9.1(?). If the table definitions you showed
>> > are correct
>> > you have sequence columns named id. id is in the list of reserved words
>> > also:
>> >
>> > http://www.postgresql.org/docs/9.1/static/sql-keywords-appendix.html
>>
>> hm -- if sequences named 'id' are not dumping properly, I'd call that
>> a pretty significant bug.
>
>
>
> Well just tested using 9.1.1 pg_dump to dump an 8.4.9 database with table:
> CREATE TABLE id_test (
>     id integer NOT NULL,
>     fld_1 text
> );
>
> Everything worked including:
>
> SELECT pg_catalog.setval('id_test_id_seq', 5, true)
>
> which represents the correct sequence value.
>
> So something else is going on.

yeah -- I was just about to report same. I also renamed the sequence
itself to 'id' with no issues. this was on 9.1.0.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 9.1 pg_dump setval() sets wrong value

2011-12-28 Thread Merlin Moncure
On Wed, Dec 28, 2011 at 10:20 AM, Adrian Klaver  wrote:
> On Wednesday, December 28, 2011 7:51:24 am Greg Donald wrote:
>> On Wed, Dec 28, 2011 at 9:30 AM, Adrian Klaver 
> wrote:
>> > Wonder if it is related to this:
>> > http://archives.postgresql.org/pgsql-bugs/2011-11/msg00098.php
>>
>> None of my sequences are named 'new'.  What do you mean?
>
> I was thinking of the more generic case. The problem with 'new' in the above 
> bug
> is that it is a reserved word and the list of reserved words recognized by
> pg_dump changed in 9.0 or 9.1(?). If the table definitions you showed are 
> correct
> you have sequence columns named id. id is in the list of reserved words also:
>
> http://www.postgresql.org/docs/9.1/static/sql-keywords-appendix.html

hm -- if sequences named 'id' are not dumping properly, I'd call that
a pretty significant bug.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Kindly Please Help Me

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 5:11 PM,   wrote:
> Hi every1 how are u all??? Members i am new in postgres and want to work on
> pgrouting but i am facing some issue will u please help me???
> I have loaded my line shapefile in pgadmin environment but when i made a
> query at it it will show that ur table (Route dont have relation)..wht is
> this???

you created the table with an upper case letter in the name ("Route").
 This means all queries that touch the table must also double case the
name because postgres folds all identifiers to lower case if you don't
do that.  Advise changing the table name from Route to route.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid memory alloc request size

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 4:07 PM, Tomas Vondra  wrote:
>>> Googling around, it sounds like this is often due to table corruption, 
>>> which would be unfortunate, but usually seems to be repeatable. I can 
>>> re-run that query without issue, and in fact can select * from the entire 
>>> table without issue. I do see the row was updated a few minutes after this 
>>> error, so is it wishful thinking that vacuum came around and successfully 
>>> removed the old, corrupted row version?
>>
>> It also happens that 18446744073709551613 is -3 in 64-bit 2's complement if 
>> it was unsigned. Is it possible that -3 was some error return code that got 
>> cast and then passed directly to malloc()?
>
> That's not likely. The corruption is usually the cause, when it hits
> varlena header - that's where the length info is stored. In that case
> PostgreSQL suddenly thinks the varlena field has a negative value (and
> malloc accepts unsigned integers).

If the problem truly went away, one likely possibility is that the bad
tuple was simply deleted -- occasionally the corruption is limited to
a tuple or two but doesn't spill over into the page itself -- in such
situations some judicious deletion of rows can get you to a point
where you can pull off a dump.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql triggers - defining a global resource (java)

2011-12-27 Thread Merlin Moncure
On Mon, Dec 26, 2011 at 8:32 AM, Aman Gupta  wrote:
> Hey Alban,
>
> Thanks for the reply. I had a follow up question w.r.t listen/notify:
>
> I am planning to associate a NOTIFY with an update on a table - a trigger is
> associated with the update, and we execute NOTIFY in the trigger code. The
> NOTIFY directly goes to a remote server and contains the updated row data in
> the payload (serialized). Each update will result in a diffferent payload
> (timestamp will be in it). My question is whether LISTEN/NOTIFY was designed
> to handle this scenario? The update rate may scale to very high levels, and
> each of those updates will do a trigger call and issue a notification. Also,
> the size of the payload may be large (maybe 10KB per notification).

postgresql notify payload is limited to 8000 bytes.  you need to write
the payload into a table and have a LISTENer process it in a queue.
This type of thing can get real complex real fast, especially if you
need to distribute the work to multiple processors.  single processor
is probably ok with ad hoc implementation.

If you are struggling with this (although I'd still give it a go), you
may want to check out PGQ
(http://wiki.postgresql.org/wiki/PGQ_Tutorial).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL DBA in SPAAAAAAAACE

2011-12-20 Thread Merlin Moncure
On Tue, Dec 6, 2011 at 10:56 AM, Joe Miller  wrote:
> You may have seen this, but RedGate software is sponsoring a contest
> to send a DBA on a suborbital space flight.
>
> And there is a PostgreSQL representativeme!
>
> https://www.dbainspace.com/finalists/joe-miller
>
> Voting is open for 7 days. Don't let one of those Oracle or SQL Server
> punks win :p

guess who won! :-D

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logical Aggregate Functions (eg ANY())

2011-12-19 Thread Merlin Moncure
On Mon, Dec 19, 2011 at 3:42 AM, Marti Raudsepp  wrote:
> In fact, there's no reason why bool_or/bool_and couldn't do the same
> thing. bool_or() is like the max() for boolean values, and bool_and()
> is min().
>
> CREATE AGGREGATE my_bool_or(bool) (sfunc=boolor_statefunc, stype=bool,
> sortop= >);
> CREATE AGGREGATE my_bool_and(bool) (sfunc=booland_statefunc,
> stype=bool, sortop= <);
>
> db=# explain analyze select bool_and(b) from bools;
>  Aggregate  (cost=1693.01..1693.02 rows=1 width=1)
>   ->  Seq Scan on bools  (cost=0.00..1443.01 rows=11 width=1)
>  Total runtime: 29.736 ms
>
> db=# explain analyze select my_bool_and(b) from bools;
>  Result  (cost=0.03..0.04 rows=1 width=0)
>   InitPlan 1 (returns $0)
>     ->  Limit  (cost=0.00..0.03 rows=1 width=1)
>           ->  Index Scan using bools_b_idx on bools
> (cost=0.00..3300.28 rows=11 width=1)
>                 Index Cond: (b IS NOT NULL)
>  Total runtime: 0.109 ms
>
> Now obviously this still has limitations -- it doesn't do index
> accesses in a GROUP BY query -- but it's a fairly simple modification.

That's really clever...bravo.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logical Aggregate Functions (eg ANY())

2011-12-18 Thread Merlin Moncure
On Sat, Dec 17, 2011 at 6:06 PM, Robert James  wrote:
> On 12/15/11, Marti Raudsepp  wrote:
>> On Thu, Dec 15, 2011 at 18:10, Robert James  wrote:
>>> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?
>>
>> Note that in many cases, writing an EXISTS(SELECT ...) or NOT
>> EXISTS(...) subquery is faster, since the planner can often optimize
>> those to a single index access -- whereas an aggregate function would
>> necessarily need to walk through and evaluate all potential rows.
>>
>
> Really? The planner can't tell that, for instance, BOOL_AND (false, *)
> is automatically false?

No (by the way, I really should have known about the bool_x aggregate
functions before suggesting a  hand rolled one!), that would require
that the planner have very special understanding of the internal
workings of aggregate functions.  There are a couple of cases where
the planner *does* have that function, for example it can convert
max(v) to 'order by v desc limit 1' to bag the index, but that's the
exception rather than the rule.

Most queries that can be equivalently expressed in aggregate and
non-aggregate form are faster without aggregates.   However,
aggregation can be a cleaner expression of the problem which is
important as well (performance isn't everything!).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CREATE OR REPLACE AGGREGATE

2011-12-15 Thread Merlin Moncure
On Thu, Dec 15, 2011 at 11:29 AM, Robert James  wrote:
> Is there anyway to do the equivalent of CREATE OR REPLACE AGGREGATE?
> Or - maybe even better - CREATE AGGREGATE if it doesn't already exist?

Well, you have DROP [IF EXISTS] which should cover at least some of
the use cases.

I'm not sure a OR REPLACE for aggreates is a good idea -- you can
already REPLACE the internal function (which is mostly what you need).
 The aggregate itself mainly defines the input types, state, and
output types.  The main advantage of replace is you get to preserve
the oid and avoid dependency issues -- if you are changing the
aggregate, it's pretty likely you'd have to go through that anyways.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logical Aggregate Functions (eg ANY())

2011-12-15 Thread Merlin Moncure
On Thu, Dec 15, 2011 at 10:10 AM, Robert James  wrote:
> I see Postgres (I'm using 8.3) has bitwise aggregate functions
> (bit_or), but doesn't seem to have logical aggregate functions.
>
> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?



CREATE OR REPLACE FUNCTION OrAgg(bool, bool) RETURNS BOOL AS
$$
  SELECT COALESCE($1 or $2, false);
$$ LANGUAGE SQL IMMUTABLE;

create aggregate "any"(bool)
(
  sfunc=OrAgg,
  stype=bool
);

postgres=# select "any"(v) from (values (false), (true)) q(v);
 any
-
 t
(1 row)

etc

note:, I don't like the use of double quoted "any" -- but I'm too lazy
to come up with a better name.  :-)

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Controlling complexity in queries

2011-12-14 Thread Merlin Moncure
On Tue, Dec 13, 2011 at 4:27 PM, Jay Levitt  wrote:
> Merlin Moncure wrote:
>>
>> Breaking your large queries into functions OTOH can make significant
>> changes to the plan, often to the worse.
>
>
> As an end-user, I think this is an area where PostgreSQL could really stand
> out (that and the moon launch).  In Rails-land, you don't have The DBA that
> writes queries. You have a developer and an ORM, and when they step outside
> that ORM to do the cool things SQL can do, all their DRY habits fall apart,
> because it's 1990 again and you can either write clear code or fast code but
> not both.

As far as ORMs are concerned, I'll take 1990 over 2011 all day long.
The heavy emphasis on procedural/OO coding tactics to model business
applications IMNSHO is and always been a total catastrophe.  The
reasons for that are numerous: insane defect rates, brittle functional
relationships, poor concurrency model etc.  Enterprises move off SQL
because they hate paying the dollars top SQL talent demands only to
pay the ultimate price when development gets crushed under the weight
of maintaining all that crappy code.  This sad state of affairs has
been encouraged by some of the top software vendors.

> But having to write one big query for performance feels exactly like having
> to write one big C function with unrolled loops.  I'm currently taking a
> well-factored, function-based query and turning it INTO what Robert James is
> trying to get OUT of: a monolithic query.

SQL has a very powerful abstraction feature: it's called a view.  Good
use of views is a key design feature for complex databases.

Functions are generally not a good choice for query abstraction unless:
*) you are working with scalars (string manipulation etc)
*) you need non relational features like plpgsql exception
handling/notice printing, etc
*) this particular operation is not really optimizable anyways and you
want to wrap it (WITH RECURSIVE for example)
*) your function is inline-able (generally, a one liner sql function
that is stable or immutable)
etc

>> In the end, the performance of your queries is going to be directly
>> related to how well you map the problem into relational logic
>
>
> It's not just that, though; it's quite possible to think relationally and
> still fall down. There are plenty of cases where the human eye can see that
> a modular function can be inlined, but the optimizer can't.  I have a
> pathological case: a query against a database with just a few thousand users
> takes 1.5 seconds on fast hardware, because it ends up scanning a cartesian
> product to get 16 rows, even before you get to the nested loops. In fact,
> most of the time the optimizer does a great job of inlining all my
> set-returning functions, once 9.0.6/9.1.2 rolled out.
>
> I've seen at least three equally ominous pieces that would have to happen to
> allow DRY, composable SQL:
>
> 1. Optional optimization of non-recursive WITH
> 2. Optional pushdown of WHERE clauses into GROUP BY[1]
> 3. LATERAL
>
> AFAIK, none of these are on anyone's short-term to-do list, and I'm sure
> none are easy.
>
> [1] Since this is my current favorite problem, the pathological case is:
>
> select questions.id
> from questions
> join (
>  select u.id
>  from users as u
>  group by u.id
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> With users.id as a primary key, it's obvious that this can return only one
> row, but it has to scan the users table to get there.  See the "Subjquery in
> a JOIN not getting restricted?" thread on pgsql-performance for Tom's
> explanation of why that's a hard problem to solve.

Yeah -- here and there you run into difficult to optimize queries.
(For my part, I'd just have converted that to WHERE EXISTS for the
semi-join).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query for all values linked to a field

2011-12-13 Thread Merlin Moncure
On Tue, Dec 13, 2011 at 2:37 PM, anisoptera  wrote:
> hi,
>
> i'm trying to output all values that can be reached from a specific record
> by a foreign key link.
>
> for example, if tblimage has image_id, image_info and tblstack has
> stack_id=tblimage.image_id, stack_info_1, stack_info_2
> i want to be able to get image_id, image_info, stack_info_1, stack_info_2.
>
> of course, my database has many more tables and links. i can use
> information_schema to get the links and recursively troll through all the
> tables, i was just wondering if there's a way to do this with one query
> instead of programmatically.

I didn't quite parse that.Using recursive queries, it should be
able to pretty easily query out, given a table name, all tables that
link to that table and the columns of the keys in the constraints
(and, if necessary, the links from other tables to those tables).

If I was looking for specific *records*, not tables and columns, I
would probably be generating a query from the above.  (If I'm
understanding you properly that is).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] order of (escaped) characters in regex range

2011-12-13 Thread Merlin Moncure
On Tue, Dec 13, 2011 at 10:53 AM, David Johnston  wrote:
> Aside from backward compatibility, and the various warnings, is there any
> reason to prefer dollar-quoting over a non-SQL-escaped string literal (i.e.,
> '3[A-Z\-\(\) ]'   ) ?

yeah -- because sooner or later you have to stick a single quote in
there (of course, you can double the ', but I personally think that's
awful).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] order of (escaped) characters in regex range

2011-12-13 Thread Merlin Moncure
On Tue, Dec 13, 2011 at 7:51 AM, InterRob  wrote:
> Thanks guys, i see what you mean.
>
> I do intend to use the PG escaping, in order to avoid that annoying
> warning... Hence, my expression should indeed be:
> SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');
>
> In the above expression i added the parentheses as I whish to match these as
> well :))

I advise dollar quoting when writing complicated regular expressions:

E'(3[A-Z\\-\\(\\) ])'
becomes
$$(3[A-Z\-\(\) ])$$

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Controlling complexity in queries

2011-12-12 Thread Merlin Moncure
On Sun, Dec 11, 2011 at 9:10 PM, Craig Ringer  wrote:
> On 12/12/2011 09:15 AM, David Johnston wrote:
>>
>> Use a WITH clause on the SELECT statement.
>
> Note that WITH is an optimisation fence, so if you're relying on Pg pushing
> WHERE clauses down into subqueries or anything like that you may find that
> your query runs a LOT slower when broken up as WITH expressions.
>
> There's been talk of a Pg extension that allows optimisation through WITH,
> but it's not currently possible.
>
> Another option is to wrap things up in SQL functions or views.

A note about that:  abstracting via views vs functions is a completely
different approach.  Views will not significantly change the way your
query works now -- they are inlined as macros and the final query is
going to be more or less the same as your hand rolled one.

Breaking your large queries into functions OTOH can make significant
changes to the plan, often to the worse.  This is because functions,
especially complicated plpgsql set returning ones with procedural
logic, are black boxes to the sql optimizer.  The upshot of this is
that functions tend to encourage nestloop style plans because the
function has to be serially executed.

Functions (also WITH) are great in that they can provide very high
levels of abstraction when composing complex queries, but there is a
price in the sense that you are taking away some of the database's
ability to plan and optimize the query.  I prefer views unless there
is a good reason not to use them.

In the end, the performance of your queries is going to be directly
related to how well you map the problem into relational logic...the
database thinks relationally, so you (the OP) should learn to do so as
well.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hope for a new PostgreSQL era?

2011-12-08 Thread Merlin Moncure
On Thu, Dec 8, 2011 at 10:11 AM, Marc Cousin  wrote:
> Le Thu, 8 Dec 2011 09:29:28 -0600,
> Merlin Moncure  a écrit :
>
>> On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin 
>> wrote:
>> > Le Thu, 8 Dec 2011 12:27:22 +,
>> > Simon Riggs  a écrit :
>> >
>> >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer
>> >>  wrote:
>> >>
>> >> > Areas in which Pg seems significantly less capable include:
>> >>
>> >> Please can you explain the features Oracle has in these area, I'm
>> >> not clear. Thanks.
>> >
>> > Maybe I can answer from my own Oracle experience. I hope it will be
>> > what Craig had in mind :)
>> >
>> >>
>> >>
>> >> > - admission control, queuing and resource limiting to optimally
>> >> > load a machine. Some limited level is possible with external
>> >> > pooling, but only by limiting concurrent workers.
>> >
>> > Oracle has natively two ways of handling inbound connections:
>> > - Dedicated, which is very similar to the PostgreSQL way of
>> > accepting connections: accept(), fork() and so on
>> > - Shared, which is based on processes listening and handling the
>> >  connections (called dispatchers) and processes doing the real work
>> >  (called workers, obviously). All of this works internally with
>> >  some sort of queuing and storing results in shared memory (I don't
>> >  remember the details of it)
>> >
>> > The advantage of this second architecture being of course that you
>> > can't have more than N workers hitting your database
>> > simultaneously. So it's easier to keep the load on the server to a
>> > reasonable value.
>>
>> you have a couple of very good options to achieve the same in postgres
>> -- pgbouncer, pgpool.
>>
>
> I wish it was the same (I use and like both pgbouncer and pgpool too,
> and they do a good job, I'm not arguing on that). But unfortunately it
> isn't: you still have the notion of session for each connected client
> in Oracle when using the shared servers model.
>
> It means you keep your session variables, your prepared statements,
> your running transaction, etc… in each individual session while having
> the multiplexing equivalent of a 'statement level' from pgbouncer.

yeah -- maybe we could use a server side feature that could allow you
to save a session state and load it up later to make life easier for
connection pooled applications.  however, it's not really that much
work to organize most of the things you'd use for this in an
application managed session instead of database managed one.

regarding the "enterprises won't use community supplied postgresql add
ons" point, this completely true in many cases. I do think pgbouncer
should be seriously considered for advancement as a core feature. That
said, this should be done on its own merits, not to satisfy the
capricious whims of enterprises.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hope for a new PostgreSQL era?

2011-12-08 Thread Merlin Moncure
On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin  wrote:
> Le Thu, 8 Dec 2011 12:27:22 +,
> Simon Riggs  a écrit :
>
>> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer 
>> wrote:
>>
>> > Areas in which Pg seems significantly less capable include:
>>
>> Please can you explain the features Oracle has in these area, I'm not
>> clear. Thanks.
>
> Maybe I can answer from my own Oracle experience. I hope it will be what
> Craig had in mind :)
>
>>
>>
>> > - admission control, queuing and resource limiting to optimally
>> > load a machine. Some limited level is possible with external
>> > pooling, but only by limiting concurrent workers.
>
> Oracle has natively two ways of handling inbound connections:
> - Dedicated, which is very similar to the PostgreSQL way of accepting
>  connections: accept(), fork() and so on
> - Shared, which is based on processes listening and handling the
>  connections (called dispatchers) and processes doing the real work
>  (called workers, obviously). All of this works internally with
>  some sort of queuing and storing results in shared memory (I don't
>  remember the details of it)
>
> The advantage of this second architecture being of course that you
> can't have more than N workers hitting your database simultaneously. So
> it's easier to keep the load on the server to a reasonable value.

you have a couple of very good options to achieve the same in postgres
-- pgbouncer, pgpool.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hope for a new PostgreSQL era?

2011-12-08 Thread Merlin Moncure
On Wed, Dec 7, 2011 at 8:52 PM, Rodrigo E. De León Plicet
 wrote:
> http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/
>
> Some of the points mentioned:
> - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL
>  in some ways.  (Database extensibility if nothing else.)

There is simply no comparing mysql's backend programming features with
those of postgres.  Postgres is a development platform in a box, mysql
is not.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] dblink performance

2011-12-07 Thread Merlin Moncure
On Wed, Dec 7, 2011 at 3:37 PM, Marc Mamin  wrote:
>> >
>> > I find dblink being a nice tool as long as the data volume to
>> transfer
>> > remains low.
>> > I've evaluated it to implement a clustered Postgres environment, but
>> > gave it up due to the poor performances.
>> > Still waiting for the binary transfer before the next try ;-)
>>
>> Binary transfer is not a super big deal in terms of performance
>> actually in the general case. It's only substantially faster in a few
>> cases like timestamp, geo types, and of course bytea.  Lack of
>> parameterization I find to be a bigger deal actually -- it's more of a
>> usability headache than a performance thing.
>>
>> Also FYI binary dblink between databases is going to be problematic
>> for any non built in type unless the type oids are synchronized across
>> databases.
>>
>> merlin
>
> Thanks,
> ... so I don't really understand where all the time get lost in the
> example I posted a few weeks ago:
>
> http://archives.postgresql.org/pgsql-general/2011-09/msg00436.php

you wrote:
"select count(*) from
(
select * from dblink('a','select * from test_cluster')as t1(a int)
union all
select * from dblink('a','select * from test_cluster')as t1(a int)
union all
select * from dblink('a','select * from test_cluster')as t1(a int)
)foo

is about 5 times slower than an equivalent query run locally.

working with asynchron. queries (dblink_send_query) does not bring much
benefit
so that much time seems to be spent for transfer and merge"

it's not exactly fair to compare dblink to local query -- dblink
method requires having to marshal all the data over the protoocl and
un-marshal on the other end.  I was seeing 3-5x times difference vs
local query but this is to be expected.  note the 'union all' had
nothing to so with your performance problems.  Also the querying
server can do a very special trick for count(*) since it only needs to
check tuple visibility that can't be done when doing select count(*)
from ().

My point up thread is that dblink is going to be comparable to other
methods that involve querying the data off the server and doing the
processing on the client side.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] dblink performance

2011-12-07 Thread Merlin Moncure
On Wed, Dec 7, 2011 at 2:47 PM, Marc Mamin  wrote:
>> On Wed, Dec 7, 2011 at 10:14 AM, Alex Bible 
> wrote:
>> >
>> > Hello All,
>> >
>> > I'm currently on a development team utilizing PostgreSQL and we are
>> looking into the possibility of using dblink to reference an external
>> database (read only). Our system administrator and DBA were concerned
>> about the performance impact that cross-database queries would have on
>> a production application. Are there any known performance issues or
>> anything of the like that I would need to know before pushing this
>> issue further? I have been using PostgreSQL for the past couple months
>> but this is my first time using dblink. I really just need an opinion
>> from someone who has used this technology before. Thanks!
>>
>> dblink is a very thin wrapper for libpq.  From the querying database,
>> the overhead is pretty light -- basically the query is fired and the
>> results are interpreted from text into whatever the database has in
>> the receiving result via the various typein functions.   For all
>> intents and purposes, this is pretty similar to sending in queries
>> over the regular sql interface.  One gotcha of course is that libpq
>> buffers the entire result in memory which can be dangerous, so be
>> advised.
>>
>> To the receiving database, dblink queries are no different from any
>> other query, except that they are not parameterized.  Lack of
>> parameterization and access to the binary protocol are the major
>> downsides when using dblink.  IMNSHO, dblink needs a variable argument
>> call that uses the paramterized interface.  Also support for binary
>> transfer of data would be nice.
>>
>> merlin
>
> I find dblink being a nice tool as long as the data volume to transfer
> remains low.
> I've evaluated it to implement a clustered Postgres environment, but
> gave it up due to the poor performances.
> Still waiting for the binary transfer before the next try ;-)

Binary transfer is not a super big deal in terms of performance
actually in the general case. It's only substantially faster in a few
cases like timestamp, geo types, and of course bytea.  Lack of
parameterization I find to be a bigger deal actually -- it's more of a
usability headache than a performance thing.

Also FYI binary dblink between databases is going to be problematic
for any non built in type unless the type oids are synchronized across
databases.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] dblink performance

2011-12-07 Thread Merlin Moncure
On Wed, Dec 7, 2011 at 10:14 AM, Alex Bible  wrote:
>
> Hello All,
>
> I’m currently on a development team utilizing PostgreSQL and we are looking 
> into the possibility of using dblink to reference an external database (read 
> only). Our system administrator and DBA were concerned about the performance 
> impact that cross-database queries would have on a production application. 
> Are there any known performance issues or anything of the like that I would 
> need to know before pushing this issue further? I have been using PostgreSQL 
> for the past couple months but this is my first time using dblink. I really 
> just need an opinion from someone who has used this technology before. Thanks!

dblink is a very thin wrapper for libpq.  From the querying database,
the overhead is pretty light -- basically the query is fired and the
results are interpreted from text into whatever the database has in
the receiving result via the various typein functions.   For all
intents and purposes, this is pretty similar to sending in queries
over the regular sql interface.  One gotcha of course is that libpq
buffers the entire result in memory which can be dangerous, so be
advised.

To the receiving database, dblink queries are no different from any
other query, except that they are not parameterized.  Lack of
parameterization and access to the binary protocol are the major
downsides when using dblink.  IMNSHO, dblink needs a variable argument
call that uses the paramterized interface.  Also support for binary
transfer of data would be nice.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL DBA in SPAAAAAAAACE

2011-12-06 Thread Merlin Moncure
On Tue, Dec 6, 2011 at 10:56 AM, Joe Miller  wrote:
> You may have seen this, but RedGate software is sponsoring a contest
> to send a DBA on a suborbital space flight.
>
> And there is a PostgreSQL representativeme!
>
> https://www.dbainspace.com/finalists/joe-miller
>
> Voting is open for 7 days. Don't let one of those Oracle or SQL Server
> punks win :p

so jealous -- I didn't make the cut.  Well, you'll have my vote.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to convert HEX to ASCII?

2011-12-02 Thread Merlin Moncure
On Fri, Dec 2, 2011 at 2:01 PM, Pavel Stehule  wrote:
>>> so bytea_agg - one param aggregate has sense
>>>
>>> it's very easy to implement it
>>
>> yup:
>>
>> create aggregate bytea_agg (bytea)
>> (
>>  sfunc=byteacat,
>>  stype=bytea
>> );
>
> this is workaround :)
>
> without a memory preallocating it has same speed like cycle in plpgsql.

sure, but I prefer to code against the workaround because it's cleaner
and it makes things easier to port over when such a feature makes it
into core.  also, one liner sql has better chance of inlining as a
general rule.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to convert HEX to ASCII?

2011-12-02 Thread Merlin Moncure
On Fri, Dec 2, 2011 at 11:15 AM, Pavel Stehule  wrote:
> 2011/12/2 Merlin Moncure :
>> On Fri, Dec 2, 2011 at 10:42 AM, Marti Raudsepp  wrote:
>>> Sorry, but AFAICT this makes a mess of encodings and only works by
>>> pure luck. The server thinks it's sending the client LATIN1 text, but
>>> it's actually UTF8-encoded and the last decoding step is done by your
>>> terminal.
>>
>> yup -- your're right -- what a coincidence!  I still prefer the 1
>> liner sql variant vs plpgsql loop though.  nicely done.
>
> so bytea_agg - one param aggregate has sense
>
> it's very easy to implement it

yup:

create aggregate bytea_agg (bytea)
(
  sfunc=byteacat,
  stype=bytea
);

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to convert HEX to ASCII?

2011-12-02 Thread Merlin Moncure
On Fri, Dec 2, 2011 at 10:42 AM, Marti Raudsepp  wrote:
> Sorry, but AFAICT this makes a mess of encodings and only works by
> pure luck. The server thinks it's sending the client LATIN1 text, but
> it's actually UTF8-encoded and the last decoding step is done by your
> terminal.

yup -- your're right -- what a coincidence!  I still prefer the 1
liner sql variant vs plpgsql loop though.  nicely done.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limiting number of connections to PostgreSQL per IP (not per DB/user)?

2011-12-02 Thread Merlin Moncure
On Wed, Nov 30, 2011 at 6:03 PM, Tomas Vondra  wrote:
> On 29.11.2011 23:38, Merlin Moncure wrote:
>> On Tue, Nov 29, 2011 at 7:49 AM, Heiko Wundram  
>> wrote:
>>> Hello!
>>>
>>> Sorry for that subscribe post I've just sent, that was bad reading on my
>>> part (for the subscribe info on the homepage).
>>>
>>> Anyway, the title says it all: is there any possibility to limit the number
>>> of connections that a client can have concurrently with a PostgreSQL-Server
>>> with "on-board" means (where I can't influence which user/database the
>>> clients use, rather, the clients mostly all use the same user/database, and
>>> I want to make sure that a single client which runs amok doesn't kill
>>> connectivity for other clients)? I could surely implement this with a proxy
>>> sitting in front of the server, but I'd rather implement this with
>>> PostgreSQL directly.
>>>
>>> I'm using (and need to stick with) PostgreSQL 8.3 because of the frontend
>>> software in question.
>>>
>>> Thanks for any hints!
>>
>> I think the (hypothetical) general solution for these types of
>> problems is to have logon triggers.  It's one of the (very) few things
>> I envy from SQL Server -- see  here:
>> http://msdn.microsoft.com/en-us/library/bb326598.aspx.
>
> I'd like to have logon triggers too, but I don't think that's the right
> solution for this problem. For example the logon triggers would be
> called after forking the backend, which means overhead.
>
> The connection limits should be checked when creating the connection
> (validation username/password etc.), before creating the backend.

I disagree. I'm not convinced the overhead is really worth worrying
about and having a trigger under the user's control allows the feature
to cover a much broader array of scenarios.  If the overhead *was* a
big deal, then you should be using a connection pooler anyways.

>> Barring the above, if you can trust the client to call a function upon
>> connection I'd just do that and handle the error on the client with a
>> connection drop. Barring *that*, I'd be putting my clients in front of
>> pgbouncer with some patches to the same to get what I needed
>> (pgbouncer is single threaded making firewally type features quite
>> easy to implement in an ad hoc fashion).
>
> The connection pooler somehow easier and more complex at the same time.
>
> You can use connect_query to execute whatever you want after connecting
> to the database (not trusting the user to do that), but why would you do
> that? But the database will see the IP of the pgbouncer, not the IP of
> the original client. So executing the query is pointless.
>
> You can modify pgbouncer and it should be quite simple, but you can
> achieve different username/password (pgbouncer) to each customer,
> different database, set pool_size for each of the connections. It won't
> use IP to count connections, but the user's won't 'steal' connections
> from the other.

Yeah, pgbouncer is an ideal platform for coding feature like firewall
features, query whitelist, etc while still having SQL access to the
database if you need it.  You also have access to the client's real
ip, and can pass that down to some code that would presumably be
shared with your logon trigger.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to convert HEX to ASCII?

2011-12-02 Thread Merlin Moncure
On Fri, Dec 2, 2011 at 8:16 AM, Torsten Zuehlsdorff
 wrote:
> Damien Churchill schrieb:
>
>
>>> after several attempts I have finally succeeded in developing a
>>> urlencode()
>>> function to encode text correctly like defined in RFC 1738.
>>>
>>> Now i have a big problem: how to decode the text?
>>>
>>> Example:
>>> # SELECT urlencode('Hellö World!');
>>>      urlencode
>>> ---
>>>  Hell%C3%B6%20World%21
>>>
>>> Does anybody know a way to convert '%21' back to '!' and '%C3%B6' to 'ö'?
>>
>>
>> I've extracted the unquote method [0] from urllib in the python stdlib
>> that decodes urlencoded strings. Hopefully be some use!
>
>
> Not directly, but it gives me some helpful hints. For example i'm now able
> to decode some basic characters, for example:
>
> # SELECT chr(x'21'::int);
>  chr
> -
>  !
> (1 row)
>
> But i clearly have a missunderstanding of other chars, like umlauts or utf-8
> chars. This, for example, should return a 'ö':
>
> # SELECT chr(x'C3B6'::int);
>  chr
> -
>  쎶
> (1 row)
>
> Also i'm not sure how to figure out, when to decode '%C3' and when to decode
> '%C3%B6'.
>
> Thanks for your help,

You're welcome.  get ready for some seriously abusive sql:

create or replace function unencode(text) returns text as
$$
with q as
(
  select (regexp_matches($1, '(%..|.)', 'g'))[1] as v
)
select string_agg(case when length(v) = 3 then chr(replace(v, '%',
'x')::bit(8)::int) else v end, '') from q;
$$ language sql immutable;

set client_encoding to latin1;
SET

postgres=# select unencode('Hell%C3%B6%20World%21');
   unencode
---
 Hellö World!
(1 row)

Time: 1.908 ms

(maybe this isn't really an immutable function, but oh well).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limiting number of connections to PostgreSQL per IP (not per DB/user)?

2011-11-30 Thread Merlin Moncure
On Tue, Nov 29, 2011 at 7:49 AM, Heiko Wundram  wrote:
> Hello!
>
> Sorry for that subscribe post I've just sent, that was bad reading on my
> part (for the subscribe info on the homepage).
>
> Anyway, the title says it all: is there any possibility to limit the number
> of connections that a client can have concurrently with a PostgreSQL-Server
> with "on-board" means (where I can't influence which user/database the
> clients use, rather, the clients mostly all use the same user/database, and
> I want to make sure that a single client which runs amok doesn't kill
> connectivity for other clients)? I could surely implement this with a proxy
> sitting in front of the server, but I'd rather implement this with
> PostgreSQL directly.
>
> I'm using (and need to stick with) PostgreSQL 8.3 because of the frontend
> software in question.
>
> Thanks for any hints!

I think the (hypothetical) general solution for these types of
problems is to have logon triggers.  It's one of the (very) few things
I envy from SQL Server -- see  here:
http://msdn.microsoft.com/en-us/library/bb326598.aspx.

Barring the above, if you can trust the client to call a function upon
connection I'd just do that and handle the error on the client with a
connection drop. Barring *that*, I'd be putting my clients in front of
pgbouncer with some patches to the same to get what I needed
(pgbouncer is single threaded making firewally type features quite
easy to implement in an ad hoc fashion).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental backup with RSYNC or something?

2011-11-22 Thread Merlin Moncure
On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula  wrote:
> Hi.
>
> I currently have a cronjob to do a full pgdump of the database every
> day. And then gzip it for saving to my backup drive.
>
> However, my db is now 60GB in size, so this daily operation is making
> less and less sense. (Some of you may think this is foolish to begin
> with).
>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?
>
> Searching google leads to complex things like "incremental WAL" and
> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
> straightforward core solution without additional software or PHD
> degrees.

My standard advice to this problem is to do a HS/SR setup which solves
a number of problems simultaneously.  It still makes sence to take a
full snapshot once in a while though -- but you can take it from the
standby.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] possible race condition in trigger functions on insert operations?

2011-11-22 Thread Merlin Moncure
On Tue, Nov 22, 2011 at 2:43 PM, David Johnston  wrote:
> Just create a single sequence for each year and then call the proper one
> on-the-fly.  You can create multiple sequences in advance and possible even
> auto-create the sequence the first time one is attempted to be used in a
> given year.  If you can live with possible (but probably unlikely) gaps in
> the sequence then all the concurrency will be handled for you and you can
> focus on writing a function that, given a year, will return the proper
> value.

I personally think the 'record the next to be inserted value' in a
table somewhere is better unless you are trying to support a lot of
concurrent operations.  Also the gap issue is more likely to come up
than you're letting on -- a rolled back transaction is all it takes.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] possible race condition in trigger functions on insert operations?

2011-11-22 Thread Merlin Moncure
On Tue, Nov 22, 2011 at 11:25 AM, Kenneth Tilton  wrote:
> Bit of a trigger NOOB Q:
>
> I am trying to use a trigger function to automatically populate new
> rows in a table with  a public ID of the form -NNN such that the
> 42nd row created in 2011 would get the ID "2011-042". Each row is
> associated via an iasid column with a row in an audit table that has a
> timestamp column called created. This works OK, but I am worried about
> two rows getting the same case_no if they come in at the same time
> (whatever that means):
>
> declare
>  case_yr integer;
>  yr_case_count bigint;
> begin
>  select date_part('year', created) into case_yr
>         from audit
>         where audit.sid = NEW.iasid;
>
>  select count(*) into yr_case_count
>    from fwa_case, audit
>    where fwa_case.iasid=audit.sid
>      and date_part('year', created) = case_yr;
>
>  NEW.case_no = to_char( case_yr, '' ) || '-' ||
> to_char(1+yr_case_count, 'FM000');
>  return NEW;
> end;
>
> Do I have to worry about this, or does ACID bail me out? If the
> former, what do I do? I am thinking first put a uniqueness constraint
> on the column and then figure out how to do retries in a trigger
> function.

ACID does not bail you out -- you've put no synchonization in to
prevent to concurrent counts coming at roughly the same time and
getting the same answer.  A uniqueness constraint is definitely a good
idea.  In terms of doing a gapless sequence generally, see here:
http://www.varlena.com/GeneralBits/130.php.  Basically the general
idea is to keep a counter field somewhere that you lock and update.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] function within a function/rollbacks/exception handling

2011-11-08 Thread Merlin Moncure
On Tue, Nov 8, 2011 at 9:13 AM, Lori Corbani  wrote:
> Richard,
>
> I manage to find one comment about an implicit rollback in a section of
> the developer's guide when porting from Oracle-to-Postgres:  "when an
> exception is caught by an EXECPTION clause, all database changes since
> the block's BEGIN are automatically rolled back"
>
> Do you know of any other place in the documentation this discusses the
> implicit rollback in more detail?  Or do you know of a good online site
> that contains some good examples or best-practices for these
> function-to-function calls?
>
> We are starting to port our Sybase database (200 stored procedures) over
> to Postgres and I am finding the online Postgres documentation and the
> Douglas book a bit lacking in some of the more specific examples that I
> am interested in finding.
>
> Thanks.
> Lori
>
>
> 
> From: Lori Corbani [l...@informatics.jax.org]
> Sent: Tuesday, November 08, 2011 8:46 AM
> To: Richard Huxton
> Cc: Lori Corbani; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] function within a function/rollbacks/exception handling
>
> Richard,
>
> I manage to find one comment about an implicit rollback in a section of
> the developer's guide when porting from Oracle-to-Postgres:  "when an
> exception is caught by an EXECPTION clause, all database changes since
> the block's BEGIN are automatically rolled back"
>
> Do you know of any other place in the documentation this discusses the
> implicit rollback in more detail?  Or do you know of a good online site
> that contains some good examples or best-practices for these
> function-to-function calls?
>
> We are starting to port our Sybase database (200 stored procedures) over
> to Postgres and I am finding the online Postgres documentation and the
> Douglas book a bit lacking in some of the more specific examples that I
> am interested in finding.

Implicit rollback is a fundamental underpinning of transactions in
SQL.  Any error will abort either A. the entire transaction or B. all
activity since the last savepoint.

In all languages except plpgsql savepoints are explicitly set (with
SAVEPOINT command) and you restore to the savepoint with ROLLBACK TO.

Savepoints in plpgsql are implicitly created anytime you enter a
BEGIN/END block with an EXCEPTION handler(s) defined.  Unlike vanilla
SQL savepoints, plpgsql savepoints can nest so that each EXCEPTION
block you enter is adding a error handler onto a stack (which is much
more similar to exceptions in the vein of java or C++).  Whether you
call functions from within functions or not is totally immaterial to
error handling generally; you can have multiple nested handlers in a
single function, or none at all.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Behavior of negative OFFSET

2011-11-07 Thread Merlin Moncure
On Mon, Nov 7, 2011 at 3:47 PM, Robert James  wrote:
> I've been using a query on Postgres 8.4 with a negative OFFSET, which
> works fine:
>
>   SELECT DISTINCT s.* FROM s WHERE ... ORDER BY s.bday ASC, s.name
> ASC LIMIT 15 OFFSET -15
>
> When I run the same query on Postgres 9.1, I get an error:
>   ERROR: OFFSET must not be negative
>
>
> Question:
> 1. Was this changed in a version of Postgres? Is this configurable?
> 2. How do I get the original behavior of negative offsets?

the original behavior was undefined. to kinda sorta get it,
create function oldoffset(int) returns int as
$$
  select case when $1 < 0 then 0 else $1 end;
$$ language sql immutable;

select v from generate_series(1,15) v limit 15 offset oldoffset(-15);

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL references in the Middle East

2011-11-07 Thread Merlin Moncure
On Sun, Nov 6, 2011 at 11:04 PM, Jan Geluk (Collis)  wrote:
> Dear all,
>
> For our customer in Dubai, we are looking for references of PostgreSQL 
> implementations in the Middle East, preferably in the GCC countries, 
> preferably in the United Arab Emirates (Dubai / Abu Dhabi).
>
> Please help me in this matter.
>
> Thanks in advance!

some quick googling turned out this company in the UAE that does
postgres hosting...maybe you might find some good contacts there:
http://www.uaehosting.org/

this guy looks like a good contact:
http://www.kannayath.com/

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Recommendations for SSDs in production?

2011-11-02 Thread Merlin Moncure
On Wed, Nov 2, 2011 at 12:01 PM, Benjamin Smith
 wrote:
> Well,
>
> After reading several glowing reviews of the new OCZ Vertex3 SSD last spring,
> we did some performance testing in dev on RHEL6. (CentOS)
>
> The results were nothing short of staggering. Complex query results returned
> in 1/10th the time as a pessimistic measurement. System loads dropped from 2+
> to 0.1 or less.
>
> Wow.
>
> So after months of using this SSD without any issues at all, we tentatively
> rolled this out to production, and had blissful, sweet beauty until about 2
> weeks ago, now we are running into sudden death scenarios. We have excellent
> backup system, so the damage is reduced to roadbumps, but are looking for a
> longer term solution that doesn't compromise performance too much.
>
> The config is super-basic, basically no tuning at all was done:
>
> # fdisk /dev/NNN;
> mke2fs -j $partn;
> mount $partn /var/lib/pgsql;
> rsync -vaz /var/lib/pgsql.old /var/lib/pgsql;
> service postgresql start;
>
> I don't mind spending some money. Can anybody comment on a recommended drive
> in real world use?
>
> After some review I found:
>
> 1) Micron P300 SSD: claims excellent numbers, can't find them for sale
> anywhere.
>
> 2) Intel X25E - good reputation, significantly slower than the Vertex3. We're
> buying some to reduce downtime.
>
> 3) OCZ "Enterprise" - reviews are mixed.
>
> 4) Kingston "Enterprise" drives appear to be identical to consumer drives with
> a different box.
>
> 5) STEC drives are astronomically expensive. (EG: "You're kidding, right?")
>
> 6) Corsair consumer drives getting excellent reviews, Aberdeen Inc recommended
> in use with RAID 1.
>
> 7) Seagate Pulsar drives, XT.2 drives are expensive SLC but can't find a
> vendor, Pulsar .2 drives are more available but having trouble finding reviews
> other than rehashed press releases.
>
> Thanks!

The X25-E is now essentially being replaced by the 710.  I would look there.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] select where not exists returning multiple rows?

2011-11-02 Thread Merlin Moncure
On Wed, Nov 2, 2011 at 8:20 AM, Chris Dumoulin  wrote:
> On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:
>>
>> On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:
>>>
>>> And we're doing an insert like this:
>>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>>> ( SELECT NULL FROM Item WHERE Sig=$4)
>>>
>>> In this case $1 and $4 should always be the same.
>>
>> FWIW, If they're always going to be the same, you can put that it the
>> query,
>> like so:
>>
>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>> ( SELECT NULL FROM Item WHERE Sig=$1)
>>
>> Saves a parameter.
>>
>>> I don't see how it's possible to get duplicate rows here, unless
>>> maybe the "select where not exists" is somehow returning multiple
>>> rows.
>>> Any ideas what's going on here?
>>
>> As pointed out by others, you don't say if it this is a race condition
>> between processes or if it always does this.
>
> It's only happening intermittently, but it doesn't appear to be a race
> condition; I'm pretty sure there's only one thread or process issuing this
> statement.

Pretty sure? you need to be 100% sure.  *Somebody* was worried about
concurrency in the code, because the actual statement in the log has
'FOR UPDATE' -- your example does not.   Intermittent failures is
classic race condition behavior.  The reason for the race is that your
select happens before the insert does so that process A and B can
select at approximately the same time and both make the decision to
insert on the same key...bam.  Logging all statements will positively
prove this.

select  where exists ... does not return > 1 rows ever and
there is precisely 0% chance you've uncovered a server bug that is
causing it to :-).

solve the problem by:
a: LOCK the table before making the insert, making sure to wrap the
lock and the insert in the same transaction (this should be the
default method)
b. retry the transaction on failure in the client
c. or on the server if you push the insert into a function.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] select where not exists returning multiple rows?

2011-11-02 Thread Merlin Moncure
On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin  wrote:
> We're using postgresql 9.1, and we've got a table that looks like this:
>
> testdb=# \d item
> Table "public.item"
>  Column   |   Type   | Modifiers
> ---+--+---
>  sig   | bigint   | not null
>  type  | smallint |
>  data  | text     |
> Indexes:
>    "item_pkey" PRIMARY KEY, btree (sig)
>
> And we're doing an insert like this:
> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT
> NULL FROM Item WHERE Sig=$4)
>
> In this case $1 and $4 should always be the same. The idea is to insert if
> the row doesn't already exist.
> We're getting primary key constraint violations:
>
> 011-10-31 22:50:26 CDT STATEMENT:  INSERT INTO Item (Sig, Type, Data) SELECT
> $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE)
> 2011-10-31 22:52:56 CDT ERROR:  duplicate key value violates unique
> constraint "item_pkey"
> 2011-10-31 22:52:56 CDT DETAIL:  Key (sig)=(-4668668895560071572) already
> exists.
>
> I don't see how it's possible to get duplicate rows here, unless maybe the
> "select where not exists" is somehow returning multiple rows.
> Any ideas what's going on here?

race condition.  lock the table first or retry the insert.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [PL/pgSQL] function call

2011-10-31 Thread Merlin Moncure
On Mon, Oct 31, 2011 at 8:31 AM, Tarlika Elisabeth Schmitz
 wrote:
> I have created a function log_insert(), which is simply a shorthand for
> an INSERT table and which I want to call from various trigger functions.
>
> CREATE OR REPLACE FUNCTION log_insert(vseverity text, vtrigger text,
> vtriggertable text, vtriggerid text, vmessage text) RETURNS boolean AS
> $BODY$
> BEGIN
>  INSERT INTO log
>  (severity, trigger,triggertable, triggerid, message)
>  VALUES
>  (vseverity, vtrigger,vtriggertable, vtriggerid, vmessage);
> END
> $BODY$
> LANGUAGE plpgsql VOLATILE;
>
>
> I tried:
> log_insert('I', TG_NAME, TG_TABLE_NAME, NEW.id, 'some msg');
> => I get a syntax error on CREATE TRIGGER.
>
> SELECT log_insert(...)
> => passes the syntax check but throws an error when run:
> "function log_insert(unknown, unknown, unknown, integer, unknown) does
> not exist Hint: No function matches the given name and argument types.
> You might need to add explicit type casts."
>
>
> Any help would be greatly appreciated.

There is some context you are not passing here -- the log_insert
function is being inside a trigger function which is where your error
always is.  However, in pl/pgsql, you always call functions with
PERFORM or SELECT depending if you want to process the result.

also, FWIW, I don't like a simple wrapper for insert statement like
that -- the syntax brevity is outweighed by the loss of SQL features
such as being able to pass DEFAULT for columns.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nextval skips values between consecutive calls

2011-10-28 Thread Merlin Moncure
On Fri, Oct 28, 2011 at 11:32 AM,   wrote:
>> -Original Message-
>> From: Merlin Moncure [mailto:mmonc...@gmail.com]
>> Sent: Friday, October 28, 2011 8:29 PM
>> To: Dmitry Epstein
>> Cc: t...@sss.pgh.pa.us; pgsql-general@postgresql.org; Peter Gagarinov
>> Subject: Re: [GENERAL] nextval skips values between consecutive calls
>>
>> On Fri, Oct 28, 2011 at 10:28 AM,   wrote:
>> >> -Original Message-
>> >> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
>> >> Sent: Friday, October 28, 2011 7:22 PM
>> >> To: Dmitry Epstein
>> >> Cc: pgsql-general@postgresql.org; Peter Gagarinov
>> >> Subject: Re: [GENERAL] nextval skips values between consecutive calls
>> >>
>> >>  writes:
>> >> > -- This is rather surprising
>> >> > select nextval(' test_sequence'), generate_series(1, 1); -- 3, 1
>> >> > select nextval(' test_sequence'), generate_series(1, 1); -- 5, 1
>> >>
>> >> > Is there any explanation for why nextval skips a value in the second
>> case?
>> >>
>> >> The targetlist is evaluated twice because of the presence of the
>> >> set-returning function.  On the second pass, generate_series reports
>> >> that it's done, and so evaluation stops ... but nextval() was already 
>> >> called a
>> second time.
>> >>
>> >> SRFs in SELECT targetlists are a pretty dangerous thing, with a lot
>> >> of surprising behaviors, especially if you combine them with other
>> >> volatile functions.  I recommend avoiding them.  They'll probably be
>> >> deprecated altogether as soon as we have LATERAL.
>> >>
>> >>                       regards, tom lane
>> >
>> > What's a good alternative in the meantime? Suppose I need to
>> > incorporate some unnests into my select, for example? (Well, I already
>> > found one alternative that seems to work, but I am not sure that's
>> > optimal.)
>>
>> Typically for guaranteed LATERAL-like behaviors you need to use a CTE.
>>
>> merlin
>
> What's a CTE?

with foo as (select generate_series(1, 1) ind)
select nextval(' test_sequence'), ind from foo;

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nextval skips values between consecutive calls

2011-10-28 Thread Merlin Moncure
On Fri, Oct 28, 2011 at 10:28 AM,   wrote:
>> -Original Message-
>> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
>> Sent: Friday, October 28, 2011 7:22 PM
>> To: Dmitry Epstein
>> Cc: pgsql-general@postgresql.org; Peter Gagarinov
>> Subject: Re: [GENERAL] nextval skips values between consecutive calls
>>
>>  writes:
>> > -- This is rather surprising
>> > select nextval(' test_sequence'), generate_series(1, 1); -- 3, 1
>> > select nextval(' test_sequence'), generate_series(1, 1); -- 5, 1
>>
>> > Is there any explanation for why nextval skips a value in the second case?
>>
>> The targetlist is evaluated twice because of the presence of the 
>> set-returning
>> function.  On the second pass, generate_series reports that it's done, and so
>> evaluation stops ... but nextval() was already called a second time.
>>
>> SRFs in SELECT targetlists are a pretty dangerous thing, with a lot of 
>> surprising
>> behaviors, especially if you combine them with other volatile functions.  I
>> recommend avoiding them.  They'll probably be deprecated altogether as
>> soon as we have LATERAL.
>>
>>                       regards, tom lane
>
> What's a good alternative in the meantime? Suppose I need to incorporate some 
> unnests into my select, for example? (Well, I already found one alternative 
> that seems to work, but I am not sure that's optimal.)

Typically for guaranteed LATERAL-like behaviors you need to use a CTE.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] missing chunk 0 for toast value ...

2011-10-25 Thread Merlin Moncure
On Tue, Oct 25, 2011 at 3:47 PM, Andrew Hammond
 wrote:
> I found the following error message in my logfiles.
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR:  missing chunk number 0
> for toast value 2411466 in pg_toast_2619
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-2]
> user=nutricate,db=admin_production STATEMENT:  SELECT "devices"."id",
> "devices"."name", "devices"."info", "devices"."pos_id",
> "devices"."device_group_id", "devices"."header_id", "devices"."footer_id",
> "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id",
> "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version",
> "devices"."device_status", "devices"."entity_id", "devices"."address",
> "devices"."created_by", "devices"."create_method", "devices"."created_on",
> "devices"."modified_by", "devices"."updated_on", "devices"."version_id",
> "devices"."unique_id", "devices"."hw_key", "devices"."config_status",
> "devices"."activated_on", "devices"."store_id",
> "devices"."last_status_update", "devices"."loaded_on",
> "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id",
> "devices"."scanner_comm_id", "devices"."public_address",
> "devices"."hostname", "devices"."update_sw_version",
> "devices"."proxy_address", "devices"."proxy_type_id",
> "devices"."build_error",
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-3]  "devices"."local_info",
> "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN
> (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" <
> 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )
> I tried reproducing it with
> SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM
> "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" < 2597 ) OR U0."id" = 140
> )) AND "devices"."auth_code" = E'0063' )
> This returned no rows and did not reproduce the error. I tried reproducing
> with an even broader approach
> CREATE TEMP TABLE foo AS SELECT * FROM devices;
> CREATE TEMP TABLE bar AS SELECT * FROM entities;
> Still no repro. This is the second time I've seen this error, and the last
> time it also did not reproduce. I'm wondering a couple of things. First, is
> there some other way to reproduce this error? Second, would simply deleting
> the row cause the problem to go away? I wouldn't think so, but why else is
> it not reproducing? Third, what is the recommended course of action here. We
> have hourly backups, but this is a production database, so I would prefer
> not to have to restore. I tested a restore from an hour ago and it ran
> successfully.

hrm, that's bizarre -- version etc?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Primary key Index Error

2011-10-25 Thread Merlin Moncure
On Tue, Oct 25, 2011 at 2:41 AM, Raghavendra
 wrote:
> On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P  wrote:
>>
>> Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
>> 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
>> duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
>> Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
>> 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
>> 8034.385 ms statement: select pg_stop_backup(); In between stop and start
>> process server_host_name is receiving all type of DML & DDL and generating
>> new WAL file Taking base backup in between start and stop process Client Log
>> Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
>> 05:16:18.202 BST 28858 LOG: could not open file
>> "pg_xlog/271047B1008C" (log file 18353, segment 140): No such
>> file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
>> 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
>> 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
>> BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
>> client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
>> HINT: If you are not restoring from a backup, try removing the file
>> "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
>> postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
>> process (PID 28858) exited with exit code 1 Oct 25 05:16:18
>> client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
>> LOG: aborting startup due to startup process failure Oct 25 05:20:53
>> client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
>> LOG: could not open file "pg_xlog/271047B10068" (log file 18353,
>> segment 104): No such file or directory Oct 25 05:20:53 client_server_name
>> postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
>> find redo location referenced by checkpoint record Oct 25 05:20:53
>> client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
>> HINT: If you are not restoring from a backup, try removing the file
>> "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
>> postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
>> process (PID 29030) exited with exit code 1 Oct 25 05:20:53
>> client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
>> LOG: aborting startup due to startup process failure manually copy following
>> file to pg_xlog folder 271047B1008C 271047B10068
>> After words i can start postgres and accessing the database , but same
>> error
>
> As per the logs, do you see missing XLOG files in Archive Destination ? becz
> these kind of situations mostly missing files will be in WAL-Archive
> location. You need to copy to pg_xlog directory and start the instance.
>
> As Merlin Said, you need to dig more to know why its crashing by increasing
> the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
> space in log-location, so make sure you have good space for logs to get what
> exactly happening at the time of backup in particular. Am not sure whether
> its safe to attach ***backtrace*** to instance for information.

yeah. also, what's the setting of archive_command (or is it even set)?
 taking a 'hot' filesystem backup without having an archive_command
and not doing any other intervention to guarantee the necessary WAL
segments are present will not give you a complete backup.  my money is
on you having an invalid backup procedure.  the only way to take a
filesystem snapshot without dealing with WAL files is to bring the
database down.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Primary key Index Error

2011-10-24 Thread Merlin Moncure
On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P  wrote:
> Postgres recovered from data folder , after that some queries shows error
>
>
> select * from table2 order by app_id  ; - its work ( 5000 data)
> select * from table2 order by app_id   desc ; - its work
>
> Here app_id contains  binary index
>
>
> select * from table2 order by id  ; - its work ( 5000 data)
> select * from table2 order by id   desc ; - shows following error
>
> manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
> page at block 311121
> manoj [local] postgres HINT:  Please REINDEX it.
>
> index "tbl2_id_pkey" is primary key
>
> Reindex not pratical this table (250 GB data)
>
> How i can solve this ?

It looks like you have data corruption.  Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online.  I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup.  Did you have any
unexpected power events?  Server crashes?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Primary key Index Error

2011-10-24 Thread Merlin Moncure
On Mon, Oct 24, 2011 at 8:32 AM, manoj  wrote:
> On 10/24/2011 06:38 PM, Merlin Moncure wrote:
>
> On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P  wrote:
>
> Postgres recovered from data folder , after that some queries shows error
>
>
> select * from table2 order by app_id  ; - its work ( 5000 data)
> select * from table2 order by app_id   desc ; - its work
>
> Here app_id contains  binary index
>
>
> select * from table2 order by id  ; - its work ( 5000 data)
> select * from table2 order by id   desc ; - shows following error
>
> manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
> page at block 311121
> manoj [local] postgres HINT:  Please REINDEX it.
>
> index "tbl2_id_pkey" is primary key
>
> Reindex not pratical this table (250 GB data)
>
> How i can solve this ?
>
> It looks like you have data corruption.  Your first priority should be
> to do the minimum necessary to get a usable database dump and then to
> get back online.  I would consider your database off line until this
> is done. Essentially, the only practical ways to repair the error you
> see would be to reindex or restore from backup.  Did you have any
> unexpected power events?  Server crashes?
>
> merlin
>
>   Actually this is backup server ,no power failure & Server crashes happens
> in between , The data folder is base backup(pg_start_backup) on the server.
> pg_dump not practical in this case  because of data size.
>   After recovering from base backup run  WAL recovery file successfully and
> its updates current date data.
>   When i am running query on big table(more than 5 Million  data)  shows
> error other wise its work fine.
>
>   select id , name  from table2 order by id  ; It uses the index
>
>   select id , name  from  table2 order by id desc ; - shows the error
>
> Error happens only using (DESC clause against primary key) , all other case
> its work fine
>
>  REINDEX will take ages

do you happen to still have the database logs on the backup from
startup to end of recovery?  anything interesting in there?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] a set of key/value pairs

2011-10-20 Thread Merlin Moncure
On Thu, Oct 20, 2011 at 1:53 PM, J.V.  wrote:
> Does anyone have a specific example of how to:
>    1) create a variable that is a set or array of key value pairs?
>    2) how to populate each element of the array (each element being a key /
> value pair)
>    3) how to iterate through the list or array retrieving each key/value
> pair?

create type pair_t as (key text, value text);

select array[('a', '123'), ('b', '234')]::pair_t[];

select (unnest(array[('a', '123'), ('b', '234')]:: pair_t[])).*;

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reading PG data from MySQL stored procedure

2011-10-20 Thread Merlin Moncure
On Thu, Oct 20, 2011 at 2:22 PM, Gauthier, Dave  wrote:
> Here's a real long shot, but what the heck...
>
>
>
> I have a user that's using a system that gives him a single MySQL DB handle
> to work with.  He must get all his data through this handle.  He wants some
> of my PG based data.  Not only can't he open a new DB handle to my PG DB,
> but he cannot even run something at the system level (otherwise I'd just
> give him a "psql -P pager=off  -c "select foo from bar" "). So it has to
> come "through" MySQL somehow, MySQL V5.1 no less.
>
>
>
> It appears that they have some sort of external language capability for
> stored procedures in their V6.  And they have something else which'll run an
> external object (UDF something or other).  But what I'm throwing out there
> is a question regarding any known data interchange functionality that might
> exist between these 2 different SQL DB engines.

First question: why in the world can't you do the processing on the
client side?  There are gajillion and one data transfer/ETL tools!
mysql of that has pretty much zero server side extensiblity AFAIK.

Of course, it would be quite possible from postgres to query mysql
database, but I digress...

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Composite type, DEFAULT, NOT NULL, REFERENCES

2011-10-19 Thread Merlin Moncure
On Wed, Oct 19, 2011 at 9:32 AM, erhaminus  wrote:
>
> Hi,
>
> Is a way to define DEFAULT, NOT NULL and REFERENCES for members of composite 
> type?
>
> For example:
>
> -- type
> CREATE TYPE bibl.bibliography AS
> (
>        edition TEXT,
>        publisher_id BIGINT
> );
>
> -- table def
> create table bibl.monograph
> (
>        id BIGSERIAL PRIMARY KEY NOT NULL,
>        bibl bibl.bibliography
> );
>
> -- how to do that, assuming that:
> --    we want edition NOT NULL, DEFAULT with value "first"
> --    we want publisher_id reffering to bibl.publisher(id)
>
> This does not work for me:
>
> ALTER TABLE bibl.monograph ALTER COLUMN (bibl).is_bibliography SET NOT NULL;
> ALTER TABLE bibl.monograph ALTER COLUMN (bibl).is_bibliography SET DEFAULT 
> false;

sure:

create table bibl.monograph
(
       id BIGSERIAL PRIMARY KEY NOT NULL,
       bibl bibl.bibliography check((bibl).is_bibliography is not
null) DEFAULT row(false, 'something');
);

create unique index on bibl.monograph(bibl);

create table a_table
(
  bibl bibl.bibliography references bibl.monograph(bibl)
);

Not saying this is a good idea -- but it can be done.  The only thing
you can't do is DOMAIN a composite type so you have to manually add
the check constraint to every table containing the type of you want to
enforce it.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] many sql file and one transaction

2011-10-18 Thread Merlin Moncure
On Tue, Oct 18, 2011 at 7:57 AM, Cédric Villemain
 wrote:
> 2011/10/18 salah jubeh :
>> Hello,
>> Thanks for the reply.
>> I considered  cat as an option but I did not go for it, because of the
>> number of sql files I have is large  which makes the code not readable
>> The second thing, which is more important is because I have some advantages
>> with  using -f such as the line number which causing the error.
>
> you can do :
>
> cat module1.sql \
>      module2.sql \
>      module_etc.sql \
>  | psql -f -
>
>
>
>> Regards
>>
>>
>>
>>
>>
>> 
>> From: "andr...@a-kretschmer.de" 
>> To: salah jubeh 
>> Sent: Tuesday, October 18, 2011 2:23 PM
>> Subject: Re: [GENERAL] many sql file and one transaction
>>
>>
>> Zitat von salah jubeh :
>>
>>> Hello,
>>>
>>>
>>> I have many SQL script files to update schema, delete data, unit
>>> test etc.  I want to run all the files in one transaction using
>>> shell script to ease the installation procedure. I can do that from
>>> the psql client by using the \i option
>>>
>>>
>>> BEGIN;
>>>
>>> \i  / .../ module1.sql
>>>
>>> \i  / .../ module2.sql
>>>
>>> \i  / .../ module_etc.sql
>>> COMMIT;
>>>
>>>
>>> Is there a way to do that  using psql command  shell script.
>>>
>>>
>>> I.E.
>>>
>>> I want to run the following  in one transaction,
>>>
>>> psql  -f  module1.sql
>>>
>>> psql  -f  module2.sql
>>>
>>> psql  -f  module_etc.sql
>>
>>
>> cat module1.sql module2.sql module_etc.sql | psql

also don't forget the -1 option to psql, which will wrap all your
commands in a transaction, and:
\set ON_ERROR_STOP

which will abort if there's an error.  either put this in all your
scripts, your .psqlrc, or echo it into psql like this:

cat <(echo "\set ON_ERROR_STOP")
 module1.sql \
 module2.sql \
 module_etc.sql \
 | psql -1

you'll want to abort on error just to avoid the chance that a token in
your script will inadvertently close the transaction and cause a bunch
of garbage to be committed in the database -- this is likely to happen
say if you have a quotation error on function body.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpgsql; execute query inside exists

2011-10-17 Thread Merlin Moncure
On Mon, Oct 17, 2011 at 10:28 AM, Alban Hertroys  wrote:
> On 17 October 2011 16:24, Merlin Moncure  wrote:
>> On Mon, Oct 17, 2011 at 8:44 AM, Alban Hertroys  wrote:
>>> On 17 October 2011 15:20, Merlin Moncure  wrote:
>>>> A better way to do this is to query information_schema:
>>>>
>>>> PERFORM 1 FROM information_schema.tables where schema_name = x and
>>>> table_name = y;
>>>>
>>>> IF FOUND THEN
>>>>  CREATE TABLE ...
>>>> END IF;
>>>>
>>>> (there is a race condition in the above code -- do you see it? if
>>>> concurrent access to this function is an issue, you have to LOCK an
>>>> object before running the PERFORM or perhaps use an advisory lock).
>>>
>>> Is there? You'd think that with transactional DDL and the code running
>>> in a single transaction (namely inside a stored function) it would be
>>> concurrency-safe.
>>
>> Transactional DDL does not protect you from race conditions any more
>> than MVCC protects you from race conditions in regular DML.  What
>> transactional DDL does is roll back the changes in the event of an
>> error so you don't have half written schema changes in your database.
>> MVCC gives a rigorous definition of visibility rules and transactions
>> guarantee only a complete unit of work getting committed to the
>> database.  You still have to code defensively against multi-user
>> access however.  The good news is that multi user coding is about an
>> order of magnitude easier in sql (especially postgres variant) than in
>> any other development platform that I'm aware of.
>>
>> The race condition here is basically the same problem that affects
>> 'upsert' patterns:
>>
>> test record(s) if found update if not found insert;
>>
>> The problem comes that in between the test and the insert case someone
>> else can also test and get the insert in before you do.  You have two
>> general strategies to get around this: locking and retry.  I greatly
>> advise going the locking route unless your concurrency requirements
>> are very high.  It's much simpler, and since you're not invoking a
>> subtransaction, faster in the uncontested case.
>
> So what would happen if you don't lock? I think it's this:
>
>  Session A              | Session B
> +-
>  SELECT x               | SELECT x
>  NOT FOUND              | NOT FOUND
>   ...                  | CREATE TABLE
>  CREATE TABLE   | 
>                  |
>               |
>
> If I understand correctly, if you don't mind the error and the
> subsequent rollback in Session A, than there's not much need to lock,
> or is there? It is important to be aware of the possible rollback of
> such a transaction, of course.

right -- allowing for rollback and retrying (either from the client or
in the procedure via sub-transaction) is always an option.  I prefer
to lock -- it's faster (usually) and gives more regular behavior.

> And what would you lock? A record in information_schema.tables? That's
> a read-only view. A table that doesn't exist yet? Can't do. A record
> in the pg_ schema? Rather not...
> I suppose you could work around that problem by keeping track of your
> own tables that were generated using aforementioned plpgsql function.
> Then you have a table (that you own) with records to lock.

yeah -- you could use an advisory lock or a special table created for
that purpose, or a row of a table that does your tracking.  Agree that
locking system catalogs is *not* advisable.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpgsql; execute query inside exists

2011-10-17 Thread Merlin Moncure
On Mon, Oct 17, 2011 at 8:44 AM, Alban Hertroys  wrote:
> On 17 October 2011 15:20, Merlin Moncure  wrote:
>> A better way to do this is to query information_schema:
>>
>> PERFORM 1 FROM information_schema.tables where schema_name = x and
>> table_name = y;
>>
>> IF FOUND THEN
>>  CREATE TABLE ...
>> END IF;
>>
>> (there is a race condition in the above code -- do you see it? if
>> concurrent access to this function is an issue, you have to LOCK an
>> object before running the PERFORM or perhaps use an advisory lock).
>
> Is there? You'd think that with transactional DDL and the code running
> in a single transaction (namely inside a stored function) it would be
> concurrency-safe.

Transactional DDL does not protect you from race conditions any more
than MVCC protects you from race conditions in regular DML.  What
transactional DDL does is roll back the changes in the event of an
error so you don't have half written schema changes in your database.
MVCC gives a rigorous definition of visibility rules and transactions
guarantee only a complete unit of work getting committed to the
database.  You still have to code defensively against multi-user
access however.  The good news is that multi user coding is about an
order of magnitude easier in sql (especially postgres variant) than in
any other development platform that I'm aware of.

The race condition here is basically the same problem that affects
'upsert' patterns:

test record(s) if found update if not found insert;

The problem comes that in between the test and the insert case someone
else can also test and get the insert in before you do.  You have two
general strategies to get around this: locking and retry.  I greatly
advise going the locking route unless your concurrency requirements
are very high.  It's much simpler, and since you're not invoking a
subtransaction, faster in the uncontested case.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index bloat question

2011-10-17 Thread Merlin Moncure
On Mon, Oct 17, 2011 at 2:15 AM, Szymon Guz  wrote:
>
>
> On 17 October 2011 02:01, Scott Marlowe  wrote:
>>
>> On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz  wrote:
>> > Hi,
>> > just a couple of questions:
>> > will there be an index bloat if I have:
>> > - a serial column and only add rows to the table?
>> > - a text column and I only add rows to the table?
>> > For the serial column the numbers are only incremented, for the text
>> > column
>> > I add random strings.
>>
>> With no deletes or updates, the only bloat will be from a non 100% fill
>> factor.
>
> Hi Scott,
>
> if there is no bloat, how could you explain this:
>
> Simple test:
>
> CREATE TABLE test (
> id text primary key,
> category_id text not null
> );
>
> CREATE INDEX i_category ON test (category_id);
>
>
> I make 500k inserts in one transaction using a python script.
> For the random text I use random uuid from the function:
>
> uuid.uuid4()
>
> After those inserts I create another index:
>
> CREATE INDEX i_new ON test (category_id);
>
>
> select
> pg_size_pretty(pg_relation_size('i_category')),
> pg_size_pretty(pg_relation_size('i_new'))
> ;
>
> Results:
>
> '37 MB';'28 MB'

You didn't post your insertion script.  btree indexes are always going
to have extra space in them due to pages splitting and being only
partially filled -- insertion order over the range of your datum plays
into this (you'll get different index arrangements from random vs
ordered insertion).  for kicks, try reindexing both indexes and see
what the size is afterwords.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


<    3   4   5   6   7   8   9   10   11   12   >