Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Melvin Davidson
Please. Always, ALWAYS, give the PostgreSQL version and O/S when reporting
a problem.

First, WHAT IS THE POSTGRESQL VERSION?
   WHAT IS THE O/S?

   Then try this:
select a.ctid, a.id, a.field1,
  b.ctid, b.id, b.field1
  from some_table a,
 some_table b
 WHERE a.ctid <> b.ctid
   AND a.id = b.id;


On Tue, Dec 22, 2015 at 4:03 AM, Aleksander Łukasz 
wrote:

> Hi,
>
> a table in our database with about 3 million rows ended up in a state
> where its seems to have duplicated entries (duplicated primary key values):
>
> # \d some_table;
>Table "public.some_table"
>  Column |Type |Modifiers
>
> +-+-
>  id | integer | not null default
> nextval('some_table_id_seq'::regclass)
>  field1 | character varying(40)   |
>  field2 | character varying(128)  |
>  ts | timestamp without time zone |
>
> Indexes:
> "some_table_pkey" PRIMARY KEY, btree (id)
> "ix_some_table_field1" btree (field1)
> "ix_some_table_field2" btree (field2)
> "ix_some_table_ts" btree (ts)
>
>
> # select id, field1, field2 from some_table where field1 is null and
> field2 is not null;
>id| field1 |field2
> -++--
>  2141750 || some_value2
> (1 row)
>
>
> # select id, field1, field2 from some_table where id = 2141750;
>id   | field1   |field2
> -+-+--
> 2141750 | some_value1   | some_value2
> (1 row)
>
> Another way this manifests itself it that running this:
>
> # update some_table
>set field2 = field1
> where
>id = 2141750;
>
> works perfectly fine (but doesn't change the result of the first two
> queries above),
> but this results in an error:
>
> # update some_table
>set field2 = field1
> where
>field1 is not null
>and field2 is null
>and ts between '2015-12-01' and '2015-12-02';
>
> ERROR:  duplicate key value violates unique constraint "some_table_pkey"
> DETAIL:  Key (id)=(2141750) already exists.
>
> Do you have any idea what could be happening and what measures should be
> undertaken to fix this issue? Thanks.
>
>


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Aleksander Łukasz
Hi,

a table in our database with about 3 million rows ended up in a state
where its seems to have duplicated entries (duplicated primary key values):

# \d some_table;
   Table "public.some_table"
 Column |Type |Modifiers
+-+-
 id | integer | not null default
nextval('some_table_id_seq'::regclass)
 field1 | character varying(40)   |
 field2 | character varying(128)  |
 ts | timestamp without time zone |

Indexes:
"some_table_pkey" PRIMARY KEY, btree (id)
"ix_some_table_field1" btree (field1)
"ix_some_table_field2" btree (field2)
"ix_some_table_ts" btree (ts)


# select id, field1, field2 from some_table where field1 is null and field2
is not null;
   id| field1 |field2
-++--
 2141750 || some_value2
(1 row)


# select id, field1, field2 from some_table where id = 2141750;
   id   | field1   |field2
-+-+--
2141750 | some_value1   | some_value2
(1 row)

Another way this manifests itself it that running this:

# update some_table
   set field2 = field1
where
   id = 2141750;

works perfectly fine (but doesn't change the result of the first two
queries above),
but this results in an error:

# update some_table
   set field2 = field1
where
   field1 is not null
   and field2 is null
   and ts between '2015-12-01' and '2015-12-02';

ERROR:  duplicate key value violates unique constraint "some_table_pkey"
DETAIL:  Key (id)=(2141750) already exists.

Do you have any idea what could be happening and what measures should be
undertaken to fix this issue? Thanks.


Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Tom Lane
=?UTF-8?Q?Aleksander_=C5=81ukasz?=  writes:
> a table in our database with about 3 million rows ended up in a state
> where its seems to have duplicated entries (duplicated primary key values):
> ...
> Do you have any idea what could be happening and what measures should be
> undertaken to fix this issue? Thanks.

Looks like corruption of the primary key index.  Try REINDEXing that
index.  This might fail if there actually are duplicate rows in the table
and not just incorrect pointers to them in the index, in which case you'll
have to clean up the duplicates somehow and try again till the REINDEX
succeeds.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Session Identifiers

2015-12-22 Thread oleg yusim
Thanks Michael, you are right, that is a very good alternative solution.

Oleg

On Tue, Dec 22, 2015 at 6:27 AM, Michael Paquier 
wrote:

> On Tue, Dec 22, 2015 at 1:42 AM, Stephen Frost  wrote:
> > Oleg,
> >
> > * oleg yusim (olegyu...@gmail.com) wrote:
> >> tcp_keepalives_idle = 900
> >> tcp_keepalives_interval=0
> >> tcp_keepalives_count=0
> >>
> >> Doesn't terminate connection to database in 15 minutes of inactivity of
> >> psql prompt. So, it looks like that would work only for case if network
> >> connection is broken and session left hanging. For psql prompt case
> looks
> >> like pg_terminate_backend() would be the only solution.
> >
> > Those settings aren't for controlling idle timeout of a connection.
> >
> > pg_terminate_backend() will work and could be run out of a cronjob.
>
> Or a background worker if you are using PG >= 9.3:
> https://github.com/michaelpq/pg_plugins/tree/master/kill_idle
> This has the advantage to not have the cronjob error out should the
> server be stopped. That's less error handling to take care of at
> frontend level.
> --
> Michael
>


Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Aleksander Łukasz
Hi, thanks for your reply.

2015-12-22 16:34 GMT+01:00 Melvin Davidson :

> Please. Always, ALWAYS, give the PostgreSQL version and O/S when reporting
> a problem.
>
> First, WHAT IS THE POSTGRESQL VERSION?
>WHAT IS THE O/S?
>

# select version();
   version

--
 PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
(1 row)



>Then try this:
> select a.ctid, a.id, a.field1,
>   b.ctid, b.id, b.field1
>   from some_table a,
>  some_table b
>  WHERE a.ctid <> b.ctid
>AND a.id = b.id;
>

ctid|   id|   field1 |ctid|
id|   field1
+-+--++-+--
 (79664,59) | 2141750 |  | (79772,23) |
2141750 | some_value1
 (79772,23) | 2141750 | some_value1  | (79664,59) |
2141750 |
(2 rows)


[GENERAL] Shared system resources

2015-12-22 Thread oleg yusim
Greetings,

I'm looking at the following security control right now:

*The DBMS must prevent unauthorized and unintended information transfer via
shared system resources.*

The purpose of this control is to prevent information, including encrypted
representations of information, produced by the actions of a prior
user/role (or the actions of a process acting on behalf of a prior
user/role) from being available to any current user/role (or current
process) that obtains access to a shared system resource (e.g., registers,
main memory, secondary storage) after the resource has been released back
to the information system. Control of information in shared resources is
also referred to as object reuse.

>From previous discussions, I understood that session in PostgreSQL is
closely associated with process, and it is essentially new process for
every user connection. In regards to that, my question would be:

When user disconnects, process is terminated and all resources are
released, does memory, session/process stored information at gets
"sanitized" or just released as is?

Thanks,

Oleg


Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Jim Nasby

On 12/22/15 1:31 PM, Melvin Davidson wrote:

The fact that you have rows with an identical id of 2141750 verifies a
corrupted primary index.
To correct it, you need to decide which row to keep.

So review the results of
SELECT * FROM some_table WHERE ctid =  (79664,59) OR ctid = (79772,23)

DELETE FROM some_table
WHERE id = 2147150
 AND field1 = {value for field1 of row to delete}
 AND field2 = {value for field2 of row to delete}
...
...
AND fieldn = {value for fieldn of row to delete};

Then
  REINDEX TABLE some_table;


And most importantly: review your logs for hardware errors and your 
Postgres and filesystem settings for anything dangerous. Index 
corruption is not normal and indicates the underlying hardware or OS is 
faulty (or maybe a bug in Postgres, but that's very unlikely).


You should also consider turning on page checksums if you haven't already.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-22 Thread Jim Nasby

On 12/22/15 6:03 PM, oleg yusim wrote:

Absolutely. But we are not talking about that type of data leakage here.
We are talking about potential situation when user, who doesn't have
access to database, but has (or gained) access to the Linux box DB is
installed one and gets his hands on data, database processes stored in
memory (memory would be a common resource here).


Of far larger concern at that point is unauthorized access to the 
database files.


Basically, if someone gains access to the OS user that Postgres is 
running as, or to root, it's game-over.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-22 Thread oleg yusim
Jim,

Yes, you are right. Generally the security control here is encryption of
data at rest (TDE), but PostgreSQL doesn't support it, to my knowledge. I
know about that vulnerability, but here I posed the question on different
one. I agree it is smaller one, compare to the absence of TDE, but I would
like to find out if this gates are opened too or not.

Thanks,

Oleg

On Tue, Dec 22, 2015 at 8:48 PM, Jim Nasby  wrote:

> On 12/22/15 6:03 PM, oleg yusim wrote:
>
>> Absolutely. But we are not talking about that type of data leakage here.
>> We are talking about potential situation when user, who doesn't have
>> access to database, but has (or gained) access to the Linux box DB is
>> installed one and gets his hands on data, database processes stored in
>> memory (memory would be a common resource here).
>>
>
> Of far larger concern at that point is unauthorized access to the database
> files.
>
> Basically, if someone gains access to the OS user that Postgres is running
> as, or to root, it's game-over.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [GENERAL] Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

2015-12-22 Thread Jim Nasby

On 12/21/15 8:22 AM, Bill Moran wrote:

Why? Just because a disk isn't enterprise-grade doesn't mean it has to lie
> >about fsync, which is the only thing diskchecker.pl tests for.
> >

>
>I was thinking that since the disk have a 32M write-cache (with not
>battery) it would lie to the OS (and postgres) about when data are really
>on disk (not in the disk write cache). But maybe that thinking was wrong.


There are ways to make on-disk write caches safe without a battery. IIRC 
some hard drives would use the inertia of the platter (turning the motor 
into a generator) to write contents out on power-off. You could also use 
a "super cap".



It varies by vendor and product, which is why diskchecker.pl exists.
It's even possible that the behavior is configurable ... check to see
if the vendor provides a utility for configuring it.


Your OS might let you control it too; I know FreeBSD has support for 
this. (Whether the drive obeys or not is a different matter...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-22 Thread oleg yusim
John,

Absolutely. But we are not talking about that type of data leakage here. We
are talking about potential situation when user, who doesn't have access to
database, but has (or gained) access to the Linux box DB is installed one
and gets his hands on data, database processes stored in memory (memory
would be a common resource here).

Thanks,

Oleg

On Tue, Dec 22, 2015 at 5:28 PM, John R Pierce  wrote:

> On 12/22/2015 2:52 PM, oleg yusim wrote:
>
>
> *The DBMS must prevent unauthorized and unintended information transfer
> via shared system resources.*
>
>
>
> you realize the database *itself* is a shared system resource and of
> and by itself has no idea about unauthorized/unintended information
> transfer, eg, any user with the proper credentials to connect to the
> database can query any tables those credentials are allowed to?
>
> --
> john r pierce, recycling bits in santa cruz
>
>


Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Melvin Davidson
BTW, Jim is referring to the O/S logs for hardware errors, not the
PostgreSQL logs.

Also, another way of deleting the bad row would be
DELETE FROM some_table where ctid = '(79664,59)';

or

DELETE FROM some_table where ctid = '(79772,23)';

On Tue, Dec 22, 2015 at 9:44 PM, Jim Nasby  wrote:

> On 12/22/15 1:31 PM, Melvin Davidson wrote:
>
>> The fact that you have rows with an identical id of 2141750 verifies a
>> corrupted primary index.
>> To correct it, you need to decide which row to keep.
>>
>> So review the results of
>> SELECT * FROM some_table WHERE ctid =  (79664,59) OR ctid = (79772,23)
>>
>> DELETE FROM some_table
>> WHERE id = 2147150
>>  AND field1 = {value for field1 of row to delete}
>>  AND field2 = {value for field2 of row to delete}
>> ...
>> ...
>> AND fieldn = {value for fieldn of row to delete};
>>
>> Then
>>   REINDEX TABLE some_table;
>>
>
> And most importantly: review your logs for hardware errors and your
> Postgres and filesystem settings for anything dangerous. Index corruption
> is not normal and indicates the underlying hardware or OS is faulty (or
> maybe a bug in Postgres, but that's very unlikely).
>
> You should also consider turning on page checksums if you haven't already.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Table with seemingly duplicated primary key values

2015-12-22 Thread Melvin Davidson
The fact that you have rows with an identical id of 2141750 verifies a
corrupted primary index.
To correct it, you need to decide which row to keep.

So review the results of
SELECT * FROM some_table WHERE ctid =  (79664,59) OR ctid = (79772,23)

DELETE FROM some_table
WHERE id = 2147150
AND field1 = {value for field1 of row to delete}
AND field2 = {value for field2 of row to delete}
...
...
   AND fieldn = {value for fieldn of row to delete};

Then
 REINDEX TABLE some_table;

​


Re: [GENERAL] Shared system resources

2015-12-22 Thread David Wilson
On Tue, Dec 22, 2015 at 04:52:23PM -0600, oleg yusim wrote:
> Greetings,
> 
> I'm looking at the following security control right now:
> 
> The DBMS must prevent unauthorized and unintended information transfer via
> shared system resources.
> 
> The purpose of this control is to prevent information, including encrypted
> representations of information, produced by the actions of a prior user/role
> (or the actions of a process acting on behalf of a prior user/role) from being
> available to any current user/role (or current process) that obtains access to
> a shared system resource (e.g., registers, main memory, secondary storage)
> after the resource has been released back to the information system. Control 
> of
> information in shared resources is also referred to as object reuse.
> 
> From previous discussions, I understood that session in PostgreSQL is closely
> associated with process, and it is essentially new process for every user
> connection. In regards to that, my question would be:
> 
> When user disconnects, process is terminated and all resources are released,
> does memory, session/process stored information at gets "sanitized" or just
> released as is?

On Linux the memory pages of an exiting process aren't sanitized at
exit, however it is impossible(?) for userspace to reallocate them
without the kernel first zeroing their contents.

It might be possible for root to use some debugging mechanisms to access
the freed memory, but certainly there is no mechanism for a non-root
user to do so.

Regarding PG in particular, I can't speak for any shared internal state
that might be maintained after a session has exitted (e.g. in the SHM
regions PG maintains). Since that state lives longer than an individual
process, it's possible some information leakage could occur that way,
but "object reuse", it seems doubtful.


David


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-22 Thread John R Pierce

On 12/22/2015 2:52 PM, oleg yusim wrote:


/The DBMS must prevent unauthorized and unintended information 
transfer via shared system resources./



you realize the database *itself* is a shared system resource and of 
and by itself has no idea about unauthorized/unintended information 
transfer, eg, any user with the proper credentials to connect to the 
database can query any tables those credentials are allowed to?


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Secret Santa List

2015-12-22 Thread David Rowley
On 23 December 2015 at 16:49, Lou Duchez  wrote:

> I have a company with four employees who participate in a Secret Santa
> program, where each buys a gift for an employee chosen at random.  (For
> now, I do not mind if an employee ends up buying a gift for himself.)  How
> can I make this work with an SQL statement?
>
> Here is my Secret Santa table:
>
> --
> create table secretsanta
> (giver text,
> recipient text,
> primary key (giver));
>
> insert into secretsanta (giver) values ('Frank'), ('Joe'), ('Steve'),
> ('Earl');
> --
>
> Here is the SQL statement I am using to populate the "recipient" column:
>
> --
> update secretsanta set recipient =
> ( select giver from secretsanta s2 where not exists (select * from
> secretsanta s3 where s3.recipient = s2.giver) order by random() limit 1 );
> --
>
> The problem: every time I run this, a single name is chosen at random and
> used to populate all the rows.  So all four rows will get a recipient of
> "Steve" or "Earl" or whatever single name is chosen at random.
>

Why not generate the required results in a SELECT then update from that.
row_number() could allow you to generate a random number to each giver,
then we can generate another random number and join to each random number.
That'll give you a giver and recipient combination.

e.g:

select giver,recipient from
(select row_number() over (order by random()) rn, giver from secretsanta) g
inner join
(select row_number() over (order by random()) rn, giver recipient from
secretsanta) r on g.rn = r.rn

You can then wrap that up in a CTE, something along the lines of:

with cte (giver, recipient) as (
select giver,recipient from
(select row_number() over (order by random()) rn, giver from secretsanta) g
inner join
(select row_number() over (order by random()) rn, giver recipient from
secretsanta) r on g.rn = r.rn
)
update secretsanta set recipient = cte.recipient from cte WHERE cte.giver =
secretsanta.giver;


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] Secret Santa List

2015-12-22 Thread Lou Duchez
I have a company with four employees who participate in a Secret Santa 
program, where each buys a gift for an employee chosen at random.  (For 
now, I do not mind if an employee ends up buying a gift for himself.)  
How can I make this work with an SQL statement?


Here is my Secret Santa table:

--
create table secretsanta
(giver text,
recipient text,
primary key (giver));

insert into secretsanta (giver) values ('Frank'), ('Joe'), ('Steve'), 
('Earl');

--

Here is the SQL statement I am using to populate the "recipient" column:

--
update secretsanta set recipient =
( select giver from secretsanta s2 where not exists (select * from 
secretsanta s3 where s3.recipient = s2.giver) order by random() limit 1 );

--

The problem: every time I run this, a single name is chosen at random 
and used to populate all the rows.  So all four rows will get a 
recipient of "Steve" or "Earl" or whatever single name is chosen at random.


I suppose the problem is that the "exists" subquery does not re-evaluate 
for each record.  How do I prevent this from happening? Can I use a 
"lateral" join of some kind, or somehow tell PostgreSQL to not be so 
optimized?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR error while adding 3rd node to cluster

2015-12-22 Thread Amit Bondwal
Hi Craig,

I remove all the bdr packages and reinstall it and setup again the BDR
cluster, still facing the same issue on 3rd node.

On Tue, Dec 22, 2015 at 10:58 AM, Amit Bondwal 
wrote:

>
> On Tue, Dec 22, 2015 at 10:05 AM, Craig Ringer 
> wrote:
>
>> select * from bdr.bdr_connections;
>>
>
>
> Hi Craig, Thanks for your reply, These commands shows nothing on 3rd node.
>
>
>
> *on node3:-*hakuna=# select * from bdr.bdr_nodes;
>  node_sysid | node_timeline | node_dboid | node_status | node_name |
> node_local_dsn | node_init_from_dsn
>
> +---++-+---++
> (0 rows)
>
>  conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid |
> conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |
> conn_dsn | conn_apply_delay | conn_replication_sets
>
> +---++---+--+---++--+--+---
> (0 rows)
>
>
>
> *on node1:-*
>
> hakuna=# select * from bdr.bdr_nodes;
>  node_sysid  | node_timeline | node_dboid | node_status |
> node_name |  node_local_dsn  |
> node_init_from_dsn
>
> -+---++-+---+--+--
>  6229651184159874988 | 1 |  18719 | r   |
> node1 | host=pg1 port=5432 dbname=hakuna |
>  6229651217067355961 | 1 |  17161 | r   |
> node2 | host=pg2 port=5432 dbname=hakuna | host=pg1 port=5432
> dbname=hakuna
> (2 rows)
>
> It is conflicting with ID of node 2 as it shows output of above command on
> node 1.
>
> My OS is debian Jessie and I installed it from bdr repo. Current version
> of bdr is 0.9.3
>
>
>


Re: [GENERAL] Session Identifiers

2015-12-22 Thread Michael Paquier
On Tue, Dec 22, 2015 at 1:42 AM, Stephen Frost  wrote:
> Oleg,
>
> * oleg yusim (olegyu...@gmail.com) wrote:
>> tcp_keepalives_idle = 900
>> tcp_keepalives_interval=0
>> tcp_keepalives_count=0
>>
>> Doesn't terminate connection to database in 15 minutes of inactivity of
>> psql prompt. So, it looks like that would work only for case if network
>> connection is broken and session left hanging. For psql prompt case looks
>> like pg_terminate_backend() would be the only solution.
>
> Those settings aren't for controlling idle timeout of a connection.
>
> pg_terminate_backend() will work and could be run out of a cronjob.

Or a background worker if you are using PG >= 9.3:
https://github.com/michaelpq/pg_plugins/tree/master/kill_idle
This has the advantage to not have the cronjob error out should the
server be stopped. That's less error handling to take care of at
frontend level.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] uuid-ossp: Performance considerations for different UUID approaches?

2015-12-22 Thread Bill Moran
On Tue, 22 Dec 2015 11:07:30 -0600
Brendan McCollam  wrote:

> (previously posted to the pgsql-performance list)
> 
> Hello,
> 
> We're in the process of designing the database for a new service, and 
> some of our tables are going to be using UUID primary key columns.
> 
> We're trying to decide between:
> 
> * UUIDv1 (timestamp/MAC uuid) and
> 
> * UUIDv4 (random uuid)
> 
> And the separate but related choice between:
> 
> * Generating the UUIDs client-side with the Python uuid library 
> (https://docs.python.org/2/library/uuid.html) or
> 
> * Letting PostgreSQL handle uuid creation with the uuid-ossp extension 
> (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)
> 
> In terms of insert and indexing/retrieval performance, is there one 
> clearly superior approach? If not, could somebody speak to the 
> performance tradeoffs of different approaches?
> 
> There seem to be sources online (e.g. 
> https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/ 
> http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/) 
> that claim that UUIDv4 (random) will lead to damaging keyspace 
> fragmentation and using UUIDv1 will avoid this.

There's no substance to these claims. Chasing the links around we finally
find this article:
http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
which makes the reasonable argument that random primary keys can cause
performance robbing fragmentation on clustered indexes.
But Postgres doesn't _have_ clustered indexes, so that article doesn't
apply at all. The other authors appear to have missed this important
point.

One could make the argument that the index itself becomming fragmented
could cause some performance degredation, but I've yet to see any
convincing evidence that index fragmentation produces any measurable
performance issues (my own experiments have been inconclusive).

Looking at it another way, a quick experiment shows that PG can fit
about 180 UUID primary keys per database page, which means a million
row table will use about 5600 pages to the tune of about 46m. On
modern hardware, that index is likely to be wholly in memory all the
time.

If your performance requirements are really so dire, then you should
probably consider ditching UUIDs as keys. Taking the same million row
table I postulated in the previous paragraph, but using ints insted
of UUIDs for the primary key, the primary key index would be about
3200 pages (~26m) ... or almost 1/2 the size -- making it more likely
to all be in memory at any point in time.

I seriously doubt that trying to make your UUIDs generate in a
predictable fashon will produce any measurable improvement, and I
see no evidence in the articles you cited that claims otherwise
have any real basis or were made by anyone knowledgeable enough
to know.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] uuid-ossp: Performance considerations for different UUID approaches?

2015-12-22 Thread Brendan McCollam

(previously posted to the pgsql-performance list)

Hello,

We're in the process of designing the database for a new service, and 
some of our tables are going to be using UUID primary key columns.


We're trying to decide between:

* UUIDv1 (timestamp/MAC uuid) and

* UUIDv4 (random uuid)

And the separate but related choice between:

* Generating the UUIDs client-side with the Python uuid library 
(https://docs.python.org/2/library/uuid.html) or


* Letting PostgreSQL handle uuid creation with the uuid-ossp extension 
(http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)


In terms of insert and indexing/retrieval performance, is there one 
clearly superior approach? If not, could somebody speak to the 
performance tradeoffs of different approaches?


There seem to be sources online (e.g. 
https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/ 
http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/) 
that claim that UUIDv4 (random) will lead to damaging keyspace 
fragmentation and using UUIDv1 will avoid this.


Is that true? If so, does that mean UUIDv1 must always be generated on 
the same machine (with same MAC address) in order to benefit from the 
better clustering of UUIDs? What about uuid_generate_v1mc() in 
uuid-ossp? Not exposing a server's real MAC address seems like a good 
security feature, but does it compromise the clustering properties of 
UUIDv1?


Thanks in advance,
Brendan McCollam


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general