Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Mike Sofen
From: George Neuner  Sent: Tuesday, August 30, 2016 5:54 PM



>Mike Sofen wrote: So in this scenario, I'm using 

>BOTH bigserials as the PK and uuids as AKs in the core tables.  I 

>reference the bigints for all joins and (have to) use the uuids for the 

>filters.  It's been working ok so far, lookup performance on a table 

>with a few million rows, using the uuid (indexed) is instantaneous.  

>I'll soon have a 100 million+ rows loaded into a single table and know a
bit more.

> 

>The uuids are also design insurance for me in case I need to shard, 

>since I'll need/want that uniqueness across servers.

 

FYI:  articles about sharding using bigint keys.

 

 

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-ins
tagram

 
http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/

 

George

 

I remember reading these articles a long time ago, forgot about them...and
appreciate the reminder!  

 

I really liked the enhanced Instagram function from Rob Conery in the second
link, but so far haven't needed to deal with it.  However, an upcoming
project may require huge data storage - approaching hundreds of billions of
rows, and I'm sticking with Postgres - so this will be a great way to test
drive the function.  And I may try my hand at a further enhancement, time
permitting.  Thanks for the links!

 

Mike

 



Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread George Neuner
On Tue, 30 Aug 2016 09:40:33 -0700, "Mike Sofen" 
wrote:

>From: Tom Lane  Sent: Tuesday, August 30, 2016 7:16 AM
>
>>Do you actually *need* UUID keys, and if so why?  A plain old bigint column
>>is smaller, cheaper to index, and the natural mechanism for generating it
>>(ie a sequence) will tend to preserve ordering for free. 
>
>I agree with Tom for a "normal" application - I would always use bigints
>(bigserial) as a PK column.  The app I currently working on is a high
>security web app for which the app coders require guids for all identifiers
>flowing around the system.  So in this scenario, I'm using BOTH bigserials
>as the PK and uuids as AKs in the core tables.  I reference the bigints for
>all joins and (have to) use the uuids for the filters.  It's been working ok
>so far, lookup performance on a table with a few million rows, using the
>uuid (indexed) is instantaneous.  I'll soon have a 100 million+ rows loaded
>into a single table and know a bit more.
>
> 
>
>The uuids are also design insurance for me in case I need to shard, since
>I'll need/want that uniqueness across servers.

FYI:  articles about sharding using bigint keys.

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Mike Sofen
From: Tom Lane  Sent: Tuesday, August 30, 2016 7:16 AM



Luke Gordon <  gord...@gmail.com> writes:

> However, according to a message on this mailing list, Postgres doesn't 

> have clustered indexes:

> "But Postgres doesn't _have_ clustered indexes, so that article 

> doesn't apply at all. The other authors appear to have missed this
important point."

>  
https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu

 

> But, doing a quick check, it appears Postgres does indeed have a 

> mechanism for a clustered index:

>  
https://www.postgresql.org/docs/9.5/static/sql-cluster.html

 

CLUSTER just does a one-time sort to put the table into index order.

There is no mechanism that would cause subsequent insertions of new keys to
respect that ordering, so it's pretty much irrelevant to the argument about
whether new UUID keys need to be generated in some ordered fashion.

 

Do you actually *need* UUID keys, and if so why?  A plain old bigint column
is smaller, cheaper to index, and the natural mechanism for generating it
(ie a sequence) will tend to preserve ordering for free.

 

 regards, tom lane

 

I agree with Tom for a "normal" application - I would always use bigints
(bigserial) as a PK column.  The app I currently working on is a high
security web app for which the app coders require guids for all identifiers
flowing around the system.  So in this scenario, I'm using BOTH bigserials
as the PK and uuids as AKs in the core tables.  I reference the bigints for
all joins and (have to) use the uuids for the filters.  It's been working ok
so far, lookup performance on a table with a few million rows, using the
uuid (indexed) is instantaneous.  I'll soon have a 100 million+ rows loaded
into a single table and know a bit more.

 

The uuids are also design insurance for me in case I need to shard, since
I'll need/want that uniqueness across servers.

 

Mike Sofen



Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Francisco Olarte
Luke:

On Tue, Aug 30, 2016 at 3:59 PM, Luke Gordon  wrote:
> I'm trying to decide on which UUID generator to use for my Postgres
> database, and I've narrowed it down to gen_random & uuid_generate_v1mc.
>
> There's a fascinating article that discusses performance implications
> between gen_random_uuid & uuid_generate_v1mc:
> https://www.starkandwayne.com/blog/uuid-primary-keys-in-postgresql/
> TL;DR, the article suggests:
> "Random produces very fragmented inserts that destroy tables. Use
> uuid_generate_v1mc() [instead].."

He probably means destroy index, not tables, tables are not index
ordered in postgres ( like some kind of clustered tables in other
products )

>
> However, according to a message on this mailing list, Postgres doesn't have
> clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu


No, he hasn't. If you are generating them to use as a kind of primary
key, like a serial, having a timestamp based means they are generated
in ascending order, which means you append on the table ( at the end )
and on the index, so they end up better filled and perform better.
Specially in the index, ordered insertions tend to be well optimized,
being a common case, and perform quite well. It's not going to be as
advantegous as on a clustered table system, but it will help ( but not
that much ).



> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html
> So, does anyone know which is best? Or are the performance differences so
> minute they'd only matter in extremely unique circumstances?

Cluster just resorts the table and rebuild the index, it's already
being told in other messages. It's equivalent to create temp table tt
as select * from the_table, truncate th_table, insert into the_table
select * from tt order by index_expression, drop table tt. It is nice
to do it for tables that are normally ordered but somehow lost it.
Like having a log table with an indexed field for insertion timestamp
and updating it a lot, or purging many old records. As you normally
would typically query it with a range condition on the timestamp, a
cluster will help.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Tom Lane
Luke Gordon  writes:
> PS I attempted to reply to the above thread, but downloading the "raw" link
> resulted in a basic auth challenge.  If I entered my mailing list
> credentials, I'd just get rechallenged with basic auth...

BTW, so far as that goes: it's just a weak anti-bot measure.
If you try it in Firefox you'll get an explanation:

A username and password are being requested by
https://www.postgresql.org. The site says: "Please authenticate
with user archives and password antispam"

Unfortunately, we've found that most other browsers don't present that
message :-(

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Luke Gordon
Tom,

Ah, that makes more sense.  Thank you very much!

On Tue, Aug 30, 2016 at 9:15 AM, Tom Lane  wrote:

> Luke Gordon  writes:
> > However, according to a message on this mailing list, Postgres doesn't
> have
> > clustered indexes:
> > "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> > apply at all. The other authors appear to have missed this important
> point."
> > https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu
>
> > But, doing a quick check, it appears Postgres does indeed have a
> mechanism
> > for a clustered index:
> > https://www.postgresql.org/docs/9.5/static/sql-cluster.html
>
> CLUSTER just does a one-time sort to put the table into index order.
> There is no mechanism that would cause subsequent insertions of new keys
> to respect that ordering, so it's pretty much irrelevant to the argument
> about whether new UUID keys need to be generated in some ordered fashion.
>
> Do you actually *need* UUID keys, and if so why?  A plain old bigint
> column is smaller, cheaper to index, and the natural mechanism for
> generating it (ie a sequence) will tend to preserve ordering for free.
>
> regards, tom lane
>


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Melvin Davidson
On Tue, Aug 30, 2016 at 9:59 AM, Luke Gordon  wrote:

> I'm trying to decide on which UUID generator to use for my Postgres
> database, and I've narrowed it down to gen_random & uuid_generate_v1mc.
>
> There's a fascinating article that discusses performance implications
> between gen_random_uuid & uuid_generate_v1mc:
> https://www.starkandwayne.com/blog/uuid-primary-keys-in-postgresql/
> TL;DR, the article suggests:
> "Random produces very fragmented inserts that destroy tables. Use
> uuid_generate_v1mc() [instead].."
>
> However, according to a message on this mailing list, Postgres doesn't
> have clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu
>
> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html
>
> So, does
> anyone know which is best? Or are the performance differences so minute
> they'd only matter in extremely unique circumstances?
>
> Regards,
>
> Luke Gordon
>
> PS I attempted to reply to the above thread, but downloading the "raw"
> link resulted in a basic auth challenge.  If I entered my mailing list
> credentials, I'd just get rechallenged with basic auth...
>
>



*You did not mention your PostgreSQL version or O/S, which would be helpful
for future reference, but please take
note.https://www.postgresql.org/docs/9.4/static/sql-cluster.html
*

*Yes, PostgreSQL does have a mechanism to cluster the index, BUT... as
stated in the docs, any subsequent insert or update will not be clustered,
so a periodic *

*reCLUSTER is required to maintain it.*


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread Tom Lane
Luke Gordon  writes:
> However, according to a message on this mailing list, Postgres doesn't have
> clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu

> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html

CLUSTER just does a one-time sort to put the table into index order.
There is no mechanism that would cause subsequent insertions of new keys
to respect that ordering, so it's pretty much irrelevant to the argument
about whether new UUID keys need to be generated in some ordered fashion.

Do you actually *need* UUID keys, and if so why?  A plain old bigint
column is smaller, cheaper to index, and the natural mechanism for
generating it (ie a sequence) will tend to preserve ordering for free.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general