Re: [GENERAL] Alternatives to a unique indexes with NULL

2015-01-18 Thread Martijn van Oosterhout
On Sat, Jan 17, 2015 at 02:03:34PM +0100, Andreas Kretschmer wrote:
 Peter Hicks peter.hi...@poggs.co.uk wrote:
 
  All,
 
  I have a Rails application on 9.3 in which I want to enforce a unique  
  index on a set of fields, one of which includes a NULL-able column.
 
  According to  
  http://www.postgresql.org/docs/9.3/static/indexes-unique.html, btree  
  indexes can't handle uniqueness on NULL columns, so I'm looking for  
  another way to achieve what I need.
 
 
 somethink like that? :
 
 test=# create table peter_hicks (id int);
 CREATE TABLE
 Time: 1,129 ms
 test=*# create unique index idx_1 on peter_hicks ((case when id is null
 then 'NULL' else '' end)) where id is null;
 CREATE INDEX
 Time: 14,803 ms


Note: COALESCE is probably the better choice here.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [GENERAL] Alternatives to a unique indexes with NULL

2015-01-17 Thread John McKown
On Sat, Jan 17, 2015 at 6:27 AM, Peter Hicks peter.hi...@poggs.co.uk
wrote:

 All,

 I have a Rails application on 9.3 in which I want to enforce a unique
 index on a set of fields, one of which includes a NULL-able column.

 According to http://www.postgresql.org/docs/9.3/static/indexes-unique.html,
 btree indexes can't handle uniqueness on NULL columns, so I'm looking for
 another way to achieve what I need.

 My initial thought is to replace the null with a single space (it's a
 character varying(1) column), which will require some changes to
 application code, but result in a cleaner process than the application
 enforcing the uniqueness constraint.

 Is there a better or cleaner way to do what I want?


​I read the above. As I understand it, you can have a unique index on a
column which is NULL-able. That will guarantee that all the non-NULL values
are unique. What it will not guarantee is that there will be at most one
NULL value in the indexed column. Are you saying that what you want is a
column with a unique index where you cannot have two or more rows with NULL
in the indexed column? ​If so, then you will need to have a value to
indicate the equivalent of NULL. Personally, I use a zero length string 
instead of a single blank ' '. This is value since you say this column is a
character varying(1). Which seems a bit strange to me, but I don't know
your application.






 Peter


-- 
​
While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity.  In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! 
John McKown


Re: [GENERAL] Alternatives to a unique indexes with NULL

2015-01-17 Thread Peter Hicks

Hi John

On 17/01/15 12:39, John McKown wrote:
​I read the above. As I understand it, you can have a unique index on 
a column which is NULL-able. That will guarantee that all the non-NULL 
values are unique. What it will not guarantee is that there will be at 
most one NULL value in the indexed column. Are you saying that what 
you want is a column with a unique index where you cannot have two or 
more rows with NULL in the indexed column?
That's correct - the application reads in a very old-format of 
fixed-length file and, if any field in there is just a single space, 
replaces it with a NULL, since a single space implies a null. However, 
only one of the records needs a constraint on one of these fields.
​If so, then you will need to have a value to indicate the equivalent 
of NULL. Personally, I use a zero length string  instead of a single 
blank ' '. This is value since you say this column is a character 
varying(1). Which seems a bit strange to me, but I don't know your 
application.
OK, that makes sense and it was more-or-less along the lines of what I 
expected.  I like the idea of a zero-length string versus a single 
space, so I'll go implement that.


I believe the column type a Rails-ism, which creates 'string' fields 
with a length constraint of 1 as 'character varying(1)'.  Probably not 
ideal, but there's usually a trade-off somewhere.


Thanks very much for your help and quick response!


Peter



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to a unique indexes with NULL

2015-01-17 Thread Andreas Kretschmer
Peter Hicks peter.hi...@poggs.co.uk wrote:

 All,

 I have a Rails application on 9.3 in which I want to enforce a unique  
 index on a set of fields, one of which includes a NULL-able column.

 According to  
 http://www.postgresql.org/docs/9.3/static/indexes-unique.html, btree  
 indexes can't handle uniqueness on NULL columns, so I'm looking for  
 another way to achieve what I need.


somethink like that? :

test=# create table peter_hicks (id int);
CREATE TABLE
Time: 1,129 ms
test=*# create unique index idx_1 on peter_hicks ((case when id is null
then 'NULL' else '' end)) where id is null;
CREATE INDEX
Time: 14,803 ms
test=*# insert into peter_hicks values (1);
INSERT 0 1
Time: 0,385 ms
test=*# insert into peter_hicks values (2);
INSERT 0 1
Time: 0,145 ms
test=*# insert into peter_hicks values (null);
INSERT 0 1
Time: 0,355 ms
test=*# insert into peter_hicks values (null);
ERROR:  duplicate key value violates unique constraint idx_1
DETAIL:  Key ((
CASE
WHEN id IS NULL THEN 'NULL'::text
ELSE ''::text
END))=(NULL) already exists.
Time: 0,376 ms
test=*#


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 'alternatives'

2012-11-28 Thread Andres Freund
Hi Christophe,

On 2012-11-28 13:07:12 -0800, Christophe Pettus wrote:
 In a query plan, I noticed the following:
 
  Join Filter: (((all_permissions.role_recursive AND
  (alternatives: SubPlan 5 or hashed SubPlan 6)) OR
  (permitted_e.id = deployed_e.id)) AND (NOT (SubPlan 13)))

Check the first item of
http://www.postgresql.org/docs/current/interactive/release-9-2.html#AEN110503

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 'alternatives'

2012-11-28 Thread Christophe Pettus
Hi, Andres,

Thanks!

On Nov 28, 2012, at 1:58 PM, Andres Freund wrote:

 http://www.postgresql.org/docs/current/interactive/release-9-2.html#AEN110503

Does that apply to views as well?  (This particular plan was not from a 
prepared or PL/pgSQL statement, but did include views.)

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 'alternatives'

2012-11-28 Thread Andres Freund
Hi,

On 2012-11-28 14:16:18 -0800, Christophe Pettus wrote:
 Thanks!

Not much to thank for, the answer was actually wrong...

 Does that apply to views as well?  (This particular plan was not from a 
 prepared or PL/pgSQL statement, but did include views.)

Its not really relevant for views no.

The real answer for this is that this actually a 8.4 feature not the
aforementioned 9.2 feature.

The commit introducing this is:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af

What that does to add hashing support for IN(). But hashing can be
pessimal in comparison to a explicit check if only a few values come in,
so this can be checked at runtime after the above commit...

Greetings,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 'alternatives'

2012-11-28 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 The commit introducing this is:
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af
 What that does to add hashing support for IN(). But hashing can be
 pessimal in comparison to a explicit check if only a few values come in,
 so this can be checked at runtime after the above commit...

Yeah.  If you look at the subplans, one is designed for retail probes
and the other is designed for sucking up the entire subquery result into
a hashtable.  EXPLAIN ANALYZE will show you that only one gets used at
runtime.  (The idea of dynamic switchover hasn't gotten anywhere yet.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-23 Thread Martijn van Oosterhout
On Wed, Aug 22, 2012 at 05:56:27PM -0700, Jeff Janes wrote:
  It's wide-ish, too, 98 columns.
 
 How many of the columns are NULL for any given row?  Or perhaps
 better, what is the distribution of values for any given column?  For
 a given column, is there some magic value (NULL, 0, 1, -1, , '')
 which most of the rows have?

In particular, if the data is sparse, as in lots of NULLs, and you
don't need to search on those, you might consider partial indexes.  If
you create partial indexes for only the non-NULL entries, postgres is
smart enough to use it when you query it for something not NULL. 
Example:

db=# create temp table foo (a int4, b int4);
CREATE TABLE
db=# insert into foo (a) select generate_series(1,10);
INSERT 0 10
db=# update foo set b=1 where a=1;
UPDATE 1
db=# create index bar on foo(b) where b is not null;
CREATE INDEX
db=# explain select * from foo where b=1;
 QUERY PLAN 

 Bitmap Heap Scan on foo  (cost=4.38..424.59 rows=500 width=8)
   Recheck Cond: (b = 1)
   -  Bitmap Index Scan on bar  (cost=0.00..4.26 rows=500 width=0)
 Index Cond: (b = 1)
(4 rows)

In this case a row update will only update indexes with non-NULL rows,
which may cut the overhead considerably.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-22 Thread Jeff Janes
On Thu, Aug 16, 2012 at 1:54 PM, Wells Oliver wellsoli...@gmail.com wrote:
 Hey folks, a question. We have a table that's getting large (6 million rows
 right now, but hey, no end in sight).

Does it grow in chunks, or one row at a time?

 It's wide-ish, too, 98 columns.

How many of the columns are NULL for any given row?  Or perhaps
better, what is the distribution of values for any given column?  For
a given column, is there some magic value (NULL, 0, 1, -1, , '')
which most of the rows have?

 The problem is that each of these columns needs to be searchable quickly at
 an application level, and I'm far too responsible an individual to put 98
 indexes on a table.

That is somewhat melodramatic.  Sure, creating 98 indexes does not
come for free.  And it is great that you are aware of this.  But just
because they are not free does not mean they are not worth their cost.
 Look at all the other costs of using a RDBMS.  Each letter of ACID
does not come for free.  But it is often worth the price.

In the generic case, you have a large amount of data to index.
Indexing a lot of data requires a lot of resources.  There is magic
bullet to this.

 Wondering what you folks have come across in terms of
 creative solutions that might be native to postgres. I can build something
 that indexes the data and caches it and runs separately from PG, but I
 wanted to exhaust all native options first.

If the data is frequently updated/inserted, then how would you
invalidate the cache when needed?  And if the data is not frequently
updated/inserted, then what about the obvious PG solution (building 96
indexes) is a problem?

If your queries are of the nature of:

where
  col1=:1 or
  col2=:1 or
  col3=:1 or
...
  col96=:1 or

then a full text index would probably be a better option.

Otherwise, it is hard to say.  You could replace 96 columns with a
single hstore column which has 96 different keys.  But from what I can
tell, maintaining a gin index on that hstore column would probably be
slower than maintaining 96 individual btree indexes.

And if you go with a gist index on the single hstore column, the cost
of maintenance is greatly reduced relative to gin.  But the index is
basically useless, you might as well just drop the index and do the
full table scan instead.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-22 Thread Scott Marlowe
On Thu, Aug 16, 2012 at 2:54 PM, Wells Oliver wellsoli...@gmail.com wrote:
 Hey folks, a question. We have a table that's getting large (6 million rows
 right now, but hey, no end in sight). It's wide-ish, too, 98 columns.

 The problem is that each of these columns needs to be searchable quickly at
 an application level, and I'm far too responsible an individual to put 98
 indexes on a table. Wondering what you folks have come across in terms of
 creative solutions that might be native to postgres. I can build something
 that indexes the data and caches it and runs separately from PG, but I
 wanted to exhaust all native options first.

I submit that you're far better off working with the users to see
which fields they really need indexes on, and especially which
combinations of fields with functional and / or partial indexes serve
them the best.

To start with you can create indexes willy nilly if you want and then
use the pg_stat*index tables to see which are or are not getting used
and start pruning them as time goes by.  But keep an eye out for long
running queries with your logging and investigate to see what
specialized indexes might help the most for those queries.  Often a
simple index on (a,b) where x is not null or something can give great
improvements over any bitmap hash scans of multiple indexes ever
could, especially on large data sets.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-16 Thread Merlin Moncure
On Thu, Aug 16, 2012 at 3:54 PM, Wells Oliver wellsoli...@gmail.com wrote:
 Hey folks, a question. We have a table that's getting large (6 million rows
 right now, but hey, no end in sight). It's wide-ish, too, 98 columns.

 The problem is that each of these columns needs to be searchable quickly at
 an application level, and I'm far too responsible an individual to put 98
 indexes on a table. Wondering what you folks have come across in terms of
 creative solutions that might be native to postgres. I can build something
 that indexes the data and caches it and runs separately from PG, but I
 wanted to exhaust all native options first.

Well, you could explore normalizing your table, particularly if many
of your 98 columns are null most of the time.  Another option would be
to implement hstore for attributes and index with GIN/GIST --
especially if you need to filter on multiple columns.  Organizing big
data for fast searching is a complicated topic and requires
significant thought in terms of optimization.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general