Re: [GENERAL] Alternatives to a unique indexes with NULL
On Sat, Jan 17, 2015 at 02:03:34PM +0100, Andreas Kretschmer wrote: Peter Hicks peter.hi...@poggs.co.uk wrote: All, I have a Rails application on 9.3 in which I want to enforce a unique index on a set of fields, one of which includes a NULL-able column. According to http://www.postgresql.org/docs/9.3/static/indexes-unique.html, btree indexes can't handle uniqueness on NULL columns, so I'm looking for another way to achieve what I need. somethink like that? : test=# create table peter_hicks (id int); CREATE TABLE Time: 1,129 ms test=*# create unique index idx_1 on peter_hicks ((case when id is null then 'NULL' else '' end)) where id is null; CREATE INDEX Time: 14,803 ms Note: COALESCE is probably the better choice here. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [GENERAL] Alternatives to a unique indexes with NULL
On Sat, Jan 17, 2015 at 6:27 AM, Peter Hicks peter.hi...@poggs.co.uk wrote: All, I have a Rails application on 9.3 in which I want to enforce a unique index on a set of fields, one of which includes a NULL-able column. According to http://www.postgresql.org/docs/9.3/static/indexes-unique.html, btree indexes can't handle uniqueness on NULL columns, so I'm looking for another way to achieve what I need. My initial thought is to replace the null with a single space (it's a character varying(1) column), which will require some changes to application code, but result in a cleaner process than the application enforcing the uniqueness constraint. Is there a better or cleaner way to do what I want? I read the above. As I understand it, you can have a unique index on a column which is NULL-able. That will guarantee that all the non-NULL values are unique. What it will not guarantee is that there will be at most one NULL value in the indexed column. Are you saying that what you want is a column with a unique index where you cannot have two or more rows with NULL in the indexed column? If so, then you will need to have a value to indicate the equivalent of NULL. Personally, I use a zero length string instead of a single blank ' '. This is value since you say this column is a character varying(1). Which seems a bit strange to me, but I don't know your application. Peter -- While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation. 111,111,111 x 111,111,111 = 12,345,678,987,654,321 Maranatha! John McKown
Re: [GENERAL] Alternatives to a unique indexes with NULL
Hi John On 17/01/15 12:39, John McKown wrote: I read the above. As I understand it, you can have a unique index on a column which is NULL-able. That will guarantee that all the non-NULL values are unique. What it will not guarantee is that there will be at most one NULL value in the indexed column. Are you saying that what you want is a column with a unique index where you cannot have two or more rows with NULL in the indexed column? That's correct - the application reads in a very old-format of fixed-length file and, if any field in there is just a single space, replaces it with a NULL, since a single space implies a null. However, only one of the records needs a constraint on one of these fields. If so, then you will need to have a value to indicate the equivalent of NULL. Personally, I use a zero length string instead of a single blank ' '. This is value since you say this column is a character varying(1). Which seems a bit strange to me, but I don't know your application. OK, that makes sense and it was more-or-less along the lines of what I expected. I like the idea of a zero-length string versus a single space, so I'll go implement that. I believe the column type a Rails-ism, which creates 'string' fields with a length constraint of 1 as 'character varying(1)'. Probably not ideal, but there's usually a trade-off somewhere. Thanks very much for your help and quick response! Peter -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alternatives to a unique indexes with NULL
Peter Hicks peter.hi...@poggs.co.uk wrote: All, I have a Rails application on 9.3 in which I want to enforce a unique index on a set of fields, one of which includes a NULL-able column. According to http://www.postgresql.org/docs/9.3/static/indexes-unique.html, btree indexes can't handle uniqueness on NULL columns, so I'm looking for another way to achieve what I need. somethink like that? : test=# create table peter_hicks (id int); CREATE TABLE Time: 1,129 ms test=*# create unique index idx_1 on peter_hicks ((case when id is null then 'NULL' else '' end)) where id is null; CREATE INDEX Time: 14,803 ms test=*# insert into peter_hicks values (1); INSERT 0 1 Time: 0,385 ms test=*# insert into peter_hicks values (2); INSERT 0 1 Time: 0,145 ms test=*# insert into peter_hicks values (null); INSERT 0 1 Time: 0,355 ms test=*# insert into peter_hicks values (null); ERROR: duplicate key value violates unique constraint idx_1 DETAIL: Key (( CASE WHEN id IS NULL THEN 'NULL'::text ELSE ''::text END))=(NULL) already exists. Time: 0,376 ms test=*# Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly. (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'alternatives'
Hi Christophe, On 2012-11-28 13:07:12 -0800, Christophe Pettus wrote: In a query plan, I noticed the following: Join Filter: (((all_permissions.role_recursive AND (alternatives: SubPlan 5 or hashed SubPlan 6)) OR (permitted_e.id = deployed_e.id)) AND (NOT (SubPlan 13))) Check the first item of http://www.postgresql.org/docs/current/interactive/release-9-2.html#AEN110503 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'alternatives'
Hi, Andres, Thanks! On Nov 28, 2012, at 1:58 PM, Andres Freund wrote: http://www.postgresql.org/docs/current/interactive/release-9-2.html#AEN110503 Does that apply to views as well? (This particular plan was not from a prepared or PL/pgSQL statement, but did include views.) -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'alternatives'
Hi, On 2012-11-28 14:16:18 -0800, Christophe Pettus wrote: Thanks! Not much to thank for, the answer was actually wrong... Does that apply to views as well? (This particular plan was not from a prepared or PL/pgSQL statement, but did include views.) Its not really relevant for views no. The real answer for this is that this actually a 8.4 feature not the aforementioned 9.2 feature. The commit introducing this is: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af What that does to add hashing support for IN(). But hashing can be pessimal in comparison to a explicit check if only a few values come in, so this can be checked at runtime after the above commit... Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'alternatives'
Andres Freund and...@anarazel.de writes: The commit introducing this is: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af What that does to add hashing support for IN(). But hashing can be pessimal in comparison to a explicit check if only a few values come in, so this can be checked at runtime after the above commit... Yeah. If you look at the subplans, one is designed for retail probes and the other is designed for sucking up the entire subquery result into a hashtable. EXPLAIN ANALYZE will show you that only one gets used at runtime. (The idea of dynamic switchover hasn't gotten anywhere yet.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?
On Wed, Aug 22, 2012 at 05:56:27PM -0700, Jeff Janes wrote: It's wide-ish, too, 98 columns. How many of the columns are NULL for any given row? Or perhaps better, what is the distribution of values for any given column? For a given column, is there some magic value (NULL, 0, 1, -1, , '') which most of the rows have? In particular, if the data is sparse, as in lots of NULLs, and you don't need to search on those, you might consider partial indexes. If you create partial indexes for only the non-NULL entries, postgres is smart enough to use it when you query it for something not NULL. Example: db=# create temp table foo (a int4, b int4); CREATE TABLE db=# insert into foo (a) select generate_series(1,10); INSERT 0 10 db=# update foo set b=1 where a=1; UPDATE 1 db=# create index bar on foo(b) where b is not null; CREATE INDEX db=# explain select * from foo where b=1; QUERY PLAN Bitmap Heap Scan on foo (cost=4.38..424.59 rows=500 width=8) Recheck Cond: (b = 1) - Bitmap Index Scan on bar (cost=0.00..4.26 rows=500 width=0) Index Cond: (b = 1) (4 rows) In this case a row update will only update indexes with non-NULL rows, which may cut the overhead considerably. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?
On Thu, Aug 16, 2012 at 1:54 PM, Wells Oliver wellsoli...@gmail.com wrote: Hey folks, a question. We have a table that's getting large (6 million rows right now, but hey, no end in sight). Does it grow in chunks, or one row at a time? It's wide-ish, too, 98 columns. How many of the columns are NULL for any given row? Or perhaps better, what is the distribution of values for any given column? For a given column, is there some magic value (NULL, 0, 1, -1, , '') which most of the rows have? The problem is that each of these columns needs to be searchable quickly at an application level, and I'm far too responsible an individual to put 98 indexes on a table. That is somewhat melodramatic. Sure, creating 98 indexes does not come for free. And it is great that you are aware of this. But just because they are not free does not mean they are not worth their cost. Look at all the other costs of using a RDBMS. Each letter of ACID does not come for free. But it is often worth the price. In the generic case, you have a large amount of data to index. Indexing a lot of data requires a lot of resources. There is magic bullet to this. Wondering what you folks have come across in terms of creative solutions that might be native to postgres. I can build something that indexes the data and caches it and runs separately from PG, but I wanted to exhaust all native options first. If the data is frequently updated/inserted, then how would you invalidate the cache when needed? And if the data is not frequently updated/inserted, then what about the obvious PG solution (building 96 indexes) is a problem? If your queries are of the nature of: where col1=:1 or col2=:1 or col3=:1 or ... col96=:1 or then a full text index would probably be a better option. Otherwise, it is hard to say. You could replace 96 columns with a single hstore column which has 96 different keys. But from what I can tell, maintaining a gin index on that hstore column would probably be slower than maintaining 96 individual btree indexes. And if you go with a gist index on the single hstore column, the cost of maintenance is greatly reduced relative to gin. But the index is basically useless, you might as well just drop the index and do the full table scan instead. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?
On Thu, Aug 16, 2012 at 2:54 PM, Wells Oliver wellsoli...@gmail.com wrote: Hey folks, a question. We have a table that's getting large (6 million rows right now, but hey, no end in sight). It's wide-ish, too, 98 columns. The problem is that each of these columns needs to be searchable quickly at an application level, and I'm far too responsible an individual to put 98 indexes on a table. Wondering what you folks have come across in terms of creative solutions that might be native to postgres. I can build something that indexes the data and caches it and runs separately from PG, but I wanted to exhaust all native options first. I submit that you're far better off working with the users to see which fields they really need indexes on, and especially which combinations of fields with functional and / or partial indexes serve them the best. To start with you can create indexes willy nilly if you want and then use the pg_stat*index tables to see which are or are not getting used and start pruning them as time goes by. But keep an eye out for long running queries with your logging and investigate to see what specialized indexes might help the most for those queries. Often a simple index on (a,b) where x is not null or something can give great improvements over any bitmap hash scans of multiple indexes ever could, especially on large data sets. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?
On Thu, Aug 16, 2012 at 3:54 PM, Wells Oliver wellsoli...@gmail.com wrote: Hey folks, a question. We have a table that's getting large (6 million rows right now, but hey, no end in sight). It's wide-ish, too, 98 columns. The problem is that each of these columns needs to be searchable quickly at an application level, and I'm far too responsible an individual to put 98 indexes on a table. Wondering what you folks have come across in terms of creative solutions that might be native to postgres. I can build something that indexes the data and caches it and runs separately from PG, but I wanted to exhaust all native options first. Well, you could explore normalizing your table, particularly if many of your 98 columns are null most of the time. Another option would be to implement hstore for attributes and index with GIN/GIST -- especially if you need to filter on multiple columns. Organizing big data for fast searching is a complicated topic and requires significant thought in terms of optimization. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general