[GENERAL] Migrating from 9.2.4 to 9.3.0 with XML DOCTYPE

2014-06-04 Thread Tim Kane
Hi all,

I’m migrating a database from 9.2.4 to 9.3.0 and encountering an issue with
an XML field failing to restore.

I’m using pg_dump 9.3.0 as follows (with 9.2.4 on 5433 and 9.3.0 on 5432).
I have verified my paths, it’s definitely using the 9.3.0 binaries. Both
source and target are using UTF8 encoding.


pg_dump -p 5433 content_xml_test | psql content_xml_test -p 5432 –v
ON_ERROR_STOP=1

CREATE SEQUENCE
Time: 0.870 ms
ALTER TABLE
Time: 0.199 ms
ALTER SEQUENCE
Time: 0.380 ms
ALTER TABLE
Time: 0.931 ms
Time: 10.883 ms

ERROR:  invalid XML content
DETAIL:  line 1: StartTag: invalid element name

 ^
CONTEXT:  COPY xml_test, line 13, column document: "



However, if I do the following:


ALTER DATABASE content_xml_test SET XMLOPTION TO DOCUMENT;

ERROR:  invalid XML document
DETAIL:  line 1: Start tag expected, '<' not found
1046710
^
CONTEXT:  COPY test_lookup, line 1, column external_ref: "{1046710,1046711}"
Time: 14.828 ms


This is even more interesting, since the external_ref column is a totally
different field, and a different type.  I suspect the failure is still
valid, but it’s reporting the wrong field name as being the culprit.


I suspect this might be resolved in a newer point release of 9.3, however I
don’t have the luxury on this particular host right now.

Any suggestions how I might convince 9.3.0 to accept the dump from 9.2.4?


Cheers,

Tim




[GENERAL] Migrating from 9.2.4 to 9.3.0 with XML DOCTYPE

2014-06-04 Thread Tim Kane
Hi all,

I’m migrating a database from 9.2.4 to 9.3.0 and encountering an issue with an 
XML field failing to restore.

I’m using pg_dump 9.3.0 as follows (with 9.2.4 on 5433 and 9.3.0 on 5432).  I 
have verified my paths, it’s definitely using the 9.3.0 binaries. Both source 
and target are using UTF8 encoding.


pg_dump -p 5433 content_xml_test | psql content_xml_test -p 5432 –v 
ON_ERROR_STOP=1

CREATE SEQUENCE
Time: 0.870 ms
ALTER TABLE
Time: 0.199 ms
ALTER SEQUENCE
Time: 0.380 ms
ALTER TABLE
Time: 0.931 ms
Time: 10.883 ms

ERROR:  invalid XML content
DETAIL:  line 1: StartTag: invalid element name

 ^
CONTEXT:  COPY xml_test, line 13, column document: "



However, if I do the following:


ALTER DATABASE content_xml_test SET XMLOPTION TO DOCUMENT;

ERROR:  invalid XML document
DETAIL:  line 1: Start tag expected, '<' not found
1046710
^
CONTEXT:  COPY test_lookup, line 1, column external_ref: "{1046710,1046711}"
Time: 14.828 ms


This is even more interesting, since the external_ref column is a totally 
different field, and a different type.  I suspect the failure is still valid, 
but it’s reporting the wrong field name as being the culprit.


I suspect this might be resolved in a newer point release of 9.3, however I 
don’t have the luxury on this particular host right now.

Any suggestions how I might convince 9.3.0 to accept the dump from 9.2.4?


Cheers,

Tim


Tim Kane
Senior Development Engineer Lead
t: +44 20 7092 8700
m:
blinkbox music - the easiest way to listen to the music you love, for free
www.blinkboxmusic.com



Re: [GENERAL] Migrating from 9.2.4 to 9.3.0 with XML DOCTYPE

2014-06-04 Thread Tim Kane
> 
> 
> From:  Tom Lane 
> 
> Hm, can you restore it into 9.2 either?
> 
> AFAICS, pg_dump has absolutely no idea that it should be worried about the
> value of xmloption, despite the fact that that setting affects what is
> considered valid XML data.  What's worse, even if it were attempting to do
> something about xmloption, I don't see how it could deal with a case like
> this where you have different values inside the same database that require
> two different settings in order to parse.
> 
> This isn't a 9.3.x bug, it's an aboriginal misdesign of the XML datatype.
> Not sure what we can do about it at this point.  Perhaps we could invent
> a "document_or_content" setting that would tell xml_in to accept either
> case?  And then have pg_dump force that setting to be used during restore?


This sounds reasonable. My use case is purely as a document store, with the
ability to perform xml parse functions against it – as such, I’m not
concerned wether it’s a document or content – hence why we have both types
recorded against that field.

For the minute, I’m getting around the restore problem by mangling the dump
such that the table is created using the text type rather than xml.  This at
least gets the data onto a 9.3 cluster, even if it’s cosmetically
represented as text instead of xml.  I can worry about the document vs
content problem at a later stage.

> 
> PS: BTW, I agree with the advice expressed by David J: under no
> circumstances put any data you care about on 9.3.0.  That release
> was rather a disaster from a quality-control standpoint :-(
> But that's unrelated to your XML issue.


Ack. Thanks for the info. I’ll push the upgrade-path agenda a little harder.









[GENERAL] Feature request: pg_restore with tablespace rules

2013-05-13 Thread Tim Kane
Hi all...

Forgive me if this is not the correct forum for such suggestions.

I'm about to begin using the custom format options for pg_restore, so that
we can leverage the performance gains from --jobs options.
However I would like to be able to specify the tablespace for individual
tables/indexes at the time of the restore operation.

Is this something that could be achieved using the pg_restore -L  style
list files?  I realise that there is no tablespace information exposed in
this output at this time, but it could be a logical way to achieve this
kind of behaviour in the future?

For now I will look at ALTER'ing the individual tables/indexes post-restore

Cheers,

Tim


[GENERAL] Foreign Key constraint violation

2013-06-18 Thread Tim Kane
Hi all,

I've discovered one of our databases has managed to get into a state where
it is violating a foreign key constraint.

The facts are these:

Table B (row_id) references Table A (row_id).
Table B contains multiple row_id's that do not exist in Table A.
There also exists a BEFORE INSERT OR UPDATE trigger on Table B.

The entirety of the trigger procedure is:
BEGIN
   NEW.updated_at = now();
   RETURN NEW;
END;


A dump/restore of these tables on another instance fails to create the
foreign key constraint (as you would expect).

I'm unable to recreate these conditions elsewhere, it simply is.

We are running Postgres 9.2.1

My questions are:

- Are there any known issues around foreign key integrity in this version
of Postgres?
- Are there any health-checks that I can perform against a foreign-key
index?

I presume the only reasonable fix for this is to drop the constraint,
remove the offending row(s) and recreate the constraint?

Thanks

Tim


[GENERAL] Order by with column ordinal and collate - fails to parse

2013-06-20 Thread Tim Kane
Hi all,

I seem to be having problems constructing a query that combines the use of
distinct, order by, and collate key words.

For instance:

# select distinct(value)  from properties order by 1 collate "C";
ERROR:  collations are not supported by type integer
LINE 1: ... distinct(value)  from properties order by 1 collate "C...
 ^
# select distinct(value)  from properties order by distinct(value) collate
"C";
ERROR:  syntax error at or near "distinct"
LINE 1: ...ct distinct(value)  from properties order by distinct(v...
 ^
# select distinct(value) as foo from properties order by foo collate "C";
ERROR:  column "foo" does not exist
LINE 1: ...tinct(value) as foo from properties order by foo collat...


Am I just being a numpty here? I can work around it with a subquery, but
this seems like a bug to me. Particularly the first example where my
ordinal field reference is treated as an integer literal. I should note
that the field 'value' is of type 'text' (not integer).

Any input appreciated. Thanks :)


Re: [GENERAL] Order by with column ordinal and collate - fails to parse

2013-06-20 Thread Tim Kane
Nice one. Yep, that works. Cheers..
I'll submit a bug report for this, though I'm on the fence as to wether
this is actually a bug per se.. I would have reasonably expected my
original syntax to have worked (as it does without any ORDER BY)..


On Thu, Jun 20, 2013 at 3:44 PM, Adrian Klaver wrote:

> On 06/20/2013 07:05 AM, Tim Kane wrote:
>
>> Hi all,
>>
>> I seem to be having problems constructing a query that combines the use
>> of distinct, order by, and collate key words.
>>
>> For instance:
>>
>> # select distinct(value)  from properties order by 1 collate "C";
>> ERROR:  collations are not supported by type integer
>> LINE 1: ... distinct(value)  from properties order by 1 collate "C...
>>   ^
>>
>
> How about:
>
> select distinct(value) collate "C" from properties order by 1 ;
>
>
>
>> Am I just being a numpty here? I can work around it with a subquery, but
>> this seems like a bug to me. Particularly the first example where my
>> ordinal field reference is treated as an integer literal. I should note
>> that the field 'value' is of type 'text' (not integer).
>>
>> Any input appreciated. Thanks :)
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@gmail.com
>


[GENERAL] Hot Standby and Foreign Tables

2013-07-18 Thread Tim Kane
Hi all,

I'm currently playing a few ideas out in my head and wanted to see if this
was feasible. There may be some limitation I am not aware of.

I would like to create a foreign table into a hot standby instance using
FDW's.  I appreciate that the hot standby is always going to be read-only,
and that foreign tables are currently read-only anyway.

Can anyone tell me if there is any reason this wouldn't work?

I'm unable to test it just yet as I've not setup replication, nor am I on a
recent enough postgres. yet. ;-)


While I'm at it, is there any facility (future?) to provide a *foreign* *
schema* variant - such that I could access an entire schema using FDW's?

Cheers,

Tim


Re: [GENERAL] Hot Standby and Foreign Tables

2013-07-19 Thread Tim Kane
Brilliant. Thanks Michael. That looks great.






On Thu, Jul 18, 2013 at 11:54 PM, Michael Paquier  wrote:

>
>
> On Thu, Jul 18, 2013 at 11:15 PM, Tim Kane  wrote:
> > Hi all,
> >
> > I'm currently playing a few ideas out in my head and wanted to see if
> this
> > was feasible. There may be some limitation I am not aware of.
> >
> > I would like to create a foreign table into a hot standby instance using
> > FDW's.  I appreciate that the hot standby is always going to be
> read-only,
> > and that foreign tables are currently read-only anyway.
> >
> > Can anyone tell me if there is any reason this wouldn't work?
> Yes it will work, the only thing necessary is to create the foreign table
> on master. Then the foreign table created is propagated to the slaves.
>
> Here is the example of a foreign table referencing a table created on
> master and linked directly to the slave, using postgres_fdw. The cluster
> uses one master in sync with one slave, master listening to port 5432 and
> slave to 5532 on the same server.
> $ ps x | egrep "5432|5532"
>   787 pts/0S  0:00 /home/mpaquier/bin/pgsql/bin/postgres -D
> /home/mpaquier/bin/pgsql/master -i -p 5432
>   809 pts/0S  0:00 /home/mpaquier/bin/pgsql/bin/postgres -D
> /home/mpaquier/bin/pgsql/slave1 -i -p 5532
>
> $ psql postgres
> postgres=# SELECT pid, application_name,
> pg_xlog_location_diff(sent_location, flush_location) AS replay_delta,
> sync_state FROM pg_stat_replication;
>  pid | application_name | replay_delta | sync_state
> -+--+--+
>  821 | slave1   |0 | sync
> (1 row)
> postgres=# CREATE EXTENSION postgres_fdw;
> CREATE EXTENSION
> postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
> OPTIONS (host 'localhost', port '5532', dbname 'postgres');
> CREATE SERVER
> postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS
> (password '');
> CREATE USER MAPPING
> postgres=# CREATE TABLE foo AS SELECT generate_series(1,3) AS a;
> SELECT 3
> postgres=# CREATE FOREIGN TABLE aa_foreign (a int) SERVER postgres_server
> OPTIONS (table_name 'foo');
> CREATE FOREIGN TABLE
> postgres=# select * from aa_foreign;
>  a
> ---
>  1
>  2
>  3
> (3 rows)
> In this case the process has read the data directly from table from with a
> foreign table queried on master.
>
> As the foreign table has been created on master, you can as well read the
> foreign table directly on slave (in this case the foreign table will simply
> read data on the same node as you connect to).
> $ psql -p 5532 postgres -c 'select * from aa_foreign'
>  a
> ---
>  1
>  2
>  3
> (3 rows)
>
> Of course you can as well link multiple clusters like that, etc.
>
> >
> > I'm unable to test it just yet as I've not setup replication, nor am I
> on a
> > recent enough postgres. yet. ;-)
> There are many scripts and manuals around to help you in that.
>
> > While I'm at it, is there any facility (future?) to provide a foreign
> schema
> > variant - such that I could access an entire schema using FDW's?
> No plans AFAIK.
>
> Regards,
> --
> Michael
>


Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Okay, so I've played with this a bit more. I think I have it cracked.

I had to increase random_page_cost and significantly reduce 
effective_cache_size in order for the planner to prefer a sequential scan. (It 
turns out this is what the 8.3.8 instance was doing all along, so it's not 
anything specific to 9.1.9).

Assuming that effective_cache_size has no bearing on postgres behaviour outside 
of query planning, then I guess that's ok. It must be simply that the index 
based method causes a lot of random order reads of the relation.

A better way however, seems to be clustering the table based on the address_id 
index.  This seems to have done the job nicely, allowing the pages to be read 
in the order they're recorded on disk.  In fact, it performs incredibly well 
now. Who knew! :)


=# explain (analyse,buffers) select * from archive.users inner join 
live.addresses using (address_id);
  QUERY 
PLAN  
--
 Merge Join  (cost=756.82..151865.53 rows=1982043 width=131) (actual 
time=6.417..3851.314 rows=1862872 loops=1)
   Merge Cond: (addresses.address_id = users.address_id)
   Buffers: shared hit=10 read=65799
   ->  Index Scan using addresses_pkey on addresses  (cost=0.00..52602.26 
rows=1872220 width=22) (actual time=0.011..638.291 rows=1872220 loops=1)
 Buffers: shared hit=6 read=18328
   ->  Index Scan using address_id_users on users  (cost=0.00..2630954.66 
rows=74700184 width=117) (actual time=6.391..1657.213 rows=1862873 loops=1)
 Buffers: shared hit=4 read=47471
 Total runtime: 3954.146 ms
(8 rows)



Again, just thinking out loud here..   In a scenario where specific clustering 
isn't an option...
I wonder if the query planner should consider the physical 
distribution/ordering of values on disk, and use that as a factor when applying 
the random_page_cost in the QEP's?

I'm sure I've missed something fundamental here, happy to be corrected :)

Cheers,

Tim 







On 10 Aug 2013, at 18:32, Timothy Kane  wrote:

> 
> Hi all,
> 
> I seem to be having some grief with the 9.1.9 query planner favouring an 
> index scan + merge join, over a sequential scan + hash join.
> Logically I would have considered the index+merge to be faster, as suggested 
> by the explain output - but in practice, it is in fact slower by orders of 
> magnitude.
> 
> In my timings below, I've tried to reduce the impact of any OS or 
> shared_buffer level caching (restarting postgres, and flushing OS cache 
> between queries-).
> 
> I've provided my settings as shown:
> 
> 
> =# show seq_page_cost;
>  seq_page_cost 
> ---
>  1
> (1 row)
> 
> Time: 0.355 ms
> =# show random_page_cost;
>  random_page_cost 
> --
>  2.2
> (1 row)
> 
> Time: 0.084 ms
> =# show cpu_tuple_cost;
>  cpu_tuple_cost 
> 
>  0.01
> (1 row)
> 
> Time: 0.077 ms
> =# show cpu_index_tuple_cost;
>  cpu_index_tuple_cost 
> --
>  0.005
> (1 row)
> 
> Time: 0.065 ms
> =# show cpu_operator_cost;
>  cpu_operator_cost 
> ---
>  0.0025
> (1 row)
> 
> Time: 0.064 ms
> =# show effective_cache_size;
>  effective_cache_size 
> --
>  12GB
> (1 row)
> 
> 
> 
> -- QEP's for 9.1.9
> =# explain (analyse,buffers) select * from archive.users inner join 
> live.addresses using (address_id);
> 
>QUERY 
> PLAN   
> 
>  Merge Join  (cost=18.79..159615.38 rows=1887786 width=131) (actual 
> time=0.023..602386.955 rows=1862872 loops=1)
>Merge Cond: (addresses.address_id = users.address_id)
>Buffers: shared hit=1655113 read=382486
>->  Index Scan using addresses_pkey on addresses  (cost=0.00..52609.75 
> rows=1872220 width=22) (actual time=0.008..1440.294 rows=1872220 loops=1)
>  Buffers: shared hit=473352 read=18328
>->  Index Scan using address_id_users on users  (cost=0.00..3075311.08 
> rows=73741592 width=117) (actual time=0.005..598455.258 rows=1862873 loops=1)
>  Buffers: shared hit=1181761 read=364158
>  Total runtime: 602548.352 ms
> (8 rows)
> 
> Time: 603090.399 ms
> 
> 
> 
> =# set enable_indexscan=off;
> SET
> Time: 0.219 ms
> =# explain (analyse,buffers) select * from archive.users inner join 
> live.addresses using (address_id);
> QUERY PLAN
> 
> ---

Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Ahh, thanks Tom.
I hadn't seen your email before I posted my own followup.

I guess the clustering approach managed to work around the need to mess with 
the statistics target.  I did previously increase the target to 1000 (from 100) 
for that field and had no impact, but this is an aspect of tuning I'm not so 
familiar with - I didn't consider pushing it all the way to 11.


On 11 Aug 2013, at 00:28, Tom Lane  wrote:

> Timothy Kane  writes:
>> I seem to be having some grief with the 9.1.9 query planner favouring an 
>> index scan + merge join, over a sequential scan + hash join.
> 
> I believe the reason it's preferring the merge join plan is that it thinks
> the executor will be able to terminate the merge join early as a
> consequence of the range of join keys in "addresses" being only a fraction
> of the range of join keys in "users".  Notice that the total estimated
> cost for the merge join is just a fraction of the full estimated cost of
> the indexscan on "users"; the only way that's possible is if the indexscan
> on "users" doesn't have to run through all of the table.  Probably, the
> range of join keys is wider than the planner thinks and so the merge join
> can't terminate early.  The fix therefore is to crank the stats target for
> "addresses" up high enough that you get a reasonable value in pg_statistic
> for the largest address_id value (look at the last histogram entry).
> 
>> Interestingly, on another instance of this same database running on postgres 
>> 8.3.8, the query planner correctly chooses the sequential scan method - 
>> having more sane cost estimates for the index scan method.
> 
> I think the 8.3 planner didn't take this effect into account.  Or maybe it
> did, but by chance the upper histogram value is closer to reality on the
> older database.
> 
>   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Yep, the effective_cache_size was specifically because we have lots of RAM to 
play with, and want to ensure we are caching wherever possible.
The reduced random_page_cost was primarily to allow for the fact we're using 
relatively fast disk (indexes are SSD, table data on SAS drives), though I 
didn't fully appreciate how the combination of these settings can influence the 
preference towards a sequential vs index scan.

I think i'll stop tweaking for now, and see how it performs in the next few 
days.  I feel like I have a much better handle on how the planner is pulling 
everything together. Cheers.

Tim


On 11 Aug 2013, at 01:38, Tom Lane  wrote:

> Tim Kane  writes:
>> I guess the clustering approach managed to work around the need to mess with 
>> the statistics target.  I did previously increase the target to 1000 (from 
>> 100) for that field and had no impact, but this is an aspect of tuning I'm 
>> not so familiar with - I didn't consider pushing it all the way to 11.
> 
> Yeah, I had actually started to write an email recommending that you dial
> down effective_cache_size and increase random_page_cost, before I noticed
> the discrepancy in the merge join cost and realized what was really going
> on.
> 
> The question now is why you had those settings like that before, and
> whether changing them back in the direction of the defaults might not be
> pessimizing the behavior for other queries.  If you have a lot of RAM and
> mostly-cached queries, the previous settings didn't sound unreasonable.
> 
>   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] WHERE 'Something%' LIKE ANY (array_field)

2013-08-14 Thread Tim Kane
Hi all,

It seems like it isn't possible to perform a wildcard LIKE evaluation
against array objects. Is this a bug, or just unsupported?


See the queries in bold, that I would have expected to return a value.
Postgres 9.1.9


=# create temp table ids (id varchar(12)[]);
CREATE TABLE
Time: 185.516 ms
=# insert into ids values ('{s1,s452334,s89}');
INSERT 0 1
Time: 0.728 ms
=# insert into ids values ('{s89}');
INSERT 0 1
Time: 0.300 ms
=# insert into ids values ('{s9323,s893}');
INSERT 0 1
Time: 0.133 ms
=# insert into ids values ('{s9323,s893,s89}');
INSERT 0 1
Time: 0.110 ms
=# select * from ids;
id
--
 {s1,s452334,s89}
 {s89}
 {s9323,s893}
 {s9323,s893,s89}
(4 rows)

Time: 0.155 ms
=# select * from ids where 's89' = ANY (id);
id
--
 {s1,s452334,s89}
 {s89}
 {s9323,s893,s89}
(3 rows)

Time: 0.121 ms
*clone=# select * from ids where 's45%' LIKE ANY (id);*
 id

(0 rows)

Time: 0.124 ms

*clone=# select * from ids where 's452334%' LIKE ANY (id);*
 id

(0 rows)

Time: 0.278 ms
clone=# select * from ids where 's452334' LIKE ANY (id);
id
--
 {s1,s452334,s89}
(1 row)

Time: 0.134 ms
clone=# select * from ids where 's452334' = ANY (id);
id
--
 {s1,s452334,s89}
(1 row)


Re: [GENERAL] WHERE 'Something%' LIKE ANY (array_field)

2013-08-14 Thread Tim Kane
Thanks Pavel,

Unless I'm being bleary eyed and not quite grasping it...  I'm not sure
that answers my question.
I'm using a single LIKE clause against an array parameter, rather than
multiple LIKE clauses against a single parameter.


It seems I'm so far stuck with a FOREACH style traversal within plpgsql
(which is fine, as this is all to be used within a function anyway).




On Wed, Aug 14, 2013 at 10:55 AM, Pavel Stehule wrote:

> Hello
>
> http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#LIKE_to_list_of_patterns
>
> Regards
>
> Pavel Stehule
>
>
> 2013/8/14 Tim Kane 
>
>> Hi all,
>>
>> It seems like it isn't possible to perform a wildcard LIKE evaluation
>> against array objects. Is this a bug, or just unsupported?
>>
>>
>> See the queries in bold, that I would have expected to return a value.
>> Postgres 9.1.9
>>
>>
>> =# create temp table ids (id varchar(12)[]);
>> CREATE TABLE
>> Time: 185.516 ms
>> =# insert into ids values ('{s1,s452334,s89}');
>> INSERT 0 1
>> Time: 0.728 ms
>> =# insert into ids values ('{s89}');
>> INSERT 0 1
>> Time: 0.300 ms
>> =# insert into ids values ('{s9323,s893}');
>> INSERT 0 1
>> Time: 0.133 ms
>> =# insert into ids values ('{s9323,s893,s89}');
>> INSERT 0 1
>> Time: 0.110 ms
>> =# select * from ids;
>> id
>> --
>>  {s1,s452334,s89}
>>  {s89}
>>  {s9323,s893}
>>  {s9323,s893,s89}
>> (4 rows)
>>
>> Time: 0.155 ms
>> =# select * from ids where 's89' = ANY (id);
>> id
>> --
>>  {s1,s452334,s89}
>>  {s89}
>>  {s9323,s893,s89}
>> (3 rows)
>>
>> Time: 0.121 ms
>> *clone=# select * from ids where 's45%' LIKE ANY (id);*
>>  id
>> 
>> (0 rows)
>>
>> Time: 0.124 ms
>>
>> *clone=# select * from ids where 's452334%' LIKE ANY (id);*
>>  id
>> 
>> (0 rows)
>>
>> Time: 0.278 ms
>> clone=# select * from ids where 's452334' LIKE ANY (id);
>> id
>> --
>>  {s1,s452334,s89}
>> (1 row)
>>
>> Time: 0.134 ms
>> clone=# select * from ids where 's452334' = ANY (id);
>> id
>> --
>>  {s1,s452334,s89}
>> (1 row)
>>
>
>


Re: [GENERAL] Convincing the query planner to play nice

2013-08-21 Thread Tim Kane

Thanks Jeff. These queries in particular relate to a set of data that is
rebuilt on a periodic basis. For all intents and purposes, the data is
newly populated and unlikely to reside in cache - hence the need to
perform my tests under similar conditions.

It's probably better than I adjust the random_page_cost for that
particular session, and leave things be otherwise.


Cheers.



On 13/08/2013 17:27, "Jeff Janes"  wrote:

>On Sat, Aug 10, 2013 at 10:32 AM, Timothy Kane  wrote:
>>
>> Hi all,
>>
>> I seem to be having some grief with the 9.1.9 query planner favouring an
>> index scan + merge join, over a sequential scan + hash join.
>> Logically I would have considered the index+merge to be faster, as
>>suggested
>> by the explain output - but in practice, it is in fact slower by orders
>>of
>> magnitude.
>>
>> In my timings below, I've tried to reduce the impact of any OS or
>> shared_buffer level caching (restarting postgres, and flushing OS cache
>> between queries-).
>
>
>Are you sure that that is the right thing to do?  It seems unlikely
>that your production server is constantly executing your query from a
>cold start.  Why test it that way?
>
>>
>> I've provided my settings as shown:
>>
>>
>> =# show seq_page_cost;
>>  seq_page_cost
>> ---
>>  1
>> (1 row)
>>
>> Time: 0.355 ms
>> =# show random_page_cost;
>>  random_page_cost
>> --
>>  2.2
>> (1 row)
>
>Given that you are testing your query from a cold start (and assuming
>against odds that that is the correct thing to do), 2.2 is probably a
>factor of 20 too small for this setting.
>
>Cheers,
>
>Jeff




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] UDP buffer drops / statistics collector

2017-04-18 Thread Tim Kane
Hi all,

I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far
I've not been able to resolve.

The drops are originating from postgres processes, and from what I know -
the only UDP traffic generated by postgres should be consumed by the
statistics collector - but for whatever reason, it's failing to read the
packets quickly enough.

Interestingly, I'm seeing these drops occur even when the system is idle..
 but every 15 minutes or so (not consistently enough to isolate any
particular activity) we'll see in the order of ~90 packets dropped at a
time.

I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on the
same hardware)


If it's relevant..  there are two instances of postgres running (and
consequently, 2 instances of the stats collector process) though 1 of those
instances is most definitely idle for most of the day.

In an effort to try to resolve the problem, I've increased (x2) the UDP
recv buffer sizes on the host - but it seems to have had no effect.

cat /proc/sys/net/core/rmem_max
1677216

The following parameters are configured

track_activities on
track_counts on
track_functions  none
track_io_timing  off


There are approximately 80-100 connections at any given time.

It seems that the issue started a few weeks ago, around the time of a
reboot on the given host... but it's difficult to know what (if anything)
has changed, or why :-/


Incidentally... the documentation doesn't seem to have any mention of UDP
whatsoever.  I'm going to use this as an opportunity to dive into the
source - but perhaps it's worth improving the documentation around this?

My next step is to try disabling track_activities and track_counts to see
if they improve matters any, but I wouldn't expect these to generate enough
data to flood the UDP buffers :-/

Any ideas?


Re: [GENERAL] UDP buffer drops / statistics collector

2017-04-18 Thread Tim Kane
Okay, so I've run an strace on the collector process during a buffer drop
event.
I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.

While I've had already increased rmem_max, it would appear this is not
being observed by the kernel.
rmem_default is set to 124kb, which would explain the above read maxing out
just slightly beyond this (presuming a ring buffer filling up behind the
read).

I'm going to try increasing rmem_default and see if it has any positive
effect.. (and then investigate why the kernel doesn't want to consider
rmem_max)..





On Tue, Apr 18, 2017 at 8:05 AM Tim Kane  wrote:

> Hi all,
>
> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far
> I've not been able to resolve.
>
> The drops are originating from postgres processes, and from what I know -
> the only UDP traffic generated by postgres should be consumed by the
> statistics collector - but for whatever reason, it's failing to read the
> packets quickly enough.
>
> Interestingly, I'm seeing these drops occur even when the system is idle..
>  but every 15 minutes or so (not consistently enough to isolate any
> particular activity) we'll see in the order of ~90 packets dropped at a
> time.
>
> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on the
> same hardware)
>
>
> If it's relevant..  there are two instances of postgres running (and
> consequently, 2 instances of the stats collector process) though 1 of those
> instances is most definitely idle for most of the day.
>
> In an effort to try to resolve the problem, I've increased (x2) the UDP
> recv buffer sizes on the host - but it seems to have had no effect.
>
> cat /proc/sys/net/core/rmem_max
> 1677216
>
> The following parameters are configured
>
> track_activities on
> track_counts on
> track_functions  none
> track_io_timing  off
>
>
> There are approximately 80-100 connections at any given time.
>
> It seems that the issue started a few weeks ago, around the time of a
> reboot on the given host... but it's difficult to know what (if anything)
> has changed, or why :-/
>
>
> Incidentally... the documentation doesn't seem to have any mention of UDP
> whatsoever.  I'm going to use this as an opportunity to dive into the
> source - but perhaps it's worth improving the documentation around this?
>
> My next step is to try disabling track_activities and track_counts to see
> if they improve matters any, but I wouldn't expect these to generate enough
> data to flood the UDP buffers :-/
>
> Any ideas?
>
>
>
>


Re: [GENERAL] UDP buffer drops / statistics collector

2017-04-19 Thread Tim Kane
Well, this is frustrating..
The buffer drops are still occurring - so I thought it worth trying use a
ramdisk and set *stats_temp_directory* accordingly.

I've reloaded the instance, and can see that the stats directory is now
being populated in the new location.  *Except* - there is one last file (
pgss_query_texts.stat) that continues to be updated in the *old* pg_stat_tmp
path..  Is that supposed to happen?


Fairly similar to this guy (but not quite the same).
https://www.postgresql.org/message-id/d6e71befad7beb4fbcd8ae74fadb1265011bb40fc...@win-8-eml-ex1.eml.local

I can see the packets arriving and being consumed by the collector..  and,
the collector is indeed updating in the new stats_temp_directory.. just not
for that one file.


It also failed to resolve the buffer drops.. At this point, I'm not sure I
expected it to.  They tend to occur semi-regularly (every 8-13 minutes) but
I can't correlate them with any kind of activity (and if I'm honest, it's
possibly starting to drive me a little bit mad).




On Tue, Apr 18, 2017 at 2:53 PM Tim Kane  wrote:

> Okay, so I've run an strace on the collector process during a buffer drop
> event.
> I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.
>
> While I've had already increased rmem_max, it would appear this is not
> being observed by the kernel.
> rmem_default is set to 124kb, which would explain the above read maxing
> out just slightly beyond this (presuming a ring buffer filling up behind
> the read).
>
> I'm going to try increasing rmem_default and see if it has any positive
> effect.. (and then investigate why the kernel doesn't want to consider
> rmem_max)..
>
>
>
>
>
> On Tue, Apr 18, 2017 at 8:05 AM Tim Kane  wrote:
>
>> Hi all,
>>
>> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far
>> I've not been able to resolve.
>>
>> The drops are originating from postgres processes, and from what I know -
>> the only UDP traffic generated by postgres should be consumed by the
>> statistics collector - but for whatever reason, it's failing to read the
>> packets quickly enough.
>>
>> Interestingly, I'm seeing these drops occur even when the system is
>> idle..  but every 15 minutes or so (not consistently enough to isolate any
>> particular activity) we'll see in the order of ~90 packets dropped at a
>> time.
>>
>> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on
>> the same hardware)
>>
>>
>> If it's relevant..  there are two instances of postgres running (and
>> consequently, 2 instances of the stats collector process) though 1 of those
>> instances is most definitely idle for most of the day.
>>
>> In an effort to try to resolve the problem, I've increased (x2) the UDP
>> recv buffer sizes on the host - but it seems to have had no effect.
>>
>> cat /proc/sys/net/core/rmem_max
>> 1677216
>>
>> The following parameters are configured
>>
>> track_activities on
>> track_counts on
>> track_functions  none
>> track_io_timing  off
>>
>>
>> There are approximately 80-100 connections at any given time.
>>
>> It seems that the issue started a few weeks ago, around the time of a
>> reboot on the given host... but it's difficult to know what (if anything)
>> has changed, or why :-/
>>
>>
>> Incidentally... the documentation doesn't seem to have any mention of UDP
>> whatsoever.  I'm going to use this as an opportunity to dive into the
>> source - but perhaps it's worth improving the documentation around this?
>>
>> My next step is to try disabling track_activities and track_counts to
>> see if they improve matters any, but I wouldn't expect these to generate
>> enough data to flood the UDP buffers :-/
>>
>> Any ideas?
>>
>>
>>
>>


Re: [GENERAL] UDP buffer drops / statistics collector

2017-04-20 Thread Tim Kane
Ok, fixed it! :D
Posting here for future me (others like me).

It would seem (having not read kernel source) that increasing the kernel
buffer sizes (rmin_default / rmin_max) does *not* take effect for any
processes that are *already* bound or listening to a port/socket.  I had
previously assumed this was a global kernel buffer, perhaps not.

I had been trying to manage these buffer sizes to resolve the UDP drop
issues, but I had not at any time restarted the statistics collector
process.  I restarted the cluster in a moment of last resort (something I
had tried numerous times *before* playing with the buffer sizes) and lo and
behold!! no more buffer drops!

Problem solved.
The pgss_query_texts.stat still wants to live in the default *pg_stat_tmp*
directory, wether by design or not..  but that's a non-issue for me now.

Thanks for listening :)



On Wed, Apr 19, 2017 at 7:36 PM Tim Kane  wrote:

> Well, this is frustrating..
> The buffer drops are still occurring - so I thought it worth trying use a
> ramdisk and set *stats_temp_directory* accordingly.
>
> I've reloaded the instance, and can see that the stats directory is now
> being populated in the new location.  *Except* - there is one last file (
> pgss_query_texts.stat) that continues to be updated in the *old* pg_stat_tmp
> path..  Is that supposed to happen?
>
>
> Fairly similar to this guy (but not quite the same).
>
> https://www.postgresql.org/message-id/d6e71befad7beb4fbcd8ae74fadb1265011bb40fc...@win-8-eml-ex1.eml.local
>
> I can see the packets arriving and being consumed by the collector..  and,
> the collector is indeed updating in the new stats_temp_directory.. just not
> for that one file.
>
>
> It also failed to resolve the buffer drops.. At this point, I'm not sure I
> expected it to.  They tend to occur semi-regularly (every 8-13 minutes) but
> I can't correlate them with any kind of activity (and if I'm honest, it's
> possibly starting to drive me a little bit mad).
>
>
>
>
> On Tue, Apr 18, 2017 at 2:53 PM Tim Kane  wrote:
>
>> Okay, so I've run an strace on the collector process during a buffer drop
>> event.
>> I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.
>>
>> While I've had already increased rmem_max, it would appear this is not
>> being observed by the kernel.
>> rmem_default is set to 124kb, which would explain the above read maxing
>> out just slightly beyond this (presuming a ring buffer filling up behind
>> the read).
>>
>> I'm going to try increasing rmem_default and see if it has any positive
>> effect.. (and then investigate why the kernel doesn't want to consider
>> rmem_max)..
>>
>>
>>
>>
>>
>> On Tue, Apr 18, 2017 at 8:05 AM Tim Kane  wrote:
>>
>>> Hi all,
>>>
>>> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so
>>> far I've not been able to resolve.
>>>
>>> The drops are originating from postgres processes, and from what I know
>>> - the only UDP traffic generated by postgres should be consumed by the
>>> statistics collector - but for whatever reason, it's failing to read the
>>> packets quickly enough.
>>>
>>> Interestingly, I'm seeing these drops occur even when the system is
>>> idle..  but every 15 minutes or so (not consistently enough to isolate any
>>> particular activity) we'll see in the order of ~90 packets dropped at a
>>> time.
>>>
>>> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on
>>> the same hardware)
>>>
>>>
>>> If it's relevant..  there are two instances of postgres running (and
>>> consequently, 2 instances of the stats collector process) though 1 of those
>>> instances is most definitely idle for most of the day.
>>>
>>> In an effort to try to resolve the problem, I've increased (x2) the UDP
>>> recv buffer sizes on the host - but it seems to have had no effect.
>>>
>>> cat /proc/sys/net/core/rmem_max
>>> 1677216
>>>
>>> The following parameters are configured
>>>
>>> track_activities on
>>> track_counts on
>>> track_functions  none
>>> track_io_timing  off
>>>
>>>
>>> There are approximately 80-100 connections at any given time.
>>>
>>> It seems that the issue started a few weeks ago, around the time of a
>>> reboot on the given host... but it's difficult to know what (if anything)
>>> has changed, or why :-/
>>>
>>>
>>> Incidentally... the documentation doesn't seem to have any mention of
>>> UDP whatsoever.  I'm going to use this as an opportunity to dive into the
>>> source - but perhaps it's worth improving the documentation around this?
>>>
>>> My next step is to try disabling track_activities and track_counts to
>>> see if they improve matters any, but I wouldn't expect these to generate
>>> enough data to flood the UDP buffers :-/
>>>
>>> Any ideas?
>>>
>>>
>>>
>>>


[GENERAL] psql --command option ignores --variable's

2013-09-02 Thread Tim Kane
Hi all,

It seems the behaviour of the ‹command / -c option is such that it will
ignore any variables set on the command-line.


:~ psql --variable TESTVAR='123' --command 'select :TESTVAR;'
ERROR:  syntax error at or near ":"
LINE 1: select :TESTVAR;
   ^

:~$ psql --variable TESTVAR='123'
psql (9.1.9)
Type "help" for help.

timk=> select :TESTVAR;
 ?column?
--
  123
(1 row)



Obviously, there are ways around this, but is this expected behaviour?


Tim




Re: [GENERAL] psql --command option ignores --variable's

2013-09-02 Thread Tim Kane

Ahh, ok. It is documented. I'll get back in my box :)



On 02/09/2013 16:04, "Adrian Klaver"  wrote:

>On 09/02/2013 07:54 AM, Tim Kane wrote:
>> Hi all,
>>
>> It seems the behaviour of the ‹command / -c option is such that it will
>> ignore any variables set on the command-line.
>>
>>
>> :~ psql --variable TESTVAR='123' --command 'select :TESTVAR;'
>> ERROR:  syntax error at or near ":"
>> LINE 1: select :TESTVAR;
>> ^
>>
>> :~$ psql --variable TESTVAR='123'
>> psql (9.1.9)
>> Type "help" for help.
>>
>> timk=> select :TESTVAR;
>>   ?column?
>> --
>>123
>> (1 row)
>>
>>
>>
>> Obviously, there are ways around this, but is this expected behaviour?
>
>See  *--> section below.
>http://www.postgresql.org/docs/9.1/interactive/app-psql.html
>
>-c command
>--command=command
>Specifies that psql is to execute one command string, command, and then
>exit. This is useful in shell scripts. Start-up files (psqlrc and
>~/.psqlrc) are ignored with this option.
>
>*-->command must be either a command string that is completely parsable
>by the server (i.e., it contains no psql-specific features), or a single
>backslash command. Thus you cannot mix SQL and psql meta-commands with
>this option. To achieve that, you could pipe the string into psql, like
>this: echo '\x \\ SELECT * FROM foo;' | psql. (\\ is the separator
>meta-command.)
><--*
>
>If the command string contains multiple SQL commands, they are processed
>in a single transaction, unless there are explicit BEGIN/COMMIT commands
>included in the string to divide it into multiple transactions. This is
>different from the behavior when the same string is fed to psql's
>standard input. Also, only the result of the last SQL command is returned.
>
>
>
>>
>>
>> Tim
>
>
>-- 
>Adrian Klaver
>adrian.kla...@gmail.com




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] psql client memory usage

2013-09-06 Thread Tim Kane
Hi all,

I have a fairly simple query, running on a particularly large table.  For
illustration:

echo "select * from really_big_table;" | psql my_database > /dev/null


When I monitor the memory usage of the psql session, it continually grows.
In fact, for this particularly large table ­ it grows to the point of
consuming all swap, before the OOM killer takes steps to resolve it.
Clearly, this isn't what I'd like to happen.


My settings are:
Postgresql 9.1.9
work_mem = 256MB
effective_cache_size = 12GB
shared_buffers = 6GB

I have 24GB physical ram to play with.


Regardless of these settings however, I'm surprised that psql wouldn't
release that memory as it flushes each record, but maybe I've misunderstood
how memory usage works on the client side.






Re: [GENERAL] psql client memory usage

2013-09-06 Thread Tim Kane

Ahh. All these years (albeit sporadic), I never knew about FETCH_COUNT.
That makes sense. Thanks muchly.



On 06/09/2013 14:11, "Suzuki Hironobu"  wrote:

>(13/09/06 21:06), Tim Kane wrote:
>> Hi all,
>>
>> I have a fairly simple query, running on a particularly large table.
>>For
>> illustration:
>>
>> echo "select * from really_big_table;" | psql my_database > /dev/null
>>
>>
>> When I monitor the memory usage of the psql session, it continually
>>grows.
>> In fact, for this particularly large table ­ it grows to the point of
>> consuming all swap, before the OOM killer takes steps to resolve it.
>> Clearly, this isn't what I'd like to happen.
>>
>>
>> My settings are:
>> Postgresql 9.1.9
>> work_mem = 256MB
>> effective_cache_size = 12GB
>> shared_buffers = 6GB
>>
>> I have 24GB physical ram to play with.
>>
>
>This is a client side problem (not server size).
>See the description of FETCH_COUNT, please.
>http://www.postgresql.org/docs/9.2/static/app-psql.html
>
>
>echo "select * from really_big_table;" | psql --variable=FETCH_COUNT=100
>my_database > /dev/null
>
>
>Regards,
>
>
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Tim Kane


On 18/09/2013 14:44, "Rowan Collins"  wrote:

>
>Running an EXPLAIN on the exact query should give you the plan. An
>EXPLAIN ANALYZE would have to wait for the query to complete either way,
>so you wouldn't be able to get it mid-way through a running process.
>
>If you don't know the exact query running, then if the
>stats_command_string config setting is on, you should be able to get it
>by querying the pg_stat_activity view.
>
>The only other problem I can think of is if there are temporary tables
>or other session-specific objects that would exist only within the
>running process.
>
>Another possibility to consider (which would also show up in
>pg_stat_activity or similar views) is that the query is waiting on some
>kind of lock, rather than just executing slowly.


I think the OP was hoping for a solution that would allow him to retrieve
the query plan that was generated at execution time.  I've certainly
wished for this type of facility in the past, in the scenario where a
given table may have been ANALYZE in the interim - making it difficult to
perform any useful diagnostics on the problem query.

In cases such as that, it is sometimes better to cancel the currently
running query and restart it in order to take advantage of the newer
statistics and generate a better QEP.


What might be a useful feature addition, is if at the time of the ANALYZE
operation - postgres could identify those currently running queries and
flag them as having begun execution with a dirty/stale QEP.  Possibly this
could be extended to heuristically determine if a query might return
faster if it were cancelled and re-executed under a new QEP?

Tim





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] CREATE DATABASE Š [ TEMPLATE [=] template ] [ TABLESPACE [=] tablespace_name ]

2013-09-26 Thread Tim Kane

I have a question regarding the behaviour of CREATE DATABASE when used with
TEMPLATE and TABLESPACE options together.


The documentation describes the tablespace parameter as:
The name of the tablespace that will be associated with the new database, or
DEFAULT to use the template database's tablespace. This tablespace will be
the default tablespace used for objects created in this database. See CREATE
TABLESPACE 
  for
more information.

I would take that to mean that all objects from the template will be created
in the new database within the specified tablespace.

However it seems this is not the case, and that the newly specified
tablespace will only apply to objects that do not already have a tablespace
defined in the templated database.

Is this a documentation issue, or a behavioural one?  It isn't clear to me.











[GENERAL] COPY table FROM STDIN doesn't show count tag

2013-10-17 Thread Tim Kane

According to the docs, the output of a copy command should return a COPY
count.  It turns out this only happens when copying from a file, and not
from STDIN.

> Outputs
> On successful completion, a COPY command returns a command tag of the form
> COPY count
> The count is the number of rows copied.
> 
clone_test_rename=# create temp table aaa (field varchar(255));
CREATE TABLE
Time: 3.806 ms
clone_test_rename=# copy aaa from '/tmp/test.dat';
COPY 3
Time: 0.623 ms
clone_test_rename=# copy aaa from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 123
>> 456
>> hello
>> \.
Time: 8472.305 ms


Tested on 9.3.0
Is there another way I can determine the number of rows copied, short of
performing a pre and post count?   (other than not using stdin, that is).






[GENERAL] Undocumented feature? USING INDEX TABLESPACE

2013-10-30 Thread Tim Kane
Hi all,

I¹ve been looking for a facility to specify the tablespace of an index that
is implicitly created when adding a PK constraint. It turns out that this
does indeed exist, though I could find no mention of it in the
documentation.

> ALTER TABLE {table} ADD CONSTRAINT {constraint} PRIMARY KEY ({column(s)} USING
> INDEX TABLESPACE {tablespace}


The only mention I could find is in reference to the original patch that
implemented this feature.
http://www.postgresql.org/message-id/41072747.8030...@familyhealth.com.au


I¹m glad to see it¹s there, but I wonder if this should be documented?

Tim




Re: [GENERAL] Undocumented feature? USING INDEX TABLESPACE

2013-10-30 Thread Tim Kane

> 
> This is indicated in the "CREATE TABLE" documentation (to which much of the
> "ALTER TABLE" documentation defers).

So it isŠ  My bad, apologies for the noise.





[GENERAL] psql variable interpolation with subsequent underscore

2013-11-08 Thread Tim Kane
Hi all,

I¹m having difficulty using variable interpolation within psql, where that
variable is within a table nameŠ


Like so..

=# set MM 201310
=# select :MM;
 ?column?
--
   201309
(1 row)


=# alter table my_table rename to my_table_:MM_raw;
ERROR:  syntax error at or near ":"
LINE 1: Šmy_table rename to my_table_:MM_ra...
 ^

The problem is that psql tries to interpret ŒMM_raw¹ as the variable
name, but my intention is to only interpret ŒMM¹ followed by a literal
underscore.

I can¹t find any other way to encapsulate the variable name in this wayŠ  Is
there a trick to it?

I suspect I¹ll need to work around this by altering the naming convention
such that the MM is at the Œend¹ of the table name.  Maybe..   Thoughts?


Tim







Re: [GENERAL] psql variable interpolation with subsequent underscore

2013-11-08 Thread Tim Kane

On 08/11/2013 16:47, "Adrian Klaver"  wrote:

>>
>test=> \set tbl_name 'my_table_':MM'_raw';
>
>test=> alter table my_table rename to :tbl_name;
>ALTER TABLE
>
>test=> \d my_table_201310_raw
>Table "public.my_table_201310_raw"
>  Column |  Type   | Modifiers
>+-+---
>  id | integer |
>


Ooh, sneaky.  Thanks Adrian.


Remi, to answer your question - this is being called as part of a shell
script. Implementing it in another language would be overkill for what I¹m
trying to do.

Cheers,

Tim




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] unexplainable psql exit status 1

2013-12-04 Thread Tim Kane
HI all,

Is anyone aware of cases where psql will occasionally return an exit status
of 1, despite there being no obvious error condition?


I have a regular (daily) cron that executes a fairly simple script of the
form:

> psql --set ON_ERROR_STOP=1 -h $DB_HOST -U $DB_USER -a -v DATE="'${DATE}'" -v
> MM="${DATE_MM}" -f duplicate_removal.sql $DB_NAME >> $DB_LOG 2>&1
> 

The execution of this duplicate_removal.sql script appears to run without
error. It’s fairly simple, and never fails to execute in and of itself.
However, on very rare occasion, the psql session will run to completion,
without visible/logged error – but it returns an exit status 1.

The only thing close to an error coming from this script is a notice,
resulting from a CREATE TABLE IF NOT EXISTS. This is no different from any
other execution of the script.
> NOTICE:  relation "discarded" already exists, skipping
> 

I can see that the output is correctly redirected to $DB_LOG and I can see
the successful COMMIT response returned at the end of the session. I can
also verify that the behaviour of the script is as expected, to completion.


The psql man page suggests that an exit status of 1 is a fatal error of its
own.

> EXIT STATUS
>psql returns 0 to the shell if it finished normally, 1 if a fatal error
> of its own occurs (e.g. out of memory, file not found), 2 if the connection to
> the server went bad and the session was not
>interactive, and 3 if an error occurred in a script and the variable
> ON_ERROR_STOP was set.
> 

There is no indication in the postgres log file of anything adverse.

At this point my only option is to set log_statement = all, enable shell
debugging, and wait a few months for it to happen again (I’ve seen this
occur only twice, over a 2-3 month period).  Has anyone encountered this
behaviour before?

Thanks

Tim




Re: [GENERAL] unexplainable psql exit status 1

2013-12-04 Thread Tim Kane

I should have mentioned, psql client and server version 9.2.4


From:  Tim Kane 
Date:  Wednesday, 4 December 2013 10:59
To:  pgsql-general General 
Subject:  unexplainable psql exit status 1

HI all,

Is anyone aware of cases where psql will occasionally return an exit status
of 1, despite there being no obvious error condition?


I have a regular (daily) cron that executes a fairly simple script of the
form:

> psql --set ON_ERROR_STOP=1 -h $DB_HOST -U $DB_USER -a -v DATE="'${DATE}'" -v
> MM="${DATE_MM}" -f duplicate_removal.sql $DB_NAME >> $DB_LOG 2>&1
> 

The execution of this duplicate_removal.sql script appears to run without
error. It’s fairly simple, and never fails to execute in and of itself.
However, on very rare occasion, the psql session will run to completion,
without visible/logged error – but it returns an exit status 1.

The only thing close to an error coming from this script is a notice,
resulting from a CREATE TABLE IF NOT EXISTS. This is no different from any
other execution of the script.
> NOTICE:  relation "discarded" already exists, skipping
> 

I can see that the output is correctly redirected to $DB_LOG and I can see
the successful COMMIT response returned at the end of the session. I can
also verify that the behaviour of the script is as expected, to completion.


The psql man page suggests that an exit status of 1 is a fatal error of its
own.

> EXIT STATUS
>psql returns 0 to the shell if it finished normally, 1 if a fatal error
> of its own occurs (e.g. out of memory, file not found), 2 if the connection to
> the server went bad and the session was not
>interactive, and 3 if an error occurred in a script and the variable
> ON_ERROR_STOP was set.
> 

There is no indication in the postgres log file of anything adverse.

At this point my only option is to set log_statement = all, enable shell
debugging, and wait a few months for it to happen again (I’ve seen this
occur only twice, over a 2-3 month period).  Has anyone encountered this
behaviour before?

Thanks

Tim




Re: [GENERAL] unexplainable psql exit status 1

2013-12-04 Thread Tim Kane

> According to the documentation you quoted, it would be useless to set
> log_statement=all, because the problem is on the client end.
> 
> Shell debugging sounds slightly more promising, but it will probably
> not tell you much more than that psql returned with exit code 1.
> 
> A quick look at the code gave me the impression that psql will always
> write an error message before exiting with 1, but I may have missed a case.
> 
> Are you sure that there is nothing on stdout or stderr in the cases where
> psql returns 1?
> 
> What is the last psql command that is executed?
> 
> Yours,
> Laurenz Albe


Thanks Laurenz.
I’ve enabled shell debugging to be 100% sure that I’m not munging the return
code anywhere. It’s entirely possible there is something going on at the
shell side of things, though I fail to see how just at the minute :)
The output of the script is as follows, bearing in mind that all STDERR is
being redirected to STDOUT – there should be no other output to be found.
I’ve verified stdout just in case, to no avail.


\set QUIET 1
\timing
\set HISTSIZE 2000
\unset QUIET
DISCARD ALL;
DISCARD ALL
Time: 0.321 ms
—-SELECT STATEMENT SNIPPED
SELECT 7713
Time: 22.081 ms
CREATE TABLE IF NOT EXISTS duplicates(like source_table);
psql:/path/to/duplicate_removal.sql:36: NOTICE:  relation "duplicates"
already exists, skipping
CREATE TABLE
Time: 0.416 ms
BEGIN;
BEGIN
Time: 0.210 ms
—-INSERT STATEMENT SNIPPED
INSERT 0 7713
Time: 93.855 ms

-—DELETE STATEMENT SNIPPED
DELETE 7713
Time: 33.939 ms

COMMIT;
COMMIT
Time: 0.242 ms


Cheers,

Tim




[GENERAL] pg_stat_tmp

2013-12-16 Thread Tim Kane
Hi all,

The past few days I’ve been encountering the following error, followed by a
full db restart and recovery


2013-12-16 07:12:53 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device


This occurs at a time of moderate load, during the same set of operations
each morning.
Interestingly, when I execute this manually at any other time of date, the
process completes normally.

I presume that the pg_stat_tmp location is system-wide and likely is not
impacted by temp_tablespaces
The root partition, where postgresql is installed does not have a lot of
disk available (4GB).

My first instinct here is to symlink pg_stat_tmp against another disk with a
little more room to breathe, however I’m surprised that pgstat.tmp would
grow to be so large in the first place – possibly there is something else at
play here.

I should note that this issue has only recently occurred, no major changes
have been introduced.

Thanks for any advice on this.

Tim




Re: [GENERAL] pg_stat_tmp

2013-12-17 Thread Tim Kane

Thanks Jeff, Magnus

Thanks for the suggestions.
This morning the same issue occurred, but this time it also complained that
it failed to write to pg_clog  (previous days, the only failure was in
writing to pg_stat_tmp)


2013-12-17 07:13:04 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:05 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:05 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:06 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:06 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT
WARNING:  terminating connection because of crash of another server
 process
2013-12-17 07:25:15 GMT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared m
emory.
2013-12-17 07:25:15 GMT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2013-12-17 07:25:16 GMT LOG:  all server processes terminated;
reinitializing
2013-12-17 07:25:17 GMT LOG:  database system was interrupted; last known up
at 2013-12-17 07:08:22 GMT
2013-12-17 07:25:17 GMT LOG:  database system was not properly shut down;
automatic recovery in progress
2013-12-17 07:25:17 GMT LOG:  redo starts at 5BC/CF3F9888
2013-12-17 07:25:18 GMT LOG:  could not open file
"pg_xlog/000105BC00D6" (log file 1468, segment 214): No such
file or directory
2013-12-17 07:25:18 GMT LOG:  redo done at 5BC/D5FFFEC0
2013-12-17 07:25:18 GMT LOG:  last completed transaction was at log time
2013-12-17 07:25:12.989653+00
2013-12-17 07:25:19 GMT LOG:  database system is ready to accept connections
2013-12-17 07:25:19 GMT LOG:  autovacuum launcher started



It’s never failed to write pg_xlog’s though – or at least, never reported a
failure.

For now, I’ve altered the pg_stat_tmp path and we’ll see what happens
tomorrow – Ideally though, yes.. I’d like to see this happening at runtime
and get a better picture of what’s blowing out here.



Further to this however, I notice that the base directory is consuming more
disk than I would have expected it to (all our relations are stored in
tablespaces assigned to other disks).
Looking for the larger culprits, I note the following files consuming 4.4GB

9.2/main/base/74641/2260957
9.2/main/base/74641/2260957.1
9.2/main/base/74641/2260957.2
9.2/main/base/74641/2260957.3
9.2/main/base/74641/2260957.4

I notice the timestamps are just shy of the time we experience that
pg_stat_tmp issue.

-rw--- 1 postgres postgres 1073741824 Dec 17 06:56 2260957
-rw--- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1
-rw--- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2
-rw--- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3
-rw--- 1 postgres postgres  328466432 Dec 17 06:56 2260957.4


They appear to be orphaned oid’s, though I’m not sure this is definitive:

clone=# select pg_relation_filepath(2260957);
 pg_relation_filepath
--

(1 row)



I probably would have thought these to be temporary files – except for that
I have temp_tablespace set elsewhere.

Are these truly orphaned files that postgresql no longer needs (possibly due
to a crash) ?Is there a mechanism to scan the PG_DATA directories to
locate such files?
FYI, we’re running 9.2.4


Thanks again,

Tim



From:  Jeff Janes 
Date:  Monday, 16 December 2013 17:32
To:  Tim Kane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] pg_stat_tmp

On Mon, Dec 16, 2013 at 5:57 AM, Tim Kane  wrote:
> Hi all,
> 
> The past few days I’ve been encountering the following error, followed by a
> full db restart and recovery
> 
> 
> 2013-12-16 07:12:53 GMT LOG:  could not write temporary statistics file
> "pg_stat_tmp/pgstat.tmp": No space left on device

Is that the only thing in the logs?  pg_stat_tmp problems should not bring
down your database.  But problems with pg_xlog running out of space
certainly can--but they should also be logged.
 

> 
> This occurs at a time of moderate load, during the same set of operations each
> morning.
> Interestingly, when I execute this manually at any other time of date, the
> process completes 

[GENERAL] ON_ERROR_EXIT and interactive mode (or, how to force interactive mode off)

2013-12-27 Thread Tim Kane
HI all,

I have a fairly complex python script which calls a number of ‘psql’
sessions via a subprocess to perform bulk load operations.
The bulk loads are generated on the fly, and are written directly to the
stdin of the subprocess.


I would like to rely on the ON_ERROR_STOP=1 functionality to force the
process to exit with an error code – however, psql rightly believes the
process it is operating under is an interactive one – since it is receiving
its input from stdin.

For performance (and disk) reasons, I’d rather not have to rely on writing
my input to a file, before passing it to psql with –f


This part of the man page gave me hope…



> -f filename
> --file filename
> Use the file filename as the source of commands instead of reading commands
> interactively. After the file is processed, psql terminates. This is in many
> ways equivalent to the internal command \i.
> If filename is - (hyphen), then standard input is read.
> Using this option is subtly different from writing psql < filename. In
> general, both will do what you expect, but using -f enables some nice features
> such as error messages with line numbers. There is also a slight chance that
> using this option will reduce the start-up overhead. On the other hand, the
> variant using the shell's input redirection is (in theory) guaranteed to yield
> exactly the same output that you would have gotten had you entered everything
> by hand.


Eg:
psql –v ON_ERROR_STOP=1 –f -

But it seems this is still understood by psql to be an interactive session,
and fails to exit the process on an error.


Is there any way to force psql to read from stdin, in a non-interactive
manner?  (that is, to ensure it exits the process on an error condition?)


Thanks for any advice

Tim









Re: [GENERAL] ON_ERROR_EXIT and interactive mode (or, how to force interactive mode off)

2013-12-30 Thread Tim Kane

Thanks Chris. I didn’t realise psycopg2 had a COPY interface, that’s quite
handy. I’ll have a play, cheers.

Tim

From:  Chris Curvey 
Reply-To:  
Date:  Friday, 27 December 2013 21:18
To:  Tim Kane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] ON_ERROR_EXIT and interactive mode (or, how to force
interactive mode off)


On Fri, Dec 27, 2013 at 7:30 AM, Tim Kane  wrote:
> HI all,
> 
> I have a fairly complex python script which calls a number of ‘psql’ sessions
> via a subprocess to perform bulk load operations.
> The bulk loads are generated on the fly, and are written directly to the stdin
> of the subprocess.
> 
> 
> I would like to rely on the ON_ERROR_STOP=1 functionality to force the process
> to exit with an error code – however, psql rightly believes the process it is
> operating under is an interactive one – since it is receiving its input from
> stdin.
> 
> For performance (and disk) reasons, I’d rather not have to rely on writing my
> input to a file, before passing it to psql with –f
> 
> 
> This part of the man page gave me hope…
> 
> 
> 
>> -f filename
>> --file filename
>> Use the file filename as the source of commands instead of reading commands
>> interactively. After the file is processed, psql terminates. This is in many
>> ways equivalent to the internal command \i.
>> If filename is - (hyphen), then standard input is read.
>> Using this option is subtly different from writing psql < filename. In
>> general, both will do what you expect, but using -f enables some nice
>> features such as error messages with line numbers. There is also a slight
>> chance that using this option will reduce the start-up overhead. On the other
>> hand, the variant using the shell's input redirection is (in theory)
>> guaranteed to yield exactly the same output that you would have gotten had
>> you entered everything by hand.
> 
> 
> Eg:
> psql –v ON_ERROR_STOP=1 –f -
> 
> But it seems this is still understood by psql to be an interactive session,
> and fails to exit the process on an error.
> 
> 
> Is there any way to force psql to read from stdin, in a non-interactive
> manner?  (that is, to ensure it exits the process on an error condition?)
> 
> 
> Thanks for any advice
> 
> Tim
> 
> 
I'm afraid I can't answer your question directly (see my sig), but if you're
already using Python to create the files, would you be better off using
psycopg2's COPY interfaces?  If use just connect directly from Python to
Postgres, I would think that your success and error handling would be much
simpler.

http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from



-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.




[GENERAL] Possible to improve optimisation / index usage based on domain properties of a function

2014-02-19 Thread Tim Kane
Hi all,

Take the following scenario

I have a set of partitions inherited from a parent table, called streams.
One of the properties of these tables is a timestamp field, nothing fancy
about it.

I also have a qualified index on this field.


I’ve noticed that if I perform the following query, the planner will
correctly use the CHECK constraints to determine the partition, and then use
the indexes available to retrieve the streams between the specified dates.


select count(*) from streams where stream_date >= ‘2013-01-08’ and
stream_date < ‘2013-01-09’;


If however, I was to provide the below query, it uses a sequential scan
based plan.  The planner is unable to utilise any indexes because it can’t
know what the function is going to return – thus unable to constrain the
range at the time of planning the execution.

select count(*) from streams where date(stream_date) = ‘2013-01-08’;



I’m wondering if we could build into postgres some level of metadata
regarding the properties of a function, such that the optimiser could filter
against the range of values that the function is expected to return.

In this case, it could deduce that the date function will only ever return a
value for stream_date to within a certain maximum and minimum range.
Thus the planner could scan the index for all values of stream_date falling
within +/- 24 hours of the right operand, and then check/re-check the
results.


I suspect this would only be suitable for very basic functions, such as
date(), date_trunc() - I suspect, for any function that reduces cardinality
to any predictable degree.

Thoughts?

Tim






Re: [GENERAL] Possible to improve optimisation / index usage based on domain properties of a function

2014-02-20 Thread Tim Kane

Thanks Alban, Sameer.

My use of partitions should have been more of a side note really.
I was particularly interested in wether the query planner could optimise a
date_folded equality expression into a range query - for the case where it
could benefit from an existing index on the non-folded values.

Granted, an expression based index would solve this.
It just seemed an opportunity to open up opportunities for the QEP – at
least for the simple case.

Cheers,

TIm


From:  Sameer Kumar 
Date:  Thursday, 20 February 2014 07:40
To:  Alban Hertroys 
Cc:  Tim Kane , pgsql-general General

Subject:  Re: [GENERAL] Possible to improve optimisation / index usage based
on domain properties of a function


On Thu, Feb 20, 2014 at 3:34 PM, Alban Hertroys  wrote:
>> > If however, I was to provide the below query, it uses a sequential scan
>> based plan.  The planner is unable to utilise any indexes because it can’t
>> know what the function is going to return – thus unable to constrain the
>> range at the time of planning the execution.
>> >
>> > select count(*) from streams where date(stream_date) = ‘2013-01-08’;
> 
> The inverse of that expression, if it’s possible to formulate one, would most
> likely use the index though:
> 
> select count(*) from streams where stream_date = inv_date(‘2013-01-08’);
> 

He has already posted that:
select count(*) from streams where stream_date >= ‘2013-01-08’ and
stream_date < ‘2013-01-09’;
This would use index

 
>> >
>>> >> I’m wondering if we could build into postgres some level of metadata
>>> regarding the properties of a function, such that the optimiser could filter
>>> against the range of values that the function is expected to return.
>>> >>
>>> >> In this case, it could deduce that the date function will only ever
>>> return a value for stream_date to within a certain maximum and minimum
>>> range.
>>> >> Thus the planner could scan the index for all values of stream_date
>>> falling within +/- 24 hours of the right operand, and then check/re-check
>>> the results.
>>> >>
>> > If you can't go for the smarter query, go for more optimum index by
>> "expression based index"
>> >
>> > http://www.postgresql.org/docs/9.1/static/indexes-expressional.html
> 
> AFAIK, you can’t use expression based indexes to partition a table, so that
> won’t help the OP much.

1. I think Tim is talking about index usage [an index which he has on
stream_date] and not partition

2. Tim, the index usage or non-usage in your two queries would remain same
even if you have a single huge table

3. I believe the partitioning tirgger/rule could re-direct records based on
an expression too [e.g. when date(stream_date)=24-Jan-2014, send it to
Partition_24Jan14]. I am not sure if query planner would go to a particular
partition when we query based on date(stream_date). I need to test this.



Best Regards,

Sameer Kumar | Database Consultant

ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: +65 8110 0350  T: +65 6438 3504 | www.ashnik.com <http://www.ashnik.com/>

 

 <http://www.ashnik.com/>

 

This email may contain confidential, privileged or copyright material and is
solely for the use of the intended recipient(s).


<><>
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Playing with 9.4devel - unnest

2014-03-10 Thread Tim Kane
Hi all,

I decided to have a play with postgresql-9.4devel as I wanted to explore the
functionality provided by

unnest (anyarray, anyarray [, …])


I’ve taken the nightly snpahost, compiled, installed.. All good.  (Obtained
from http://ftp.postgresql.org/pub/snapshot/dev/ )
However it seems the expected multi-argument unnest function doesn’t exist.

Have I missed something?

Cheers,

Tim




Re: [GENERAL] Playing with 9.4devel - unnest

2014-03-10 Thread Tim Kane

Hmm. So it is. 
My bad, thanks Tom.

I hadn’t noticed the documentation where it clearly says "This is only
allowed in the FROM clause”

xml_test=# select unnest(*) from (select array[1,2],array[1,2,3]) foo;
ERROR:  function unnest() does not exist


And, yes.. I was expecting the function signature to change. Thanks for
setting me straight.

Tim



From:  Tom Lane 
Date:  Monday, 10 March 2014 15:10
To:  Tim Kane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] Playing with 9.4devel - unnest

Tim Kane  writes:
>  I decided to have a play with postgresql-9.4devel as I wanted to explore the
>  functionality provided by
>  unnest (anyarray, anyarray [, ╜])
>  Iâ•˙ve taken the nightly snpahost, compiled, installed.. All good.  (Obtained
>  from http://ftp.postgresql.org/pub/snapshot/dev/ )
>  However it seems the expected multi-argument unnest function doesnâ•˙t exist.

>  Have I missed something?

It's there:

regression=# select * from unnest(array[1,2], array[3,4]);
 unnest | unnest 
+
  1 |  3
  2 |  4
(2 rows)

If you were expecting this to change, it didn't:

regression=# \df unnest
   List of functions
   Schema   |  Name  | Result data type | Argument data types |  Type
++--+-+
 pg_catalog | unnest | SETOF anyelement | anyarray| normal
(1 row)

because the new functionality arises from a parser transformation,
not from a simple function.

regards, tom lane





Re: [GENERAL] Playing with 9.4devel - unnest

2014-03-10 Thread Tim Kane

I think I may have misunderstood the use case of this..

I can do the following:


select * from unnest(array[1,2,3], array[1,3,4]);
 unnest | unnest
+
  1 |  1
  2 |  3
  3 |  4
(3 rows)


But what I really wanted to do, is unnest multiple sets of array values as
returned from a table/query..

Eg:


xml_test=# create temp table z (foo integer[], bar integer[]);
CREATE TABLE

xml_test=# insert into z values (array[1,2,3], array[4,5,6]);
INSERT 0 1

xml_test=# select * from z;
   foo   |   bar
-+-
 {1,2,3} | {4,5,6}
(1 row)

xml_test=# select * from unnest (select foo, bar from z);
ERROR:  syntax error at or near "select"
LINE 1: select * from unnest (select foo, bar from z);
  ^

xml_test=# select * from unnest (select * from z);
ERROR:  syntax error at or near "select"
LINE 1: select * from unnest (select * from z);
  ^


xml_test=#  select (array[1,2,3], array[4,5,6]);
  row
---
 ("{1,2,3}","{4,5,6}")
(1 row)



xml_test=# select row(foo,bar) from z;
  row
---
 ("{1,2,3}","{4,5,6}")
(1 row)



xml_test=# select * from unnest(array[1,2,3], array[1,3,4]);
 unnest | unnest
+
  1 |  1
  2 |  3
  3 |  4
(3 rows)


xml_test=# select * from unnest ( select row(foo,bar) from z );
ERROR:  syntax error at or near "select"
LINE 1: select * from unnest ( select row(foo,bar) from z );
   ^


xml_test=# select * from unnest ( (select row(foo,bar) from z) );
ERROR:  function unnest(record) does not exist
LINE 1: select * from unnest ( (select row(foo,bar) from z) );
  ^
HINT:  No function matches the given name and argument types. You might need
to add explicit type casts.


Any suggestions? Or should the parser be allowing a subquery as a parameter
to unnest?


Tim



From:  Tim Kane 
Date:  Monday, 10 March 2014 15:26
To:  Tom Lane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] Playing with 9.4devel - unnest


Hmm. So it is. 
My bad, thanks Tom.

I hadn’t noticed the documentation where it clearly says "This is only
allowed in the FROM clause”

xml_test=# select unnest(*) from (select array[1,2],array[1,2,3]) foo;
ERROR:  function unnest() does not exist


And, yes.. I was expecting the function signature to change. Thanks for
setting me straight.

Tim



From:  Tom Lane 
Date:  Monday, 10 March 2014 15:10
To:  Tim Kane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] Playing with 9.4devel - unnest

Tim Kane  writes:
>  I decided to have a play with postgresql-9.4devel as I wanted to explore the
>  functionality provided by
>  unnest (anyarray, anyarray [, ╜])
>  Iâ•˙ve taken the nightly snpahost, compiled, installed.. All good.  (Obtained
>  from http://ftp.postgresql.org/pub/snapshot/dev/ )
>  However it seems the expected multi-argument unnest function doesnâ•˙t exist.

>  Have I missed something?

It's there:

regression=# select * from unnest(array[1,2], array[3,4]);
 unnest | unnest 
+
  1 |  3
  2 |  4
(2 rows)

If you were expecting this to change, it didn't:

regression=# \df unnest
   List of functions
   Schema   |  Name  | Result data type | Argument data types |  Type
++--+-+
 pg_catalog | unnest | SETOF anyelement | anyarray| normal
(1 row)

because the new functionality arises from a parser transformation,
not from a simple function.

regards, tom lane





[GENERAL] XML validation of whitespace values

2014-03-14 Thread Tim Kane
HI all,

I’ve had an interesting problem trying to perform an UPDATE based on the
results of processing an xpath against a field of type xml.
Specifically, my query would fail with:

> ERROR:  could not parse XML document
> DETAIL:  line 1: Start tag expected, '<' not found


I thought this strange, since the xml type is (I believe) supposed to
guarantee that the value is well formed (wether it’s a document or content).

After some amount of digging to locate the rogue entry, it turns out the
field value is actually a single whitespace character – which apparently is
well formed..

Thus I’m left with the following situation…



clone=# select xml_is_well_formed(' ');
 xml_is_well_formed

 t
(1 row)


clone=# select xpath_exists (‘//test', ' ');
ERROR:  could not parse XML document
DETAIL:  line 1: Start tag expected, '<' not found



clone=# create temp table xml_test (document xml);
CREATE TABLE
clone=# insert into xml_test values (' ');
INSERT 0 1
clone=# select xml_is_well_formed(document::text) from xml_test;
 xml_is_well_formed

 t
(1 row)

clone=# select xpath_exists ('test', document) from xml_test;
ERROR:  could not parse XML document
DETAIL:  line 1: Start tag expected, '<' not found



This is confusing, since… if the xml is well formed – it probably shouldn’t
cause the parser to error.  And if it isn’t well formed, then the
instantiation of the value as an xml type should fail.





[GENERAL] hstore binary representation of keys

2014-04-22 Thread Tim Kane
Hi all,

I’ve been using hstore to record various key/val pairs, but I’ve noticed it
consumes a lot more disk than I would have expected.
I don’t have any hard figures to illustrate, but empirical testing has shown
that if I record these pairs as traditional column based fields, I can save
a significant amount of disk.

What I think I’m seeing here, is that the hstore representation needs to
store the entirety of the key alongside each value.

Let’s say I have a table of 1000 records, and 900 of them have a key named
‘A_REALLY_REALLY_REALLY_LONG_KEY’, then this key will be written do disk 900
times, along with the appropriate values.


I guess there are two options open to me here.

1. I could transpose these values into a dedicated field
2. I could use shorter key names

Does hstore2 go any way to improving this situation? Some kind of enumerated
key based system?



Cheers,

TIm





[GENERAL] DDL statement blocked on long-running query with no runtime dependancy

2014-05-06 Thread Tim Kane
Hi all,

I’ve just noticed that some DDL statements will block if there are other
queries accessing the relation in question, even where those queries are not
dependant on the underlying structures (such as a new index, created after
the initial query).

For instance…

> SELECT long_running_query FROM foo;   -- Query A,
> 
> --   meanwhile…
> 
> CREATE UNIQUE INDEX ON foo USING btree (blah);
> -- Fine, no problem
> 
> DROP INDEX foo_blah_idx;
> -- The above DDL statement will block until Query A completes, even though the
> query cannot possibly rely on the newly created index
> 


Is this behaviour by design? Is it worth allowing DDL statements to occur
where it can be determined that no currently running query will be affected?

Cheers,

Tim




[GENERAL] DDL statement blocked on long-running query with no runtime dependancy

2014-05-06 Thread Tim Kane
Hi all,

I’ve just noticed that some DDL statements will block if there are other 
queries accessing the relation in question, even where those queries are not 
dependant on the underlying structures.

For instance…

SELECT long_running_query FROM foo;   -- Query A,

--   meanwhile…

CREATE UNIQUE INDEX ON foo USING btree (blah);
-- Fine, no problem

DROP INDEX foo_blah_idx;
-- The above DDL statement will block until Query A completes, even though the 
query cannot possibly rely on the newly created index



Is this behaviour by design? Is it worth allowing DDL statements to occur where 
it can be determined that no currently running query will be affected?






Tim Kane
Senior Development Engineer Lead
t: +44 20 7092 8700
m:
blinkbox music - the easiest way to listen to the music you love, for free
www.blinkboxmusic.com



Re: [GENERAL] DDL statement blocked on long-running query with no runtime dependancy

2014-05-06 Thread Tim Kane

Thanks Tom, I see there are a lot of considerations. I guess I just found it
to be an interesting scenario, although not entirely unexpected. It’s not a
big deal of course.

Cheers,

Tim


From:  Tom Lane 
Date:  Tuesday, 6 May 2014 15:05
To:  Tim Kane 
Cc:  pgsql-general General 
Subject:  Re: [GENERAL] DDL statement blocked on long-running query with no
runtime dependancy

Tim Kane  writes:
>  [ DROP INDEX requires exclusive lock on index's table ]

>  Is this behaviour by design?

Yes.  Even if you assumed that the DROP had complete information about
all concurrent queries (which it does not, and we could not cheaply
provide that), it would still be possible for an incoming query to start
using the index immediately after DROP looks.  Table-level locks are the
only means we have to prevent such race conditions.

In theory we could narrow the scope of the exclusive locking to just
the targeted index, but that would not actually help much in practice:
all incoming queries would still need to conflict with the DROP, because
the planner will want to inspect every index on the table to see if it's
potentially useful for the new query.

Recent versions of PG do have a DROP INDEX CONCURRENTLY operation, which
goes through a multiple-transaction sequence involving marking the index
invalid, and a lot of waiting, in order to avoid blocking other
transactions.  I'm not sure that that answers your concern though, as
what you seem to want is for the DROP to not wait.

regards, tom lane





[GENERAL] Partitioning such that key field of inherited tables no longer retains any selectivity

2014-05-11 Thread Tim Kane
The subject line may not actually describe what I want to illustrate…

Basically, let’s say we have a nicely partitioned data-set. Performance is a
net win and I’m happy with it.
The partitioning scheme is equality based, rather than range based.

That is, each partition contains a subset of the data where partition_key =
{some_value}, and of course we let constraint exclusion enable the optimiser
to do its thing.

As such, all of the data contained in a given partition has the same value
for partition_key. That field, within the scope of its partition – isn’t
terribly useful anymore, and in my mind is wasting bytes – it’s only purpose
really is to allow the CHECK constraint to verify the data is what it should
be.


Wouldn’t it be nice if we could somehow create a child table where we could
define a const field value, that did not need to be stored on disk at the
tuple level?
This would allow the check constraint to supply the optimiser with the
information it needs, while removing the need to consume disk to record a
field whose value is always the same.


Extending this idea..
Postgresql could possibly look at any equality based check constraint for a
table and instead of storing each field value verbatim, we could implicitly
optimise away the need to write those field values to disk, on the
understanding that those values can never change (unless the constraint is
removed/altered).

I’m sure there are all kinds of worms in this canister, but I thought it
might be an interesting discussion.


Cheers,

Tim




Re: [GENERAL] Re: Partitioning such that key field of inherited tables no longer retains any selectivity

2014-05-11 Thread Tim Kane

> 
> From:  Tom Lane 
> David G Johnston  writes:
>>  Two approaches:
>>  1. Standard virtual column name that, when used, gets rewritten into a
>>  constant that is stored at the table level.
>>  2. A way for a column's value to be defined as a function call.
> 
> Recent versions of the SQL spec have a notion of "generated columns"
> that I think subsumes both of these concepts.  We had a draft patch
> awhile back that attempted to implement that feature.  It crashed
> and burned for reasons I don't recall ... but certainly implementing
> an already-standardized feature is more attractive than just inventing
> behavior on our own.
> 

That sounds interesting.
Is this what you are referring to?  Actually, it looks like it would fit the
bill and then some.


—
4.14.8 Base columns and generated columns

A column of a base table is either a base column or a generated column. A
base column is one that is not a generated column. A generated column is one
whose values are determined by evaluation of a generation expression, a
 whose declared type is by implication that of the column.
A generation expression can reference base columns of the base table to
which it belongs but cannot otherwise access SQL- data. Thus, the value of
the field corresponding to a generated column in row R is determined by the
values of zero or more other fields of R.

A generated column GC depends on each column that is referenced by a  in its generation expression, and each such referenced column is
a parametric column of GC.

 —








[GENERAL] materialised views vs unlogged table (also, ize vs ise)

2014-05-18 Thread Tim Kane
HI all,

I’m curious about materialised views.
I was just setting something up, and noticed there is no support for
UNLOGGED materialised views
(http://www.postgresql.org/message-id/E1UZNrD-0008Eh-83@gemulon.postgresql.o
rg)

I also notice that the creation of an unlogged table appears to be a lot
quicker than that of a materialised view (as you might expect).
Read operations seem indistinguishable, though I’ve not tested with cold
buffers.

Aside from the convenience of the REFRESH functionality, are there any other
factors I should consider?
(the base tables get dropped/recreated each night, so I can’t rely on
REFRESH functionality – I’ll need to recreate the table/view in either
case).



Oh, I also noticed we don’t support alternate spellings of MATERIALIZE, as
we do for ANALYZE. 
I’m not sure if we do this anywhere else, maybe it’s just analyze being the
odd one out.


=# create materialized view testview as select 1;
SELECT 1
Time: 21.760 ms
=# create materialised view testview as select 1;
ERROR:  syntax error at or near "materialised"
LINE 1: create materialised view testview as select 1;


Not that it isn’t difficult for users to simply spell it one way or the
other, it just seems a minor inconsistency.

Cheers,

Tim




Re: [GENERAL] materialised views vs unlogged table (also, ize vs ise)

2014-05-19 Thread Tim Kane

> 
>>  Aside from the convenience of the REFRESH functionality, are there any other
>>  factors I should consider?
> An exclusive lock is taken on the materialized view during a REFRESH
> operation, blocking an read or write queries attempted on them. You
> can tackle this limitation in the upcoming 9.4 by using REFRESH
> CONCURRENTLY, a unique index being necessary on the materialized view.
> -- 

Yep. Thanks Michael. I was actually trying to say that I have no need for
refresh functionality in this instance. :)

- The table/views I need will be destroyed and recreated each night.
- Refresh functionality isn’t helpful in this instance, as the underlying
tables will also be destroyed
- Crash recovery isn’t necessary

So, in this scenario -  will I get any benefit from a materialised view,
that I wouldn't have from an unlogged table?

Cheers