Re: [HACKERS] pg_dump versus rules, once again

2016-12-30 Thread Benedikt Grundmann
On 30 December 2016 at 11:58, Benedikt Grundmann <bgrundm...@janestreet.com>
wrote:

>
> On 17 November 2016 at 03:45, Robert Haas <robertmh...@gmail.com> wrote:
>
>> On Wed, Nov 16, 2016 at 10:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> > Robert Haas <robertmh...@gmail.com> writes:
>> >> On Wed, Nov 16, 2016 at 10:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >>> The changes in pg_backup_archiver.c would have to be back-patched
>> >>> into all versions supporting --if-exists, so that they don't fail
>> >>> on dump archives produced by patched versions.
>> >
>> >> Even if you patch future minor releases, past minor releases are still
>> >> going to exist out there in the wild for a long, long time.
>> >
>> > Yeah, but it would only matter if you try to use pg_restore --clean
>> --if-exists
>> > with an archive file that happens to contain a view that has this issue.
>> > Such cases would previously have failed anyway, because of precisely
>> > the bug at issue ... and there aren't very many of them, or we'd have
>> > noticed the problem before.  So I don't feel *too* bad about this,
>> > I just want to make sure we have a solution available.
>>
>> Right, OK.
>>
>
> For what it is worth we just run into this problem on our postgres 9.2.17
> installation on a hunch we (after reading Tom's initial email replaced the
> view that caused this by this
>
> create view ... as select * from (...original view definition...)
> hack_around_pg_dump_versus_rules_bug;
>
> Which caused pg_dump to change its behavior and instead emit create view
>  which is what we wanted (because we take filtered down and dependency
> ordered outputs of pg_dump as the starting point for new patches to the
> db).  But it surprised me mildly that the hack "worked" so I thought I
> would mention it here.   It might just mean that I'm misunderstanding the
> bug but if there was really a dependency in the original that dependency
> still exists now.
>
>
N/m turns out that using pg_dump -t  isn't a good way to test if
the hack works because than it always does the good thing.


>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>


Re: [HACKERS] pg_dump versus rules, once again

2016-12-30 Thread Benedikt Grundmann
On 17 November 2016 at 03:45, Robert Haas  wrote:

> On Wed, Nov 16, 2016 at 10:14 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Wed, Nov 16, 2016 at 10:00 PM, Tom Lane  wrote:
> >>> The changes in pg_backup_archiver.c would have to be back-patched
> >>> into all versions supporting --if-exists, so that they don't fail
> >>> on dump archives produced by patched versions.
> >
> >> Even if you patch future minor releases, past minor releases are still
> >> going to exist out there in the wild for a long, long time.
> >
> > Yeah, but it would only matter if you try to use pg_restore --clean
> --if-exists
> > with an archive file that happens to contain a view that has this issue.
> > Such cases would previously have failed anyway, because of precisely
> > the bug at issue ... and there aren't very many of them, or we'd have
> > noticed the problem before.  So I don't feel *too* bad about this,
> > I just want to make sure we have a solution available.
>
> Right, OK.
>

For what it is worth we just run into this problem on our postgres 9.2.17
installation on a hunch we (after reading Tom's initial email replaced the
view that caused this by this

create view ... as select * from (...original view definition...)
hack_around_pg_dump_versus_rules_bug;

Which caused pg_dump to change its behavior and instead emit create view
 which is what we wanted (because we take filtered down and dependency
ordered outputs of pg_dump as the starting point for new patches to the
db).  But it surprised me mildly that the hack "worked" so I thought I
would mention it here.   It might just mean that I'm misunderstanding the
bug but if there was really a dependency in the original that dependency
still exists now.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] Decoding proacl

2016-10-03 Thread Benedikt Grundmann
I'm trying to understand how to decode proacl in pg_proc.  The
documentation says:

PostgreSQL grants default privileges on some types of objects to PUBLIC.
... EXECUTE privilege for functions; ... Also, these initial default
privilege settings can be changed using the ALTER DEFAULT PRIVILEGES
command.

I also found this email
 by
Tom saying that NULL means the default of execute to public.

Questions:

a) Does NULL mean execute to public?  Or does it mean whatever
pg_default_acl contains for functions?

b) What does it mean if pg_default_acl is empty?

c) If NULL means execute to public can somebody explain why this happens:

postgres_prod@proddb_testing=# select proacl from pg_proc where proname =
'hstore_eq';
─[ RECORD 1 ]
proacl │ ¤

Time: 87.862 ms
postgres_prod@proddb_testing=# grant execute on function hstore_eq(hstore,
hstore) to public;
GRANT
Time: 88.931 ms
postgres_prod@proddb_testing=# select proacl from pg_proc where proname =
'hstore_eq';
─[ RECORD 1 ]
proacl │ *{=X/postgres_prod,postgres_prod=X/postgres_prod}*

I would have expected the bold to still be NULL.  Also I haven't found any
combination of statements to set it back to NULL (short of dropping and
recreating hstore_eq).

Which leads me to I guess the most important questions.

d) Other than compactness in the representation of acls is there any
practical difference between an the above representation and having NULL in
proacl?

Thanks in advance,

Bene


Re: [HACKERS] between not propated into a simple equality join

2016-05-10 Thread Benedikt Grundmann
On Tue, May 10, 2016 at 7:41 AM, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 10 May 2016 at 16:34, David G. Johnston <david.g.johns...@gmail.com>
> wrote:
> > On Mon, May 9, 2016 at 8:53 AM, Benedikt Grundmann
> > <bgrundm...@janestreet.com> wrote:
> >>
> >> We just run into a very simple query that the planner does much worse on
> >> than we thought it would (in production the table in question is ~ 100
> GB).
> >> It surprised us given the planner is generally quite good, so I thought
> I
> >> share our surprise
> >>
> >> Setup:
> >>
> >> postgres_prod@proddb_testing=# select version();[1]
> >> version
> >>
> >>
> 
> >>  PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> >> 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
> >> (1 row)
> >>
> >> Time: 69.246 ms
> >>
> >> postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
> >> int);
> >> CREATE TABLE
> >> Time: 67.096 ms
> >> postgres_prod@proddb_testing=# insert into toy_data3
> >>   (select current_date-(s.idx/1000), s.idx from
> generate_series(1,100)
> >> as s(idx));
> >> INSERT 0 100
> >> Time: 1617.483 ms
> >> postgres_prod@proddb_testing=# create index toy_data_date3 on
> >> toy_data3(the_date);
> >> CREATE INDEX
> >> Time: 660.166 ms
> >> postgres_prod@proddb_testing=# analyze toy_data3;
> >> ANALYZE
> >> Time: 294.984 ms
> >>
> >> The bad behavior:
> >>
> >> postgres_prod@proddb_testing=# explain analyze
> >>   select * from (
> >>select td1.the_date, td1.i
> >> from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
> >> and td1.i = td2.i
> >>   ) foo
> >>   where the_date between current_date and current_date;
> >>QUERY
> >> PLAN
> >>
> >>
> ───
> >>  Hash Join  (cost=55.49..21980.50 rows=1 width=8) (actual
> >> time=0.336..179.374 rows=999 loops=1)
> >>Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
> >>->  Seq Scan on toy_data3 td2  (cost=0.00..14425.00 rows=100
> >> width=8) (actual time=0.007..72.510 rows=100 lo
> >>->  Hash  (cost=40.44..40.44 rows=1003 width=8) (actual
> >> time=0.321..0.321 rows=999 loops=1)
> >>  Buckets: 1024  Batches: 1  Memory Usage: 40kB
> >>  ->  Index Scan using toy_data_date3 on toy_data3 td1
> >> (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
> >>Index Cond: ((the_date >= ('now'::cstring)::date) AND
> >> (the_date <= ('now'::cstring)::date))
> >>  Total runtime: 179.440 ms
> >> (8 rows)
> >>
> >> Time: 246.094 ms
> >>
> >> Notice the red.  Which is sad because one would like it to realize that
> it
> >> could propagate the index constraint onto td2.  That is on both sides
> of the
> >> join do the green.
> >>
> >
> > FWIW
> >
> > This is my plan result:
> > version
> > PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> > 4.8.2-19ubuntu1) 4.8.2, 64-bit
> > All default settings
> >
> > using "BETWEEN"
> >
> >  QUERY PLAN
> > Nested Loop  (cost=0.86..48.91 rows=1 width=8) (actual
> time=0.042..168.512
> > rows=999 loops=1)
> >   ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.43..8.46
> > rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
> > Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date
> <=
> > ('now'::cstring)::date))
> >   ->  Index Scan using toy_data_date3 on toy_data3 td2  (cost=0.42..40.44
> > rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
> > Index Cond: (the_date = td1.the_date)
> > Filter: (td1.i = i)
> > Rows Removed by Filter: 998
> > Planning time: 0.353 ms
> > Execution time: 169.692 ms
> >
> > using "="
> >
> > QUERY PLAN
> > Hash Join  (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652
> > rows=99

[HACKERS] between not propated into a simple equality join

2016-05-09 Thread Benedikt Grundmann
We just run into a very simple query that the planner does much worse on
than we thought it would (in production the table in question is ~ 100
GB).  It surprised us given the planner is generally quite good, so I
thought I share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
version


 PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3

  (select current_date-(s.idx/1000), s.idx from generate_series(1,100)
as s(idx));
INSERT 0 100
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on
toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
and td1.i = td2.i
  ) foo
  where the_date between current_date and current_date;
   QUERY
PLAN
───
 Hash Join  (cost=55.49..21980.50 rows=1 width=8) (actual
time=0.336..179.374 rows=999 loops=1)
   Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
   ->  Seq Scan on toy_data3 td2  (cost=0.00..14425.00 rows=100
width=8) (actual time=0.007..72.510 rows=100 lo
   ->  Hash  (cost=40.44..40.44 rows=1003 width=8) (actual
time=0.321..0.321 rows=999 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 40kB
 ->  Index Scan using toy_data_date3 on toy_data3 td1
 (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
   Index Cond: ((the_date >= ('now'::cstring)::date) AND
(the_date <= ('now'::cstring)::date))
 Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red.  Which is sad because one would like it to realize that it
could propagate the index constraint onto td2.  That is on both sides of
the join do the green.

As it does correctly when one explicitly uses equality (bold below) (but of
course we sometimes have multiple day ranges in production and we only used
a single date range above to make it extra interesting for the planner to
NOT do a seqscan):

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
and td1.i = td2.i) foo
  where *the_date = current_date*;
   QUERY
PLAN
───
 Hash Join  (cost=50.47..92.17 rows=1 width=8) (actual time=0.300..0.652
rows=999 loops=1)
   Hash Cond: (td1.i = td2.i)
   ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.00..37.93
rows=1003 width=8) (actual time=0.023..0.169
 Index Cond: (the_date = ('now'::cstring)::date)
   ->  Hash  (cost=37.93..37.93 rows=1003 width=8) (actual
time=0.270..0.270 rows=999 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 40kB
 ->  Index Scan using toy_data_date3 on toy_data3 td2
 (cost=0.00..37.93 rows=1003 width=8) (actual time=0.007.
   Index Cond: (the_date = ('now'::cstring)::date)
 Total runtime: 0.713 ms
(9 rows)

Time: 66.904 ms

Cheers,

Bene


Re: [HACKERS] pg_basebackup compression TODO item

2016-03-07 Thread Benedikt Grundmann
On Sun, Mar 6, 2016 at 7:36 PM, Euler Taveira  wrote:

> On 03-03-2016 14:44, Magnus Hagander wrote:
> > On Thu, Mar 3, 2016 at 6:34 PM, Andres Freund  > > wrote:
> >
> > On 2016-03-03 18:31:03 +0100, Magnus Hagander wrote:
> > > I think we want it at protocol level rather than pg_basebackup
> level.
> >
> > I think we may want both eventually, but I do agree that protocol
> level
> > has a lot higher "priority" than that. Something like protocol level
> > compression has a bit of different tradeofs than compressing base
> > backups, and it's nice not to compress, uncompress, compress again.
> >
> >
> >
> > Yeah, good point, we definitely want both. Based on the field experience
> > I've had (which might differ from others), having it protocol level
> > would help more people tough, so should be higher prio.
> >
> Some time ago, I started a thread [1] to implement compression at
> protocol level. The use cases are data load over slow links and reduce
> bandwidth consumption during replication.
>
> At that time, there wasn't a consensus about which compression algorithm
> to choose. After the WAL compression feature, I think we can do some POC
> with LZ compression (that is already available in common).
>
> I'll try to update the code and do some benchmarks.
>
>
> +1 to protocol level compression.  In our case the primary reasons why we
use thirdparty magic networking appliances as a middle man between our
offices is to compress postgres network traffic (which is very
compress-able that is > 95% reduction is normal).  And the presence of
those devices introduces all kinds of weird additional error cases and
administrative overhead (+ of course cost).  So I would personally consider
protocol level compression to be bigger killer feature than any other
feature that has made itself into postgres since the 9.2 release. But of
course YMMV ;-)



> [1] http://www.postgresql.org/message-id/4fd9698f.2090...@timbira.com
>
>
> --
>Euler Taveira   Timbira - http://www.timbira.com.br/
>PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Death by regexp_replace

2016-01-18 Thread Benedikt Grundmann
thanks

On Fri, Jan 15, 2016 at 7:22 PM, Devrim Gündüz  wrote:

> Hi,
>
> That is the version of *repo* RPM, not PostgreSQL itself.Once you install
> it, you can grab the latest version with
>
> yum install postgresql92-server
>
> Regards, Devrim
>
> On January 15, 2016 7:48:53 PM GMT+02:00, Robert Haas <
> robertmh...@gmail.com> wrote:
>>
>>  Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
>>>  it looks like you haven't released at least the link on this page for 9.2
>>>
>>>  http://yum.postgresql.org/repopackages.php
>>>
>>>  says 7 in the filename which is certainly not 14 ;-)
>>>
>>>  
>>> http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm
>>>
>>>  Is that expected?
>>>
>>
>> Adding Devrim, who I believe maintains that stuff.
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>


[HACKERS] Death by regexp_replace

2016-01-15 Thread Benedikt Grundmann
Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...

gdb revealed that we were sitting in pg_regexec  (we forced it to return 16
aka invalid regex to return our system into a good state).

Here is the regular expression and the text to run on:

*WARNING DO NOT DO THIS ON A PRODUCTION BOX*

select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

Cheers,

Bene


Re: [HACKERS] Death by regexp_replace

2016-01-15 Thread Benedikt Grundmann
On Fri, Jan 15, 2016 at 4:39 PM, Benedikt Grundmann <
bgrundm...@janestreet.com> wrote:

>
> On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> Kevin Grittner <kgri...@gmail.com> writes:
>> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
>> >> practically no constraint there, leading to having to consider O(N^2)
>> >> or more possibilities.)
>>
>> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
>> > but it seems to be in an endless loop until you do that.
>>
>> A bit of further experimentation suggests the runtime growth is actually
>> more like O(2^N).  It will terminate in a reasonable amount of time if the
>> input string is about half as long as the given example.
>>
>> The problem is that so far as the DFA engine is concerned, the pattern
>> substring '(,*\1)+' can match almost anything at all, because it's
>> equivalent to '(,*[^,]+)+' which is easily seen to match any string
>> whatever that's got at least one non-comma.  So, for each possible match
>> to the substring '([^,]+)', of which there are lots, it has to consider
>> every possible way of breaking up all the rest of the string into one or
>> more substrings.  The vast majority of those ways will fail when the
>> backref match is checked, but there's no way to realize it before that.
>>
>> To be clear I'm perfectly happy with that query taking forever (I didn't
> write it ;-)).  The only thing I was unhappy about was that
> pg_cancel/terminate_backend didn't work.  If that is fixed great.
>
>
>> regards, tom lane
>>
>
>
Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm


Is that expected?

Thanks,

Bene


Re: [HACKERS] Death by regexp_replace

2016-01-15 Thread Benedikt Grundmann
9.2.6

On Fri, Jan 15, 2016 at 3:48 PM, Kevin Grittner  wrote:

> On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane  wrote:
>
> >> *WARNING DO NOT DO THIS ON A PRODUCTION BOX*
> >> select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
> >> LN', '([^,]+)(,*\1)+', '\1');
>
> > This responds to cancel just fine for me.
>
> > (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
> > practically no constraint there, leading to having to consider O(N^2)
> > or more possibilities.)
>
> On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> but it seems to be in an endless loop until you do that.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Death by regexp_replace

2016-01-15 Thread Benedikt Grundmann
On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane  wrote:

> Kevin Grittner  writes:
> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane  wrote:
> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
> >> practically no constraint there, leading to having to consider O(N^2)
> >> or more possibilities.)
>
> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> > but it seems to be in an endless loop until you do that.
>
> A bit of further experimentation suggests the runtime growth is actually
> more like O(2^N).  It will terminate in a reasonable amount of time if the
> input string is about half as long as the given example.
>
> The problem is that so far as the DFA engine is concerned, the pattern
> substring '(,*\1)+' can match almost anything at all, because it's
> equivalent to '(,*[^,]+)+' which is easily seen to match any string
> whatever that's got at least one non-comma.  So, for each possible match
> to the substring '([^,]+)', of which there are lots, it has to consider
> every possible way of breaking up all the rest of the string into one or
> more substrings.  The vast majority of those ways will fail when the
> backref match is checked, but there's no way to realize it before that.
>
> To be clear I'm perfectly happy with that query taking forever (I didn't
write it ;-)).  The only thing I was unhappy about was that
pg_cancel/terminate_backend didn't work.  If that is fixed great.


> regards, tom lane
>


Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread Benedikt Grundmann
On Wed, Dec 30, 2015 at 7:16 AM, David Rowley 
wrote:

>
> A number of ideas were suggested on the other thread about how we might go
> about solving this problem. In [3] Simon talked about perhaps enabling
> extra optimisations when the planner sees that the plan will cost more than
> some given threshold. That's perhaps an option, but may not work well for
> optimisations which must take place very early in planning, for example [4].
>

A small tweak on 3 to deal with 4.  If the returned plan cost is quite high
(say you estimate minutes+) you could just restart planning from scratch
with all costly planning enabled, because even in the worst case (that is
the additional options don't find a better plan), the total planning cost
won't matter much in the grand scheme of things.


Re: [HACKERS] Proposal: Incremental Backup

2014-08-08 Thread Benedikt Grundmann
On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini 
gabriele.bartol...@2ndquadrant.it wrote:

 Hi Marco,

  With the current full backup procedure they are backed up, so I think
  that having them backed up with a rsync-like algorithm is what an user
  would expect for an incremental backup.

 Exactly. I think a simple, flexible and robust method for file based
 incremental backup is all we need. I am confident it could be done for
 9.5.

 I would like to quote every single word Simon said. Block level
 incremental backup (with Robert's proposal) is definitely the ultimate
 goal for effective and efficient physical backups. I see file level
 incremental backup as a very good compromise, a sort of intermediate
 release which could nonetheless produce a lot of benefits to our user
 base, for years to come too.

 Thanks,
 Gabriele


I haven't been following this discussion closely at all. But at Janestreet
we have been using pg_start_backup together with rsync --link-dest (onto a
big NFS) to achieve incremental stored backup.  In our experience this
works very well, it is however advisable to look into whatever is used to
serve the NFS as we had to set some options to increase the maximum number
of hardlinks.

Cheers,

Bene



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Benedikt Grundmann
That is it possible to tell the planner that index is off limits i.e. don't
ever generate a plan using it?

Rationale:  Schema changes on big tables.  I might have convinced myself /
strong beliefs that for all queries that I need to be fast the planner does
not need to use a given index (e.g. other possible plans are fast enough).
However if I just drop the index and it turns out I'm wrong I might be in a
world of pain because it might just take way to long to recreate the index.

I know that I can use pg_stat* to figure out if an index is used at all.
But in the presense of multiple indices and complex queries the planner
might prefer the index-to-be-dropped but the difference to the alternatives
available is immaterial.

The current best alternative we have is to test such changes on a testing
database that gets regularly restored from production.  However at least in
our case we simply don't know all possible queries (and logging all of them
is not an option).

Cheers,

Bene


Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann
I posted this on this mailing list before at Jane Street we have developed
very fast code to get timing information based on TSC if available.  It's
all ocaml but well documented and mostly just calls to c functions so
should be easy to port to C and we release it under a very liberal license
so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene


On Wed, May 14, 2014 at 12:41 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, May 13, 2014 at 11:34 PM, Greg Stark st...@mit.edu wrote:
  I always assumed the kernel used rdtsc to implement some of the high
  performance timers. It can save the current time in a mapped page when
  it schedules a process and then in the vdso syscall (ie in user-space)
  it can use rdtsc to calculate the offset needed to adjust that
  timestamp to the current time. This seems consistent with your
  calculations that showed the 40ns overhead with +/- 10ns precision.

 Crazy idea: Instead of trying to time precisely the amount of time we
 spend in each node, configure a very-high frequency timer interrupt
 (or background thread?) that does:

 SomeGlobalVariablePointingToTheCurrentNode-profiling_counter++;

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann
On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann 
bgrundm...@janestreet.com wrote:

 I posted this on this mailing list before at Jane Street we have developed
 very fast code to get timing information based on TSC if available.  It's
 all ocaml but well documented and mostly just calls to c functions so
 should be easy to port to C and we release it under a very liberal license
 so it should be no problem to take the ideas:

 https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

 Hope this is useful.

 Bene


Also I'm sorry for top posting.  hackers is the only mailing list I'm on
that requires this and some others require top posting so this runs counter
my habits and I only realized after sending...




 On Wed, May 14, 2014 at 12:41 PM, Robert Haas robertmh...@gmail.comwrote:

 On Tue, May 13, 2014 at 11:34 PM, Greg Stark st...@mit.edu wrote:
  I always assumed the kernel used rdtsc to implement some of the high
  performance timers. It can save the current time in a mapped page when
  it schedules a process and then in the vdso syscall (ie in user-space)
  it can use rdtsc to calculate the offset needed to adjust that
  timestamp to the current time. This seems consistent with your
  calculations that showed the 40ns overhead with +/- 10ns precision.

 Crazy idea: Instead of trying to time precisely the amount of time we
 spend in each node, configure a very-high frequency timer interrupt
 (or background thread?) that does:

 SomeGlobalVariablePointingToTheCurrentNode-profiling_counter++;

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann
On Thu, May 15, 2014 at 11:31 AM, Greg Stark st...@mit.edu wrote:

 On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:
  I posted this on this mailing list before at Jane Street we have
 developed
  very fast code to get timing information based on TSC if available.  It's
  all ocaml but well documented and mostly just calls to c functions so
 should
  be easy to port to C and we release it under a very liberal license so it
  should be no problem to take the ideas:

 What OS do you run it on though? How fast is your implementation
 compared to the kernel implementation of clock_gettime()?

 Are you sure your implementation is actually faster? And are you sure
 you're protected against clocks going backwards? I think you should
 put some i/o in the loop in the test and start several threads running
 it to make it more likely the thread is rescheduled to a different
 processor during the test. It suspect you'll find the rdtsc goes
 backwards sometimes or produces crazy results when switching
 processors.


There are benchmarks in the link I posted (obtained by a micro benchmarking
library we developed / use internally which takes great care to obtain
reliable numbers) .  We use posix threads extensively. We internally spend
a lot of time setting up ntp and monitoring systems so that clock backwards
never happens (so with other words I wouldn't be surprised if the library
does NOT work correctly when it does  -- our protection is outside).  I do
not believe we have seen the tdtsc going backwards on thread context switch
you mention (and as said we use lots of threads).  OS?  Centos 6.5
primarily.







--
 greg



[HACKERS] How to do fast performance timing

2013-12-09 Thread Benedikt Grundmann
At Jane Street we have recently spend a lot of time trying to get a fast
gettimeofday.  I saw lots of references in various postgres hacker threads
related to a lack of such a facility so 

The culmination of those efforts can be read here:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli
and

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.ml

it's all OCaml but the code is mostly imperative and very well documented.
In particular we made an effort to document our assumption.  There are a
few which are ocaml specific.  But a lot of the lessons we have learned
here should be applicable to postgres.

Hope this will be useful,

Cheers,

Bene

PS: We are releasing our code under the Apache license so you should feel
free to reuse the ideas.


Re: [HACKERS] record identical operator

2013-09-13 Thread Benedikt Grundmann
On Thu, Sep 12, 2013 at 11:27 PM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.
 [...]
 The patch adds an identical operator (===) for the record type:


[...]

 The new operator is logically similar to IS NOT DISTINCT FROM for a
 record, although its implementation is very different.  For one
 thing, it doesn't replace the operation with column level operators
 in the parser.  For another thing, it doesn't look up operators for
 each type, so the identical operator does not need to be
 implemented for each type to use it as shown above.  It compares
 values byte-for-byte, after detoasting.  The test for identical
 records can avoid the detoasting altogether for any values with
 different lengths, and it stops when it finds the first column with
 a difference.

 I toyed with the idea of supporting hashing of records using this
 operator, but could not see how that would be a performance win.

 The identical (===) and not identical (!==) operator names were
 chosen because of a vague similarity to the exactly equals
 concepts in JavaScript and PHP, which use that name.  The semantics
 aren't quite the same, but it seemed close enough not to be too
 surprising.  The additional operator names seemed natural to me
 based on the first two, but I'm not really that attached to these
 names for the operators if someone has a better idea.

 Since the comparison of record values is not documented (only
 comparisons involving row value constructors), it doesn't seem like
 we should document this special case.  It is intended primarily for
 support of matview refresh and maintenance, and it seems likely
 that record comparison was not documented on the basis that it is
 intended primarily for support of such things as indexing and merge
 joins -- so leaving the new operators undocumented seems consistent
 with existing policy.  I'm open to arguments that the policy should
 change.

 -


Wouldn't it be slightly less surprising / magical to not declare new
operators
but just new primitive functions?  In the least invasive version they could
even
be called matview_is_record_identical or similar.

cheers,

Bene


Re: [HACKERS] Backup throttling

2013-08-27 Thread Benedikt Grundmann
On Tue, Aug 27, 2013 at 12:58 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 20, 2013 at 2:37 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  Throttling in the client seems much better to me. TCP is designed to
 handle
  a slow client.

 Other people have already offered some good points in this area, but
 let me just add one thought that I don't think has been mentioned yet.
  We have a *general* need to be able to throttle server-side resource
 utilization, particularly I/O.  This is a problem not only for
 pg_basebackup, but for COPY, CLUSTER, VACUUM, and even things like
 UPDATE.  Of all of those, the only one for which we currently have any
 kind of a solution is VACUUM.  Now, maybe pg_basebackup also needs its
 own special-purpose solution, but I think we'd do well to consider a
 general I/O rate-limiting strategy and then consider particular needs
 in the light of that framework.  In that context, server-side seems
 better to me, because something like CLUSTER isn't going to produce
 anything that the client can effectively limit.


+1 it is very easy at the moment to for example run a manual vacuum
full/cluster against a big table and generate WAL so quickly that the hot
standby disconnects because it gets too far behind.


Re: [HACKERS] hardware donation

2013-07-19 Thread Benedikt Grundmann
The server is already turned off and in our nyc office (I'm based in the
ldn one).  But I'm pretty sure its a LSI MegaRAID SAS 9285.


On Thu, Jul 18, 2013 at 11:58 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 7/10/13 12:53 PM, Benedikt Grundmann wrote:

 The server will probably be most interesting for the disks in it.  That
 is where we spend the largest amount of time optimizing (for sequential
 scan speed in particular):
 22x600GB disks in a Raid6+0 (Raid0 of 2x 10disk raid 6 arrays) + 2 spare
 disks. Overall size 8.7 TB in that configuration.


 What is the RAID controller used in the server?  That doesn't impact the
 donation, I'm just trying to fit this one into my goals for finding useful
 community performance testing equipment.

 There are a good number of systems floating around the community with HP
 controllers--I have even one myself now--but we could use more LSI Logic
 and Adaptec based systems.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



[HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread Benedikt Grundmann
A third party application we use generates SQL queries.  Here is query it
generated that broke today and for which I have a hard time arguing that
the postgres behavior is correct (minimally the error message is confusing):

=# create temporary table foo (b double precision );
CREATE TABLE
Time: 40.368 ms
=# insert into foo select min(NULL);
ERROR:  column b is of type double precision but expression is of type
text
LINE 1: insert into foo select min(NULL);
   ^
HINT:  You will need to rewrite or cast the expression.

So why does min(NULL) have type text?  According to the docs it has the
type of the input.  The value is itself NULL which is a valid member of all
types in SQL isn't it?

So what is going on?

Thanks,

Bene


Re: [HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread Benedikt Grundmann
Thanks David,

I like the fact that postgres is explicit in it's types.  All I'm arguing
is that error message is misleading. And that I had a hard time
understanding why happened what happened.  The part I was missing is that
despite supporting an any type the necessary type inference is very very
local and quickly resorts to the default type.

thanks everyone,

Bene


On Fri, Jul 12, 2013 at 3:17 PM, David Johnston pol...@yahoo.com wrote:

 Benedikt Grundmann wrote
  A third party application we use generates SQL queries.  Here is query it
  generated that broke today and for which I have a hard time arguing that
  the postgres behavior is correct (minimally the error message is
  confusing):
 
  =# create temporary table foo (b double precision );
  CREATE TABLE
  Time: 40.368 ms
  =# insert into foo select min(NULL);
  ERROR:  column b is of type double precision but expression is of type
  text
  LINE 1: insert into foo select min(NULL);
 ^
  HINT:  You will need to rewrite or cast the expression.
 
  So why does min(NULL) have type text?  According to the docs it has the
  type of the input.  The value is itself NULL which is a valid member of
  all
  types in SQL isn't it?
 
  So what is going on?
 
  Thanks,
 
  Bene

 Ideally PostgreSQL would be smart enough to recognize that min(NULL) is
 of
 an unknown type and thus would use the definition of foo to coerce NULL
 to
 the desired type.  I cannot explain why it does not do this but from the
 example it cannot.

 Using a literal NULL without an explicit type-cast is not recommended as
 the
 system cannot always accurately figure out what type you mean for it to
 use.
 Being a valid value for all types does not mean it magically switches to
 fit
 whatever usage is required.  Columns are typed, not values per-se, and so
 NULL can belong in any column but once it is part of that column it takes
 on
 that column's type.

 The query you show is pretty pointless since the intent of min is to take
 a column over which to aggregate; not a literal which will only ever return
 itself.

 In short the SELECT query is trying its best to execute and so in the
 presence of an unadorned NULL - and being unable to infer the type from
 context - it simply uses the default type which is text.  The SELECT
 executes just fine, and outputs a min column of type text which when
 supplied to the table foo causes the type mis-match for column b on
 foo.

 The PostgreSQL behavior is simple because it does not infer the type of
 NULL from the column in foo but it is not required to do so its failure is
 not wrong.  The error message, given what does occur, makes perfect sense
 and is easy enough to trace (i.e., what column is feeding foo.b from the
 SELECT statement; then, why is that column being seen as text).

 PostgreSQL is in the opinion of some too verbose in its requirement to be
 explicit regarding types but it does make for less buggy code overall.
  This
 particular use-case may be solvable but I'd argue that your example is not
 likely to convince anyone that it is a serious enough problem worth the
 effort it would take to do so.

 David J.






 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/column-b-is-of-type-X-but-expression-is-of-type-text-tp5763586p5763587.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] hardware donation

2013-07-10 Thread Benedikt Grundmann
Jane Street has a spare server we would like to donate to the postgres
community.  We originally planed to use it for one of our database clusters
and it matches exactly what we use in production at the moment.

Rough specs:
CPU: 8x Intel(R) Xeon(R) CPU   X5570  @ 2.93GHz
MEM: 48GB

The server will probably be most interesting for the disks in it.  That is
where we spend the largest amount of time optimizing (for sequential scan
speed in particular):

22x600GB disks in a Raid6+0 (Raid0 of 2x 10disk raid 6 arrays) + 2 spare
disks. Overall size 8.7 TB in that configuration.

Is this something the community would find useful?  If so, who is the right
person to talk to about shipping/delivery?

Cheers,

Bene


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Benedikt Grundmann
On Fri, Jun 14, 2013 at 10:11 AM, Samrat Revagade revagade.sam...@gmail.com
 wrote:

 Hello,


 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for that:




 http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com



 Let me again summarize the problem we are trying to address.



 When the master fails, last few WAL files may not reach the standby. But
 the master may have gone ahead and made changes to its local file system
 after flushing WAL to the local storage.  So master contains some file
 system level changes that standby does not have.  At this point, the data
 directory of master is ahead of standby's data directory.

 Subsequently, the standby will be promoted as new master.  Later when the
 old master wants to be a standby of the new master, it can't just join the
 setup since there is inconsistency in between these two servers. We need to
 take the fresh backup from the new master.  This can happen in both the
 synchronous as well as asynchronous replication.



 Fresh backup is also needed in case of clean switch-over because in the
 current HEAD, the master does not wait for the standby to receive all the
 WAL up to the shutdown checkpoint record before shutting down the
 connection. Fujii Masao has already submitted a patch to handle clean
 switch-over case, but the problem is still remaining for failback case.



 The process of taking fresh backup is very time consuming when databases
 are of very big sizes, say several TB's, and when the servers are connected
 over a relatively slower link.  This would break the service level
 agreement of disaster recovery system.  So there is need to improve the
 process of disaster recovery in PostgreSQL.  One way to achieve this is to
 maintain consistency between master and standby which helps to avoid need
 of fresh backup.



 So our proposal on this problem is that we must ensure that master should
 not make any file system level changes without confirming that the
 corresponding WAL record is replicated to the standby.



A alternative proposal (which will probably just reveal my lack of
understanding about what is or isn't possible with WAL).  Provide a way to
restart the master so that it rolls back the WAL changes that the slave
hasn't seen.

 There are many suggestions and objections pgsql-hackers about this problem
 The brief summary is as follows:





Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Benedikt Grundmann
On Fri, Jun 14, 2013 at 2:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Marko Kreen mark...@gmail.com writes:
  On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed dean.a.rash...@gmail.com
 wrote:
  Attached is a patch implementing a new aggregate function md5_agg() to
  compute the aggregate MD5 sum across a number of rows.

  It's more efficient to calculate per-row md5, and then sum() them.
  This avoids the need for ORDER BY.

 Good point.  The aggregate md5 function also fails to distinguish the
 case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
 from the case where they contain 'xyz' followed by 'zyxyz'.

 Now, as against that, you lose any sensitivity to the ordering of the
 values.

 Personally I'd be a bit inclined to xor the per-row md5's rather than
 sum them, but that's a small matter.

 regards, tom lane


xor works but only if each row is different (e.g. at the very least all
columns together make a unique key).





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-28 Thread Benedikt Grundmann
, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:
  We are seeing these errors on a regular basis on the testing box now.
  We
  have even changed the backup script to
  shutdown the hot standby, take lvm snapshot, restart the hot standby,
 rsync
  the lvm snapshot.  It still happens.
 
  We have never seen this before we introduced the hot standby.  So we
 will
  now revert to taking the backups from lvm snapshots on the production
  database.  If you have ideas of what else we should try / what
 information
  we can give you to debug this let us know and we will try to so.
 
  Until then we will sadly operate on the assumption that the combination
 of
  hot standby and frozen snapshot backup of it is not production ready.

 I'm pretty suspicious that your backup procedure is messed up in some
 way.  The fact that you got invalid page headers is really difficult
 to attribute to a PostgreSQL bug.  A number of the other messages that
 you have posted also tend to indicate either corruption, or that WAL
 replay has stopped early.  It would be interesting to see the logs
 from when the clone was first started up, juxtaposed against the later
 WAL flush error messages.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company





Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-21 Thread Benedikt Grundmann
We are seeing these errors on a regular basis on the testing box now.  We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot.  It still happens.

We have never seen this before we introduced the hot standby.  So we will
now revert to taking the backups from lvm snapshots on the production
database.  If you have ideas of what else we should try / what information
we can give you to debug this let us know and we will try to so.

Until then we will sadly operate on the assumption that the combination of
hot standby and frozen snapshot backup of it is not production ready.

Thanks,

Bene




On Thu, May 16, 2013 at 8:10 AM, David Powers dpow...@janestreet.comwrote:

 I'll try to get the primary upgraded over the weekend when we can afford a
 restart.

 In the meantime I have a single test showing that a shutdown, snapshot,
 restart produces a backup that passes the vacuum analyze test.  I'm going
 to run a full vacuum today.

 -David


 On Wed, May 15, 2013 at 3:53 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 15.05.2013 22:50, Benedikt Grundmann wrote:

 On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangashlinnakangas@**
 vmware.com hlinnakan...@vmware.com

 The subject says 9.2.3. Are you sure you're running 9.2.4 on all the

 servers? There was a fix to a bug related to starting a standby server
 from
 a filesystem snapshot. I don't think it was quite the case you have, but
 pretty close.


 So this is delightfully embarrassing I just went back to double check and

 - primary box is 9.2.3
 - standby is 9.2.4
 - testing is 9.2.4

 I guess that alone could possibly explain it?


 Hmm, no, it should still work. There haven't been any changes in the WAL
 format. I do recommend upgrading the primary, of course, but I don't really
 see how that would explain what you're seeing.

 - Heikki





Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread Benedikt Grundmann
On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 15.05.2013 15:42, David Powers wrote:

 First, thanks for the replies.  This sort of thing is frustrating and hard
 to diagnose at a distance, and any help is appreciated.

 Here is some more background:

 We have 3 9.2.4 databases using the following setup:


 The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
 servers? There was a fix to a bug related to starting a standby server from
 a filesystem snapshot. I don't think it was quite the case you have, but
 pretty close.


So this is delightfully embarrassing I just went back to double check and

- primary box is 9.2.3
- standby is 9.2.4
- testing is 9.2.4

I guess that alone could possibly explain it?

Thanks,

Bene


[HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-14 Thread Benedikt Grundmann
Today we have seen this on our testing database instance:

ERROR:  could not open file base/16416/291498116.3 (target block 431006):
No such file or directory

That database get's created by rsyncing the LVM snapshot of the standby,
which is a readonly backup of proddb
using streaming replication.

We do not put the standby database into backup mode before rsyncing the LVM
snapshot, trusting postgres ability to recover
from crash.  Now we are not sure anymore...  Thoughts?  It's worth noting
that we have been using this method of taking a backup
for a long long time, but we only recently (after 9.2 came out we switched
to doing a streaming replica and then to take the
frozen snapshot backup method from the streaming replica).

Let me know if there is more information you need, or if this is a known
problem.

Thanks,

Bene


Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-14 Thread Benedikt Grundmann
It's on the production database and the streaming replica.  But not on the
snapshot.

production
-rw--- 1 postgres postgres 312778752 May 13 21:28
/database/postgres/base/16416/291498116.3

streaming replica
-rw--- 1 postgres postgres 312778752 May 13 23:50
/database/postgres/base/16416/291498116.3

Is there a way to find out what the file contains?

We just got some more information.  All of the following was done / seen in
the logs of the snapshot database.

After we saw this we run a vacuum full on the table we suspect to be backed
by this file.  This happened:


WARNING:  concurrent insert in progress within table js_equity_daily_diff


Over and over again.  So we killed the VACUUM full by hitting ctrl-c again
and again.  After that the logs contained:

fgrep ERROR postgresql-2013-05-14.csv  | head
2013-05-14 00:23:16.028 EDT,postgres_prod,proddb_testing,15698,
172.24.65.103:54536,5191bc31.3d52,3,SELECT,2013-05-14 00:23:13
EDT,2/3330,0,ERROR,58P01,could not open file base/16416/291498116.3
(target block 431006): No such file or directory,SQL statement
select max(equity_date) from js_equity_daily_diff

The above was the original error that made us investigate.

2013-05-14 09:21:47.121 EDT,postgres_prod,proddb_testing,21002,
172.27.41.24:36815,51923a67.520a,38756,VACUUM,2013-05-14 09:21:43
EDT,2/21611,645995272,ERROR,57014,canceling statement due to user
request,,vacuum full js_equity_daily_diff,,,psql
2013-05-14 09:22:04.700 EDT,postgres_prod,proddb_testing,21008,
172.27.41.24:36831,51923a75.5210,115909,VACUUM,2013-05-14 09:21:57
EDT,2/21618,645995273,ERROR,57014,canceling statement due to user
request,,vacuum full js_equity_daily_diff,,,psql

Than us cancelling the vacuum full

2013-05-14 09:22:13.947 EDT,,,30911,,51919d78.78bf,1,,2013-05-13 22:12:08
EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not satisfied ---
flushed only to 1CEE/31266090,writing block 0 of relation
base/16416/291498116
2013-05-14 09:22:14.964 EDT,,,30911,,51919d78.78bf,2,,2013-05-13 22:12:08
EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not satisfied ---
flushed only to 1CEE/31266090,writing block 0 of relation
base/16416/291498116

And after that these started appearing in logs (and they get repeated every
second now:

[root@nyc-dbc-001 pg_log]# fgrep ERROR postgresql-2013-05-14.csv  | tail -n
2
2013-05-14 09:47:43.301 EDT,,,30911,,51919d78.78bf,3010,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116
2013-05-14 09:47:44.317 EDT,,,30911,,51919d78.78bf,3012,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116

There are no earlier ERROR's in the logs.

2013-05-14 09:38:03.115 EDT,,,30911,,51919d78.78bf,1868,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116
2013-05-14 09:38:03.115 EDT,,,30911,,51919d78.78bf,1869,,2013-05-13
22:12:08 EDT,,0,WARNING,58030,could not write block 0 of
base/16416/291498116,Multiple failures --- write error might be
permanent.

The disk is not full nor are there any messages in the kernel logs.

Cheers,

Bene



On Tue, May 14, 2013 at 9:27 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 14.05.2013 14:57, Benedikt Grundmann wrote:

 Today we have seen this on our testing database instance:

 ERROR:  could not open file base/16416/291498116.3 (target block
 431006):
 No such file or directory

 That database get's created by rsyncing the LVM snapshot of the standby,
 which is a readonly backup of proddb
 using streaming replication.

 We do not put the standby database into backup mode before rsyncing the
 LVM
 snapshot, trusting postgres ability to recover
 from crash.  Now we are not sure anymore...  Thoughts?  It's worth noting
 that we have been using this method of taking a backup
 for a long long time, but we only recently (after 9.2 came out we switched
 to doing a streaming replica and then to take the
 frozen snapshot backup method from the streaming replica).

 Let me know if there is more information you need, or if this is a known
 problem.


 That certainly should work. I'd suggest that you narrow down the problem a
 bit more. Is the file present in the original system? Anything else that
 might be related in the logs?

 - Heikki



Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-14 Thread Benedikt Grundmann
That's one possible explanation.  It's worth noting that we haven't seen
this before moving to streaming rep first and we have been using that
method for a long time.


On Tue, May 14, 2013 at 11:34 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 14.05.2013 16:48, Benedikt Grundmann wrote:

 It's on the production database and the streaming replica.  But not on the
 snapshot.


 So, the LVM snapshot didn't work correctly?

 - Heikki



Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-14 Thread Benedikt Grundmann
I think my previous message wasn't clear enough.  I do *NOT* think that LVM
snapshot is the culprit.

However I cannot discount it as one of the possibilities.  But I have no
evidence in either /var/log/messages or in dmesg that the LVM snapshot went
into a bad state AND we have been using this method for a long time.

The only thing that is *new* is that we took the snapshot from the
streaming replica.  So again my best guess as of now is that if the
database crashes while it is in streaming standby a invalid disk state can
result during during the following startup (in rare and as of now unclear
circumstances).

You seem to be quite convinced that it must be LVM can you elaborate why?

Thanks,

Bene




On Tue, May 14, 2013 at 12:09 PM, Benedikt Grundmann 
bgrundm...@janestreet.com wrote:

 That's one possible explanation.  It's worth noting that we haven't seen
 this before moving to streaming rep first and we have been using that
 method for a long time.


 On Tue, May 14, 2013 at 11:34 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 14.05.2013 16:48, Benedikt Grundmann wrote:

 It's on the production database and the streaming replica.  But not on
 the
 snapshot.


 So, the LVM snapshot didn't work correctly?

 - Heikki





Re: [HACKERS] json api WIP patch

2013-02-04 Thread Benedikt Grundmann
On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 02/04/2013 10:47 AM, Robert Haas wrote:


 The SQL standards considerations seem worth thinking about, too.
 We've certainly gone through a lot of pain working toward eliminating
 = as an operator name, and if the SQL standard has commandeered -
 for some purpose or other, I'd really rather not add to the headaches
 involved should we ever decide to reclaim it.



 OK, but I'd like to know what is going to be safe. There's no way to
 future-proof the language. I'm quite prepared to replace - with something
 else, and if I do then - will need to be adjusted accordingly, I think.

 My suggestion would be ~ and ~. I know David Wheeler didn't like that
 on the ground that some fonts elevate ~ rather than aligning it in the
 middle as most monospaced fonts do, but I'm tempted just to say then use a
 different font. Other possibilities that come to mind are + and +,
 although I think they're less attractive. But I'll be guided by the
 consensus, assuming there is one ;-)

 As a user I would be much in favor of just functions and no additional
operators if the sole difference is syntactical.  I think custom operators
are much harder to remember than function names (assuming reasonably well
chosen function names).

Now Robert seems to suggest that there will also be speed / planner
difference which seems sad (I would have expected operators to be just
syntactical sugar for specially named functions and once we are past the
parser there should be no difference).


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-01-09 Thread Benedikt Grundmann
On Wed, Jan 9, 2013 at 2:01 AM, Josh Berkus j...@agliodbs.com wrote:

 All,

  Well, the problem of find out the box's physical RAM is doubtless
  solvable if we're willing to put enough sweat and tears into it, but
  I'm dubious that it's worth the trouble.  The harder part is how to know
  if the box is supposed to be dedicated to the database.  Bear in mind
  that the starting point of this debate was the idea that we're talking
  about an inexperienced DBA who doesn't know about any configuration knob
  we might provide for the purpose.

 For what it is worth even if it is a dedicated database box 75% might be
way too high. I remember investigating bad performance on our biggest
database server, that in the end turned out to be a too high setting of
effective_cache_size. From reading the code back then my rationale for it
being to high was that the code that makes use of the effective_cache_size
tries very hard to account for what the current query would do to the cache
but doesn't take into account how many queries (on separate datasets!) are
currently begin executed (and competing for the same cache).  On that box
we often have 100+ active connections and many looking at different big
datasets.

Cheers,

bene


Re: [HACKERS] Improve compression speeds in pg_lzcompress.c

2013-01-08 Thread Benedikt Grundmann
 Personally, my biggest gripe about the way we do compression is that
 it's easy to detoast the same object lots of times.  More generally,
 our in-memory representation of user data values is pretty much a
 mirror of our on-disk representation, even when that leads to excess
 conversions.  Beyond what we do for TOAST, there's stuff like numeric
 where not only toast but then post-process the results into yet
 another internal form before performing any calculations - and then of
 course we have to convert back before returning from the calculation
 functions.  And for things like XML, JSON, and hstore we have to
 repeatedly parse the string, every time someone wants to do anything
 to do.  Of course, solving this is a very hard problem, and not
 solving it isn't a reason not to have more compression options - but
 more compression options will not solve the problems that I personally
 have in this area, by and large.

 At the risk of saying something totally obvious and stupid as I haven't
looked at the actual representation this sounds like a memoisation
problem.  In ocaml terms:

type 'a rep =
  | On_disk_rep of Byte_sequence
  | In_memory_rep of 'a

type 'a t = 'a rep ref

let get_mem_rep t converter =
  match !t with
  | On_disk_rep seq -
let res = converter seq in
t := In_memory_rep res;
res
  | In_memory_rep x - x
;;

... (if you need the other direction that it's straightforward too)...

Translating this into c is relatively straightforward if you have the
luxury of a fresh start
and don't have to be super efficient:

typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;

type t = {
  rep_kind_t rep_kind;
  union {
char *on_disk;
void *in_memory;
  } rep;
};

void *get_mem_rep(t *t, void * (*converter)(char *)) {
  void *res;
  switch (t-rep_kind) {
 case ON_DISK_REP:
res = converter(t-on_disk);
t-rep.in_memory = res;
t-rep_kind = IN_MEMORY_REP;
return res;
 case IN_MEMORY_REP;
return t-rep.in_memory;
  }
}

Now of course fitting this into the existing types and ensuring that there
is neither too early freeing of memory nor memory leaks or other bugs is
probably a nightmare and why you said that this is a hard problem.

Cheers,

Bene


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-08-29 Thread Benedikt Grundmann
On Tue, Aug 28, 2012 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Tue, Aug 28, 2012 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Oh, I'd forgotten that worked that way.  Frankly, that makes me quite a
  bit more concerned about this proposal than I was before.  I do *not*
  want to re-introduce silent cross-category casts to text, not even if
  there's no other way to match the function/operator.  I think that hack
  was/is tolerable for actual assignment to a table column, because there
  is very little chance that the semantics of such an assignment will come
  out differently than the user expected.

  Well, I think that when there is only one LPAD function, there is also
  very little chance that the results will come out differently than the
  user expected.

 [ shrug... ]  I'm having a hard time resisting the temptation to point
 out that there are two.  The real point here though is that the proposed
 behavior change will affect all functions, not only the cases where you
 think there is only one sane behavior.  And features such as search paths
 and default parameters frequently mean that there are more potential
 matches than the user thought of while writing the query.

 In the end, SQL is a fairly strongly typed language, especially in our
 manifestation of it.  I don't think we should give that up, especially
 not for benefits as dubious as not having to write a cast to make it
 clear that yes you really do want a timestamp to be treated as text.
 IMO, saving people from the errors that inevitably arise from that sort
 of sloppy thinking is a benefit, not a cost, of having a typed language.

 regards, tom lane

+a very big number

I remember the pain we had when we upgraded from 8.1 to 8.4, but I also
distinctly remember that after the upgrade I was a little bit more
confident that our SQL code does the right thing.  But we are a OCaml shop
if there is one thing we believe in with ferocity it is that a STRICT type
checker is a good thing (TM).  You pay a little verbosity tax but in return
all the stupid little obvious bugs get caught and maybe even more
importantly when you later change your types the system are forced to
reconsider all cases where you used the value of (now different) type (and
that is A VERY GOOD THING in a big code base). Admittedly we are not there
yet in Postgres as functions are only (re)checked upon execution.

My 2cents,

Bene


[HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

2012-07-20 Thread Benedikt Grundmann
We yesterday encountered a program that in a degenerate case issued in
a single transaction a huge number of selects (in a single transaction
but each select in a separate call to PGExec) (huge = ~ 400,000).

That transaction would continue to eat memory up until a point where
calls to malloc (in aset.c) would fail and log for example:

,out of memory,Failed on request of size 11.

Both from that transaction and random others.  In additional observation
of interest to us was that while both VIRT and RES was growing VIRT was
always roughly double of RES.  Which seems to indicate that whatever
allocations were done not all of the memory allocated was actually
touched (server is a Centos6 box).

So I have two questions:

   - Is that expected expected behaviour?  The transaction was
 in READ_COMMITED mode, and my best guess is that this implies
 that some snapshot is taken before each subselect and all
 of them are only freed once the transaction is finished

   - Any recommendations on the use of overcommit?  We had it
 disabled on the assumption that with overcommit the OOM
 killer might kill a random process and that it is better
 to instead have a process that is actually allocating fail
 (and on the additional assumption that any memory allocated
 by postgres would actually be touched).

This is not a huge production issue for us as we can fix the
program to no longer issue huge numbers of selects.  But we
would like to understand.

Thank you very much,

Bene

PS:
The machine has huge amounts of memory and during normal operations
it looks like this:

-bash-4.1$ cat /proc/meminfo
MemTotal:   49413544 kB
MemFree: 1547604 kB
Buffers:5808 kB
Cached: 43777988 kB
SwapCached:0 kB
Active: 18794732 kB
Inactive:   27309980 kB
Active(anon):   13786796 kB
Inactive(anon):  1411296 kB
Active(file):5007936 kB
Inactive(file): 25898684 kB
Unevictable:   71928 kB
Mlocked:   55576 kB
SwapTotal:   2047992 kB
SwapFree:2047992 kB
Dirty: 12100 kB
Writeback: 79684 kB
AnonPages:   2393372 kB
Mapped: 12887392 kB
Shmem:  12871676 kB
Slab:1050468 kB
SReclaimable: 190068 kB
SUnreclaim:   860400 kB
KernelStack:4832 kB
PageTables:   450584 kB
NFS_Unstable:  23312 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:26754764 kB
Committed_AS:   17394312 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  120472 kB
VmallocChunk:   34333956900 kB
HardwareCorrupted: 0 kB
AnonHugePages:   1599488 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:5604 kB
DirectMap2M: 2078720 kB
DirectMap1G:48234496 kB

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

2012-07-20 Thread Benedikt Grundmann
First of all thanks to everyone who has replied so far.

On Fri, Jul 20, 2012 at 10:04 AM, Andres Freund and...@2ndquadrant.com wrote:

 Hi,

 On Friday, July 20, 2012 09:19:31 AM Benedikt Grundmann wrote:
  We yesterday encountered a program that in a degenerate case
  issued in a single transaction a huge number of selects (in a
  single transaction but each select in a separate call to PGExec)
  (huge = ~ 400,000).

  That transaction would continue to eat memory up until a point
  where calls to malloc (in aset.c) would fail and log for example:

  ,out of memory,Failed on request of size 11.
 Could you show us the different statements youre running in that transaction?

They all look like this:

DECLARE sqmlcursor51587 CURSOR FOR select
entry_time,source,bad_fields,isin,sedol,cusip,bloomberg,reuters,exchange_code,currency,description,bbg_instrument_type,instrument_type,specifics,definer,primary_exchange,is_primary_security,is_primary_listing,tags,bloomberg_id,status
from vw_instruments_v7 where jane_symbol = E'FOO BAR' and true and
effective_until = (select max(effective_until) from
vw_instruments_v7)

Sorry I imagine that the fact that this generates a cursor every time
is important
but it had honestly escaped my attention, because the library we use to query
the database uses CURSORs basically for every select, so that it can process
the data in batches (in this particular case that is conceptually unnecessary as
the query will only return one row, but the library does not know that).

 Are you using any user defined functions, deferred foreign keys, or anything
 like that?

No there are several versions of the above view and while the one
mentioned above contains calls to regexp_replace I can reproduce
the same behaviour with a different version of the view that just
renames columns of the underlying table.

 After youve got that out of memory message, the log should show
 a list of memory contexts with the amount of memory allocated in
 each. Could you include that in a mail?

We are using csv logging, which through me off for a moment because I couldn't
find it in there.  But indeed in the .log file I see memory contexts but
I don't know how to correlate them.

I assume they only get written when out of memory happens, so I have included
below the very first one.

TopMemoryContext: 268528136 total in 31 blocks; 37640 free (160
chunks); 268490496 used
  TopTransactionContext: 24576 total in 2 blocks; 14872 free (4
chunks); 9704 used
  Local Buffer Lookup Table: 2088960 total in 8 blocks; 234944 free
(25 chunks); 1854016 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
  PL/PgSQL function context: 24576 total in 2 blocks; 14384 free (14
chunks); 10192 used
  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0
chunks); 6512 used
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0
chunks); 20776 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  MessageContext: 131072 total in 5 blocks; 54792 free (5 chunks); 76280 used
  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 18880 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used
  ExecutorState: 65600 total in 4 blocks; 33792 free (18 chunks); 31808 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 11792 free (3 chunks); 12784 used
  CacheMemoryContext: 1342128 total in 21 blocks; 290016 free (11
chunks); 1052112 used
idx_raw_cfd_bear_commodity_position_records_position_date: 2048
total in 1 blocks; 752 free (0 chunks); 1296 used
CachedPlan: 3072 total in 2 blocks; 2008 free (2 chunks); 1064 used
CachedPlanSource: 3072 total in 2 blocks; 1656 free (0 chunks); 1416 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 1984 free (1 chunks); 1088 used
CachedPlanSource: 3072 total in 2 blocks; 1696 free (0 chunks); 1376 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
CachedPlanSource: 3072 total in 2 blocks; 1584 free (0 chunks); 1488 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlan: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
CachedPlanSource: 3072 total in 2 blocks; 1960 free (3 chunks); 1112 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

2012-07-20 Thread Benedikt Grundmann
On Fri, Jul 20, 2012 at 9:10 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 20.07.2012 10:19, Benedikt Grundmann wrote:

 We yesterday encountered a program that in a degenerate case issued in
 a single transaction a huge number of selects (in a single transaction
 but each select in a separate call to PGExec) (huge = ~ 400,000).

 That transaction would continue to eat memory up until a point where
 calls to malloc (in aset.c) would fail and log for example:

 ,out of memory,Failed on request of size 11.

 ...


 - Is that expected expected behaviour?  The transaction was
   in READ_COMMITED mode, and my best guess is that this implies
   that some snapshot is taken before each subselect and all
   of them are only freed once the transaction is finished


 In more complicated scenarios, with e.g subtransactions, it's normal that
 there's some growth in memory usage like that. But with simple SELECTs, I
 don't think there should be.

 Can you write a simple self-contained test script that reproduces this? That
 would make it easier to see where the memory is going.

Assuming that it isn't obvious now that I realized that we generate a cursor
every time, I will give it a shot otherwise.

 PS, you should upgrade, the latest minor version in 8.4.x series is 8.4.12.
 If possible, upgrading to a more recent major version would be a good idea
 too. I don't know if it will help with this problem, but it might..

We are in fact automatically doing an upgrade in testing to 9.1 every day
at the moment.  We plan to pull the trigger in production in a few weeks.

Thanks,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

2012-07-20 Thread Benedikt Grundmann
On Fri, Jul 20, 2012 at 10:46 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:

 DECLARE sqmlcursor51587 CURSOR FOR select
 entry_time,source,bad_fields,isin,sedol,cusip,bloomberg,reuters,exchange_code,currency,description,bbg_instrument_type,instrument_type,specifics,definer,primary_exchange,is_primary_security,is_primary_listing,tags,bloomberg_id,status
 from vw_instruments_v7 where jane_symbol = E'FOO BAR' and true and
 effective_until = (select max(effective_until) from
 vw_instruments_v7)

 Sorry I imagine that the fact that this generates a cursor every time
 is important
 but it had honestly escaped my attention, because the library we use to query
 the database uses CURSORs basically for every select, so that it can process
 the data in batches (in this particular case that is conceptually unnecessary 
 as
 the query will only return one row, but the library does not know that).

Actually I believe this must be it.  I just went back and checked the library
and it does not CLOSE the cursors.  This is normally not a problem as most
transactions we have run one or two queries only...  I'll patch the library
to CLOSE the cursor when all the data has been delivered and test if the
error does not happen then.

I also noticed just know that all TopMemoryContext's after the first one
look significantly different.  They contain large PortalMemory sections.
Are those related to cursors?

 TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
  Portal hash: 8380416 total in 10 blocks; 3345088 free (34 chunks);
5035328 used
  PortalMemory: 16769024 total in 11 blocks; 2737280 free (15 chunks);
14031744 used
PortalHeapMemory: 56320 total in 9 blocks; 4320 free (0 chunks); 52000 used
  ExecutorState: 57344 total in 3 blocks; 15248 free (3 chunks); 42096 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
PortalHeapMemory: 56320 total in 9 blocks; 4320 free (0 chunks); 52000 used
  ExecutorState: 57344 total in 3 blocks; 15248 free (3 chunks); 42096 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
...


Thanks everyone,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

2012-07-20 Thread Benedikt Grundmann
On Fri, Jul 20, 2012 at 10:56 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:
 On Fri, Jul 20, 2012 at 10:46 AM, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:

 Actually I believe this must be it.  I just went back and checked the library
 and it does not CLOSE the cursors.  This is normally not a problem as most
 transactions we have run one or two queries only...  I'll patch the library
 to CLOSE the cursor when all the data has been delivered and test if the
 error does not happen then.

Just to confirm that indeed fixed it.  Sorry for the noise.

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-21 Thread Benedikt Grundmann
On Mon, May 21, 2012 at 5:03 AM, Joel Jacobson j...@trustly.com wrote:

 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02301.php
 The initial feedback was on the usage of OIDs as file names.
 This was indeed a bad idea and was changed, see
 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02318.php
 Gurjeet Singh pointed out the problem with functions sharing the same
 name but having different arguments.
 As of now, it's not certain they will always be dumped into the same files.
 This is a valid point, and needs to be solved in an elegant way.
 The arguments needs to be made part of the path somehow.

This is interesting at Jane Street we actually have a small tool
that parses the output of pg_dump.  (Well applies a set of regular
expressions plus a little bit guesswork).  We use this to do three things
all of which I would love to see supported by postgres tool chain proper:

1) split the output into one file per thing (basically as per this
   thread) each file named type_name_running-integer for use
   with a VCS.  So if we have an overloaded function foo we end up with
   several function_foo_1.sql function_foo_2.sql ...  The order of the
   enumeration is just the order the functions occurred in the pg_dump
   which seems to be stable and therefore good enough.

2) extract a patch.  You give the tool the name of one or more roots
   (e.g. a table or set of tables you want to modify).  It finds all
   things that depend on it (well sort of just turn the body of each
   definition into a list of words and a depends on b if the name of b
   occurrs in a).  Do a topological sort (if there are cycles because
   of the hack dependency check break them but continue and produce a
   warning). Output a file that first drops the definitions in inverse
   dependency order and then recreates them (in dependency order).
   The file starts with a begin but does NOT end with a commit so you
   are forced to enter it yourself.

   This tool is fantastic if you have a big set of plpgsql functions as
   it is otherwise hard to make sure that you have modified all places
   when refactoring, changing a column, etc...

3) Find all leaves.  E.g. do the topsort on the whole pg_dump and list
   the names of all things nothing depends on.  This is mostly useful if
   you want to make sure that you are not accumulating cruft that isn't
   used by anything.  Of course you separately need a list or knowledge
   about the entry points of your application(s).

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread Benedikt Grundmann
On Thu, Mar 29, 2012 at 1:01 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I gather from previous posts that the intent isn't to allow different
 packages from different authors to provide a common and compatible
 feature; but what happens in the current design if someone
 accidentally or maliciously produces an extension which provides the
 same feature name as another extension?

 Would we need some registry?

A good (documented) convention should make that unnecessary such
as:

packagename.feature

for example
provides hstore.populate_record

Or something along those lines.  Or maybe even prefix it with java
like inverse url of author of package.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion for SELECT

2012-02-10 Thread Benedikt Grundmann
On 10/02/12 08:50, Robert Haas wrote:
 On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Peter Eisentraut pete...@gmx.net writes:
  That seems pretty useful, and it's more or less a one-line change, as in
  the attached patch.
 
  That seems pretty nearly entirely bogus.  What is the argument for
  supposing that the word right after SELECT is a function name?  I would
  think it would be a column name (from who-knows-what table) much more
  often.
 
 It isn't necessarily, but it might be.  It'd certainly be nice to type:
 
 SELECT pg_siTAB
 
 and get:
 
 SELECT pg_size_pretty(
 

Well the important problem in completion is how the dictionary of
possible terms is determined and how much context info is used to
reduce / enhance that dictionary.

case 1:

  select xTAB

case 2:

  select xTAB from bar

Possible dictionaries in case 1:

  1.1 complete nothing
  1.2 complete assuming the union of all columns from all tables 
  in the search_path as the dictionary.
  1.3 complete assuming the union of all function names in the
  search_path as the dictionary
  1.4 complete assuming 1.2 and 1.3

Possible dictionaries in case 2:

  2.1 treat it like case 1
  2.2 complete assuming the union of all columns from bar 
  in the search_path as the dictionary
  2.3   2.2 + 1.3

Now these are heuristics and the decision becomes a question
of usefulness vs amount of time it costs to implement and 
possibly how expensive computing the dictionary is.

I personally would like 1.3 in case 1 and 2.3 in case 2,
because I expect 1.2 to be to show too many possible
completions to be useful.  But even 1.3 in both cases wouldn't
be that bad.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-02-08 Thread Benedikt Grundmann
On 07/02/12 19:58, Bruce Momjian wrote:
 On Tue, Feb 07, 2012 at 05:06:18PM -0500, Greg Smith wrote:
  On 02/07/2012 03:23 PM, Bruce Momjian wrote:
  Where did you see that there will be an improvement in the 9.2
  documentation?  I don't see an improvement.
  
  I commented that I'm hoping for an improvement in the documentation
  of how much timing overhead impacts attempts to measure this area
  better.  That's from the add timing of buffer I/O requests feature
  submission.  I'm not sure if Bene read too much into that or not; I
  didn't mean to imply that the docs around random_page_cost have
  gotten better.

I guess I did.  But I'm very glad that as a side effect Bruce and Greg 
have improved it ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-23 Thread Benedikt Grundmann
On 19/01/12 17:39, Greg Smith wrote:
 On 1/19/12 1:10 PM, Robert Haas wrote:
 I have to say that I find that intensely counterintuitive.  The
 current settings are not entirely easy to tune correctly, but at least
 they're easy to explain.
 
 If there's anyone out there who has run a larger PostgreSQL database
 and not at some point been extremely frustrated with how the current
 VACUUM settings are controlled, please speak up and say I'm wrong
 about this. I thought it was well understood the UI was near unusably
 bad, it just wasn't obvious what to do about it.
 
We are frustrated but mostly our frustration is not about the
somewhat inscrutable knobs but the inscrutable meters or lack
there of.  

Postgres (auto or manual for that matter) vacuuming and analyzing 
is essentially a performance tuning problem without a good way to 
measure the current performance, the fact that the knobs to turn 
are confusing as well is secondary.

What I think is missing is a clear way to know if you are vacuuming 
(and analyzing) enough, and how much you are paying for that.  

At the moment we are basically changing the knobs blindly based on
some back of the envelope calculations and hearsay.  Than sometimes
month later we find out that eps we haven't been analyzing enough
and that's why on that particular table the planner is now picking
a bad query.

What I want is that page 

http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html

to start with Here is how you know if you are vacuuming enough...

In an ideal world one would like some meter in a statistics table
or similar that returns a percentage 100% means just enough 50% 
means you have to double 150% means 50% too much (e.g. wasted)...
But I could do with a boolean as well.  A complicated extension
and the recommendation to install 3 different extensions would
be better than what is there right now but only very barely. Of
course a meter wouldn't tell you that if traffic doubled you would 
still keep up and for that you need a complicated calculation or
(you just keep looking at the meter and adjust).

But at the moment there is no such meter (at least I don't know
of it) and that is the actual problem.

My 2cents,

Bene


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-11 Thread Benedikt Grundmann
(replying just to you)
On 10/01/12 15:22, Greg Smith wrote:
 On 1/5/12 5:04 AM, Benedikt Grundmann wrote:
 That sort of thing is one reason why all attempts so far to set
 random_page_cost based on physical characteristics haven't gone
 anywhere useful.  The setting is sort of overloaded right now, it's a
 fuzzy mix of true random seek cost blended with some notion of cache
 percentage. Trying to bring some measurements to bear on it is a less
 effective approach than what people actually do here.  Monitor the
 profile of query execution, change the value, see what happens.  Use
 that as feedback for what direction to keep going; repeat until
 you're just spinning with no improvements.
 
Thank you very much for the reply it is very interesting.  I'm
excited to hear that documentation in that area will improve in
9.2.  It's interesting postgres has remarkable good documentation
but it is a sufficiently complex system that to actually sensible
tune the knobs provided you have to understand quite a lot about
what is going on.  A colleague of mine likes to say 
all abstractions leak, which seems very appropriate in this case.

 We are not sure if the database used to choose differently
 before the move to the new hardware and the hardware is
 performing worse for random seeks.  Or if the planner is
 now making different choices.
 
 I don't recommend ever deploying new hardware without first doing
 some low-level benchmarks to validate its performance.  Once stuff
 goes into production, you can't do that anymore.  See
 http://www.2ndquadrant.com/en/talks/ for my hardware benchmarking
 talks if you'd like some ideas on what to collect.
 
We had actually done lots of tests on the sequential read performance.
But you are right we could have done better (and I'll definitely read
through your talks).

Did you see my follow up?  Based on the feedback we did further tests
and It is now clear that neither the hardware nor the database version 
are at fault.  A different plan is chosen by both new and old database 
version if spun up on the database as it is right now.

Our best guess is that the clusters we run after we had moved to the
hardware (it having more diskspace and faster sequential I/O making
it possible) changed the planners perception of how the joins will
perform in relation to each other.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-11 Thread Benedikt Grundmann
On 11/01/12 08:26, Benedikt Grundmann wrote:
 (replying just to you)
Clearly I didn't.  Sigh. Getting myself a coffee now.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2012-01-10 Thread Benedikt Grundmann
On 10/01/12 09:07, Simon Riggs wrote:
  You can repeat that argument ad infinitum. Even if the CRC covers all the
  pages in the OS buffer cache, it still doesn't cover the pages in the
  shared_buffers, CPU caches, in-transit from one memory bank to another etc.
  You have to draw the line somewhere, and it seems reasonable to draw it
  where the data moves between long-term storage, ie. disk, and RAM.
 
 We protect each change with a CRC when we write WAL, so doing the same
 thing doesn't sound entirely unreasonable, especially if your database
 fits in RAM and we aren't likely to be doing I/O anytime soon. The
 long term storage argument may no longer apply in a world with very
 large memory.
 
I'm not so sure about that.  The experience we have is that storage
and memory doesn't grow as fast as demand.  Maybe we are in a minority 
but at Jane Street memory size  database size is sadly true for most 
of the important databases.

Concrete the two most important databases are 

715 GB

and

473 GB 

in size (the second used to be much closer to the first one in size but
we recently archived a lot of data).

In both databases there is a small set of tables that use the majority of
the disk space.  Those tables are also the most used tables.  Typically
the size of one of those tables is between 1-3x size of memory.  And the
cumulative size of all indices on the table is normally roughly the same
size as the table.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-09 Thread Benedikt Grundmann
On 07/01/12 23:01, Peter Eisentraut wrote:
 On tor, 2012-01-05 at 10:04 +, Benedikt Grundmann wrote:
  We have recently upgrade two of our biggest postgres databases 
  to new hardware and minor version number bump (8.4.5 - 8.4.9).
  
  We are experiencing a big performance regression in some queries.
  In those cases the planner seems to choose a nested loop index
  scan instead of hashing the index once and then joining.
 
 There was a planner regression introduced in version 8.4.8, which was
 thought to be fixed in 8.4.9.  Maybe you got caught by that.  See
 
 Message-Id: 760c0206-b5f4-4dc6-9296-b7a730b7f...@silentmedia.com
 
 for some information.  Check if your queries match that pattern.

Good idea.  But that is not it.  We checked by using 8.4.5 on
the new hardware (and the new database) which produced the same
(bad) plans as 8.4.10 (with both the old and the new postgres config).

We are again speculating that it might be:
  For some of those tables we have also have recently (as part
  of the move) clustered for the first time in ages and it was
  speculated that that might have changed statistics (such
  as correlation) and increased the attractiveness of the
  index scan to the planner.

Is that possible?  If so what is the best way to prove / disprove
this theory? And ideally if true what knobs are available to tune
this?

Thanks,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Benedikt Grundmann
Hello list,

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.

Emails in this mailing lists archive seem to indicate that 
1.0 vs 3.0 - 4.0 are appropriate values on modern hardware.

Which surprised me a bit as I had thought that on actual 
harddrives (ignoring SSDs) random_page_cost is higher.
I guess that the number tries to reflect caching of the
relevant pages in memory and modern hardware you have
more of that?

Anyhow it would be great to have a scientific way of
setting those numbers.

Background:

We have recently upgrade two of our biggest postgres databases 
to new hardware and minor version number bump (8.4.5 - 8.4.9).

We are experiencing a big performance regression in some queries.
In those cases the planner seems to choose a nested loop index
scan instead of hashing the index once and then joining.

The new hardware was optimized for seq scans and does those
very fast.  

We are not sure if the database used to choose differently
before the move to the new hardware and the hardware is 
performing worse for random seeks.  Or if the planner is
now making different choices.

As a counter measure we are experimenting with 
enable_nestloop = off
random_page_cost = 20 (instead of the previous 4).

It is worth noting that for many small tables the nestloop
is indeed marginally faster (in the doesn't really matter 
because both cases are fast enough case).  But the regression
for the big tables (big in the sense of index just fits into
memory but in practice might not because there other frequently
accessed big tables) is a show stopper.

For some of those tables we have also have recently (as part
of the move) clustered for the first time in ages and it was
speculated that that might have changed statistics (such
as correlation) and increased the attractiveness of the 
index scan to the planner.

Another thing that I have thought before might be provide
some enlightenment would be a 
explain log select ...

That would show all the sub plans considered and why they 
were discarded or something approximating this.

Thanks in advance for any reply and sorry that this email
turned out to be rather long stream of consciousness dump.

Bene

-- relevant parts of our postgresql.conf ---

shared_buffers = 12GB   # min 128kB
# (change requires restart)
temp_buffers = 512MB# min 800kB
#max_prepared_transactions = 0  # zero disables the feature
# (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 192MB# min 64kB
maintenance_work_mem = 1GB  # min 1MB
#max_stack_depth = 2MB  # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires restart)

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 100ms   # 0-100 milliseconds
vacuum_cost_page_hit = 1# 0-1 credits
vacuum_cost_page_miss = 10  # 0-1 credits
vacuum_cost_page_dirty = 20 # 0-1 credits
vacuum_cost_limit = 7500# 1-1 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers 
scanned/round

# - Asynchronous Behavior -

effective_io_concurrency = 40# 1-1000. 0 disables prefetching

# WRITE AHEAD LOG

fsync = on  # turns forced synchronization on or off
synchronous_commit = off# immediate fsync at commit
#wal_sync_method = fsync# the default is the first option 
# supported by the operating system:
#   open_datasync
#   fdatasync
#   fsync
#   fsync_writethrough
#   open_sync
full_page_writes = on   # recover from partial page writes
wal_buffers = 16MB  # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms   # 1-1 milliseconds

commit_delay = 1000 # range 0-10, in microseconds
commit_siblings = 5 # range 1-1000

# - Checkpoints -

checkpoint_segments = 128   # in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min  # range 30s-1h
checkpoint_completion_target = 0.9  # checkpoint target duration, 0.0 - 1.0
checkpoint_warning = 30s# 0 disables

# - Archiving -

archive_mode = off  # allows 

Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Benedikt Grundmann
On 05/01/12 10:04, Benedikt Grundmann wrote:
 
 As a counter measure we are experimenting with 
 enable_nestloop = off
 random_page_cost = 20 (instead of the previous 4).
 
For what it is worth we had to revert the enable_nestloop = off 
change.  It just moved the pain around by making other queries
perform much worse than before. 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Benedikt Grundmann
For what's worth here are the numbers on one of our biggest databases
(same system as I posted about separately wrt seq_scan_cost vs
random_page_cost).


0053 1001
00BA 1009
0055 1001
00B9 1020
0054 983
00BB 1010
0056 1001
00BC 1019
0069 0
00BD 1009
006A 224
00BE 1018
006B 1009
00BF 1008
006C 1008
00C0 1006
006D 1004
00C1 1014
006E 1016
00C2 1023
006F 1003
00C3 1012
0070 1011
00C4 1000
0071 1011
00C5 1002
0072 1005
00C6 982
0073 1009
00C7 996
0074 1013
00C8 973
0075 1002
00D1 987
0076 997
00D2 968
0077 1007
00D3 974
0078 1012
00D4 964
0079 994
00D5 981
007A 1013
00D6 964
007B 999
00D7 966
007C 1000
00D8 971
007D 1000
00D9 956
007E 1008
00DA 976
007F 1010
00DB 950
0080 1001
00DC 967
0081 1009
00DD 983
0082 1008
00DE 970
0083 988
00DF 965
0084 1007
00E0 984
0085 1012
00E1 1004
0086 1004
00E2 976
0087 996
00E3 941
0088 1008
00E4 960
0089 1003
00E5 948
008A 995
00E6 851
008B 1001
00E7 971
008C 1003
00E8 954
008D 982
00E9 938
008E 1000
00EA 931
008F 1008
00EB 956
0090 1009
00EC 960
0091 1013
00ED 962
0092 1006
00EE 933
0093 1012
00EF 956
0094 994
00F0 978
0095 1017
00F1 292
0096 1004
0097 1005
0098 1014
0099 1012
009A 994
0035 1003
009B 1007
0036 1004
009C 1010
0037 981
009D 1024
0038 1002
009E 1009
0039 998
009F 1011
003A 995
00A0 1015
003B 996
00A1 1018
003C 1013
00A5 1007
003D 1008
00A3 1016
003E 1007
00A4 1020
003F 989
00A7 375
0040 989
00A6 1010
0041 975
00A9 3
0042 994
00A8 0
0043 1010
00AA 1
0044 1007
00AB 1
0045 1008
00AC 0
0046 991
00AF 4
0047 1010
00AD 0
0048 997
00AE 0
0049 1002
00B0 5
004A 1004
00B1 0
004B 1012
00B2 0
004C 999
00B3 0
004D 1008
00B4 0
004E 1007
00B5 807
004F 1010
00B6 1007
0050 1004
00B7 1007
0051 1009
00B8 1006
0052 1005
0057 1008
00C9 994
0058 991
00CA 977
0059 1000
00CB 978
005A 998
00CD 944
005B 971
00CC 972
005C 1005
00CF 969
005D 1010
00CE 988
005E 1006
00D0 975
005F 1015
0060 989
0061 998
0062 1014
0063 1000
0064 991
0065 990
0066 1000
0067 947
0068 377
00A2 1011


On 23/12/11 14:23, Kevin Grittner wrote:
 Jeff Janes jeff.ja...@gmail.com wrote:
  
  Could we get some major OLTP users to post their CLOG for
  analysis?  I wouldn't think there would be much
  security/propietary issues with CLOG data.
  
 FWIW, I got the raw numbers to do my quick check using this Ruby
 script (put together for me by Peter Brant).  If it is of any use to
 anyone else, feel free to use it and/or post any enhanced versions
 of it.
  
 #!/usr/bin/env ruby
 
 Dir.glob(*) do |file_name|
   contents = File.read(file_name)
   total = 
 contents.enum_for(:each_byte).enum_for(:each_slice,
 256).inject(0) do |count, chunk|
   if chunk.all? { |b| b == 0x55 }
 count + 1
   else
 count
   end
 end
   printf %s %d\n, file_name, total
 end
  
 -Kevin
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-22 Thread Benedikt Grundmann
Let me mention another lightweight data-interchange format.

At http://www.janestreet.com we have developed a small c module to deal
with S-expressions (sexp) as a way to store arbitrary data.  As we write
most of our code in OCaml sexps are a natural way for us to store data.
http://hg.ocaml.info/release/sexplib/ provides automatic ways to convert
any ocaml value into a sexp).

The extension is still pretty new but we use it successfully on a daily
basis.  After we have upgraded to 9.x we will pack it as an extension 
and releast it opensource.

API wise the module at the moment offers the following:

sexp_validate(text) returns boolean
Validate that the passed in text is a valid s expression.

create domain sexp as text check (sexp_validate(value));

BTW: It is a PITA that arrays of domains are not valid types.

And several functions to manipulate take apart sexp's or modify sexp's
using a path into the sexp (similar to what xpath does for xml).

Such as:

sexp_get(sexp, text) returns sexp
Get the sub sexp of sexp identified by the path.  
Returns NULL if path is not a valid path in sexp.
Example:
   path=.a space.b.[1].x
   ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) )))
- The Value

And sexp_get_atom(sexp, text) returns text
Get the sub atom of sexp identified by the path.  
Returns NULL if path is not a valid path in sexp or
does not identify an atom.
Example:
   path=.a space.b.[1].x
   ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) )))
^
- The Value

Cheers,

Bene

On 20/12/11 19:39, Claes Jakobsson wrote:
 On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote:
  On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:
  
  My understanding is that JSON is a subset of ECMAscript
  
  Well, no, JSON is formally “a lightweight data-interchange format.” It’s 
  derived from JavaScript syntax, but it is not a programming language, so I 
  wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.
  
   http://json.org/
 
 Are people explicitly asking for a) *JSON* datatype or b) a type that lets 
 you store arbitrary complex semi-untyped data structures?
 
 if b) then this might get a lot more interesting
 
 Cheers,
 Claes
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Real-life range datasets

2011-12-22 Thread Benedikt Grundmann
Hello,

We have a table in a postgres 8.4 database that would make use of date
ranges and exclusion constraints if they were available.  Sadly I cannot
give you the data as it is based on data we are paying for and as part
of the relevant licenses we are obliqued to not give the data to third
parties.

Basically the tables keep meta data about financial instruments on
a given day.  Thanks to corporate actions (companies merging, splitting
up, etc...) that meta data can be different from one day to the next.

One way to model such a table is:

identifier, date, payload columns...


unique index on (date, identifier)

(and in fact some of the payload columns are unique per day indices
as well).

But because there are a large number of rows per day and most don't
change this is a very wasteful representation.

Instead we use this

identifier, effective_from, effective_until, payload columns...

And we have some clever plpgsql functions that merge a days snapshot
into that representation.  That happens only a few times per day and
is currently quite slow mostly because a lot of time is spend in 
validation triggers to check that there are no overlapping entries
for effective_from,effective_until for jane_symbol and a few other
identifiers.

The most common operations are: 

  Get all or most rows of a given day 

(select ... from instruments where :date between effective_from and 
effective_until)

left join of the instruments (again in the normal case constrained to
one day but in same cases periods of a week or a few month)

select ... from t left join instruments on 
  t.jane_symbol = instruments.jane_symbol
  t.date between instruments.effective_from and t.effective_until
  where t.date = X
and additional constraint on the number of rows from t

With t a huge table clustered on date with roughly 500,000 to 2,000,000 
entries per day.  The left join would work most of the time (my guess is
more than 90%).  But there are entries in t where the jane_symbol would
not be in instruments (sadly).

Current size (immediately after a cluster):

  table toast(all indices)   total 
| 1268 MB | 900 MB | 693 MB| 2861 MB

= select min(effective_from), max(effective_from) from instruments;
min |max 
+
 2011-05-30 | 2011-12-21
(1 row)

b= select count(*) from instruments where current_date - 1 between 
effective_from and effective_until ;
 count  

 358741
(1 row)

I should be able to give you a table with the same characteristics as
the instruments table but bogus data by replacing all entries in the
table with random strings of the same length or something like that.
I can probably take a little bit of time during this or the next week
to generate such fake real world data ;-)   Is there an ftp site to
upload the gzipped pg_dump file to?

Cheers,

Bene

On 20/12/11 16:48, Alexander Korotkov wrote:
 Hackers,
 
 For better GiST indexing of range types it's important to have real-life
 datasets for testing on. Real-life range datasets would help to proof (or
 reject) some concepts and get more realistic benchmarks. Also, it would be
 nice to know what queries you expect to run fast on that datasets. Ideally
 it should be real-life set of queries, but it also could be your
 presentation of what are typical queries  for such datasets.
 Thanks!
 
 -
 With best regards,
 Alexander Korotkov.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typed hstore proposal

2011-12-22 Thread Benedikt Grundmann
On 22/12/11 10:44, Tom Lane wrote:
 Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes:
  I mean to create a typed hstore, called tstore for now.
 
 Um ... what is the point of this, exactly?  From what I've seen, most
 applications for hstore are pretty happy with the fact that hstore is
 only weakly typed, and if an entry *is* an integer, or a float, or
 whatever else, it's not hard to cast to and from text as needed.

More over it is also easy with the current hstore to add constraints like this:

  contracts_is_an_integer CHECK ((tags - 'contracts'::text) ~ 
'^[0-9]+$'::text)

to ensure that it actually is.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers