from:"Benedikt Grundmann"

Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...

gdb revealed that we were sitting in pg_regexec  (we forced it to return 16
aka invalid regex to return our system into a good state).

Here is the regular expression and the text to run on:

*WARNING DO NOT DO THIS ON A PRODUCTION BOX*

select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

Cheers,

Bene

Re: [HACKERS] Death by regexp_replace

On Fri, Jan 15, 2016 at 4:39 PM, Benedikt Grundmann <
bgrundm...@janestreet.com> wrote:

>
> On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> Kevin Grittner <kgri...@gmail.com> writes:
>> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
>> >> practically no constraint there, leading to having to consider O(N^2)
>> >> or more possibilities.)
>>
>> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
>> > but it seems to be in an endless loop until you do that.
>>
>> A bit of further experimentation suggests the runtime growth is actually
>> more like O(2^N).  It will terminate in a reasonable amount of time if the
>> input string is about half as long as the given example.
>>
>> The problem is that so far as the DFA engine is concerned, the pattern
>> substring '(,*\1)+' can match almost anything at all, because it's
>> equivalent to '(,*[^,]+)+' which is easily seen to match any string
>> whatever that's got at least one non-comma.  So, for each possible match
>> to the substring '([^,]+)', of which there are lots, it has to consider
>> every possible way of breaking up all the rest of the string into one or
>> more substrings.  The vast majority of those ways will fail when the
>> backref match is checked, but there's no way to realize it before that.
>>
>> To be clear I'm perfectly happy with that query taking forever (I didn't
> write it ;-)).  The only thing I was unhappy about was that
> pg_cancel/terminate_backend didn't work.  If that is fixed great.
>
>
>> regards, tom lane
>>
>
>
Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm


Is that expected?

Thanks,

Bene

Re: [HACKERS] Death by regexp_replace

9.2.6

On Fri, Jan 15, 2016 at 3:48 PM, Kevin Grittner  wrote:

> On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane  wrote:
>
> >> *WARNING DO NOT DO THIS ON A PRODUCTION BOX*
> >> select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
> >> LN', '([^,]+)(,*\1)+', '\1');
>
> > This responds to cancel just fine for me.
>
> > (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
> > practically no constraint there, leading to having to consider O(N^2)
> > or more possibilities.)
>
> On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> but it seems to be in an endless loop until you do that.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Death by regexp_replace

On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane  wrote:

> Kevin Grittner  writes:
> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane  wrote:
> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
> >> practically no constraint there, leading to having to consider O(N^2)
> >> or more possibilities.)
>
> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> > but it seems to be in an endless loop until you do that.
>
> A bit of further experimentation suggests the runtime growth is actually
> more like O(2^N).  It will terminate in a reasonable amount of time if the
> input string is about half as long as the given example.
>
> The problem is that so far as the DFA engine is concerned, the pattern
> substring '(,*\1)+' can match almost anything at all, because it's
> equivalent to '(,*[^,]+)+' which is easily seen to match any string
> whatever that's got at least one non-comma.  So, for each possible match
> to the substring '([^,]+)', of which there are lots, it has to consider
> every possible way of breaking up all the rest of the string into one or
> more substrings.  The vast majority of those ways will fail when the
> backref match is checked, but there's no way to realize it before that.
>
> To be clear I'm perfectly happy with that query taking forever (I didn't
write it ;-)).  The only thing I was unhappy about was that
pg_cancel/terminate_backend didn't work.  If that is fixed great.


> regards, tom lane
>

Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread Benedikt Grundmann

On Wed, Dec 30, 2015 at 7:16 AM, David Rowley 
wrote:

>
> A number of ideas were suggested on the other thread about how we might go
> about solving this problem. In [3] Simon talked about perhaps enabling
> extra optimisations when the planner sees that the plan will cost more than
> some given threshold. That's perhaps an option, but may not work well for
> optimisations which must take place very early in planning, for example [4].
>

A small tweak on 3 to deal with 4.  If the returned plan cost is quite high
(say you estimate minutes+) you could just restart planning from scratch
with all costly planning enabled, because even in the worst case (that is
the additional options don't find a better plan), the total planning cost
won't matter much in the grand scheme of things.

Re: [HACKERS] Proposal: Incremental Backup

2014-08-08 Thread Benedikt Grundmann

On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini 
gabriele.bartol...@2ndquadrant.it wrote:

 Hi Marco,

  With the current full backup procedure they are backed up, so I think
  that having them backed up with a rsync-like algorithm is what an user
  would expect for an incremental backup.

 Exactly. I think a simple, flexible and robust method for file based
 incremental backup is all we need. I am confident it could be done for
 9.5.

 I would like to quote every single word Simon said. Block level
 incremental backup (with Robert's proposal) is definitely the ultimate
 goal for effective and efficient physical backups. I see file level
 incremental backup as a very good compromise, a sort of intermediate
 release which could nonetheless produce a lot of benefits to our user
 base, for years to come too.

 Thanks,
 Gabriele


I haven't been following this discussion closely at all. But at Janestreet
we have been using pg_start_backup together with rsync --link-dest (onto a
big NFS) to achieve incremental stored backup.  In our experience this
works very well, it is however advisable to look into whatever is used to
serve the NFS as we had to set some options to increase the maximum number
of hardlinks.

Cheers,

Bene



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Benedikt Grundmann

That is it possible to tell the planner that index is off limits i.e. don't
ever generate a plan using it?

Rationale:  Schema changes on big tables.  I might have convinced myself /
strong beliefs that for all queries that I need to be fast the planner does
not need to use a given index (e.g. other possible plans are fast enough).
However if I just drop the index and it turns out I'm wrong I might be in a
world of pain because it might just take way to long to recreate the index.

I know that I can use pg_stat* to figure out if an index is used at all.
But in the presense of multiple indices and complex queries the planner
might prefer the index-to-be-dropped but the difference to the alternatives
available is immaterial.

The current best alternative we have is to test such changes on a testing
database that gets regularly restored from production.  However at least in
our case we simply don't know all possible queries (and logging all of them
is not an option).

Cheers,

Bene

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

I posted this on this mailing list before at Jane Street we have developed
very fast code to get timing information based on TSC if available.  It's
all ocaml but well documented and mostly just calls to c functions so
should be easy to port to C and we release it under a very liberal license
so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene


On Wed, May 14, 2014 at 12:41 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, May 13, 2014 at 11:34 PM, Greg Stark st...@mit.edu wrote:
  I always assumed the kernel used rdtsc to implement some of the high
  performance timers. It can save the current time in a mapped page when
  it schedules a process and then in the vdso syscall (ie in user-space)
  it can use rdtsc to calculate the offset needed to adjust that
  timestamp to the current time. This seems consistent with your
  calculations that showed the 40ns overhead with +/- 10ns precision.

 Crazy idea: Instead of trying to time precisely the amount of time we
 spend in each node, configure a very-high frequency timer interrupt
 (or background thread?) that does:

 SomeGlobalVariablePointingToTheCurrentNode-profiling_counter++;

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann 
bgrundm...@janestreet.com wrote:

 I posted this on this mailing list before at Jane Street we have developed
 very fast code to get timing information based on TSC if available.  It's
 all ocaml but well documented and mostly just calls to c functions so
 should be easy to port to C and we release it under a very liberal license
 so it should be no problem to take the ideas:

 https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

 Hope this is useful.

 Bene


Also I'm sorry for top posting.  hackers is the only mailing list I'm on
that requires this and some others require top posting so this runs counter
my habits and I only realized after sending...




 On Wed, May 14, 2014 at 12:41 PM, Robert Haas robertmh...@gmail.comwrote:

 On Tue, May 13, 2014 at 11:34 PM, Greg Stark st...@mit.edu wrote:
  I always assumed the kernel used rdtsc to implement some of the high
  performance timers. It can save the current time in a mapped page when
  it schedules a process and then in the vdso syscall (ie in user-space)
  it can use rdtsc to calculate the offset needed to adjust that
  timestamp to the current time. This seems consistent with your
  calculations that showed the 40ns overhead with +/- 10ns precision.

 Crazy idea: Instead of trying to time precisely the amount of time we
 spend in each node, configure a very-high frequency timer interrupt
 (or background thread?) that does:

 SomeGlobalVariablePointingToTheCurrentNode-profiling_counter++;

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

On Thu, May 15, 2014 at 11:31 AM, Greg Stark st...@mit.edu wrote:

 On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:
  I posted this on this mailing list before at Jane Street we have
 developed
  very fast code to get timing information based on TSC if available.  It's
  all ocaml but well documented and mostly just calls to c functions so
 should
  be easy to port to C and we release it under a very liberal license so it
  should be no problem to take the ideas:

 What OS do you run it on though? How fast is your implementation
 compared to the kernel implementation of clock_gettime()?

 Are you sure your implementation is actually faster? And are you sure
 you're protected against clocks going backwards? I think you should
 put some i/o in the loop in the test and start several threads running
 it to make it more likely the thread is rescheduled to a different
 processor during the test. It suspect you'll find the rdtsc goes
 backwards sometimes or produces crazy results when switching
 processors.


There are benchmarks in the link I posted (obtained by a micro benchmarking
library we developed / use internally which takes great care to obtain
reliable numbers) .  We use posix threads extensively. We internally spend
a lot of time setting up ntp and monitoring systems so that clock backwards
never happens (so with other words I wouldn't be surprised if the library
does NOT work correctly when it does  -- our protection is outside).  I do
not believe we have seen the tdtsc going backwards on thread context switch
you mention (and as said we use lots of threads).  OS?  Centos 6.5
primarily.







--
 greg

[HACKERS] How to do fast performance timing

2013-12-09 Thread Benedikt Grundmann

At Jane Street we have recently spend a lot of time trying to get a fast
gettimeofday.  I saw lots of references in various postgres hacker threads
related to a lack of such a facility so 

The culmination of those efforts can be read here:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli
and

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.ml

it's all OCaml but the code is mostly imperative and very well documented.
In particular we made an effort to document our assumption.  There are a
few which are ocaml specific.  But a lot of the lessons we have learned
here should be applicable to postgres.

Hope this will be useful,

Cheers,

Bene

PS: We are releasing our code under the Apache license so you should feel
free to reuse the ideas.

Re: [HACKERS] record identical operator

2013-09-13 Thread Benedikt Grundmann

On Thu, Sep 12, 2013 at 11:27 PM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.
 [...]
 The patch adds an identical operator (===) for the record type:


[...]

 The new operator is logically similar to IS NOT DISTINCT FROM for a
 record, although its implementation is very different.  For one
 thing, it doesn't replace the operation with column level operators
 in the parser.  For another thing, it doesn't look up operators for
 each type, so the identical operator does not need to be
 implemented for each type to use it as shown above.  It compares
 values byte-for-byte, after detoasting.  The test for identical
 records can avoid the detoasting altogether for any values with
 different lengths, and it stops when it finds the first column with
 a difference.

 I toyed with the idea of supporting hashing of records using this
 operator, but could not see how that would be a performance win.

 The identical (===) and not identical (!==) operator names were
 chosen because of a vague similarity to the exactly equals
 concepts in JavaScript and PHP, which use that name.  The semantics
 aren't quite the same, but it seemed close enough not to be too
 surprising.  The additional operator names seemed natural to me
 based on the first two, but I'm not really that attached to these
 names for the operators if someone has a better idea.

 Since the comparison of record values is not documented (only
 comparisons involving row value constructors), it doesn't seem like
 we should document this special case.  It is intended primarily for
 support of matview refresh and maintenance, and it seems likely
 that record comparison was not documented on the basis that it is
 intended primarily for support of such things as indexing and merge
 joins -- so leaving the new operators undocumented seems consistent
 with existing policy.  I'm open to arguments that the policy should
 change.

 -


Wouldn't it be slightly less surprising / magical to not declare new
operators
but just new primitive functions?  In the least invasive version they could
even
be called matview_is_record_identical or similar.

cheers,

Bene

Re: [HACKERS] Backup throttling

2013-08-27 Thread Benedikt Grundmann

On Tue, Aug 27, 2013 at 12:58 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 20, 2013 at 2:37 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  Throttling in the client seems much better to me. TCP is designed to
 handle
  a slow client.

 Other people have already offered some good points in this area, but
 let me just add one thought that I don't think has been mentioned yet.
  We have a *general* need to be able to throttle server-side resource
 utilization, particularly I/O.  This is a problem not only for
 pg_basebackup, but for COPY, CLUSTER, VACUUM, and even things like
 UPDATE.  Of all of those, the only one for which we currently have any
 kind of a solution is VACUUM.  Now, maybe pg_basebackup also needs its
 own special-purpose solution, but I think we'd do well to consider a
 general I/O rate-limiting strategy and then consider particular needs
 in the light of that framework.  In that context, server-side seems
 better to me, because something like CLUSTER isn't going to produce
 anything that the client can effectively limit.


+1 it is very easy at the moment to for example run a manual vacuum
full/cluster against a big table and generate WAL so quickly that the hot
standby disconnects because it gets too far behind.

Re: [HACKERS] hardware donation

2013-07-19 Thread Benedikt Grundmann

The server is already turned off and in our nyc office (I'm based in the
ldn one).  But I'm pretty sure its a LSI MegaRAID SAS 9285.


On Thu, Jul 18, 2013 at 11:58 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 7/10/13 12:53 PM, Benedikt Grundmann wrote:

 The server will probably be most interesting for the disks in it.  That
 is where we spend the largest amount of time optimizing (for sequential
 scan speed in particular):
 22x600GB disks in a Raid6+0 (Raid0 of 2x 10disk raid 6 arrays) + 2 spare
 disks. Overall size 8.7 TB in that configuration.


 What is the RAID controller used in the server?  That doesn't impact the
 donation, I'm just trying to fit this one into my goals for finding useful
 community performance testing equipment.

 There are a good number of systems floating around the community with HP
 controllers--I have even one myself now--but we could use more LSI Logic
 and Adaptec based systems.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

[HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread Benedikt Grundmann

A third party application we use generates SQL queries.  Here is query it
generated that broke today and for which I have a hard time arguing that
the postgres behavior is correct (minimally the error message is confusing):

=# create temporary table foo (b double precision );
CREATE TABLE
Time: 40.368 ms
=# insert into foo select min(NULL);
ERROR:  column b is of type double precision but expression is of type
text
LINE 1: insert into foo select min(NULL);
   ^
HINT:  You will need to rewrite or cast the expression.

So why does min(NULL) have type text?  According to the docs it has the
type of the input.  The value is itself NULL which is a valid member of all
types in SQL isn't it?

So what is going on?

Thanks,

Bene

Re: [HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread Benedikt Grundmann

Thanks David,

I like the fact that postgres is explicit in it's types. All I'm arguing
is that error message is misleading. And that I had a hard time
understanding why happened what happened. The part I was missing is that
despite supporting an any type the necessary type inference is very very
local and quickly resorts to the default type.

thanks everyone,

Bene

On Fri, Jul 12, 2013 at 3:17 PM, David Johnston pol...@yahoo.com wrote:

Benedikt Grundmann wrote
A third party application we use generates SQL queries. Here is query it
generated that broke today and for which I have a hard time arguing that
the postgres behavior is correct (minimally the error message is
confusing):

=# create temporary table foo (b double precision );
CREATE TABLE
Time: 40.368 ms
=# insert into foo select min(NULL);
ERROR: column b is of type double precision but expression is of type
text
LINE 1: insert into foo select min(NULL);
^
HINT: You will need to rewrite or cast the expression.

So why does min(NULL) have type text? According to the docs it has the
type of the input. The value is itself NULL which is a valid member of
all
types in SQL isn't it?

So what is going on?

Thanks,

Bene

Ideally PostgreSQL would be smart enough to recognize that min(NULL) is
of
an unknown type and thus would use the definition of foo to coerce NULL
to
the desired type. I cannot explain why it does not do this but from the
example it cannot.

Using a literal NULL without an explicit type-cast is not recommended as
the
system cannot always accurately figure out what type you mean for it to
use.
Being a valid value for all types does not mean it magically switches to
fit
whatever usage is required. Columns are typed, not values per-se, and so
NULL can belong in any column but once it is part of that column it takes
on
that column's type.

The query you show is pretty pointless since the intent of min is to take
a column over which to aggregate; not a literal which will only ever return
itself.

In short the SELECT query is trying its best to execute and so in the
presence of an unadorned NULL - and being unable to infer the type from
context - it simply uses the default type which is text. The SELECT
executes just fine, and outputs a min column of type text which when
supplied to the table foo causes the type mis-match for column b on
foo.

The PostgreSQL behavior is simple because it does not infer the type of
NULL from the column in foo but it is not required to do so its failure is
not wrong. The error message, given what does occur, makes perfect sense
and is easy enough to trace (i.e., what column is feeding foo.b from the
SELECT statement; then, why is that column being seen as text).

PostgreSQL is in the opinion of some too verbose in its requirement to be
explicit regarding types but it does make for less buggy code overall.
This
particular use-case may be solvable but I'd argue that your example is not
likely to convince anyone that it is a serious enough problem worth the
effort it would take to do so.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/column-b-is-of-type-X-but-expression-is-of-type-text-tp5763586p5763587.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] hardware donation

2013-07-10 Thread Benedikt Grundmann

Jane Street has a spare server we would like to donate to the postgres
community.  We originally planed to use it for one of our database clusters
and it matches exactly what we use in production at the moment.

Rough specs:
CPU: 8x Intel(R) Xeon(R) CPU   X5570  @ 2.93GHz
MEM: 48GB

The server will probably be most interesting for the disks in it.  That is
where we spend the largest amount of time optimizing (for sequential scan
speed in particular):

22x600GB disks in a Raid6+0 (Raid0 of 2x 10disk raid 6 arrays) + 2 spare
disks. Overall size 8.7 TB in that configuration.

Is this something the community would find useful?  If so, who is the right
person to talk to about shipping/delivery?

Cheers,

Bene

Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Benedikt Grundmann

On Fri, Jun 14, 2013 at 10:11 AM, Samrat Revagade revagade.sam...@gmail.com
wrote:

Hello,

We have already started a discussion on pgsql-hackers for the problem of
taking fresh backup during the failback operation here is the link for that:

http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com

Let me again summarize the problem we are trying to address.

When the master fails, last few WAL files may not reach the standby. But
the master may have gone ahead and made changes to its local file system
after flushing WAL to the local storage. So master contains some file
system level changes that standby does not have. At this point, the data
directory of master is ahead of standby's data directory.

Subsequently, the standby will be promoted as new master. Later when the
old master wants to be a standby of the new master, it can't just join the
setup since there is inconsistency in between these two servers. We need to
take the fresh backup from the new master. This can happen in both the
synchronous as well as asynchronous replication.

Fresh backup is also needed in case of clean switch-over because in the
current HEAD, the master does not wait for the standby to receive all the
WAL up to the shutdown checkpoint record before shutting down the
connection. Fujii Masao has already submitted a patch to handle clean
switch-over case, but the problem is still remaining for failback case.

The process of taking fresh backup is very time consuming when databases
are of very big sizes, say several TB's, and when the servers are connected
over a relatively slower link. This would break the service level
agreement of disaster recovery system. So there is need to improve the
process of disaster recovery in PostgreSQL. One way to achieve this is to
maintain consistency between master and standby which helps to avoid need
of fresh backup.

So our proposal on this problem is that we must ensure that master should
not make any file system level changes without confirming that the
corresponding WAL record is replicated to the standby.

A alternative proposal (which will probably just reveal my lack of
understanding about what is or isn't possible with WAL). Provide a way to
restart the master so that it rolls back the WAL changes that the slave
hasn't seen.

There are many suggestions and objections pgsql-hackers about this problem
The brief summary is as follows:

Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Benedikt Grundmann

On Fri, Jun 14, 2013 at 2:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Marko Kreen mark...@gmail.com writes:
  On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed dean.a.rash...@gmail.com
 wrote:
  Attached is a patch implementing a new aggregate function md5_agg() to
  compute the aggregate MD5 sum across a number of rows.

  It's more efficient to calculate per-row md5, and then sum() them.
  This avoids the need for ORDER BY.

 Good point.  The aggregate md5 function also fails to distinguish the
 case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
 from the case where they contain 'xyz' followed by 'zyxyz'.

 Now, as against that, you lose any sensitivity to the ordering of the
 values.

 Personally I'd be a bit inclined to xor the per-row md5's rather than
 sum them, but that's a small matter.

 regards, tom lane


xor works but only if each row is different (e.g. at the very least all
columns together make a unique key).





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-28 Thread Benedikt Grundmann

, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:
  We are seeing these errors on a regular basis on the testing box now.
  We
  have even changed the backup script to
  shutdown the hot standby, take lvm snapshot, restart the hot standby,
 rsync
  the lvm snapshot.  It still happens.
 
  We have never seen this before we introduced the hot standby.  So we
 will
  now revert to taking the backups from lvm snapshots on the production
  database.  If you have ideas of what else we should try / what
 information
  we can give you to debug this let us know and we will try to so.
 
  Until then we will sadly operate on the assumption that the combination
 of
  hot standby and frozen snapshot backup of it is not production ready.

 I'm pretty suspicious that your backup procedure is messed up in some
 way.  The fact that you got invalid page headers is really difficult
 to attribute to a PostgreSQL bug.  A number of the other messages that
 you have posted also tend to indicate either corruption, or that WAL
 replay has stopped early.  It would be interesting to see the logs
 from when the clone was first started up, juxtaposed against the later
 WAL flush error messages.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-21 Thread Benedikt Grundmann

We are seeing these errors on a regular basis on the testing box now.  We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot.  It still happens.

We have never seen this before we introduced the hot standby.  So we will
now revert to taking the backups from lvm snapshots on the production
database.  If you have ideas of what else we should try / what information
we can give you to debug this let us know and we will try to so.

Until then we will sadly operate on the assumption that the combination of
hot standby and frozen snapshot backup of it is not production ready.

Thanks,

Bene




On Thu, May 16, 2013 at 8:10 AM, David Powers dpow...@janestreet.comwrote:

 I'll try to get the primary upgraded over the weekend when we can afford a
 restart.

 In the meantime I have a single test showing that a shutdown, snapshot,
 restart produces a backup that passes the vacuum analyze test.  I'm going
 to run a full vacuum today.

 -David


 On Wed, May 15, 2013 at 3:53 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 15.05.2013 22:50, Benedikt Grundmann wrote:

 On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangashlinnakangas@**
 vmware.com hlinnakan...@vmware.com

 The subject says 9.2.3. Are you sure you're running 9.2.4 on all the

 servers? There was a fix to a bug related to starting a standby server
 from
 a filesystem snapshot. I don't think it was quite the case you have, but
 pretty close.


 So this is delightfully embarrassing I just went back to double check and

 - primary box is 9.2.3
 - standby is 9.2.4
 - testing is 9.2.4

 I guess that alone could possibly explain it?


 Hmm, no, it should still work. There haven't been any changes in the WAL
 format. I do recommend upgrading the primary, of course, but I don't really
 see how that would explain what you're seeing.

 - Heikki

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread Benedikt Grundmann

On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 15.05.2013 15:42, David Powers wrote:

 First, thanks for the replies.  This sort of thing is frustrating and hard
 to diagnose at a distance, and any help is appreciated.

 Here is some more background:

 We have 3 9.2.4 databases using the following setup:


 The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
 servers? There was a fix to a bug related to starting a standby server from
 a filesystem snapshot. I don't think it was quite the case you have, but
 pretty close.


So this is delightfully embarrassing I just went back to double check and

- primary box is 9.2.3
- standby is 9.2.4
- testing is 9.2.4

I guess that alone could possibly explain it?

Thanks,

Bene

[HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

Today we have seen this on our testing database instance:

ERROR:  could not open file base/16416/291498116.3 (target block 431006):
No such file or directory

That database get's created by rsyncing the LVM snapshot of the standby,
which is a readonly backup of proddb
using streaming replication.

We do not put the standby database into backup mode before rsyncing the LVM
snapshot, trusting postgres ability to recover
from crash.  Now we are not sure anymore...  Thoughts?  It's worth noting
that we have been using this method of taking a backup
for a long long time, but we only recently (after 9.2 came out we switched
to doing a streaming replica and then to take the
frozen snapshot backup method from the streaming replica).

Let me know if there is more information you need, or if this is a known
problem.

Thanks,

Bene

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

It's on the production database and the streaming replica.  But not on the
snapshot.

production
-rw--- 1 postgres postgres 312778752 May 13 21:28
/database/postgres/base/16416/291498116.3

streaming replica
-rw--- 1 postgres postgres 312778752 May 13 23:50
/database/postgres/base/16416/291498116.3

Is there a way to find out what the file contains?

We just got some more information.  All of the following was done / seen in
the logs of the snapshot database.

After we saw this we run a vacuum full on the table we suspect to be backed
by this file.  This happened:


WARNING:  concurrent insert in progress within table js_equity_daily_diff


Over and over again.  So we killed the VACUUM full by hitting ctrl-c again
and again.  After that the logs contained:

fgrep ERROR postgresql-2013-05-14.csv  | head
2013-05-14 00:23:16.028 EDT,postgres_prod,proddb_testing,15698,
172.24.65.103:54536,5191bc31.3d52,3,SELECT,2013-05-14 00:23:13
EDT,2/3330,0,ERROR,58P01,could not open file base/16416/291498116.3
(target block 431006): No such file or directory,SQL statement
select max(equity_date) from js_equity_daily_diff

The above was the original error that made us investigate.

2013-05-14 09:21:47.121 EDT,postgres_prod,proddb_testing,21002,
172.27.41.24:36815,51923a67.520a,38756,VACUUM,2013-05-14 09:21:43
EDT,2/21611,645995272,ERROR,57014,canceling statement due to user
request,,vacuum full js_equity_daily_diff,,,psql
2013-05-14 09:22:04.700 EDT,postgres_prod,proddb_testing,21008,
172.27.41.24:36831,51923a75.5210,115909,VACUUM,2013-05-14 09:21:57
EDT,2/21618,645995273,ERROR,57014,canceling statement due to user
request,,vacuum full js_equity_daily_diff,,,psql

Than us cancelling the vacuum full

2013-05-14 09:22:13.947 EDT,,,30911,,51919d78.78bf,1,,2013-05-13 22:12:08
EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not satisfied ---
flushed only to 1CEE/31266090,writing block 0 of relation
base/16416/291498116
2013-05-14 09:22:14.964 EDT,,,30911,,51919d78.78bf,2,,2013-05-13 22:12:08
EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not satisfied ---
flushed only to 1CEE/31266090,writing block 0 of relation
base/16416/291498116

And after that these started appearing in logs (and they get repeated every
second now:

[root@nyc-dbc-001 pg_log]# fgrep ERROR postgresql-2013-05-14.csv  | tail -n
2
2013-05-14 09:47:43.301 EDT,,,30911,,51919d78.78bf,3010,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116
2013-05-14 09:47:44.317 EDT,,,30911,,51919d78.78bf,3012,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116

There are no earlier ERROR's in the logs.

2013-05-14 09:38:03.115 EDT,,,30911,,51919d78.78bf,1868,,2013-05-13
22:12:08 EDT,,0,ERROR,XX000,xlog flush request 1D08/9B57FCD0 is not
satisfied --- flushed only to 1CEE/3C869588,writing block 0 of
relation base/16416/291498116
2013-05-14 09:38:03.115 EDT,,,30911,,51919d78.78bf,1869,,2013-05-13
22:12:08 EDT,,0,WARNING,58030,could not write block 0 of
base/16416/291498116,Multiple failures --- write error might be
permanent.

The disk is not full nor are there any messages in the kernel logs.

Cheers,

Bene



On Tue, May 14, 2013 at 9:27 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 14.05.2013 14:57, Benedikt Grundmann wrote:

 Today we have seen this on our testing database instance:

 ERROR:  could not open file base/16416/291498116.3 (target block
 431006):
 No such file or directory

 That database get's created by rsyncing the LVM snapshot of the standby,
 which is a readonly backup of proddb
 using streaming replication.

 We do not put the standby database into backup mode before rsyncing the
 LVM
 snapshot, trusting postgres ability to recover
 from crash.  Now we are not sure anymore...  Thoughts?  It's worth noting
 that we have been using this method of taking a backup
 for a long long time, but we only recently (after 9.2 came out we switched
 to doing a streaming replica and then to take the
 frozen snapshot backup method from the streaming replica).

 Let me know if there is more information you need, or if this is a known
 problem.


 That certainly should work. I'd suggest that you narrow down the problem a
 bit more. Is the file present in the original system? Anything else that
 might be related in the logs?

 - Heikki

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

That's one possible explanation.  It's worth noting that we haven't seen
this before moving to streaming rep first and we have been using that
method for a long time.


On Tue, May 14, 2013 at 11:34 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 14.05.2013 16:48, Benedikt Grundmann wrote:

 It's on the production database and the streaming replica.  But not on the
 snapshot.


 So, the LVM snapshot didn't work correctly?

 - Heikki

Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)

I think my previous message wasn't clear enough.  I do *NOT* think that LVM
snapshot is the culprit.

However I cannot discount it as one of the possibilities.  But I have no
evidence in either /var/log/messages or in dmesg that the LVM snapshot went
into a bad state AND we have been using this method for a long time.

The only thing that is *new* is that we took the snapshot from the
streaming replica.  So again my best guess as of now is that if the
database crashes while it is in streaming standby a invalid disk state can
result during during the following startup (in rare and as of now unclear
circumstances).

You seem to be quite convinced that it must be LVM can you elaborate why?

Thanks,

Bene




On Tue, May 14, 2013 at 12:09 PM, Benedikt Grundmann 
bgrundm...@janestreet.com wrote:

 That's one possible explanation.  It's worth noting that we haven't seen
 this before moving to streaming rep first and we have been using that
 method for a long time.


 On Tue, May 14, 2013 at 11:34 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 14.05.2013 16:48, Benedikt Grundmann wrote:

 It's on the production database and the streaming replica.  But not on
 the
 snapshot.


 So, the LVM snapshot didn't work correctly?

 - Heikki

Re: [HACKERS] json api WIP patch

2013-02-04 Thread Benedikt Grundmann

On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 02/04/2013 10:47 AM, Robert Haas wrote:


 The SQL standards considerations seem worth thinking about, too.
 We've certainly gone through a lot of pain working toward eliminating
 = as an operator name, and if the SQL standard has commandeered -
 for some purpose or other, I'd really rather not add to the headaches
 involved should we ever decide to reclaim it.



 OK, but I'd like to know what is going to be safe. There's no way to
 future-proof the language. I'm quite prepared to replace - with something
 else, and if I do then - will need to be adjusted accordingly, I think.

 My suggestion would be ~ and ~. I know David Wheeler didn't like that
 on the ground that some fonts elevate ~ rather than aligning it in the
 middle as most monospaced fonts do, but I'm tempted just to say then use a
 different font. Other possibilities that come to mind are + and +,
 although I think they're less attractive. But I'll be guided by the
 consensus, assuming there is one ;-)

 As a user I would be much in favor of just functions and no additional
operators if the sole difference is syntactical.  I think custom operators
are much harder to remember than function names (assuming reasonably well
chosen function names).

Now Robert seems to suggest that there will also be speed / planner
difference which seems sad (I would have expected operators to be just
syntactical sugar for specially named functions and once we are past the
parser there should be no difference).

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-01-09 Thread Benedikt Grundmann

On Wed, Jan 9, 2013 at 2:01 AM, Josh Berkus j...@agliodbs.com wrote:

 All,

  Well, the problem of find out the box's physical RAM is doubtless
  solvable if we're willing to put enough sweat and tears into it, but
  I'm dubious that it's worth the trouble.  The harder part is how to know
  if the box is supposed to be dedicated to the database.  Bear in mind
  that the starting point of this debate was the idea that we're talking
  about an inexperienced DBA who doesn't know about any configuration knob
  we might provide for the purpose.

 For what it is worth even if it is a dedicated database box 75% might be
way too high. I remember investigating bad performance on our biggest
database server, that in the end turned out to be a too high setting of
effective_cache_size. From reading the code back then my rationale for it
being to high was that the code that makes use of the effective_cache_size
tries very hard to account for what the current query would do to the cache
but doesn't take into account how many queries (on separate datasets!) are
currently begin executed (and competing for the same cache).  On that box
we often have 100+ active connections and many looking at different big
datasets.

Cheers,

bene

Re: [HACKERS] Improve compression speeds in pg_lzcompress.c

2013-01-08 Thread Benedikt Grundmann

 Personally, my biggest gripe about the way we do compression is that
 it's easy to detoast the same object lots of times.  More generally,
 our in-memory representation of user data values is pretty much a
 mirror of our on-disk representation, even when that leads to excess
 conversions.  Beyond what we do for TOAST, there's stuff like numeric
 where not only toast but then post-process the results into yet
 another internal form before performing any calculations - and then of
 course we have to convert back before returning from the calculation
 functions.  And for things like XML, JSON, and hstore we have to
 repeatedly parse the string, every time someone wants to do anything
 to do.  Of course, solving this is a very hard problem, and not
 solving it isn't a reason not to have more compression options - but
 more compression options will not solve the problems that I personally
 have in this area, by and large.

 At the risk of saying something totally obvious and stupid as I haven't
looked at the actual representation this sounds like a memoisation
problem.  In ocaml terms:

type 'a rep =
  | On_disk_rep of Byte_sequence
  | In_memory_rep of 'a

type 'a t = 'a rep ref

let get_mem_rep t converter =
  match !t with
  | On_disk_rep seq -
let res = converter seq in
t := In_memory_rep res;
res
  | In_memory_rep x - x
;;

... (if you need the other direction that it's straightforward too)...

Translating this into c is relatively straightforward if you have the
luxury of a fresh start
and don't have to be super efficient:

typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;

type t = {
  rep_kind_t rep_kind;
  union {
char *on_disk;
void *in_memory;
  } rep;
};

void *get_mem_rep(t *t, void * (*converter)(char *)) {
  void *res;
  switch (t-rep_kind) {
 case ON_DISK_REP:
res = converter(t-on_disk);
t-rep.in_memory = res;
t-rep_kind = IN_MEMORY_REP;
return res;
 case IN_MEMORY_REP;
return t-rep.in_memory;
  }
}

Now of course fitting this into the existing types and ensuring that there
is neither too early freeing of memory nor memory leaks or other bugs is
probably a nightmare and why you said that this is a hard problem.

Cheers,

Bene

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-08-29 Thread Benedikt Grundmann

On Tue, Aug 28, 2012 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Tue, Aug 28, 2012 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Oh, I'd forgotten that worked that way.  Frankly, that makes me quite a
  bit more concerned about this proposal than I was before.  I do *not*
  want to re-introduce silent cross-category casts to text, not even if
  there's no other way to match the function/operator.  I think that hack
  was/is tolerable for actual assignment to a table column, because there
  is very little chance that the semantics of such an assignment will come
  out differently than the user expected.

  Well, I think that when there is only one LPAD function, there is also
  very little chance that the results will come out differently than the
  user expected.

 [ shrug... ]  I'm having a hard time resisting the temptation to point
 out that there are two.  The real point here though is that the proposed
 behavior change will affect all functions, not only the cases where you
 think there is only one sane behavior.  And features such as search paths
 and default parameters frequently mean that there are more potential
 matches than the user thought of while writing the query.

 In the end, SQL is a fairly strongly typed language, especially in our
 manifestation of it.  I don't think we should give that up, especially
 not for benefits as dubious as not having to write a cast to make it
 clear that yes you really do want a timestamp to be treated as text.
 IMO, saving people from the errors that inevitably arise from that sort
 of sloppy thinking is a benefit, not a cost, of having a typed language.

 regards, tom lane

+a very big number

I remember the pain we had when we upgraded from 8.1 to 8.4, but I also
distinctly remember that after the upgrade I was a little bit more
confident that our SQL code does the right thing.  But we are a OCaml shop
if there is one thing we believe in with ferocity it is that a STRICT type
checker is a good thing (TM).  You pay a little verbosity tax but in return
all the stupid little obvious bugs get caught and maybe even more
importantly when you later change your types the system are forced to
reconsider all cases where you used the value of (now different) type (and
that is A VERY GOOD THING in a big code base). Admittedly we are not there
yet in Postgres as functions are only (re)checked upon execution.

My 2cents,

Bene

[HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

We yesterday encountered a program that in a degenerate case issued in
a single transaction a huge number of selects (in a single transaction
but each select in a separate call to PGExec) (huge = ~ 400,000).

That transaction would continue to eat memory up until a point where
calls to malloc (in aset.c) would fail and log for example:

,out of memory,Failed on request of size 11.

Both from that transaction and random others.  In additional observation
of interest to us was that while both VIRT and RES was growing VIRT was
always roughly double of RES.  Which seems to indicate that whatever
allocations were done not all of the memory allocated was actually
touched (server is a Centos6 box).

So I have two questions:

   - Is that expected expected behaviour?  The transaction was
 in READ_COMMITED mode, and my best guess is that this implies
 that some snapshot is taken before each subselect and all
 of them are only freed once the transaction is finished

   - Any recommendations on the use of overcommit?  We had it
 disabled on the assumption that with overcommit the OOM
 killer might kill a random process and that it is better
 to instead have a process that is actually allocating fail
 (and on the additional assumption that any memory allocated
 by postgres would actually be touched).

This is not a huge production issue for us as we can fix the
program to no longer issue huge numbers of selects.  But we
would like to understand.

Thank you very much,

Bene

PS:
The machine has huge amounts of memory and during normal operations
it looks like this:

-bash-4.1$ cat /proc/meminfo
MemTotal:   49413544 kB
MemFree: 1547604 kB
Buffers:5808 kB
Cached: 43777988 kB
SwapCached:0 kB
Active: 18794732 kB
Inactive:   27309980 kB
Active(anon):   13786796 kB
Inactive(anon):  1411296 kB
Active(file):5007936 kB
Inactive(file): 25898684 kB
Unevictable:   71928 kB
Mlocked:   55576 kB
SwapTotal:   2047992 kB
SwapFree:2047992 kB
Dirty: 12100 kB
Writeback: 79684 kB
AnonPages:   2393372 kB
Mapped: 12887392 kB
Shmem:  12871676 kB
Slab:1050468 kB
SReclaimable: 190068 kB
SUnreclaim:   860400 kB
KernelStack:4832 kB
PageTables:   450584 kB
NFS_Unstable:  23312 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:26754764 kB
Committed_AS:   17394312 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  120472 kB
VmallocChunk:   34333956900 kB
HardwareCorrupted: 0 kB
AnonHugePages:   1599488 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:5604 kB
DirectMap2M: 2078720 kB
DirectMap1G:48234496 kB

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

First of all thanks to everyone who has replied so far.

On Fri, Jul 20, 2012 at 10:04 AM, Andres Freund and...@2ndquadrant.com wrote:

 Hi,

 On Friday, July 20, 2012 09:19:31 AM Benedikt Grundmann wrote:
  We yesterday encountered a program that in a degenerate case
  issued in a single transaction a huge number of selects (in a
  single transaction but each select in a separate call to PGExec)
  (huge = ~ 400,000).

  That transaction would continue to eat memory up until a point
  where calls to malloc (in aset.c) would fail and log for example:

  ,out of memory,Failed on request of size 11.
 Could you show us the different statements youre running in that transaction?

They all look like this:

DECLARE sqmlcursor51587 CURSOR FOR select
entry_time,source,bad_fields,isin,sedol,cusip,bloomberg,reuters,exchange_code,currency,description,bbg_instrument_type,instrument_type,specifics,definer,primary_exchange,is_primary_security,is_primary_listing,tags,bloomberg_id,status
from vw_instruments_v7 where jane_symbol = E'FOO BAR' and true and
effective_until = (select max(effective_until) from
vw_instruments_v7)

Sorry I imagine that the fact that this generates a cursor every time
is important
but it had honestly escaped my attention, because the library we use to query
the database uses CURSORs basically for every select, so that it can process
the data in batches (in this particular case that is conceptually unnecessary as
the query will only return one row, but the library does not know that).

 Are you using any user defined functions, deferred foreign keys, or anything
 like that?

No there are several versions of the above view and while the one
mentioned above contains calls to regexp_replace I can reproduce
the same behaviour with a different version of the view that just
renames columns of the underlying table.

 After youve got that out of memory message, the log should show
 a list of memory contexts with the amount of memory allocated in
 each. Could you include that in a mail?

We are using csv logging, which through me off for a moment because I couldn't
find it in there.  But indeed in the .log file I see memory contexts but
I don't know how to correlate them.

I assume they only get written when out of memory happens, so I have included
below the very first one.

TopMemoryContext: 268528136 total in 31 blocks; 37640 free (160
chunks); 268490496 used
  TopTransactionContext: 24576 total in 2 blocks; 14872 free (4
chunks); 9704 used
  Local Buffer Lookup Table: 2088960 total in 8 blocks; 234944 free
(25 chunks); 1854016 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
  PL/PgSQL function context: 24576 total in 2 blocks; 14384 free (14
chunks); 10192 used
  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0
chunks); 6512 used
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0
chunks); 20776 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  MessageContext: 131072 total in 5 blocks; 54792 free (5 chunks); 76280 used
  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 18880 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used
  ExecutorState: 65600 total in 4 blocks; 33792 free (18 chunks); 31808 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 11792 free (3 chunks); 12784 used
  CacheMemoryContext: 1342128 total in 21 blocks; 290016 free (11
chunks); 1052112 used
idx_raw_cfd_bear_commodity_position_records_position_date: 2048
total in 1 blocks; 752 free (0 chunks); 1296 used
CachedPlan: 3072 total in 2 blocks; 2008 free (2 chunks); 1064 used
CachedPlanSource: 3072 total in 2 blocks; 1656 free (0 chunks); 1416 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 1984 free (1 chunks); 1088 used
CachedPlanSource: 3072 total in 2 blocks; 1696 free (0 chunks); 1376 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
CachedPlanSource: 3072 total in 2 blocks; 1584 free (0 chunks); 1488 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlan: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
CachedPlanSource: 3072 total in 2 blocks; 1960 free (3 chunks); 1112 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

On Fri, Jul 20, 2012 at 9:10 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 20.07.2012 10:19, Benedikt Grundmann wrote:

 We yesterday encountered a program that in a degenerate case issued in
 a single transaction a huge number of selects (in a single transaction
 but each select in a separate call to PGExec) (huge = ~ 400,000).

 That transaction would continue to eat memory up until a point where
 calls to malloc (in aset.c) would fail and log for example:

 ,out of memory,Failed on request of size 11.

 ...


 - Is that expected expected behaviour?  The transaction was
   in READ_COMMITED mode, and my best guess is that this implies
   that some snapshot is taken before each subselect and all
   of them are only freed once the transaction is finished


 In more complicated scenarios, with e.g subtransactions, it's normal that
 there's some growth in memory usage like that. But with simple SELECTs, I
 don't think there should be.

 Can you write a simple self-contained test script that reproduces this? That
 would make it easier to see where the memory is going.

Assuming that it isn't obvious now that I realized that we generate a cursor
every time, I will give it a shot otherwise.

 PS, you should upgrade, the latest minor version in 8.4.x series is 8.4.12.
 If possible, upgrading to a more recent major version would be a good idea
 too. I don't know if it will help with this problem, but it might..

We are in fact automatically doing an upgrade in testing to 9.1 every day
at the moment.  We plan to pull the trigger in production in a few weeks.

Thanks,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects

On Fri, Jul 20, 2012 at 10:46 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:

 DECLARE sqmlcursor51587 CURSOR FOR select
 entry_time,source,bad_fields,isin,sedol,cusip,bloomberg,reuters,exchange_code,currency,description,bbg_instrument_type,instrument_type,specifics,definer,primary_exchange,is_primary_security,is_primary_listing,tags,bloomberg_id,status
 from vw_instruments_v7 where jane_symbol = E'FOO BAR' and true and
 effective_until = (select max(effective_until) from
 vw_instruments_v7)

 Sorry I imagine that the fact that this generates a cursor every time
 is important
 but it had honestly escaped my attention, because the library we use to query
 the database uses CURSORs basically for every select, so that it can process
 the data in batches (in this particular case that is conceptually unnecessary 
 as
 the query will only return one row, but the library does not know that).

Actually I believe this must be it.  I just went back and checked the library
and it does not CLOSE the cursors.  This is normally not a problem as most
transactions we have run one or two queries only...  I'll patch the library
to CLOSE the cursor when all the data has been delivered and test if the
error does not happen then.

I also noticed just know that all TopMemoryContext's after the first one
look significantly different.  They contain large PortalMemory sections.
Are those related to cursors?

 TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
  Portal hash: 8380416 total in 10 blocks; 3345088 free (34 chunks);
5035328 used
  PortalMemory: 16769024 total in 11 blocks; 2737280 free (15 chunks);
14031744 used
PortalHeapMemory: 56320 total in 9 blocks; 4320 free (0 chunks); 52000 used
  ExecutorState: 57344 total in 3 blocks; 15248 free (3 chunks); 42096 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
PortalHeapMemory: 56320 total in 9 blocks; 4320 free (0 chunks); 52000 used
  ExecutorState: 57344 total in 3 blocks; 15248 free (3 chunks); 42096 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
...


Thanks everyone,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres 8.4.9: running out of memory (malloc fails) when running a transaction that runs a LOT of selects