date:20141202

[HACKERS] Many processes blocked at ProcArrayLock

2014-12-02 Thread Xiaoyulei

Test configuration:
Hardware:
4P intel server, 60 core 120 hard thread.
Memory:512G
SSD:2.4T

PG:
max_connections = 160   # (change requires restart)
shared_buffers = 32GB   
work_mem = 128MB
maintenance_work_mem = 32MB 
bgwriter_delay = 100ms  # 10-1ms between rounds
bgwriter_lru_maxpages = 200 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 2.0   # 0-10.0 multipler on buffers 
scanned/round
wal_level = minimal # minimal, archive, or hot_standby
wal_buffers = 256MB # min 32kB, -1 sets based on 
shared_buffers
autovacuum = off
checkpoint_timeout=60min
checkpoint_segments = 1000
archive_mode = off
synchronous_commit = off
fsync = off
full_page_writes = off  


We use tpcc and pgbench to test postgresql 9.4beat2 performance. And we found 
the tps/tpmc could not increase with the terminal increase. The detail 
information is in attachment.

Many processes is blocked, I dump the call stack, and found these processes is 
blocked at: ProcArrayLock. 60% processes is blocked in ProcArrayEndTransaction 
with ProcArrayLock EXCLUSIVE, 20% is in GetSnapshotData with ProcArrayLock 
SHARED. Others locks like XLogFlush and WALInsertLock are not very heavy.

Is there any way we solve this problem?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using Core Foundation locale functions

2014-12-02 Thread Peter Geoghegan

On Fri, Nov 28, 2014 at 8:43 AM, Peter Eisentraut pete...@gmx.net wrote:
 At the moment, this is probably just an experiment that shows where
 refactoring and better abstractions might be suitable if we want to
 support multiple locale libraries.  If we want to pursue ICU, I think
 this could be a useful third option.

FWIW, I think that the richer API that ICU provides for string
transformations could be handy in optimizing sorting using abbreviated
keys. For example, ICU will happily only produce parts of sort keys
(the equivalent of strxfrm() blobs) if that is all that is required
[1].

I think that ICU also allows clients to parse individual primary
weights in a principled way (primary weights tend to be isomorphic to
the Unicode code points in the original string). I think that this
will enable order-preserving compression of the type anticipated by
the Unicode collation algorithm [2]. That could be useful for certain
languages, like Russian, where the primary weight level usually
contains multi-byte code points with glibc's strxfrm() (this is
generally not true of languages that use the Latin alphabet, or of
East Asian languages).

Note that there is already naturally a form of what you might call
compression with strxfrm() [3]. This is very useful for abbreviated
keys.

[1] http://userguide.icu-project.org/collation/architecture
[2] http://www.unicode.org/reports/tr10/#Run-length_Compression
[3] 
http://www.postgresql.org/message-id/cam3swztywe5j69tapvzf2cm7mhskke3uhhnk9gluqckkwqo...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer

2014-12-02 Thread Tomas Vondra


Dne 2014-12-02 02:52, Tom Lane napsal:

Tomas Vondra t...@fuzzy.cz writes:

On 2.12.2014 01:33, Tom Lane wrote:

What I suspect you're looking at here is the detritus of creation of
a huge number of memory contexts. mcxt.c keeps its own state about
existing contents in TopMemoryContext. So, if we posit that those
contexts weren't real small, there's certainly room to believe that
there was some major memory bloat going on recently.



Aha! MemoryContextCreate allocates the memory for the new context from
TopMemoryContext explicitly, so that it survives resets of the parent
context. Is that what you had in mind by keeping state about existing
contexts?


Right.


That'd probably explain the TopMemoryContext size, because array_agg()
creates separate context for each group. So if you have 1M groups, you
have 1M contexts. Although I don's see how the size of those contexts
would matter?


Well, if they're each 6K, that's your 6GB right there.


Yeah, but this memory should be freed after the query finishes, no?


Maybe we could move this info (list of child contexts) to the parent
context somehow, so that it's freed when the context is destroyed?


We intentionally didn't do that, because in many cases it'd result in
parent contexts becoming nonempty when otherwise they'd never have
anything actually in them.  The idea was that such shell parent 
contexts

should be cheap, requiring only a control block in TopMemoryContext and
not an actual allocation arena.  This idea of a million separate child
contexts was never part of the design of course; we might need to 
rethink

whether that's a good idea.  Or maybe there need to be two different
policies about where to put child control blocks.


Maybe. For me, the 130MB is not really a big deal, because for this to
happen there really needs to be many child contexts at the same time,
consuming much more memory. With 6.5GB consumed in total, 130MB amounts
to ~2% which is negligible. Unless we can fix the RSS bloat.

Also, this explains the TopMemoryContext size, but not the RSS size 
(or

am I missing something)?


Very possibly you're left with islands that prevent reclaiming very
much of the peak RAM usage.  It'd be hard to be sure without some sort
of memory map, of course.


Yes, that's something I was thinking about too - I believe what happens
is that allocations of info in TopMemoryContext and the actual contexts
are interleaved, and at the end only the memory contexts are deleted. 
The

blocks allocated in TopMemoryContexts are kept, creating the islands.

If that's the case, allocating the child context info within the parent
context would solve this, because these pieces would be reclaimed with 
the

rest of the parent memory.

But then again, there are probably other ways to create such islands
(e.g. allocating additional block in a long-lived context while the 
child

contexts exist).

regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Serialization exception : Who else was involved?

2014-12-02 Thread Olivier MATROT

Hello,

 

I'm using PostgreSQL .9.2.8 on Windows from a .NET application using
Npgsql.

I'm working in the Radiology Information System field.

 

We have thousands of users against a big accounting database.

We're using the SERIALIZABLE isolation level to ensure data consistency.

 

Because of the large number of users, and probably because of the
database design, we're facing serialization exception and we retry our
transactions.

So far so good.

 

I was wondering if there was a log level in PostgreSQL that could tell
me which query was the trigger of a doomed transaction.

The goal is to understand the failures to improve the database and
application designs.

 

I pushed the logs to the DEBUG5 level with no luck.

 

After carefully reviewing the documentation, it seems that there was
nothing.

So I downloaded the code and looked at it.

 

Serialization conflict detection is done in
src/backend/storage/lmgr/predicate.c, where transactions that are doomed
to fail are marked as such with the SXACT_FLAG_DOOMED flag.

 

I simply added elog(...) calls with the NOTIFY level, each time the flag
is set, compiled the code and give it a try.

 

The results are amazing for me, because this simple modification allows
me to know which query is marking other running transactions to fail.

I'm pretty sure that in the production environment of our major
customers, there should be no more than a few transaction involved.

 

I would like to see this useful and simple addition in a future version
of PostgreSQL.

Is it in the spirit of what is done when it comes to ease the work of
the developer ?

May be the level I've chosen is not appropriate ?

 

Please let me know what you think.

 

Kind Regards.

 

Olivier.

Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-02 Thread Stephen Frost

Adam,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
Ok. Though, this would affect how CATUPDATE is handled. Peter Eisentraut
previously raised a question about whether superuser checks should be
included with catupdate which led me to create the following post.

http://www.postgresql.org/message-id/cakrt6cqovt2kiykg2gff7h9k8+jvu1149zlb0extkkk7taq...@mail.gmail.com

Certainly, we could keep has_rolcatupdate for this case and put the
superuser check in role_has_attribute, but it seems like it might be worth
taking a look at whether a superuser can bypass catupdate or not. Just a
thought.

My recollection matches the documentation- rolcatupdate should be
required to update the catalogs. The fact that rolcatupdate is set by
AlterUser to match rolsuper is an interesting point and one which we
might want to reconsider, but that's beyond the scope of this patch.

Ergo, I'd suggest keeping has_rolcatupdate, but have it do the check
itself directly instead of calling down into role_has_attribute().

There's an interesting flip side to that though, which is the question
of what to do with pg_roles and psql. Based on the discussion this far,
it seems like we'd want to keep the distinction for pg_roles and psql
based on what bits have explicitly been set rather than what's actually
checked for. As such, I'd have one other function-
check_has_attribute() which *doesn't* have the superuser allow-all check
and is what is used in pg_roles and by psql. I'd expose both functions
at the SQL level.

Ok. I had originally thought for this patch that I would try to minimize
these types of changes, though perhaps this is that opportunity previously
mentioned in refactoring those. However, the catupdate question still
remains.

It makes sense to me, at least, to include removing those individual
attribute functions in this patch.

I have no reason for one over the other, though I did ask myself that
question. I did find it curious that in some cases there is has_X and
then in others pg_has_X. Perhaps I'm not looking in the right places,
but I haven't found anything that helps to distinguish when one vs the
other is appropriate (even if it is a general rule of thumb).

Given that we're changing things anyway, it seems to me that the pg_
prefix makes sense.

Yes, we were, however the latter causes a syntax error with initdb. :-/

Ok, then just stuff the 255 back there and add a comment about why it's
required and mention that cute tricks to calculate the value won't work.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer

2014-12-02 Thread Tomas Vondra

Dne 2 Prosinec 2014, 10:59, Tomas Vondra napsal(a):
 Dne 2014-12-02 02:52, Tom Lane napsal:
 Tomas Vondra t...@fuzzy.cz writes:

 Also, this explains the TopMemoryContext size, but not the RSS size
 (or am I missing something)?

 Very possibly you're left with islands that prevent reclaiming very
 much of the peak RAM usage.  It'd be hard to be sure without some sort
 of memory map, of course.

 Yes, that's something I was thinking about too - I believe what happens
 is that allocations of info in TopMemoryContext and the actual contexts
 are interleaved, and at the end only the memory contexts are deleted.
 The blocks allocated in TopMemoryContexts are kept, creating the
 islands.

 If that's the case, allocating the child context info within the parent
 context would solve this, because these pieces would be reclaimed with
 the rest of the parent memory.

On second thought, I'm not sure this explains the continuous increase of
consumed memory. When the first iteration consumes 2,818g of memory, why
should the following iterations consume significantly more? The allocation
patterns should be (almost) exactly the same, reusing the already
allocated memory (either at the system or TopMemoryContext level).

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_statement normalization fails due to temporary tables

2014-12-02 Thread Andres Freund

Hi,

pg_stat_statement's query normalization fails when temporary tables are
used in a query. That's because JumbleRangeTable() uses the relid in the
RTE to build the query fingerprint. I think in this case pgss from 9.1
actually would do better than 9.2+ as the hash lookup previously didn't
use the relid.

I don't really have a good idea about fixing this though. The best thing
that comes to mind is simply use eref-aliasname for the
disambiguation...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-02 Thread Alex Shulgin


Michael Paquier michael.paqu...@gmail.com writes:

 On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin a...@commandprompt.com wrote:
 Here's the patch rebased against current HEAD, that is including the
 recently committed action_at_recovery_target option.
 If this patch gets in, it gives a good argument to jump to 10.0 IMO.
 That's not a bad thing, only the cost of making recovery params as
 GUCs which is still a feature wanted.

 The default for the new GUC is 'pause', as in HEAD, and
 pause_at_recovery_target is removed completely in favor of it.
 Makes sense. Another idea that popped out was to rename this parameter
 as recovery_target_action as well, but that's not really something
 this patch should care about.

Indeed, but changing the name after the fact is straightforward.

 I've also taken the liberty to remove that part that errors out when
 finding $PGDATA/recovery.conf.
 I am not in favor of this part. It may be better to let the users know
 that their old configuration is not valid anymore with an error. This
 patch cuts in the flesh with a huge axe, let's be sure that users do
 not ignore the side pain effects, or recovery.conf would be simply
 ignored and users would not be aware of that.

Yeah, that is good point.

I'd be in favor of a solution that works the same way as before the
patch, without the need for extra trigger files, etc., but that doesn't
seem to be nearly possible.  Whatever tricks we might employ will likely
be defeated by the fact that the oldschool user will fail to *include*
recovery.conf in the main conf file.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

95 matches

Mail list logo