Re: [HACKERS] Interrupting long external library calls

2012-05-28 Thread Sandro Santilli
On Fri, May 25, 2012 at 12:34:54PM -0400, Tom Lane wrote:
 Sandro Santilli s...@keybit.net writes:
  I ended up providing an explicit mechanism to request interruption of
  whatever the library is doing, and experimented (successfully so far)
  requesting the interruption from a SIGINT handler.
 
  Do you see any major drawback in doing so ?
 
 This seems a bit fragile.  It might work all right in Postgres, where
 we tend to set up signal handlers just once at process start, but ISTM
 other systems might assume they can change their signal handlers at
 any time.  The handler itself looks less than portable anyway ---
 what about the SIGINFO case?

Indeed setting the handler from within the library itself was a temporary
implementation to see how effective it would have been. The idea is to
move the registration of the hanlder outside the library and into the
user (PostGIS in this case). The library itself would only expose 
GEOS_requestInterrupt/GEOS_cancelInterrupt API calls.

I'm guessing PostGIS should use the more portable pqsignal functions ?

What should I know about SIGINFO ?

 I assume that the geos::util::Interrupt::request() call sets a flag
 somewhere that's going to be periodically checked in long-running
 loops. 

Yes, this is what will happen.

 Would it be possible for the periodic checks to include a
 provision for a callback into Postgres-specific glue code, wherein
 you could test the same flags CHECK_FOR_INTERRUPTS does?  A similar
 approach might then be usable in other contexts, and it seems safer
 to me than messing with a host environment's signal handling.

Would it be enough for the signal handler (installed by PostGIS) 
to check those flags and call the GEOS_requestInterrupt function
when appropriate ?

--strk; 

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Marti Raudsepp
Hi list,

This bug isn't causing me any immediate problems -- the plan works out
well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
number of rows coming from a nestloop join, when joining 2 large
partitioned tables. Maybe it's been fixed in more recent versions,
sadly it's an EOL Linux distro and I have no immediate plans to
upgrade.

It's estimating to join 135957 x 281 rows, but the product is somehow
2946151270877

In reality, it's joining 132577 x ~1 rows to get 133116 results

QUERY PLAN
 GroupAggregate  (cost=852067259163.57..977278688175.85
rows=2946151270877 width=36)
   -  Sort  (cost=852067259163.57..859432637340.77 rows=2946151270877 width=36)
 Sort Key: b.banner_id, b.client_body_id,
(COALESCE(b.partner_body_id, a.partner_body_id)), b.space_id,
b.campaign_id, a.evt_type_id
 -  Nested Loop  (cost=0.00..213859871.55 rows=2946151270877 width=36)
   Join Filter: (a.request_id = b.request_id)
   -  Append  (cost=0.00..5905.69 rows=135957 width=20)
 -  Index Scan using XIF01request on request a
(cost=0.00..8.27 rows=1 width=20)
   Index Cond: ((request_time = '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time 
'2012-05-28 10:00:00'::timestamp without time zone))
 -  Index Scan using XIF01request_1222 on
request_1222 a  (cost=0.00..5897.42 rows=135956 width=20)
   Index Cond: ((request_time = '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time 
'2012-05-28 10:00:00'::timestamp without time zone))
   -  Append  (cost=0.00..1569.44 rows=281 width=32)
 -  Seq Scan on request_data b  (cost=0.00..11.30
rows=130 width=32)
 -  Index Scan using
IX_relationship64_request_d_c_1150 on request_d_c_1150 b
(cost=0.00..9.56 rows=2 width=32)
   Index Cond: (b.request_id = a.request_id)
*snip lots of partition index scans*

Query:
SELECT '2012-05-28T09:00:00', count(*),
uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
WHERE stats_request.request_time = '2012-05-28T09:00:00' AND
stats_request.request_time  (timestamp '2012-05-28T09:00:00' +
interval E'1 hour')::timestamp
GROUP BY banner_id, client_body_id, partner_body_id, space_id,
campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id;

Full EXPLAIN ANALYZE is attached.

Regards,
Marti

   QUERY PLAN   


 GroupAggregate  (cost=879052459671.79..1008131430885.41 rows=3037152263850 
width=36) (actual time=53254.277..53366.138 rows=205 loops=1)
   -  Sort  (cost=879052459671.79..886645340331.41 rows=3037152263850 
width=36) (actual time=53252.215..53271.619 rows=133116 loops=1)
 Sort Key: b.banner_id, b.client_body_id, (COALESCE(b.partner_body_id, 
a.partner_body_id)), b.space_id, b.campaign_id, a.evt_type_id
 Sort Method:  quicksort  Memory: 16544kB
 -  Nested Loop  (cost=0.00..220465186.10 rows=3037152263850 width=36) 
(actual time=2.079..52841.371 rows=133116 loops=1)
   Join Filter: (a.request_id = b.request_id)
   -  Append  (cost=0.00..6085.67 rows=140156 width=20) (actual 
time=0.115..147.640 rows=132577 loops=1)
 -  Index Scan using XIF01request on request a  
(cost=0.00..8.27 rows=1 width=20) (actual time=0.014..0.014 rows=0 loops=1)
   Index Cond: ((request_time = '2012-05-28 
09:00:00'::timestamp without time zone) AND (request_time  '2012-05-28 
10:00:00'::timestamp without time zone))
 -  Index Scan using XIF01request_1222 on request_1222 a 
 (cost=0.00..6077.40 rows=140155 width=20) (actual time=0.099..117.756 
rows=132577 loops=1)
   Index Cond: ((request_time = '2012-05-28 
09:00:00'::timestamp without time zone) AND (request_time  '2012-05-28 
10:00:00'::timestamp without time zone))
   -  Append  (cost=0.00..1569.44 rows=281 width=32) (actual 
time=0.217..0.365 rows=1 loops=132577)
 -  Seq Scan on request_data b  (cost=0.00..11.30 rows=130 
width=32) (actual time=0.000..0.000 rows=0 loops=132577)
 -  Index Scan using IX_relationship64_request_d_c_1150 
on request_d_c_1150 b  (cost=0.00..9.56 rows=2 width=32) (actual 
time=0.003..0.003 rows=0 loops=132577)
   Index Cond: (b.request_id = a.request_id)
 -  Index Scan using IX_relationship64_request_d_n_1150 
on request_d_n_1150 b  

Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Marti Raudsepp
On Mon, May 28, 2012 at 10:45 AM, Marti Raudsepp ma...@juffo.org wrote:
 Query:
 SELECT '2012-05-28T09:00:00', count(*),
 uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
 partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
 WHERE stats_request.request_time = '2012-05-28T09:00:00' AND
 stats_request.request_time  (timestamp '2012-05-28T09:00:00' +
 interval E'1 hour')::timestamp
 GROUP BY banner_id, client_body_id, partner_body_id, space_id,
 campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
 partner_body_id, space_id, campaign_id, evt_type_id;

Oh, I forgot to mention that stats_request is a view and that's where
the JOIN is coming from:

CREATE VIEW stats_request AS
 SELECT a.request_id, a.request_time, b.b2s_id, a.evt_type_id,
b.space_id, b.banner_id, COALESCE(b.visitor_id, a.visitor_id) AS
visitor_id, COALESCE(b.partner_body_id, a.partner_body_id) AS
partner_body_id, b.client_body_id, b.campaign_id
   FROM request a
   JOIN request_data b USING (request_id);

request and request_data are both large partitioned tables.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Greg Sabino Mullane
On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote:
 On May 27, 2012, at 12:53 PM, Tom Lane wrote:
  occurring, they'd take long enough to expose the process to sinval
  overrun even with not-very-high DDL rates.
 As it turns out, there are quite a few temporary tables created.

For the record, same here. We do *lots* of DDL (hence the cronjobs 
to vac/reindex system catalogs).

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpuQRprn1huB.pgp
Description: PGP signature


Re: [HACKERS] pg_stat_statements temporary file

2012-05-28 Thread Andres Freund
On Friday, May 25, 2012 05:19:28 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:
  Where do you suggest the file be written to?
  
  One could argue stats_temp_directory would be the correct place.
 
 No, that would be exactly the *wrong* place, because that directory can
 be on a RAM disk.  We need to put this somewhere where it'll survive
 a shutdown.
I had assumed it would do the writeout regularly to survive a database crash. 
As it does not do that my argument is clearly bogus, sorry for that.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-28 Thread Florian Pflug
On May28, 2012, at 02:46 , Noah Misch wrote:
 On Thu, May 24, 2012 at 07:31:37PM +0200, Florian Pflug wrote:
 Since the security barrier flag carries a potentially hefty performance
 penalty, I think it should be optional. Application which don't allow
 SQL-level access to the database might still benefit from row-level security,
 because it saves them from having to manually add the WHERE clause to every
 statement, or having to wrap all their tables with views. Yet without direct
 SQL-level access, the security barrier thing isn't really necessary, so
 it'd be nice if they wouldn't have to pay for it. How about
 
  ALTER TABLE ? SET ROW POLICY ? WITH (security_barrier)
 
 Backward compatibility concerns limited our options when retrofitting the
 security_barrier treatment for views, but I'd rather not add a knob completely
 disabling it in the context of a brand new feature.  A better avenue is to
 enhance our facilities for identifying safe query fragments.  For example,
 ALTER FUNCTION ... LEAKPROOF is superuser-only.  Adding a way for a table
 owner to similarly trust functions for the purpose of his own tables would
 help close the gap that motivates such an all-or-nothing knob.

Hm, I'm not sure a per-function flag is really the solution here. Neither,
however, is a per-RLS flag as your arguments made me realize. There really are
three orthogonal concepts here, all of which allow security barriers to be
ignored, namely

  A) Trusting the use not to exploit leaks, i.e. what you call a trusted query
 generator

  B) There being no leaks, i.e. all functions being LEAKPROOF

  C) Not caring about leaks, i.e. the security_barrier flag

Concept B is handled adequately by the LEAKPROOF flag. Concept C is handled
by the security_barrier flag. However, as you pointed out, it's a bit of a
dubious concept and only really necessary for backwards compatibility because
it reflects pre-9.2 behaviour.

Concept A is what I was aiming for. Per the above, a per-RLS flag is clearly the
wrong tool for the job, so consider my suggestion withdrawn. What we actually
want, I think, is a per-role flag which marks a role as leak trusted. Queries
issued by such a role would behave as if all functions are LEAKPROOF, since even
if there is a leak, the role is trusted not to exploit it.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Function call hierarchy/path since getting the buffer until access its data

2012-05-28 Thread Waldecir Faria

Good morning, I am doing a study about buffer management to improve the 
performance of one program that does heavy I/O operations. After looking and 
reading from different softwares' source codes/texts one friend suggested me to 
take a look at the PostgreSQL code. I already took a look at the PostgreSQL 
buffer management modules ( freelist.c and cia ) but now I am a bit confused 
how the buffer read/write works, I tried to see how PostgreSQL does to get, for 
example, a char array from one buffer. Looking at rawpage.c I think that I 
found a good example using the following function calls sequence starting at 
function get_raw_page_internal():

StrategyGetBuffer-BufferAlloc-ReadBuffer_Common 
-ReadBufferExtended-BufferGetPage- memcpy page to buf

But I need more examples or explanations to understand it better. Does anyone 
have more examples or can recommend me some article that says something about 
this?

Thanks,
-Waldecir
  

Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-28 Thread Andres Freund
On Tuesday, May 15, 2012 05:30:27 PM Andres Freund wrote:
 On Monday, May 14, 2012 07:55:32 PM Fujii Masao wrote:
  On Mon, May 14, 2012 at 6:32 PM, Andres Freund and...@2ndquadrant.com
 
 wrote:
   On Friday, May 11, 2012 08:45:23 PM Tom Lane wrote:
   Andres Freund and...@2ndquadrant.com writes:
Its the only place though which knows whether its actually sensible
to wakeup the walsender. We could make it return whether it wrote
anything and do the wakeup at the callers. I count 4 different
callsites which would be an annoying duplication but I don't really
see anything better right now.
   
   Another point here is that XLogWrite is not only normally called with
   the lock held, but inside a critical section.  I see no reason to take
   the risk of doing signal sending inside critical sections.
   
   BTW, a depressingly large fraction of the existing calls to
   WalSndWakeup are also inside critical sections, generally for no good
   reason that I can see.  For example, in EndPrepare(), why was the
   call placed where it is and not down beside SyncRepWaitForLSN?
   
   Hm. While I see no real problem moving it out of the lock I don't
   really see a way to cleanly outside critical sections everywhere. The
   impact of doing so seems to be rather big to me. The only externally
   visible place where it actually is known whether we write out data and
   thus do the wakeup is XLogInsert, XLogFlush and XLogBackgroundFlush.
   The first two of those are routinely called inside a critical section.
  
  So what about moving the existing calls of WalSndWakeup() out of a
  critical section and adding new call of WalSndWakeup() into
  XLogBackgroundFlush()? Then all WalSndWakeup()s are called outside a
  critical section and after releasing WALWriteLock. I attached the patch.
 
 Imo its simply not sensible to call WalSndWakeup at *any* of the current
 locations. They simply don't have the necessary information. They will
 wakeup too often (because with concurrency commits often won't require
 additional wal writes) and too seldom (because a flush caused by
 XLogInsert wont cause a wakeup).
Does anybody have a better idea than to either call WalSndWakeup() at 
essentially the wrong places or calling it inside a critical section?

Tom, what danger do you see from calling it in a critical section?

Greetings,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote:
 On May 27, 2012, at 12:53 PM, Tom Lane wrote:
 occurring, they'd take long enough to expose the process to sinval
 overrun even with not-very-high DDL rates.

 As it turns out, there are quite a few temporary tables created.

 For the record, same here. We do *lots* of DDL (hence the cronjobs 
 to vac/reindex system catalogs).

I wonder if it could've been something like transient problem with
a cronjob leading to massive bloat of pg_attribute, eventually
triggering the syncscan issue, then fixed by a successful VAC FULL
before we thought to look closely at the table size.  The syncscan
issue definitely was there in 8.3.5, it's only the question of
pg_attribute size that made me doubt it was happening for you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-05-28 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 This is really where I was hoping to eventually get to with the logging
 changes that have been discussed over the past couple of years.  We need
 to have a mechanism to allow logging to different places, based on
 information included in the log message and/or context, where one of
 those places could potentially be some kind of receiver process that
 then dumps the message into a table, either on the local system or on a
 remote system.

That looks like syslog features?
  
 What exactly all of that looks like and how it works is a pretty big
 discussion, but I'd definitely love to hear from others who are
 interested in changes to our logging infrastructure.

I though about exposing the log CSV files as partitions of a log table,
each partition being a FDW able to read the file.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-05-28 Thread Christopher Browne
On Mon, May 28, 2012 at 11:39 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Stephen Frost sfr...@snowman.net writes:
 This is really where I was hoping to eventually get to with the logging
 changes that have been discussed over the past couple of years.  We need
 to have a mechanism to allow logging to different places, based on
 information included in the log message and/or context, where one of
 those places could potentially be some kind of receiver process that
 then dumps the message into a table, either on the local system or on a
 remote system.

 That looks like syslog features?

 What exactly all of that looks like and how it works is a pretty big
 discussion, but I'd definitely love to hear from others who are
 interested in changes to our logging infrastructure.

 I though about exposing the log CSV files as partitions of a log table,
 each partition being a FDW able to read the file.

All interesting...

We've got several models as to how logs are likely to be captured,
which mean that it'll be difficult to have one uniform answer.

1.  An easy traditional default is to capture logs in a log directory.

An FDW might be able to get at this, with the cost of some additional
configuration.  Unfortunately, that injures the simplicity of this way
of logging.

2.  Ask Syslog

My favorite way to configure *my* PG instances (e.g. - those that I
use for testing) is for them to forward messages to syslog.  That way
they, and my Slony test instances, are all logging to one common
place, rather than the logs getting strewn in a bunch of places.

An FDW that could talk to syslog would be a nifty idea, though there
are enough different syslog implementations around to, again, injure
the simplicity of this.

[Also, mumble, mumble, syslog might be forwarding to a remote server,
further complications...]

3.  Lossy logging is desired by some doing high performance systems
where they can't afford to capture everything

http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php

One approach that I know Theo has used has been to throw events onto a
Spread channel, and have a listener pulling and aggregating the events
on a best-efforts basis.  I'm not sure if I should treat that as a
separate answer, or as part of the same.

4.  For a while, I had my syslog set up to capture logs into a
Postgres table.  Very cool, but pretty big slowdown.

What's notably messy, right now, is that we've got a bunch of logging
targets where there's nothing resembling a uniform way of *accessing*
the logs.  It seems to me that the messiness and non-uniformity are
the tough part of the problem.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade libraries check

2012-05-28 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Well, the scheme I had in mind would require pg_upgrade to verify that
 the new cluster contains an extension control file for each extension in
 the old cluster (which is something it really oughta do anyway, if it
 doesn't already).  After that, though, it ought not be looking at the
 individual functions defined by an extension --- it is the extension's
 business which libraries those are in.

I have some plans that we will be discussing later in the new dev cycle
and that would impact such a method if we're to follow them. To better
solve both the per-system (not even cluster) and per-database extension
versions and the inline/os-packaged extension discrepancy, I'm thinking
that we should move the extension support files from their shared OS
location to a per-database location at CREATE EXTENSION time.

That would mean managing several copies of those so that each database
can actually implement a different version and upgrade cycle of an
extension, including of its shared object libraries (some more session
state) in simplest cases (no extra shared object dependencies).

I don't intend to be already discussing the details of that proposal
here, that's just a hint so that you know that I intend to rework the
current control file strategy that we have, so any work on that in
pg_upgrade in next cycle will possibly be impacted.

 The only reason for pg_upgrade to still look at probin at all would be
 to cover C functions that weren't within extensions.  In the long run it
 might be possible to consider those unsupported, or at least not common
 enough to justify a safety check in pg_upgrade.

I'm thinking about optionally forbidding creating functions written in C
or installed into pg_catalog when not done via an extension script, and
maybe later down the road changing the default to be the forbidding.

The pg_catalog case makes sense as going via an extension's script is
the only way I know about to dump/restore the function.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade libraries check

2012-05-28 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 I have some plans that we will be discussing later in the new dev cycle
 and that would impact such a method if we're to follow them. To better
 solve both the per-system (not even cluster) and per-database extension
 versions and the inline/os-packaged extension discrepancy, I'm thinking
 that we should move the extension support files from their shared OS
 location to a per-database location at CREATE EXTENSION time.

As a packager, I can say that moving shared libraries in such a way is
an absolute nonstarter, as in don't even bother to propose it because it
is not going to happen.  Putting shared libraries into a
postgres-writable directory will be seen (correctly) as a security hole
of the first magnitude, not to mention that in many systems it'd require
root privilege anyway to adjust the dynamic linker's search path.  You
could possibly make per-database copies of the control and script files,
but I don't see much point in that if you can't similarly
version-control the shared libraries.

I think we're better off sticking to the assumption that the files
constituting an extension are read-only so far as the database server is
concerned.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Attempting to do a rolling move to 9.2Beta (as a slave) fails

2012-05-28 Thread Tom Lane
Karl Denninger k...@denninger.net writes:
 I am attempting to validate the path forward to 9.2, and thus tried the
 following:

 1. Build 9.2Beta1; all fine.

 2. Run a pg_basebackup from the current master machine (running 9.1) to
 a new directory on the slave machine, using the 9.2Beta1 pg_basebackup
 executable.

 3. Run a pg_upgrade against that from the new binary directory,
 producing a 9.2Beta1 data store.

I do not think this can work, unless pg_basebackup is more magic than I
think it is.  AFAIK, what you have after step 2 is a non-self-consistent
data directory that needs to be fixed by WAL replay before it is
consistent.  And pg_upgrade needs a consistent starting point.

 4. Attempt to start the result as a SLAVE against the existing 9.1 master.

This is definitely not going to work.  You can only log-ship between
servers of the same major version.

 But the last step fails, claiming that wal_level was set to minimal
 when the WAL records were written.  No it wasn't.  Not only was it not
 on the master where the base backup came from, it wasn't during the
 upgrade either nor is it set that way on the new candidate slave.
 Is this caused by the version mismatch?  Note that it does NOT bitch
 about the versions not matching.

That sounds like a bug, or poorly sequenced error checks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proclock table corrupted

2012-05-28 Thread Tom Lane
Harshitha S hershe...@gmail.com writes:
 Sorry, the OS is WindRiver Linux.

What I would suspect first is issues with the MIPS spinlock assembly
code (look into s_lock.h) not being portable to your platform.  That
code hasn't been tested on very many machines, I suspect.  It's not
impossible that it doesn't work at all on multiprocessor machines;
is yours one?

 Yes , I am taking of the fast path locking patch discussed in the link
 below.
 http://postgresql.1045698.n5.nabble.com/bug-in-fast-path-locking-td5626629.html

Not relevant to 9.0.x, for sure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] Attempting to do a rolling move to 9.2Beta (as a slave) fails

2012-05-28 Thread Karl Denninger
On 5/28/2012 11:44 AM, Tom Lane wrote:
 Karl Denninger k...@denninger.net writes:
 I am attempting to validate the path forward to 9.2, and thus tried the
 following:
 1. Build 9.2Beta1; all fine.
 2. Run a pg_basebackup from the current master machine (running 9.1) to
 a new directory on the slave machine, using the 9.2Beta1 pg_basebackup
 executable.
 3. Run a pg_upgrade against that from the new binary directory,
 producing a 9.2Beta1 data store.
 I do not think this can work, unless pg_basebackup is more magic than I
 think it is.  AFAIK, what you have after step 2 is a non-self-consistent
 data directory that needs to be fixed by WAL replay before it is
 consistent.  And pg_upgrade needs a consistent starting point.
Actually when pg_upgrade starts it starts the old binary against the old
data directory first, and thus replays the WAL records until it reaches
consistency before it does the upgrade.  It /*does*/ work; you have to
specify that you want the WAL records during the pg_basebackup (e.g.
-x=stream) so you have the WAL files for the old binary to consider
during the startup (or manually ship them after the backup completes.)

 4. Attempt to start the result as a SLAVE against the existing 9.1 master.
 This is definitely not going to work.  You can only log-ship between
 servers of the same major version.
OK.
 But the last step fails, claiming that wal_level was set to minimal
 when the WAL records were written.  No it wasn't.  Not only was it not
 on the master where the base backup came from, it wasn't during the
 upgrade either nor is it set that way on the new candidate slave.
 Is this caused by the version mismatch?  Note that it does NOT bitch
 about the versions not matching.
 That sounds like a bug, or poorly sequenced error checks.

   regards, tom lane
Well, at least I know why it fails and that it's a bad error message
(and can't work) rather than something stupid in the original setup
(which looked ok.)

-- 
-- Karl Denninger
/The Market Ticker ®/ http://market-ticker.org
Cuda Systems LLC


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Does anybody have a better idea than to either call WalSndWakeup() at 
 essentially the wrong places or calling it inside a critical section?

 Tom, what danger do you see from calling it in a critical section?

My concern was basically that it might throw an error.  Looking at the
current implementation of SetLatch, it seems that's not possible, but
I wonder whether we want to lock ourselves into that assumption.

Still, if the alternatives are worse, maybe that's the best answer.
If we do that, though, let's add comments to WalSndWakeup and SetLatch
mentioning that they mustn't throw error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-05-28 Thread Robert Haas
On May 28, 2012, at 11:57 AM, Christopher Browne cbbro...@gmail.com wrote:
 All interesting...
 
 We've got several models as to how logs are likely to be captured,
 which mean that it'll be difficult to have one uniform answer.
 
 1.  An easy traditional default is to capture logs in a log directory.
 
 An FDW might be able to get at this, with the cost of some additional
 configuration.  Unfortunately, that injures the simplicity of this way
 of logging.
 
 2.  Ask Syslog
 
 My favorite way to configure *my* PG instances (e.g. - those that I
 use for testing) is for them to forward messages to syslog.  That way
 they, and my Slony test instances, are all logging to one common
 place, rather than the logs getting strewn in a bunch of places.
 
 An FDW that could talk to syslog would be a nifty idea, though there
 are enough different syslog implementations around to, again, injure
 the simplicity of this.

What does talk to syslog mean in this context?  Syslog doesn't store any 
data; it just routes it around.

 [Also, mumble, mumble, syslog might be forwarding to a remote server,
 further complications...]
 
 3.  Lossy logging is desired by some doing high performance systems
 where they can't afford to capture everything
 
 http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php
 
 One approach that I know Theo has used has been to throw events onto a
 Spread channel, and have a listener pulling and aggregating the events
 on a best-efforts basis.  I'm not sure if I should treat that as a
 separate answer, or as part of the same.
 
 4.  For a while, I had my syslog set up to capture logs into a
 Postgres table.  Very cool, but pretty big slowdown.
 
 What's notably messy, right now, is that we've got a bunch of logging
 targets where there's nothing resembling a uniform way of *accessing*
 the logs.  It seems to me that the messiness and non-uniformity are
 the tough part of the problem.

Yeah, I agree.  I think what is missing here is something that can be read (and 
maybe indexed?) like a table, but written by a pretty dumb process.  It's not 
terribly workable to have PG log to PG, because there are too many situations 
where the problem you're trying to report would frustrate your attempt to 
report it.  At the other end of the spectrum, our default log format is easy to 
generate but (a) impoverished, not even including a time stamp by default and 
(b) hard to parse, especially because two customers with the same 
log_line_prefix is a rare nicety.  The  CSV format is both rich and 
machine-parseable (good start!) but it takes an unreasonable amount of work to 
make it usefully queryable.  We need something that looks more like a big red 
button.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 This bug isn't causing me any immediate problems -- the plan works out
 well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
 number of rows coming from a nestloop join, when joining 2 large
 partitioned tables.

This sounds familiar, but a quick trawl through the commit logs didn't
immediately turn up any related-looking patches.  Can you put together
a self-contained test case?

Also, what do you have constraint_exclusion set to?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-05-28 Thread Christopher Browne
On Mon, May 28, 2012 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote:
 On May 28, 2012, at 11:57 AM, Christopher Browne cbbro...@gmail.com wrote:
 2.  Ask Syslog

 My favorite way to configure *my* PG instances (e.g. - those that I
 use for testing) is for them to forward messages to syslog.  That way
 they, and my Slony test instances, are all logging to one common
 place, rather than the logs getting strewn in a bunch of places.

 An FDW that could talk to syslog would be a nifty idea, though there
 are enough different syslog implementations around to, again, injure
 the simplicity of this.

 What does talk to syslog mean in this context?  Syslog doesn't store any 
 data; it just routes it around.

Right, I guess that's a bit like saying, let's have something
listening to procmail, when that's really just a filter.

If there was some output form that was particularly amenable to our
use, [e.g. - simple to configure via the big red button that you
suggest], that would be nice.

 [Also, mumble, mumble, syslog might be forwarding to a remote server,
 further complications...]

 3.  Lossy logging is desired by some doing high performance systems
 where they can't afford to capture everything

 http://archives.postgresql.org/pgsql-hackers/2011-11/msg01437.php

 One approach that I know Theo has used has been to throw events onto a
 Spread channel, and have a listener pulling and aggregating the events
 on a best-efforts basis.  I'm not sure if I should treat that as a
 separate answer, or as part of the same.

 4.  For a while, I had my syslog set up to capture logs into a
 Postgres table.  Very cool, but pretty big slowdown.

 What's notably messy, right now, is that we've got a bunch of logging
 targets where there's nothing resembling a uniform way of *accessing*
 the logs.  It seems to me that the messiness and non-uniformity are
 the tough part of the problem.

 Yeah, I agree.  I think what is missing here is something that can be read 
 (and maybe indexed?) like a table, but written by a pretty dumb process.  
 It's not terribly workable to have PG log to PG, because there are too many 
 situations where the problem you're trying to report would frustrate your 
 attempt to report it.  At the other end of the spectrum, our default log 
 format is easy to generate but (a) impoverished, not even including a time 
 stamp by default and (b) hard to parse, especially because two customers with 
 the same log_line_prefix is a rare nicety.  The  CSV format is both rich and 
 machine-parseable (good start!) but it takes an unreasonable amount of work 
 to make it usefully queryable.  We need something that looks more like a big 
 red button.

There's a case to be made for some lossier NoSQL-y thing here.  But
I'm not sure what size fits enough.  I hate the idea of requiring the
deployment of *another* DBMS (however lite), but reading from text
files isn't particularly nice either.

Perhaps push the logs into an unlogged table on an extra PG instance,
where an FDW tries to make that accessible?  A fair bit of process
needs to live behind that big red button, and that's at least a
plausible answer.

What's needed is to figure out what restrictions are acceptable to
impose to have something that's button-worthy.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Upcoming back-branch PG releases

2012-05-28 Thread Tom Lane
A minor security issue (CVE-2012-2143) has been found in code that
Postgres shares with several other projects.  After some discussion it
was decided that the issue is not of enough severity to justify exactly
coordinated security releases, which would be quite difficult anyway
given the projects' disparate release procedures.  The plan that's been
settled on is that the other projects will publish their fixes this
Wednesday (May 30).  Once the issue is public, we will commit the
corresponding fix to our git repository, and then move forward with our
usual over-the-weekend release process to produce security updates on
Monday June 4.

Accordingly, expect to see back-branch release activity this week.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Marti Raudsepp
On Mon, May 28, 2012 at 8:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, what do you have constraint_exclusion set to?

The only sane value, partition

 This sounds familiar, but a quick trawl through the commit logs didn't
 immediately turn up any related-looking patches.  Can you put together
 a self-contained test case?

Sure, tested on 8.4.7, 8.4.11,  with all default configuration. Does
not occur in =9.0

create table a_parent (i int);
create table a_child1 () inherits (a_parent);
create table a_child2 () inherits (a_parent);
insert into a_child1 select generate_series(1,10);
insert into a_child2 select generate_series(11,20);
create index a1_i_idx on a_child1(i);
create index a2_i_idx on a_child2(i);

create table b_parent (i int);
create table b_child1 () inherits (b_parent);
create table b_child2 () inherits (b_parent);
insert into b_child1 select generate_series(1,10);
insert into b_child1 select generate_series(1,10);
insert into b_child2 select generate_series(11,20);
insert into b_child2 select generate_series(11,20);
create index b1_i_idx on b_child1(i);
create index b2_i_idx on b_child2(i);

analyze;
explain select * from a_parent join b_parent using (i) where i between 1 and 2;

Actually returns 4 rows, but estimate is 28168

QUERY PLAN
Nested Loop  (cost=0.00..1276.16 rows=28168 width=4)
  Join Filter: (public.a_parent.i = public.b_parent.i)
  -  Append  (cost=0.00..62.56 rows=14 width=4)
-  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
  Filter: ((i = 1) AND (i = 2))
-  Index Scan using a1_i_idx on a_child1 a_parent
(cost=0.00..8.28 rows=1 width=4)
  Index Cond: ((i = 1) AND (i = 2))
-  Index Scan using a2_i_idx on a_child2 a_parent
(cost=0.00..8.28 rows=1 width=4)
  Index Cond: ((i = 1) AND (i = 2))
  -  Append  (cost=0.00..56.64 rows=2404 width=4)
-  Seq Scan on b_parent  (cost=0.00..34.00 rows=2400 width=4)
-  Index Scan using b2_i_idx on b_child2 b_parent
(cost=0.00..11.31 rows=2 width=4)
  Index Cond: (public.b_parent.i = public.a_parent.i)
-  Index Scan using b1_i_idx on b_child1 b_parent
(cost=0.00..11.32 rows=2 width=4)
  Index Cond: (public.b_parent.i = public.a_parent.i)
(15 rows)


There was a similar case in 9.0.4 with WHERE i=1, but that has been
fixed in 9.0.7

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Marti Raudsepp
On Mon, May 28, 2012 at 10:32 PM, Marti Raudsepp ma...@juffo.org wrote:
 There was a similar case in 9.0.4 with WHERE i=1, but that has been
 fixed in 9.0.7

Oh, it's been fixed in 9.0.7, but apparently not in 8.4.11; the empty
parent tables are confusing the estimate:

explain select * from a_parent join b_parent using (i) where i=1;

QUERY PLAN
Nested Loop  (cost=56.57..123.65 rows=224 width=4)
  -  Append  (cost=0.00..62.60 rows=16 width=4)
-  Seq Scan on b_parent  (cost=0.00..40.00 rows=12 width=4)
  Filter: (i = 1)
-  Index Scan using b2_i_idx on b_child2 b_parent
(cost=0.00..11.30 rows=2 width=4)
  Index Cond: (i = 1)
-  Index Scan using b1_i_idx on b_child1 b_parent
(cost=0.00..11.30 rows=2 width=4)
  Index Cond: (i = 1)
  -  Materialize  (cost=56.57..56.71 rows=14 width=4)
-  Append  (cost=0.00..56.56 rows=14 width=4)
  -  Seq Scan on a_parent  (cost=0.00..40.00 rows=12 width=4)
Filter: (i = 1)
  -  Index Scan using a1_i_idx on a_child1 a_parent
(cost=0.00..8.28 rows=1 width=4)
Index Cond: (i = 1)
  -  Index Scan using a2_i_idx on a_child2 a_parent
(cost=0.00..8.28 rows=1 width=4)
Index Cond: (i = 1)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-28 Thread Peter Eisentraut
On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
 Upon closer inspection of the issue I came to believe that the proper
 fix is to drop support for special treatment of host part starting
 with slash altogether.
 
 Attached is a patch to do that. 

Committed.

I also updated the documentation and tests to show that percent-encoding
a host part starting with slash also works, as discussed upthread.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-05-28 Thread Robert Haas
On Mon, May 28, 2012 at 2:21 PM, Christopher Browne cbbro...@gmail.com wrote:
 Yeah, I agree.  I think what is missing here is something that can be read 
 (and maybe indexed?) like a table, but written by a pretty dumb process.  
 It's not terribly workable to have PG log to PG, because there are too many 
 situations where the problem you're trying to report would frustrate your 
 attempt to report it.  At the other end of the spectrum, our default log 
 format is easy to generate but (a) impoverished, not even including a time 
 stamp by default and (b) hard to parse, especially because two customers 
 with the same log_line_prefix is a rare nicety.  The  CSV format is both 
 rich and machine-parseable (good start!) but it takes an unreasonable amount 
 of work to make it usefully queryable.  We need something that looks more 
 like a big red button.

 There's a case to be made for some lossier NoSQL-y thing here.  But
 I'm not sure what size fits enough.  I hate the idea of requiring the
 deployment of *another* DBMS (however lite), but reading from text
 files isn't particularly nice either.

 Perhaps push the logs into an unlogged table on an extra PG instance,
 where an FDW tries to make that accessible?  A fair bit of process
 needs to live behind that big red button, and that's at least a
 plausible answer.

 What's needed is to figure out what restrictions are acceptable to
 impose to have something that's button-worthy.

I am not fired up about needing a second instance of PG; it seems to
me that that requirement by itself makes it considerably more involved
than pushing a big red button.  I agree with you that deploying
another DBMS, even a lightweight one, is also not a good solution.

As far as CSV goes, I think the biggest deficiency is that there's a
mismatch between the way that log files are typically named (e.g. one
per day, or one per hour) and the way that a CSV foreign table is
created (you've got to point it at one particular file).  Maybe we
could have a CSV reader that understands PostgreSQL-format CSV logs,
but you point it at a directory, rather than a single file, and it
reads all the CSV files in the directory.  And maybe it could also be
smart enough that if you've got a WHERE clause that filter by date, it
uses that to skip any files that can be proven irrelevant.  So the
user can just turn on CSV logging, point the FDW at the log directory,
and away they go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-28 Thread Robert Haas
On Sun, May 27, 2012 at 2:53 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2012-05-22 at 10:19 -0400, Robert Haas wrote:
 I think we should have made roles and tablespaces database
 objects rather than shared objects,

 User names are global objects in the SQL standard, which is part of the
 reason that the current setup was never seriously challenged.

Does the SQL standard really discriminate between the database and the
cluster?  Wow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-28 Thread Robert Haas
On Fri, May 25, 2012 at 11:12 PM, Bruce Momjian br...@momjian.us wrote:
 On Fri, May 25, 2012 at 10:34:54PM -0400, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote:
   Yes, pre-1996.  I think the fact that authentication/user names appear
   in pg_hba.conf really locked the user name idea into global objects, and
   we have never really been able to make a dent in that.
 
  Eh?  Why would the presence of usernames in pg_hba.conf mean that they
  have to be global objects?

 I havn't had a chance (yet) to look, but perhaps the current code
 attempts to validate the role before figuring out what database is being
 requested?  We'd have to essentially invert that, of course, for this..
 One thing I was wondering about is if we're going to have an issue
 supporting things like tell me what databases exist (psql -l), which
 connect to the 'postgres' by default, for local-only roles.  I'm not
 sure that I actually care, to be honest, but it's something to consider.
 I don't think we should require users to create every local role also in
 postgres, nor do I feel that we should allow connections to postgres by
 any role, nor do I want to break tools which use 'postgres' to basically
 get access to shared catalogs- but I don't see an immediate or easy
 solution..

 Yes.  In a simple case, you have a username, you want to validate it
 against LDAP or kerberos --- how do you partition the external
 authentication tool based on database name?  Seems like an obvious
 problem to me.

Well, when people try to connect to database it, you set up
pg_hba.conf to point them at the Kerberos server, but when they try to
connect to database sales, you just use MD5 for that.  Or whatever
your site policy happens to be.  I'm not seeing the problem;
pg_hba.conf already allows different authentication methods for
different databases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade libraries check

2012-05-28 Thread Robert Haas
On Sun, May 27, 2012 at 11:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't recall exactly what problems drove us to make pg_upgrade do
 what it does with extensions, but we need a different fix for them.

Well, you need pg_upgrade to preserve the OIDs of objects that are
part of extensions just as you do for any other objects.  If pg_dump
--binary-upgrade just emits CREATE EXTENSION snarfle, for some
extension snarfle that provides an eponymous type, then the new
cluster is going to end up with a type with a different OID than than
whatever existed in the old cluster, and that's going to break all
sorts of things - e.g. arrays already on disk that contain the old
type OID.

I think that it would be an extremely fine thing if we could fix the
world so that we not preserve OIDs across a pg_upgrade; that would be
all kinds of wonderful.  However, I think that's likely to be quite a
difficult project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup --xlog compatibility break

2012-05-28 Thread Peter Eisentraut
In 9.1, the pg_basebackup option --xlog takes no argument.  In 9.2, it
takes a required argument.  I think such compatibility breaks should be
avoided, especially in client-side programs.  Now you can't write a
script running pg_basebackup that works with 9.1 and 9.2, if you need to
include the WAL.

I think the behavior of -x/--xlog should be restored to the state of
9.1, and a new option should be added to select between the fetch and
stream methods.  (With a suitable default, this would also increase
usability a bit.)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Mon, May 28, 2012 at 10:32 PM, Marti Raudsepp ma...@juffo.org wrote:
 There was a similar case in 9.0.4 with WHERE i=1, but that has been
 fixed in 9.0.7

 Oh, it's been fixed in 9.0.7, but apparently not in 8.4.11; the empty
 parent tables are confusing the estimate:

Hmm ... what your test case seems to be exhibiting is related to this:

Author: Tom Lane t...@sss.pgh.pa.us
Branch: master [f3ff0433a] 2011-07-14 17:30:57 -0400
Branch: REL9_1_STABLE Release: REL9_1_0 [cf8245285] 2011-07-14 17:31:12 -0400
Branch: REL9_0_STABLE Release: REL9_0_5 [0dd46a776] 2011-07-14 17:31:25 -0400

In planner, don't assume that empty parent tables aren't really empty.

There's a heuristic in estimate_rel_size() to clamp the minimum size
estimate for a table to 10 pages, unless we can see that vacuum or analyze
has been run (and set relpages to something nonzero, so this will always
happen for a table that's actually empty).  However, it would be better
not to do this for inheritance parent tables, which very commonly are
really empty and can be expected to stay that way.  Per discussion of a
recent pgsql-performance report from Anish Kejariwal.  Also prevent it
from happening for indexes (although this is more in the nature of
documentation, since CREATE INDEX normally initializes relpages to
something nonzero anyway).

Back-patch to 9.0, because the ability to collect statistics across a
whole inheritance tree has improved the planner's estimates to the point
where this relatively small error makes a significant difference.  In the
referenced report, merge or hash joins were incorrectly estimated as
cheaper than a nestloop with inner indexscan on the inherited table.
That was less likely before 9.0 because the lack of inherited stats would
have resulted in a default (and rather pessimistic) estimate of the cost
of a merge or hash join.

However, the error in your original example is far too large to be
explained by that, so I think it was tripping over something different.
When I run your test case in 8.4, I get

 Nested Loop  (cost=0.00..1276.12 rows=28168 width=4)
   Join Filter: (public.a_parent.i = public.b_parent.i)
   -  Append  (cost=0.00..62.57 rows=14 width=4)
 -  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
   Filter: ((i = 1) AND (i = 2))
 -  Index Scan using a1_i_idx on a_child1 a_parent  (cost=0.00..8.29 
rows=1 width=4)
...
   -  Append  (cost=0.00..56.63 rows=2404 width=4)
 -  Seq Scan on b_parent  (cost=0.00..34.00 rows=2400 width=4)
 -  Index Scan using b1_i_idx on b_child1 b_parent  (cost=0.00..11.34 
rows=2 width=4)
...

and that join size estimate is not too out of line if you accept the
admittedly-bogus numbers for the appendrel sizes.  There seems to be
something else going on in your original example.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup --xlog compatibility break

2012-05-28 Thread Magnus Hagander
On Mon, May 28, 2012 at 10:11 PM, Peter Eisentraut pete...@gmx.net wrote:
 In 9.1, the pg_basebackup option --xlog takes no argument.  In 9.2, it
 takes a required argument.  I think such compatibility breaks should be
 avoided, especially in client-side programs.  Now you can't write a
 script running pg_basebackup that works with 9.1 and 9.2, if you need to
 include the WAL.

 I think the behavior of -x/--xlog should be restored to the state of
 9.1, and a new option should be added to select between the fetch and
 stream methods.  (With a suitable default, this would also increase
 usability a bit.)

Just to be clear - it's not possible to actually accept -x with an
*optional* parameter, is it? Meaning -x would mean the same as -x
fetch and therefor become backwards compatible?

IIRC I did try that, and didn't get it to work - but if that's doable,
that seems like the cleanest way?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FDW / list of needed columns, WHERE conditions (in PlanForeignScan)

2012-05-28 Thread Tomas Vondra
Hi,

I'm writing my first FDW, and I need to get the list of columns I
actually need to fetch when planning the query. I do want to fetch only
the columns that are actually needed, not all of them.

Initially I've used RelOptInfo-reltargetlist but it seems it does not
cover cases like

   SELECT 1 FROM FDW WHERE column_a = 1

and so on. Even RelOptInfo-attr_needed seems not to work as this
returns not needed for all columns:

for (i = baserel-min_attr; i = baserel-max_attr; i++) {
if (baserel-attr_needed[i-baserel-min_attr] != NULL)
elog(WARNING, attr %d needed, i);
else
elog(WARNING, attr %d not needed, i);
}

Where to get this info?

Also, I'd like to apply as much restrictions as possible when executing
the plan. I see there's PlanState-qual, described as implicitly-ANDed
qual conditions, which I assume is what I need. But this seems to be
available only in BeginForeignScan (as ss.ps.qual in ForeignScanState),
not in planning which is the place where I need to compute estimates
etc. Where do I get this, when planning the query?

kind regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW / list of needed columns, WHERE conditions (in PlanForeignScan)

2012-05-28 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 I'm writing my first FDW, and I need to get the list of columns I
 actually need to fetch when planning the query. I do want to fetch only
 the columns that are actually needed, not all of them.

reltargetlist and attr_needed only tell you about columns the scan has
to *output* (ie, they are used in join conditions or the final result).
Vars that are only mentioned in baserestrict conditions aren't
included.  So you'd need to do something like pull_varattnos on the
baserestrictinfo list and union that with attr_needed.

 Also, I'd like to apply as much restrictions as possible when executing
 the plan. I see there's PlanState-qual, described as implicitly-ANDed
 qual conditions, which I assume is what I need. But this seems to be
 available only in BeginForeignScan (as ss.ps.qual in ForeignScanState),
 not in planning which is the place where I need to compute estimates
 etc. Where do I get this, when planning the query?

Same answer, rel-baserestrictinfo.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Marti Raudsepp
On Mon, May 28, 2012 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, the error in your original example is far too large to be
 explained by that, so I think it was tripping over something different.

Good point. But I generated a bigger data set with the above test case
and it gets progressively worse with more rows and partitions. (The
original database has 2x4 billion rows in over 100 partitions)

Here's a bigger test case, 2GB total (will take a few minutes to
generate). It gives a total estimate of 3900158, even though the
Append nodes suggest 13x2406 rows.

create table a_parent (i int);
create table a_child1 () inherits (a_parent);
create table a_child2 () inherits (a_parent);
create table a_child3 () inherits (a_parent);
insert into a_child1 select generate_series(0001,1000);
insert into a_child2 select generate_series(1001,2000);
insert into a_child3 select generate_series(2001,3000);
create index a1_i_idx on a_child1(i);
create index a2_i_idx on a_child2(i);
create index a3_i_idx on a_child3(i);
alter table a_child1 add check (i between 0001 and 1000);
alter table a_child2 add check (i between 1001 and 2000);
alter table a_child3 add check (i between 2001 and 3000);

create table b_parent (i int);
create table b_child1 () inherits (b_parent);
create table b_child2 () inherits (b_parent);
create table b_child3 () inherits (b_parent);
insert into b_child1 select generate_series(0001,1000);
insert into b_child1 select generate_series(0001,1000);
insert into b_child2 select generate_series(1001,2000);
insert into b_child2 select generate_series(1001,2000);
insert into b_child3 select generate_series(2001,3000);
insert into b_child3 select generate_series(2001,3000);
create index b1_i_idx on b_child1(i);
create index b2_i_idx on b_child2(i);
create index b3_i_idx on b_child3(i);
alter table b_child1 add check (i between 0001 and 1000);
alter table b_child2 add check (i between 1001 and 2000);
alter table b_child3 add check (i between 2001 and 3000);

analyze;
explain select * from a_parent join b_parent using (i) where i between 1 and 2;

Nested Loop  (cost=0.00..1413.71 rows=3900158 width=4)
  Join Filter: (public.a_parent.i = public.b_parent.i)
  -  Append  (cost=0.00..55.37 rows=13 width=4)
-  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
  Filter: ((i = 1) AND (i = 2))
-  Index Scan using a1_i_idx on a_child1 a_parent
(cost=0.00..9.37 rows=1 width=4)
  Index Cond: ((i = 1) AND (i = 2))
  -  Append  (cost=0.00..74.41 rows=2406 width=4)
-  Seq Scan on b_parent  (cost=0.00..34.00 rows=2400 width=4)
-  Index Scan using b1_i_idx on b_child1 b_parent
(cost=0.00..13.43 rows=2 width=4)
  Index Cond: (public.b_parent.i = public.a_parent.i)
-  Index Scan using b2_i_idx on b_child2 b_parent
(cost=0.00..13.50 rows=2 width=4)
  Index Cond: (public.b_parent.i = public.a_parent.i)
-  Index Scan using b3_i_idx on b_child3 b_parent
(cost=0.00..13.48 rows=2 width=4)
  Index Cond: (public.b_parent.i = public.a_parent.i)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus nestloop rows estimate in 8.4.7

2012-05-28 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Mon, May 28, 2012 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, the error in your original example is far too large to be
 explained by that, so I think it was tripping over something different.

 Good point. But I generated a bigger data set with the above test case
 and it gets progressively worse with more rows and partitions. (The
 original database has 2x4 billion rows in over 100 partitions)

 Here's a bigger test case, 2GB total (will take a few minutes to
 generate). It gives a total estimate of 3900158, even though the
 Append nodes suggest 13x2406 rows.

On reflection I think this is an artifact of the lack of
inheritance-tree stats in 8.4.  The estimated size of the join does
*not* come from the product of the two appendrel sizes shown in EXPLAIN,
because the inner one is a inner indexscan using a parameter from the
outer side (what we would now call a parameterized path).  Rather, the
estimated size is join selectivity times outer relation size times inner
relation size.  The outer relation size, after applying its restriction
clause, is indeed only 13 rows, but the inner relation size is 60e6 rows
because it has no restriction clause.  If we had an accurate join
selectivity estimate that'd be fine, but for lack of any stats about the
inheritance tree eqjoinsel just punts and returns DEFAULT_EQ_SEL, ie
0.005.  And that works out to your result.

So, nothing to see here ... 8.4 is just not very good with this type
of problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup --xlog compatibility break

2012-05-28 Thread Fujii Masao
On Tue, May 29, 2012 at 5:38 AM, Magnus Hagander mag...@hagander.net wrote:
 On Mon, May 28, 2012 at 10:11 PM, Peter Eisentraut pete...@gmx.net wrote:
 In 9.1, the pg_basebackup option --xlog takes no argument.  In 9.2, it
 takes a required argument.  I think such compatibility breaks should be
 avoided, especially in client-side programs.  Now you can't write a
 script running pg_basebackup that works with 9.1 and 9.2, if you need to
 include the WAL.

 I think the behavior of -x/--xlog should be restored to the state of
 9.1, and a new option should be added to select between the fetch and
 stream methods.  (With a suitable default, this would also increase
 usability a bit.)

 Just to be clear - it's not possible to actually accept -x with an
 *optional* parameter, is it? Meaning -x would mean the same as -x
 fetch and therefor become backwards compatible?

 IIRC I did try that, and didn't get it to work - but if that's doable,
 that seems like the cleanest way?

+1 ISTM you need to change getopt_long() to do that.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-28 Thread Peter Geoghegan
The attached very simple patch moves the commit_delay +
commit_siblings sleep into XLogFlush, where the leader alone sleeps.
This appears to be a much more effective site for a delay.

Benchmark results, with and without a delay of 3000ms (commit_siblings
is 0 in both cases) are available from:

http://leadercommitdelay.staticloud.com

The only way this differed from my usual setup for these benchmarks
was that, as it happened, shared_buffers was set to 256MB, which would
of course have affected wal_buffers actual value, which was 8MB.

I surmise that the reason this works so well is that it increases the
rate of batching at lower client counts, without needlessly delaying
each and every follower beyond when their transaction is likely to
have committed. A more detailed analysis will have to wait for
tomorrow.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


move_delay_2012_05_29.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-28 Thread Tatsuo Ishii
 On Tue, May 22, 2012 at 3:27 PM, Tatsuo Ishii is...@postgresql.org wrote:
 
  Thanks for your comments. They clarify a lot.
  But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2?
  Isn't it possible for them to produce same pg_wchar?

 If LB is in 0x90 - 0x99 range, then they are LC2.
 If LB is in 0xf0 - 0xff range, then they are LCPRV2.

 
 Thanks. I rewrote inverse conversion from pg_wchar to mule. New version of
 patch is attached.

[forgot to cc: to the list]

I looked into your patch, especially: pg_wchar2euc_with_len(const
pg_wchar *from, unsigned char *to, int len)

I think there's a small room to enhance the function.

if (*from  24)
{
*to++ = *from  24;
*to++ = (*from  16)  0xFF;
*to++ = (*from  8)  0xFF;
*to++ = *from  0xFF;
cnt += 4;
}

Since the function walk through this every single wchar, something like:

if ((c = *from  24))
{
*to++ = c;
*to++ = (*from  16)  0xFF;
*to++ = (*from  8)  0xFF;
*to++ = *from  0xFF;
cnt += 4;
}

will save few cycles(I'm not sure the optimizer produces similar code
above anyway though).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-28 Thread Alex

Peter Eisentraut pete...@gmx.net writes:

 On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
 Upon closer inspection of the issue I came to believe that the proper
 fix is to drop support for special treatment of host part starting
 with slash altogether.
 
 Attached is a patch to do that. 

 Committed.

Many thanks!

 I also updated the documentation and tests to show that percent-encoding
 a host part starting with slash also works, as discussed upthread.

Yes, that's a side-effect, but still might be useful to know.

--
Regards,
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AbortOutOfAnyTransaction is a few bricks shy of a load

2012-05-28 Thread Tom Lane
While poking at the slow-relcache-rebuild issue recently reported by
Jeff Frost and Greg Mullane, I had set up a test case that used a
small function to create and drop enough temp tables to provoke an
sinval reset.  I had also modified sinval.c so that it reported sinval
reset processing as elog(LOG) not elog(DEBUG4).  The test script just
ran until killed, and I happened to notice that killing the client
resulted in something bizarre in the postmaster log:

2012-05-28 22:06:23.943 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.018 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.018 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.091 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.091 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.162 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.163 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.235 EDT 19701 LOG:  could not send data to client: Broken 
pipe
2012-05-28 22:06:24.235 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.235 EDT 19701 FATAL:  connection to client lost
2012-05-28 22:06:24.235 EDT 19701 WARNING:  StartTransaction while in START 
state
2012-05-28 22:06:24.235 EDT 19701 WARNING:  GUC nest level = 1 at transaction 
start
2012-05-28 22:06:24.235 EDT 19701 LOG:  disconnection: session time: 
0:00:12.148 user=tgl database=foo host=[local]

Most of that is expected, but not the WARNINGs.  Digging into this,
I found the following stack trace leading to the first warning:

#3  0x00708431 in elog_finish (elevel=optimized out, 
fmt=0x73bfa8 StartTransaction while in %s state) at elog.c:1212
#4  0x0048b17d in StartTransaction () at xact.c:1689
#5  StartTransactionCommand () at xact.c:2440
#6  0x004cac05 in RemoveTempRelationsCallback (code=optimized out, 
arg=optimized out)
at namespace.c:3753
#7  RemoveTempRelationsCallback (code=optimized out, arg=optimized out) at 
namespace.c:3747
#8  0x00628cdd in shmem_exit (code=1) at ipc.c:221
#9  0x00628d55 in proc_exit_prepare (code=1) at ipc.c:181
#10 0x00628dd8 in proc_exit (code=1) at ipc.c:96
#11 0x00705ef3 in errfinish (dummy=optimized out) at elog.c:518
-- this is ereport(FATAL, errmsg(connection to client lost)));
#12 0x00641149 in ProcessInterrupts () at postgres.c:2875
#13 ProcessInterrupts () at postgres.c:2824
#14 0x00705eb4 in errfinish (dummy=optimized out) at elog.c:543
#15 0x00708431 in elog_finish (elevel=optimized out, 
fmt=0x85387b sinval reset received) at elog.c:1212
#16 0x0062d08a in ReceiveSharedInvalidMessages (
invalFunction=0x6f6800 LocalExecuteInvalidationMessage, 
resetFunction=0x6f6140 InvalidateSystemCaches) at sinval.c:112
#17 0x0048b0f8 in AtStart_Cache () at xact.c:792
#18 StartTransaction () at xact.c:1791
#19 StartTransactionCommand () at xact.c:2440
#20 0x0062cefd in ProcessCatchupEvent () at sinval.c:331
#21 0x0062d185 in EnableCatchupInterrupt () at sinval.c:267
#22 0x00641665 in prepare_for_client_read () at postgres.c:515
#23 0x0059cfc1 in secure_read (port=0x2a49060, ptr=0xb23260, len=8192) 
at be-secure.c:302
#24 0x005a43ef in pq_recvbuf () at pqcomm.c:816
#25 0x005a4e65 in pq_getbyte () at pqcomm.c:857
#26 0x00642ba1 in SocketBackend (inBuf=0x7fff060ba630) at postgres.c:342
#27 ReadCommand (inBuf=0x7fff060ba630) at postgres.c:490
#28 PostgresMain (argc=optimized out, argv=optimized out, 
username=optimized out)
at postgres.c:3929

That is, we're trying to do an elog(FATAL) exit after having realized
the client is dead, and within that we have to run a transaction to get
rid of the temp tables we made, and StartTransactionCommand is unhappy
with the transaction state.

Now, RemoveTempRelationsCallback calls AbortOutOfAnyTransaction before
it does StartTransactionCommand, so why are we getting this whining?
The answer appears to be that AbortOutOfAnyTransaction is satisfied if
it sees blockState == TBLOCK_DEFAULT.  In contrast,
AbortCurrentTransaction knows that it should poke a little deeper and
check for state == TRANS_DEFAULT; otherwise, we failed partway through
transaction start and still need to do cleanup.  I added the latter code
back in 2005, in commit 60b2444cc3ba037630c9b940c3c9ef01b954b87b, which
is otherwise entirely unrelated.  I suppose testing of that patch had
shown me that AbortCurrentTransaction needed to handle this case, but
I overlooked that AbortOutOfAnyTransaction might need to do likewise.

I intend to add the identical code to AbortOutOfAnyTransaction, as per
attached patch against HEAD; this eliminates the WARNING messages in my
test case.  I think this had probably better get back-patched, too.
In my test the warnings seem only cosmetic, but I'm afraid there could
be more severe consequences in some cases.

regards,