Re: [HACKERS] Review: query result history in psql

2013-06-27 Thread ian link
I just applied the patch to a clean branch from the latest master. I
couldn't get a segfault from using the new feature. Could you provide a
little more info to reproduce the segfault? Thanks


On Thu, Jun 27, 2013 at 11:28 PM, Pavel Stehule wrote:

> Hello
>
> after patching I god segfault
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
> 98  for (p = prompt_string;
> Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686
> ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386
> (gdb) bt
> #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
> #1  0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134
> #2  0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336
>
> Regards
>
> Pavel Stehule
>
> 2013/6/28 ian link :
> >> It's better to post a review as a reply to the message which contains
> >> the patch.
> >
> > Sorry about that, I did not have the email in my inbox and couldn't
> figure
> > out how to use the old message ID to send a reply. Here is the thread:
> >
> http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com
> >
> >> The 'EscapeForCopy' was meant to mean 'Escape string in a format require
> >> by the COPY TEXT format', so 'copy' in the name refers to the escaping
> >> format, not the action performed by the function.
> >
> >
> > I see, that makes sense now. Keep it as you see fit, it's not a big deal
> in
> > my opinion.
> >
> >>  Some mathematical toolkits, like Matlab or Mathematica, automatically
> set
> >> a variable called 'ans' (short for "answer") containing the result of
> the
> >> last operation. I was trying to emulate exactly this behaviour.
> >
> >
> > I've actually been using Matlab lately, which must be why the name made
> > sense to me intuitively. I don't know if this is the best name, however.
> It
> > kind of assumes that our users use Matlab/Octave/Mathematica. Maybe
> 'qhist'
> > or 'hist' or something?
> >
> >> The history is not erased. The history is always stored in the client's
> >> memory.
> >
> > Ah, I did not pick up on that. Thank you for explaining it! That's
> actually
> > a very neat way of doing it. Sorry I did not realize that at first.
> >
> >> I was considering such a behaviour. But since the feature is turned off
> by
> >> default, I decided that whoever is using it, is aware of cost. Instead
> of
> >> truncating the history automatically (which could lead to a nasty
> surprise),
> >> I decided to equip the user with \ansclean , a command erasing the
> history.
> >> I believe that it is better to let the user decide when history should
> be
> >> erased, instead of doing it automatically.
> >
> >
> > I think you are correct. However, if we turn on the feature by default
> (at
> > some point in the future) the discussion should probably be re-visited.
> >
> >> This is  my first submitted patch, so I can't really comment on the
> >> process. But if you could add the author's email to CC, the message
> would be
> >> much easier to spot. I replied after two days only because I missed the
> >> message in the flood of other pgsql-hacker messages. I think I need to
> scan
> >> the list more carefully...
> >
> > My fault, I definitely should have CC'd you.
> >
> > As for the patch, I made a new version of the latest one you provided in
> the
> > original thread. Let me know if anything breaks, but it compiles fine on
> my
> > box. Thanks for the feedback!
> >
> > Ian
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
>


Re: [HACKERS] Review: query result history in psql

2013-06-27 Thread Pavel Stehule
Hello

after patching I god segfault

Program terminated with signal 11, Segmentation fault.
#0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
98  for (p = prompt_string;
Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686
ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386
(gdb) bt
#0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
#1  0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134
#2  0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336

Regards

Pavel Stehule

2013/6/28 ian link :
>> It's better to post a review as a reply to the message which contains
>> the patch.
>
> Sorry about that, I did not have the email in my inbox and couldn't figure
> out how to use the old message ID to send a reply. Here is the thread:
> http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com
>
>> The 'EscapeForCopy' was meant to mean 'Escape string in a format require
>> by the COPY TEXT format', so 'copy' in the name refers to the escaping
>> format, not the action performed by the function.
>
>
> I see, that makes sense now. Keep it as you see fit, it's not a big deal in
> my opinion.
>
>>  Some mathematical toolkits, like Matlab or Mathematica, automatically set
>> a variable called 'ans' (short for "answer") containing the result of the
>> last operation. I was trying to emulate exactly this behaviour.
>
>
> I've actually been using Matlab lately, which must be why the name made
> sense to me intuitively. I don't know if this is the best name, however. It
> kind of assumes that our users use Matlab/Octave/Mathematica. Maybe 'qhist'
> or 'hist' or something?
>
>> The history is not erased. The history is always stored in the client's
>> memory.
>
> Ah, I did not pick up on that. Thank you for explaining it! That's actually
> a very neat way of doing it. Sorry I did not realize that at first.
>
>> I was considering such a behaviour. But since the feature is turned off by
>> default, I decided that whoever is using it, is aware of cost. Instead of
>> truncating the history automatically (which could lead to a nasty surprise),
>> I decided to equip the user with \ansclean , a command erasing the history.
>> I believe that it is better to let the user decide when history should be
>> erased, instead of doing it automatically.
>
>
> I think you are correct. However, if we turn on the feature by default (at
> some point in the future) the discussion should probably be re-visited.
>
>> This is  my first submitted patch, so I can't really comment on the
>> process. But if you could add the author's email to CC, the message would be
>> much easier to spot. I replied after two days only because I missed the
>> message in the flood of other pgsql-hacker messages. I think I need to scan
>> the list more carefully...
>
> My fault, I definitely should have CC'd you.
>
> As for the patch, I made a new version of the latest one you provided in the
> original thread. Let me know if anything breaks, but it compiles fine on my
> box. Thanks for the feedback!
>
> Ian
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: query result history in psql

2013-06-27 Thread ian link
>
> It's better to post a review as a reply to the message which contains
> the patch.

Sorry about that, I did not have the email in my inbox and couldn't figure
out how to use the old message ID to send a reply. Here is the thread:
http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com

The 'EscapeForCopy' was meant to mean 'Escape string in a format require by
> the COPY TEXT format', so 'copy' in the name refers to the escaping format,
> not the action performed by the function.
>

I see, that makes sense now. Keep it as you see fit, it's not a big deal in
my opinion.

 Some mathematical toolkits, like Matlab or Mathematica, automatically set
> a variable called 'ans' (short for "answer") containing the result of the
> last operation. I was trying to emulate exactly this behaviour.


I've actually been using Matlab lately, which must be why the name made
sense to me intuitively. I don't know if this is the best name, however. It
kind of assumes that our users use Matlab/Octave/Mathematica. Maybe 'qhist'
or 'hist' or something?

The history is not erased. The history is always stored in the client's
> memory.

Ah, I did not pick up on that. Thank you for explaining it! That's actually
a very neat way of doing it. Sorry I did not realize that at first.

I was considering such a behaviour. But since the feature is turned off by
> default, I decided that whoever is using it, is aware of cost. Instead of
> truncating the history automatically (which could lead to a nasty
> surprise), I decided to equip the user with \ansclean , a command erasing
> the history. I believe that it is better to let the user decide when
> history should be erased, instead of doing it automatically.


I think you are correct. However, if we turn on the feature by default (at
some point in the future) the discussion should probably be re-visited.

This is  my first submitted patch, so I can't really comment on the
> process. But if you could add the author's email to CC, the message would
> be much easier to spot. I replied after two days only because I missed the
> message in the flood of other pgsql-hacker messages. I think I need to scan
> the list more carefully...

My fault, I definitely should have CC'd you.

As for the patch, I made a new version of the latest one you provided in
the original thread. Let me know if anything breaks, but it compiles fine
on my box. Thanks for the feedback!

Ian


query-history-review1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: enable new error fields in plpgsql (9.4)

2013-06-27 Thread Pavel Stehule
Hello

2013/6/28 Noah Misch :
> On Tue, Jun 25, 2013 at 03:56:27PM +0200, Pavel Stehule wrote:
>> cleaned patch is in attachment
>
> Of the five options you're adding to GET STACKED DIAGNOSTICS, four of them
> appear in the SQL standard.  DATATYPE_NAME does not; I think we should call it
> PG_DATATYPE_NAME in line with our other extensions in this area.
>

yes, but It should be fixed in 9.3 enhanced fields too - it should be
consistent with PostgreSQL fields


>
>> >> 2013/2/1 Pavel Stehule :
>> >> > 2013/2/1 Peter Eisentraut :
>> >> >> On 2/1/13 8:00 AM, Pavel Stehule wrote:
>> >> >>> 2013/2/1 Marko Tiikkaja :
>> >>  Is there a compelling reason why we wouldn't provide these already in
>> >>  9.3?
>> >> >>>
>> >> >>> a time for assign to last commitfest is out.
>> >> >>>
>> >> >>> this patch is relative simple and really close to enhanced error
>> >> >>> fields feature - but depends if some from commiters will have a time
>> >> >>> for commit to 9.3 - so I am expecting primary target 9.4, but I am not
>> >> >>> be angry if it will be commited early.
>> >> >>
>> >> >> If we don't have access to those fields on PL/pgSQL, what was the point
>> >> >> of the patch to begin with?  Surely, accessing them from C wasn't the
>> >> >> main use case?
>> >> >>
>> >> >
>> >> > These fields are available for application developers now. But is a
>> >> > true, so without this patch, GET STACKED DIAGNOSTICS statement will
>> >> > not be fully consistent, because some fields are accessible and others
>> >> > not
>
> I am inclined to back-patch this to 9.3.  The patch is fairly mechanical, and
> I think the risk of introducing bugs is less than the risk that folks will be
> confused by these new-in-9.3 error fields being accessible from libpq and the
> protocol, yet inaccessible from PL/pgSQL.
>

+1

>
> The existing protocol documentation says things like this:
>
> Table name: if the error was associated with a specific table, the
> name of the table.  (When this field is present, the schema name field
> provides the name of the table's schema.)
>
> The way you have defined RAISE does not enforce this; the user can set
> TABLE_NAME without setting SCHEMA_NAME at all.  I see a few options here:
>
> 1. Change RAISE to check the invariants thoroughly.  For example, TABLE_NAME
> would require SCHEMA name, and the pair would need to name an actual table.
>
> 2. Change RAISE to check the invariants simply.  For example, it could check
> that SCHEMA_NAME is present whenever TABLE_NAME is present but provide no
> validation that the pair names an actual table.  (I think the protocol
> language basically allows this, though a brief note wouldn't hurt.)
>
> 3. Tweak the protocol documentation to clearly permit what the patch has RAISE
> allow, namely the free-form use of these fields.  This part of the protocol is
> new in 9.3, so it won't be a big deal to change it.  The libpq documentation
> has similar verbiage to update.
>
> I suppose I prefer #3.  It seems fair for user code to employ these fields for
> applications slightly broader than their core use, like a constraint name that
> represents some userspace notion of a constraint rather than normal, cataloged
> constraint.  I can also imagine an application like replaying logs from
> another server, recreating the messages as that server emitted them; #2 or #3
> would suffice for that.
>

I like #3 too. These fields should be used in custom code freely - and
I don't would create some limits. Developer can use it for application
code how he likes. It was designed for this purpose.

> Looking beyond the immediate topic of PL/pgSQL, it seems fair for a FDW to use
> these error fields to name remote objects not known locally.  Suppose a
> foreign INSERT fires a remote trigger, and that trigger violates a constraint
> of some other remote table.  Naming the remote objects would be a reasonable
> design choice.  postgres_fdw might have chosen to just copy fields from the
> remote error (it does not do this today for the fields in question, though).
> The FDW might not even have a notion of a schema, at which point it would
> legitimately choose to leave that field unpopulated.  Once we allow any part
> of the system to generate such errors, we should let PL/pgSQL do the same.

+1

Regards

Pavel

>
>
> Thoughts on that plan?
>
> --
> Noah Misch
> EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-27 Thread Sawada Masahiko
On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila  wrote:
> On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
>> Hi,
>>
>> >
>> >> So our proposal on this problem is that we must ensure that master
>> should
>> > not make any file system level changes without confirming that the
>> >> corresponding WAL record is replicated to the standby.
>> >
>> >   How will you take care of extra WAL on old master during recovery.
>> If it
>> > plays the WAL which has not reached new-master, it can be a problem.
>> >
>>
>> I am trying to understand how there would be extra WAL on old master
>> that it would replay and cause inconsistency. Consider how I am
>> picturing it and correct me if I am wrong.
>>
>> 1) Master crashes. So a failback standby becomes new master forking the
>> WAL.
>> 2) Old master is restarted as a standby (now with this patch, without
>> a new base backup).
>> 3) It would try to replay all the WAL it has available and later
>> connect to the new master also following the timeline switch (the
>> switch might happen using archived WAL and timeline history file OR
>> the new switch-over-streaming-replication-connection as of 9.3,
>> right?)
>>
>> * in (3), when the new standby/old master is replaying WAL, from where
>> is it picking the WAL?
>Yes, this is the point which can lead to inconsistency, new standby/old 
> master
>will replay WAL after the last successful checkpoint, for which he get 
> info from
>control file. It is picking WAL from the location where it was logged when 
> it was active (pg_xlog).
>
>> Does it first replay all the WAL in pg_xlog
>> before archive? Should we make it check for a timeline history file in
>> archive before it starts replaying any WAL?
>
> I have really not thought what is best solution for problem.
>
>> * And, would the new master, before forking the WAL, replay all the
>> WAL that is necessary to come to state (of data directory) that the
>> old master was just before it crashed?
>
> I don't think new master has any correlation with old master's data directory,
> Rather it will replay the WAL it has received/flushed before start acting as 
> master.
when old master fail over, WAL which ahead of new master might be
broken data. so that when user want to dump from old master, there is
possible to fail dump.
it is just idea, we extend parameter which is used in recovery.conf
like 'follow_master_force'. this parameter accepts 'on' and 'off', is
effective only when standby_mode is set to on.

if both parameters 'follow_master_force' and 'standby_mode' is set to 'on',
1. when standby server starts and starts to recovery, standby server
skip to apply WAL which is in  pg_xlog, and request WAL from latest
checkpoint LSN to master server.
2. master server receives LSN which is standby server latest
checkpoint, and compare between LSN of standby and LSN of master
latest checkpoint. if those LSN match, master will send WAL from
latest checkpoint LSN. if not, master will inform standby that failed.
3. standby will fork WAL, and apply WAL which is sent from master continuity.

in this approach, user who want to dump from old master will set 'off'
to follow_master_force and standby_mode, and gets the dump of old
master after master started. OTOH, user who want to starts replication
force will set 'on' to both parameter.

please give me feedback.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move unused buffers to freelist

2013-06-27 Thread Amit Kapila
On Thursday, June 27, 2013 5:54 PM Robert Haas wrote:
> On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila 
> wrote:
> > Configuration Details
> > O/S - Suse-11
> > RAM - 128GB
> > Number of Cores - 16
> > Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min,
> > synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off
> Pgbench -
> > Select-only Scalefactor - 1200 Time - 30 mins
> >
> >  8C-8T16C-16T32C-32T64C-
> 64T
> > Head   62403101810 99516  94707
> > Patch  62827101404 99109  94744
> >
> > On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB
> shared
> > buffers, this is no major difference.
> > One of the reasons could be that there is no much swapping in shared
> buffers
> > as most data already fits in shared buffers.
> 
> I'd like to just back up a minute here and talk about the broader
> picture here.  What are we trying to accomplish with this patch?  Last
> year, I did some benchmarking on a big IBM POWER7 machine (16 cores,
> 64 hardware threads).  Here are the results:
> 
> http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-
> ibm.html
> 
> Now, if you look at these results, you see something interesting.
> When there aren't too many concurrent connections, the higher scale
> factors are only modestly slower than the lower scale factors.  But as
> the number of connections increases, the performance continues to rise
> at the lower scale factors, and at the higher scale factors, this
> performance stops rising and in fact drops off.  So in other words,
> there's no huge *performance* problem for a working set larger than
> shared_buffers, but there is a huge *scalability* problem.  Now why is
> that?
> 
> As far as I can tell, the answer is that we've got a scalability
> problem around BufFreelistLock.  Contention on the buffer mapping
> locks may also be a problem, but all of my previous benchmarking (with
> LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant
> in the room.  My interest in having the background writer add buffers
> to the free list is basically around solving that problem.  It's a
> pretty dramatic problem, as the graph above shows, and this patch
> doesn't solve it.  There may be corner cases where this patch improves
> things (or, equally, makes them worse) but as a general point, the
> difficulty I've had reproducing your test results and the specificity
> of your instructions for reproducing them suggests to me that what we
> have here is not a clear improvement on general workloads.  Yet such
> an improvement should exist, because there are other products in the
> world that have scalable buffer managers; we currently don't.  Instead
> of spending a lot of time trying to figure out whether there's a small
> win in narrow cases here (and there may well be), I think we should
> back up and ask why this isn't a great big win, and what we'd need to
> do to *get* a great big win.  I don't see much point in tinkering
> around the edges here if things are broken in the middle; things that
> seem like small wins or losses now may turn out otherwise in the face
> of a more comprehensive solution.
> 
> One thing that occurred to me while writing this note is that the
> background writer doesn't have any compelling reason to run on a
> read-only workload.  It will still run at a certain minimum rate, so
> that it cycles the buffer pool every 2 minutes, if I remember
> correctly.  But it won't run anywhere near fast enough to keep up with
> the buffer allocation demands of 8, or 32, or 64 sessions all reading
> data not all of which is in shared_buffers at top speed.  In fact,
> we've had reports that the background writer isn't too effective even
> on read-write workloads.  The point is - if the background writer
> isn't waking up and running frequently enough, what it does when it
> does wake up isn't going to matter very much.  I think we need to
> spend some energy poking at that.

Currently it wakes up based on bgwriterdelay config parameter which is by
default 200ms, so you means we should
think of waking up bgwriter based on allocations and number of elements left
in freelist?

As per my understanding Summarization of points raised by you and Andres
which this patch should address to have a bigger win:

1. Bgwriter needs to be improved so that it can help in reducing usage count
and finding next victim buffer 
   (run the clock sweep and add buffers to the free list).
2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are
less.
3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
   (a spinlock for the freelist, and an lwlock for the clock sweep).
4. Separate processes for writing dirty buffers and moving buffers to
freelist
5. Bgwriter needs to be more aggressive, logic based on which it calculates
how many buffers it needs to process needs to be improved.
6. Th

Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Kevin Grittner
Andres Freund  wrote:

> Hm. There were some issues with the test_logical_decoding
> Makefile not cleaning up the regression installation properly.
> Which might have caused the issue.
>
> Could you try after applying the patches and executing a clean
> and then rebuild?

Tried, and problem persists.

> Otherwise, could you try applying my git tree so we are sure we
> test the same thing?
>
> $ git remote add af 
> git://git.postgresql.org/git/users/andresfreund/postgres.git
> $ git fetch af
> $ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4
> $ ./configure ...
> $ make

Tried that, too, and problem persists.  The log shows the last
commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Because you mention possible problems with the regression test
cleanup for test_logical_decoding I also tried:

rm -fr contrib/test_logical_decoding/
git reset --hard HEAD
make world
make check-world

I get the same failure, with primary key or unique index column
showing as 0 in results.

I am off on vacation tomorrow and next week.  Will dig into this
with gdb if not solved when I get back -- unless you have a better
suggestion for how to figure it out.

Once this is solved, I will be working with testing the final
result of all these layers, including creating a second output
plugin.  I want to confirm that multiple plugins play well
together.  I'm glad to see other eyes also on this patch set.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-06-27 Thread Alvaro Herrera
Robert Haas escribió:

> All right, so here's a patch that does something along those lines.
> We have to always take a new snapshot when somebody scans a catalog
> that has no syscache, because there won't be any invalidation messages
> to work off of in that case.  The only catalog in that category that's
> accessed during backend startup (which is mostly what your awful test
> case is banging on) is pg_db_role_setting.  We could add a syscache
> for that catalog or somehow force invalidation messages to be sent
> despite the lack of a syscache, but what I chose to do instead is
> refactor things slightly so that we use the same snapshot for all four
> scans of pg_db_role_setting, instead of taking a new one each time.  I
> think that's unimpeachable on correctness grounds; it's no different
> than if we'd finished all four scans in the time it took us to finish
> the first one, and then gotten put to sleep by the OS scheduler for as
> long as it took us to scan the other three.  Point being that there's
> no interlock there.

That seems perfectly acceptable to me, yeah.

> The difference is 3-4%, which is quite a lot less than what you
> measured before, although on different hardware, so results may vary.

3-4% on that synthetic benchmark sounds pretty acceptable to me, as
well.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation/help for materialized and recursive views

2013-06-27 Thread Kevin Grittner
Robert Haas  wrote:
> Magnus Hagander  wrote:

>>>  The functionality of materialized views will (over time) totally swamp
>>>  that of normal views, so mixing all the corresponding documentation
>>>  with the documentation for normal views probably doesn’t make things
>>>  easier for people that are only interested in normal views.
>> 
>>  That's a better point I think. That said, it would be very useful if
>>  it actually showed up in "\h CREATE VIEW" in psql - I wonder if we
>>  should just add the syntax to that page, and then link said future
>>  information on a separate page somehow?
> 
> IMHO, it's better to keep them separate; they are very different beasts.


+1

Although I wonder whether we shouldn't cross-link those pages

 
-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 09:54:45PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Should we be using entab -s 3?
> 
> IIUC, that wouldn't affect leading whitespace at all.  What it would
> affect I think (outside of comment blocks) is whitespace between code
> and a same-line /* ... */ comment.  Personally I'd prefer that a
> tab-stop-aligned /* ... */ comment always have a tab before it, even
> if the expansion of the tab is only *one* space.  That is, in
> 
> foo(bar,/* comment */
> bar1,   /* comment */
> bar12,  /* comment */
> bar123, /* comment */
> baz);   /* comment */
> 
> I think each of those comments ought to have a tab before it, not
> space(s).  pgindent currently does this inconsistently --- the bar123
> line will have a space instead.  Moving to -s 3 would presumably make
> this worse (even less consistent) not better, since now the bar12 line
> would also be emitted with spaces not a tab.

I think you have grasped the issue.  :-)

> Inside a comment, though, probably the behavior of -s 3 would be just
> fine.  So the real problem here IMO is that use of tabs ought to be
> context sensitive (behavior inside a comment different from outside),
> and I don't think entab can do that.  I see though that it understands
> about C quoted strings, so maybe we could teach it about comments too?

Yes, that could be done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 6:18 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I'm looking at the combined patches 0003-0005, which are essentially all
>> about adding a function to obtain relation OID from (tablespace,
>> filenode).  It takes care to look through the relation mapper, and uses
>> a new syscache underneath for performance.
>
>> One question about this patch, originally, was about the usage of
>> that relfilenode syscache.  It is questionable because it would be the
>> only syscache to apply on top of a non-unique index.
>
> ... which, I assume, is on top of a pg_class index that doesn't exist
> today.  Exactly what is the argument that says performance of this
> function is sufficiently critical to justify adding both the maintenance
> overhead of a new pg_class index, *and* a broken-by-design syscache?
>
> Lose the cache and this probably gets a lot easier to justify.  As is,
> I think I'd vote to reject altogether.

I already voted that way, and nothing's happened since to change my mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Tom Lane
Bruce Momjian  writes:
> Should we be using entab -s 3?

IIUC, that wouldn't affect leading whitespace at all.  What it would
affect I think (outside of comment blocks) is whitespace between code
and a same-line /* ... */ comment.  Personally I'd prefer that a
tab-stop-aligned /* ... */ comment always have a tab before it, even
if the expansion of the tab is only *one* space.  That is, in

foo(bar,/* comment */
bar1,   /* comment */
bar12,  /* comment */
bar123, /* comment */
baz);   /* comment */

I think each of those comments ought to have a tab before it, not
space(s).  pgindent currently does this inconsistently --- the bar123
line will have a space instead.  Moving to -s 3 would presumably make
this worse (even less consistent) not better, since now the bar12 line
would also be emitted with spaces not a tab.

Inside a comment, though, probably the behavior of -s 3 would be just
fine.  So the real problem here IMO is that use of tabs ought to be
context sensitive (behavior inside a comment different from outside),
and I don't think entab can do that.  I see though that it understands
about C quoted strings, so maybe we could teach it about comments too?

No idea whether astyle is any smarter about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 05:31:54PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Noah Misch wrote:
> >> Note that emacs and pgindent remain at odds over interior tabs in comments.
> >> When pgindent finds a double-space (typically after a sentence) ending at a
> >> tab stop, it replaces the double-space with a tab.  c-fill-paragraph will
> >> convert that tab to a *single* space, and that can be enough to change many
> >> line break positions.
> 
> > We should really stop pgindent from converting those double-spaces to
> > tabs.  Those tabs are later changed to three or four spaces when wording
> > of the comment is changed, and things start looking very odd.
> 
> +1.  That's probably the single most annoying bit of behavior in pgindent.
> Being a two-spaces-after-a-period kind of guy, it might bite me more
> often than other people, but now that somebody else has brought it up...

That might be caused by entab, actually.  I think there are two cases
here:

*  six-space indented line
*  two spaces in text that happen to land on a tab stop

entab doesn't distinguish them, but it certainly could.  In fact, it
already has an option for that:

-s  Minimum spaces needed to replace with a tab (default = 2).

Should we be using entab -s 3?  That would make a six-space indent be a
tab and two spaces, but it would certainly handle the period-space-space
case.  I actually don't remember people complaining about this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-06-27 Thread Nicholas White
> The result of the current patch using lead

Actually, I think I agree with you (and, FWIW, so does Oracle:
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18).
I've refactored the window function's implementation so that (e.g.) lead(5)
means the 5th non-null value away in front of the current row (the previous
implementation was the last non-null value returned if the 5th rows in
front was null). These semantics are slower, as the require the function to
scan through the tuples discarding non-null ones. I've made the
implementation use a bitmap in the partition context to cache whether or
not a given tuple produces a null. This seems correct (it passes the
regression tests) but as it stores row offsets (which are int64s) I was
careful not to use bitmap methods that use ints to refer to set members.
I've added more explanation in the code's comments. Thanks -


lead-lag-ignore-nulls.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixing pg_ctl with relative paths

2013-06-27 Thread Fujii Masao
On Fri, Jun 28, 2013 at 12:47 AM, Fujii Masao  wrote:
>
> Another corner case is, for example, pg_ctl -D test1 -o "-D test2", 
> that is, multiple -D specifications appear in the command-line.

The patch handles this case properly. Sorry for noise..

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Claudio Freire
On Thu, Jun 27, 2013 at 6:20 PM, Jeff Janes  wrote:
> On Wed, Jun 26, 2013 at 11:14 AM, Claudio Freire 
> wrote:
>>
>>
>> Now I just have two indices. One that indexes only hot tuples, it's
>> very heavily queried and works blazingly fast, and one that indexes by
>> (hotness, key). I include the hotness value on the query, and still
>> works quite fast enough. Luckily, I know things become cold after an
>> update to mark them cold, so I can do that. I included hotness on the
>> index to cluster updates on the hot part of the index, but I could
>> have just used a regular index and paid a small price on the updates.
>>
>> Indeed, for a while it worked without the hotness, and there was no
>> significant trouble. I later found out that WAL bandwidth was
>> noticeably decreased when I added that hotness column, so I did, helps
>> a bit with replication. Has worked ever since.
>
>
>
> I'm surprised that clustering updates into the hot part of the index,
> without also clustering them it into a hot part of the table heap, works
> well enough to make a difference.  Does clustering in the table just come
> naturally under your usage patterns?

Yes, hotness is highly correlated to age, while still not 100%. So
most updates hit the tail of the table, about a week or two worth of
it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize

2013-06-27 Thread Jeff Janes
On Sat, Jun 22, 2013 at 12:46 AM, Stephen Frost  wrote:

> Noah,
>
> * Noah Misch (n...@leadboat.com) wrote:
> > This patch introduces MemoryContextAllocHuge() and repalloc_huge() that
> check
> > a higher MaxAllocHugeSize limit of SIZE_MAX/2.
>
> Nice!  I've complained about this limit a few different times and just
> never got around to addressing it.
>
> > This was made easier by tuplesort growth algorithm improvements in commit
> > 8ae35e91807508872cabd3b0e8db35fc78e194ac.  The problem has come up before
> > (TODO item "Allow sorts to use more available memory"), and Tom floated
> the
> > idea[1] behind the approach I've used.  The next limit faced by sorts is
> > INT_MAX concurrent tuples in memory, which limits helpful work_mem to
> about
> > 150 GiB when sorting int4.
>
> That's frustratingly small. :(
>

I've added a ToDo item to remove that limit from sorts as well.

I was going to add another item to make nodeHash.c use the new huge
allocator, but after looking at it just now it was not clear to me that it
even has such a limitation.  nbatch is limited by MaxAllocSize, but
nbuckets doesn't seem to be.

Cheers,

Jeff


Re: [HACKERS] MD5 aggregate

2013-06-27 Thread Dean Rasheed
On 27 June 2013 17:47, Peter Eisentraut  wrote:
> On 6/27/13 4:19 AM, Dean Rasheed wrote:
>> I'd say there are clearly people who want it, and the nature of some
>> of those answers suggests to me that we ought to have a better answer
>> in core.
>
> It's not clear what these people wanted this functionality for.  They
> all wanted to analyze a table to compare with another table (or the same
> table later).  Either, they wanted this to detect data changes, in which
> case the right tool is a checksum, not a cryptographic hash.  We already
> have several checksum implementations in core, so we could expose on of
> them.  Or they wanted this to protect their data from tampering, in
> which case the right tool is a cryptographic hash, but Noah argues that
> a sum of MD5 hashes is not cryptographically sound.  (And in any case,
> we don't put cryptographic functionality into the core.)
>
> The reason md5_agg is proposed here and in all those cited posts is
> presumably because the md5() function was already there anyway.  The the
> md5() function is there because the md5 code was already there anyway
> because of the authentication.  Let's not add higher-order
> already-there-anyway code. ;-)
>

OK fair enough. It's looking like there are more people who don't want
md5_agg() in core, or want something different, than who do want it.
Also, if we're taking the view that the existing md5() function is
only for hashing passwords, then it's probably not worth trying to
optimise it.

At this stage it's probably best to mark this as returned with
feedback, and I'll consider the options for rewriting it, but not
during this commitfest.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-06-27 Thread Kevin Grittner
Peter Eisentraut  wrote:

> On 6/27/13 10:20 AM, Robert Haas wrote:
>>  So I'd like to endorse Josh's idea: subject to appropriate review,
>>  let's add these test cases.  Then, if it really turns out to be too
>>  burdensome, we can take them out, or figure out a sensible way to
>>  split the suite.  Pushing all of Robins work into a secondary suite,
>>  or throwing it out altogether, both seem to me to be things which will
>>  not be to the long-term benefit of the project.
> 
> I agree.  If it gets too big, let's deal with it then.  But let's not
> make things more complicated because they *might* get too big later.

+1


--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: enable new error fields in plpgsql (9.4)

2013-06-27 Thread Noah Misch
On Tue, Jun 25, 2013 at 03:56:27PM +0200, Pavel Stehule wrote:
> cleaned patch is in attachment

Of the five options you're adding to GET STACKED DIAGNOSTICS, four of them
appear in the SQL standard.  DATATYPE_NAME does not; I think we should call it
PG_DATATYPE_NAME in line with our other extensions in this area.


> >> 2013/2/1 Pavel Stehule :
> >> > 2013/2/1 Peter Eisentraut :
> >> >> On 2/1/13 8:00 AM, Pavel Stehule wrote:
> >> >>> 2013/2/1 Marko Tiikkaja :
> >>  Is there a compelling reason why we wouldn't provide these already in
> >>  9.3?
> >> >>>
> >> >>> a time for assign to last commitfest is out.
> >> >>>
> >> >>> this patch is relative simple and really close to enhanced error
> >> >>> fields feature - but depends if some from commiters will have a time
> >> >>> for commit to 9.3 - so I am expecting primary target 9.4, but I am not
> >> >>> be angry if it will be commited early.
> >> >>
> >> >> If we don't have access to those fields on PL/pgSQL, what was the point
> >> >> of the patch to begin with?  Surely, accessing them from C wasn't the
> >> >> main use case?
> >> >>
> >> >
> >> > These fields are available for application developers now. But is a
> >> > true, so without this patch, GET STACKED DIAGNOSTICS statement will
> >> > not be fully consistent, because some fields are accessible and others
> >> > not

I am inclined to back-patch this to 9.3.  The patch is fairly mechanical, and
I think the risk of introducing bugs is less than the risk that folks will be
confused by these new-in-9.3 error fields being accessible from libpq and the
protocol, yet inaccessible from PL/pgSQL.


The existing protocol documentation says things like this:

Table name: if the error was associated with a specific table, the
name of the table.  (When this field is present, the schema name field
provides the name of the table's schema.)

The way you have defined RAISE does not enforce this; the user can set
TABLE_NAME without setting SCHEMA_NAME at all.  I see a few options here:

1. Change RAISE to check the invariants thoroughly.  For example, TABLE_NAME
would require SCHEMA name, and the pair would need to name an actual table.

2. Change RAISE to check the invariants simply.  For example, it could check
that SCHEMA_NAME is present whenever TABLE_NAME is present but provide no
validation that the pair names an actual table.  (I think the protocol
language basically allows this, though a brief note wouldn't hurt.)

3. Tweak the protocol documentation to clearly permit what the patch has RAISE
allow, namely the free-form use of these fields.  This part of the protocol is
new in 9.3, so it won't be a big deal to change it.  The libpq documentation
has similar verbiage to update.

I suppose I prefer #3.  It seems fair for user code to employ these fields for
applications slightly broader than their core use, like a constraint name that
represents some userspace notion of a constraint rather than normal, cataloged
constraint.  I can also imagine an application like replaying logs from
another server, recreating the messages as that server emitted them; #2 or #3
would suffice for that.

Looking beyond the immediate topic of PL/pgSQL, it seems fair for a FDW to use
these error fields to name remote objects not known locally.  Suppose a
foreign INSERT fires a remote trigger, and that trigger violates a constraint
of some other remote table.  Naming the remote objects would be a reasonable
design choice.  postgres_fdw might have chosen to just copy fields from the
remote error (it does not do this today for the fields in question, though).
The FDW might not even have a notion of a schema, at which point it would
legitimately choose to leave that field unpopulated.  Once we allow any part
of the system to generate such errors, we should let PL/pgSQL do the same.


Thoughts on that plan?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Jeff Janes
On Thu, Jun 27, 2013 at 9:51 AM, Atri Sharma  wrote:

> >
> > commit_delay exists to artificially increase the window in which the
> > leader backend waits for more group commit followers. At higher client
> > counts, that isn't terribly useful because you'll naturally have
> > enough clients anyway, but at lower client counts particularly where
> > fsyncs have high latency, it can help quite a bit. I mention this
> > because clearly commit_delay is intended to trade off latency for
> > throughput. Although having said that, when I worked on commit_delay,
> > the average and worse-case latencies actually *improved* for the
> > workload in question, which consisted of lots of small write
> > transactions. Though I wouldn't be surprised if you could produce a
> > reasonable case where latency was hurt a bit, but throughput improved.
>

Throughput and average latency are strictly reciprocal, aren't they?  I
think when people talk about improving latency, they must mean something
like "improve 95% latency", not average latency.  Otherwise, it doesn't
seem to make much sense to me, they are the same thing.


>
> Thanks for your reply.
>
> The logic says that latency will be hit when commit_delay is applied,
> but I am really interested in why we get an improvement instead.
>

There is a spot on the disk to which the current WAL is destined to go.
 That spot on the disk is not going to be under the write-head for (say)
another 6 milliseconds.

Without commit_delay, I try to commit my record, but find that someone else
is already on the lock (and on the fsync as well).  I have to wait for 6
milliseconds before that person gets their commit done and releases the
lock, then I can start mine, and have to wait another 8 milliseconds (7500
rpm disk) for the spot to come around again, for a total of 14 milliseconds
of latency.

With commit_delay, I get my record in under the nose of the person who is
already doing the delay, and they wake up and flush it for me in time to
make the 6 millisecond cutoff.  Total 6 milliseconds latency for me.

One thing I tried a while ago (before the recent group-commit changes were
made) was to record in shared memory when the last fsync finished, and then
the next time someone needed to fsync, they would sleep until just before
the write spot was predicted to be under the write head again
(previous_finish + rotation_time - safety_margin, where rotation_time -
safety_margin were represented by a single guc).  It worked pretty well on
the system in which I wrote it, but seemed too brittle to be a general
solution.

Another thing I tried was to drop the WALWriteLock after the WAL write
finished but before calling fsync.  The theory was that process 1 could
write its WAL and then block on the fsync, and then process 2 could also
write its WAL and also block directly on the fsync, and the kernel/disk
controller would be smart enough to realize that it could merge the two
pending fsync requests into one.  This did not work at all, possibly
because my disk controller was very cheap and not very smart.

Cheers,

Jeff


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> 
> > Being at least one of the persons having mentioned astyle to Alvaro, I
> > had tested that once and I thought the results were resembling something
> > reasonable after an hour of fiddling or so. But there were certain
> > things that I could not be make it do during that. The only thing I
> > remember now was reducing the indentation of parameters to the left if
> > the line length got to long. Now, I personally think that's an
> > anti-feature, but I am not sure if others think differently.
> 
> I never particularly cared for that behavior either.  It probably made
> sense back in the video-terminal days, when your view of a program was
> 80 columns period.

I've never liked that either; I am a fan of keeping things to 80
columns, but when things get longer I prefer my editor to wrap them to
the next line without the silly de-indent (or not wrap, if I tell it not
to.)

Another benefit of more modern tools is that there's no need for a
typedef file, which is great when you're trying to indent after a patch
which adds some more typedefs that are not listed in the file.

> These days I think most people can use a wider
> window at need --- not that I want to adopt wider lines as standard, but
> the readability tradeoff between not having lines wrap versus messing up
> the indentation seems like it's probably different now.

Agreed.  I would be sad if we adopted a policy of sloppiness on width.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Tom Lane
Alvaro Herrera  writes:
> I'm looking at the combined patches 0003-0005, which are essentially all
> about adding a function to obtain relation OID from (tablespace,
> filenode).  It takes care to look through the relation mapper, and uses
> a new syscache underneath for performance.

> One question about this patch, originally, was about the usage of
> that relfilenode syscache.  It is questionable because it would be the
> only syscache to apply on top of a non-unique index.

... which, I assume, is on top of a pg_class index that doesn't exist
today.  Exactly what is the argument that says performance of this
function is sufficiently critical to justify adding both the maintenance
overhead of a new pg_class index, *and* a broken-by-design syscache?

Lose the cache and this probably gets a lot easier to justify.  As is,
I think I'd vote to reject altogether.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Alvaro Herrera
Andres Freund wrote:

> On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote:
> > One question about this patch, originally, was about the usage of
> > that relfilenode syscache.  It is questionable because it would be the
> > only syscache to apply on top of a non-unique index.  It is said that
> > this doesn't matter because the only non-unique values that can exist
> > would reference entries that have relfilenode = 0; and in turn this
> > doesn't matter because those values would be queried through the
> > relation mapper anyway, not from the syscache.  (This is implemented in
> > the higher-level function.)
> 
> Well, you can even query the syscache without hurt for mapped relations,
> you just won't get an answer. The only thing you may not do because it
> would yield multiple results is to query the syscache with
> (tablespace, InvalidOid/0). Which is still not nice although it doesn't
> make much sense to query with InvalidOid.

Yeah, I agree that it doesn't make sense to query for that.  The problem
is that something could reasonably be developed that uses the syscache
directly without checking whether the relfilenode is 0.

> > (I would much prefer for there to be a way to define partial
> > indexes in BKI.)
> 
> I don't think that's the hard part, it's that we don't use the full
> machinery for updating indexes but rather the relatively simplistic
> CatalogUpdateIndexes(). I am not sure we can guarantee that the required
> infrastructure is available in all the cases to support doing generic
> predicate evaluation.

You're right, CatalogIndexInsert() doesn't allow for predicates, so
fixing BKI would not help.

I still wonder about having a separate cache.  Right now pg_class has
two indexes; adding this new one would mean a rather large decrease in
insert performance (50% more indexes to update than previously), which
is not good considering that it's inserted into for each and every temp
table creation -- that would become slower.  This would be a net loss
for every user, even those that don't want logical replication.  On the
other hand, table creation also has to add tuples to pg_attribute,
pg_depend, pg_shdepend and maybe other catalogs, so perhaps the
difference is negligible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Tom Lane
Andres Freund  writes:
> I think before doing any serious testing we would need to lay out how
> many changes and what changes in formatting we would allow and what kind
> of enforced formatting rules we think are required.

Well, we certainly never applied any such analysis to pgindent.  I
personally don't mind leaving corner cases to the judgment of the tool
author ... of course, what's a corner case and what's important may be
in the eye of the beholder.  But it's surely a bug, for instance, that
pgindent is so clueless about function pointers.

> Being at least one of the persons having mentioned astyle to Alvaro, I
> had tested that once and I thought the results were resembling something
> reasonable after an hour of fiddling or so. But there were certain
> things that I could not be make it do during that. The only thing I
> remember now was reducing the indentation of parameters to the left if
> the line length got to long. Now, I personally think that's an
> anti-feature, but I am not sure if others think differently.

I never particularly cared for that behavior either.  It probably made
sense back in the video-terminal days, when your view of a program was
80 columns period.  These days I think most people can use a wider
window at need --- not that I want to adopt wider lines as standard, but
the readability tradeoff between not having lines wrap versus messing up
the indentation seems like it's probably different now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Andres Freund
Hi,

On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote:
> One question about this patch, originally, was about the usage of
> that relfilenode syscache.  It is questionable because it would be the
> only syscache to apply on top of a non-unique index.  It is said that
> this doesn't matter because the only non-unique values that can exist
> would reference entries that have relfilenode = 0; and in turn this
> doesn't matter because those values would be queried through the
> relation mapper anyway, not from the syscache.  (This is implemented in
> the higher-level function.)

Well, you can even query the syscache without hurt for mapped relations,
you just won't get an answer. The only thing you may not do because it
would yield multiple results is to query the syscache with
(tablespace, InvalidOid/0). Which is still not nice although it doesn't
make much sense to query with InvalidOid.

> I'm not sure about the placing of the new SQL-callable function in
> dbsize.c either.  It is certainly not a function that has anything to do
> with object sizes.

Not happy with that myself. I only placed the function there because
pg_relation_filenode() already was in it. Happy to change if somebody
has a good idea.

> (I would much prefer for there to be a way to define partial
> indexes in BKI.)

I don't think that's the hard part, it's that we don't use the full
machinery for updating indexes but rather the relatively simplistic
CatalogUpdateIndexes(). I am not sure we can guarantee that the required
infrastructure is available in all the cases to support doing generic
predicate evaluation.

Should bki really be the problem we probably could create the index
after bki based bootstrapping finished.

Thanks,

Andres Freund
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Markus Wanner
On 06/27/2013 11:13 PM, Jeff Janes wrote:
> Wouldn't any IO system being used on a high-end system be fairly good
> about making this work through interleaved read-ahead algorithms?

To some extent, certainly. It cannot possibly get better than a fully
sequential load, though.

> That sounds like it would be much more susceptible to lock contention,
> and harder to get bug-free, than dividing into bigger chunks, like whole
> 1 gig segments.  

Maybe, yes. Splitting a known amount of work into equal pieces sounds
like a pretty easy parallelization strategy. In case you don't know the
total amount of work or the size of each piece in advance, it gets a bit
harder. Choosing chunks that turn out to be too big certainly hurts.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Andres Freund
On 2013-06-27 17:31:54 -0400, Tom Lane wrote:
> > Really, we should get out of patched BSD indent entirely already.  How
> > about astyle, for instance?  I'm told that with some sensible options,
> > the diff to what we have now is not very large, and several things
> > actually become sensible (such as function pointer decls, which are
> > messed up rather badly by pgindent)
> 
> AFAIR, no one has ever done a serious comparison to anything except GNU
> indent, and (at least at the time) it seemed to have bugs as bad as
> pgindent's, just different ones.  I'm certainly open to another choice
> as long as we don't lose on portability of the tool.  But who will do
> the legwork to test something else?

I think before doing any serious testing we would need to lay out how
many changes and what changes in formatting we would allow and what kind
of enforced formatting rules we think are required.

Being at least one of the persons having mentioned astyle to Alvaro, I
had tested that once and I thought the results were resembling something
reasonable after an hour of fiddling or so. But there were certain
things that I could not be make it do during that. The only thing I
remember now was reducing the indentation of parameters to the left if
the line length got to long. Now, I personally think that's an
anti-feature, but I am not sure if others think differently.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Jeff Janes
On Thu, Jun 27, 2013 at 9:35 AM, Nicolas Barbier
wrote:

>
> My reasoning was: To determine which index block to update (typically
> one in both the partitioned and non-partitioned cases), one needs to
> walk the index first, which supposedly causes one additional (read)
> I/O in the non-partitioned case on average, because there is one extra
> level and the lower part of the index is not cached (because of the
> size of the index).


But the "extra level" is up at the top where it is well cached, not at the
bottom where it is not.

Cheers,

Jeff


Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-06-27 Thread Alvaro Herrera
I'm looking at the combined patches 0003-0005, which are essentially all
about adding a function to obtain relation OID from (tablespace,
filenode).  It takes care to look through the relation mapper, and uses
a new syscache underneath for performance.

One question about this patch, originally, was about the usage of
that relfilenode syscache.  It is questionable because it would be the
only syscache to apply on top of a non-unique index.  It is said that
this doesn't matter because the only non-unique values that can exist
would reference entries that have relfilenode = 0; and in turn this
doesn't matter because those values would be queried through the
relation mapper anyway, not from the syscache.  (This is implemented in
the higher-level function.)

This means that there would be one syscache that is damn easy to misuse
.. and we've setup things so that syscaches are very easy to use in the
first place.  From that perspective, this doesn't look good.  However,
it's an easy mistake to notice and fix, so perhaps this is not a serious
problem.  (I would much prefer for there to be a way to define partial
indexes in BKI.)

I'm not sure about the placing of the new SQL-callable function in
dbsize.c either.  It is certainly not a function that has anything to do
with object sizes.  The insides of it would belong more in lsyscache.c,
I think, except then that file does not otherwise concern itself with
the relation mapper so its scope would have to expand a bit.  But this
is no place for the SQL-callable portion, so that would have to find a
different home as well.

The other option, of course, it to provide a separate caching layer for
these objects altogether, but given how concise this implementation is,
it doesn't sound too palatable.

Thoughts?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Tom Lane
Alvaro Herrera  writes:
> Noah Misch wrote:
>> Note that emacs and pgindent remain at odds over interior tabs in comments.
>> When pgindent finds a double-space (typically after a sentence) ending at a
>> tab stop, it replaces the double-space with a tab.  c-fill-paragraph will
>> convert that tab to a *single* space, and that can be enough to change many
>> line break positions.

> We should really stop pgindent from converting those double-spaces to
> tabs.  Those tabs are later changed to three or four spaces when wording
> of the comment is changed, and things start looking very odd.

+1.  That's probably the single most annoying bit of behavior in pgindent.
Being a two-spaces-after-a-period kind of guy, it might bite me more
often than other people, but now that somebody else has brought it up...

> Really, we should get out of patched BSD indent entirely already.  How
> about astyle, for instance?  I'm told that with some sensible options,
> the diff to what we have now is not very large, and several things
> actually become sensible (such as function pointer decls, which are
> messed up rather badly by pgindent)

AFAIR, no one has ever done a serious comparison to anything except GNU
indent, and (at least at the time) it seemed to have bugs as bad as
pgindent's, just different ones.  I'm certainly open to another choice
as long as we don't lose on portability of the tool.  But who will do
the legwork to test something else?

Probably the principal argument against switching to a different tool
has been that whitespace changes would complicate back-patching, but
I don't see why we couldn't solve that by re-indenting all the live back
branches at the time of the switch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Jeff Janes
On Thu, Jun 27, 2013 at 2:12 AM, Nicolas Barbier
wrote:

> 2013/6/26 Heikki Linnakangas :
>
> > On 26.06.2013 16:41, Yuri Levinsky wrote:
> >
> >> Heikki,
> >> As far as I understand the height of the btree will affect the number of
> >> I/Os necessary. The height of the tree does not increase linearly with
> >> the number of records.
> >
> > Now let's compare that with a hash partitioned table, with 1000
> partitions
> > and a b-tree index on every partition. [..] This is almost equivalent to
> > just having a b-tree that's one level taller [..] There certainly isn't
> > any difference in the number of actual I/O performed.
>
> Imagine that there are a lot of indexes, e.g., 50. Although a lookup
> (walking one index) is equally fast, an insertion must update al 50
> indexes. When each index requires one extra I/O (because each index is
> one level taller), that is 50 extra I/Os.


Except for pathological conditions like indexing the longest values that
can be indexed, a btree insertion rarely needs to split even the lowest
internal page, much less all pages up to the root.

...


> Additionally: Imagine that the data can be partitioned along some
> column that makes sense for performance reasons (e.g., some “date”
> where most accesses are concentrated on rows with more recent dates).
> The other indexes will probably not have such a performance
> distribution. Using those other indexes (both for look-ups and
> updates) in the non-partitioned case, will therefore pull a huge
> portion of each index into cache (because of the “random distribution”
> of the non-date data). In the partitioned case, more cache can be
> spent on the indexes that correspond to the “hot partitions.”
>

This could well be true for range partitioning on date.  It is hard to see
how this would work well for hash partitioning on the date, unless you
carefully arrange for the number of hash partitions to be about the same as
the number of distinct dates in the table.

Cheers,

Jeff


Re: [HACKERS] checking variadic "any" argument in parser - should be array

2013-06-27 Thread Pavel Stehule
Hello

2013/6/27 Jeevan Chalke :
> Hi Pavel,
>
> I had a look over your new patch and it looks good to me.
>
> My review comments on patch:
>
> 1. It cleanly applies with patch -p1 command.
> 2. make/make install/make check were smooth.
> 3. My own testing didn't find any issue.
> 4. I had a code walk-through and I am little bit worried or confused on
> following changes:
>
>   if (PG_ARGISNULL(argidx))
>   return NULL;
>
> ! /*
> !  * Non-null argument had better be an array.  The parser doesn't
> !  * enforce this for VARIADIC ANY functions (maybe it should?), so
> that
> !  * check uses ereport not just elog.
> !  */
> ! arr_typid = get_fn_expr_argtype(fcinfo->flinfo, argidx);
> ! if (!OidIsValid(arr_typid))
> ! elog(ERROR, "could not determine data type of concat()
> input");
> !
> ! if (!OidIsValid(get_element_type(arr_typid)))
> ! ereport(ERROR,
> ! (errcode(ERRCODE_DATATYPE_MISMATCH),
> !  errmsg("VARIADIC argument must be an array")));
>
> - /* OK, safe to fetch the array value */
>   arr = PG_GETARG_ARRAYTYPE_P(argidx);
>
>   /*
> --- 3820,3828 
>   if (PG_ARGISNULL(argidx))
>   return NULL;
>
> ! /* Non-null argument had better be an array */
> !
> Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo,
> argidx;
>
>   arr = PG_GETARG_ARRAYTYPE_P(argidx);
>
> We have moved checking of array type in parser (ParseFuncOrColumn()) which
> basically verifies that argument type is indeed an array. Which exactly same
> as that of second if statement in earlier code, which you replaced by an
> Assert.
>
> However, what about first if statement ? Is it NO more required now? What if
> get_fn_expr_argtype() returns InvalidOid, don't you think we need to throw
> an error saying "could not determine data type of concat() input"?

yes, If I understand well to question, a main differences is in stage
of checking. When I do a check in parser stage, then I can expect so
"actual_arg_types" array holds a valid values.

>
> Well, I tried hard to trigger that code, but didn't able to get any test
> which fails with that error in earlier version and with your version. And
> thus I believe it is a dead code, which you removed ? Is it so ?

It is removed in this version :), and it is not a bug, so there is not
reason for patching previous versions. Probably there should be a
Assert instead of error. This code is relatively mature - so I don't
expect a issue from SQL level now. On second hand, this functions can
be called via DirectFunctionCall API from custom C server side
routines, and there a developer can does a bug simply if doesn't fill
necessary structs well. So, there can be Asserts still.

>
> Moreover, if in any case get_fn_expr_argtype() returns an InvalidOid, we
> will hit an Assert rather than an error, is this what you expect ?
>

in this moment yes,

small change can helps with searching of source of possible issues.

so instead on line
Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo,
argidx;

use two lines

Assert(OidIsValid(get_fn_expr_argtype(fcinfo->flinfo, argidx)));
Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo,
argidx;

what you think?

> 5. This patch has user visibility, i.e. now we are throwing an error when
> user only says "VARIADIC NULL" like:
>
> select concat(variadic NULL) is NULL;
>
> Previously it was working but now we are throwing an error. Well we are now
> more stricter than earlier with using VARIADIC + ANY, so I have no issue as
> such. But I guess we need to document this user visibility change. I don't
> know exactly where though. I searched for VARIADIC and all related
> documentation says it needs an array, so nothing harmful as such, so you can
> ignore this review comment but I thought it worth mentioning it.

yes, it is point for possible issues in RELEASE NOTES, I am thinking ???

Regards

Pavel

>
> Thanks
>
>
>
> On Thu, Jun 27, 2013 at 12:35 AM, Pavel Stehule 
> wrote:
>>
>> Hello
>>
>> remastered version
>>
>> Regards
>>
>> Pavel
>>
>> 2013/6/26 Jeevan Chalke :
>> > Hi Pavel
>> >
>> >
>> > On Sat, Jan 26, 2013 at 9:22 AM, Pavel Stehule 
>> > wrote:
>> >>
>> >> Hello Tom
>> >>
>> >> you did comment
>> >>
>> >> ! <><--><--> * Non-null argument had better be an array.
>> >> The parser doesn't
>> >> ! <><--><--> * enforce this for VARIADIC ANY functions
>> >> (maybe it should?), so
>> >> ! <><--><--> * that check uses ereport not just elog.
>> >> ! <><--><--> */
>> >>
>> >> So I moved this check to parser.
>> >>
>> >> It is little bit stricter - requests typed NULL instead unknown NULL,
>> >> but it can mark error better and early
>> >
>> >
>> > Tom suggested that this check should be better done by parser.
>> > This patch tries to accomplish t

Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Jeff Janes
On Wed, Jun 26, 2013 at 11:14 AM, Claudio Freire wrote:

>
> Now I just have two indices. One that indexes only hot tuples, it's
> very heavily queried and works blazingly fast, and one that indexes by
> (hotness, key). I include the hotness value on the query, and still
> works quite fast enough. Luckily, I know things become cold after an
> update to mark them cold, so I can do that. I included hotness on the
> index to cluster updates on the hot part of the index, but I could
> have just used a regular index and paid a small price on the updates.
>
Indeed, for a while it worked without the hotness, and there was no
> significant trouble. I later found out that WAL bandwidth was
> noticeably decreased when I added that hotness column, so I did, helps
> a bit with replication. Has worked ever since.
>


I'm surprised that clustering updates into the hot part of the index,
without also clustering them it into a hot part of the table heap, works
well enough to make a difference.  Does clustering in the table just come
naturally under your usage patterns?

 Cheers,

Jeff


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Jeff Janes
On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner  wrote:

> On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> > We could also allow a large query to search a single table in parallel.
> > A seqscan would be easy to divide into N equally-sized parts that can be
> > scanned in parallel. It's more difficult for index scans, but even then
> > it might be possible at least in some limited cases.
>
> So far reading sequentially is still faster than hopping between
> different locations. Purely from the I/O perspective, that is.
>


Wouldn't any IO system being used on a high-end system be fairly good about
making this work through interleaved read-ahead algorithms?  Also,
hopefully the planner would be able to predict when parallelization has
nothing to add and avoid using it, although surely that is easier said than
done.


>
> For queries where the single CPU core turns into a bottle-neck and which
> we want to parallelize, we should ideally still do a normal, fully
> sequential scan and only fan out after the scan and distribute the
> incoming pages (or even tuples) to the multiple cores to process.
>

That sounds like it would be much more susceptible to lock contention, and
harder to get bug-free, than dividing into bigger chunks, like whole 1 gig
segments.

Fanning out line by line (according to line_number % number_processes) was
my favorite parallelization method in Perl, but those files were read only
and so had no concurrency issues.

Cheers,

Jeff


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Josh Berkus
On 06/27/2013 11:11 AM, Tom Lane wrote:
> AFAICS, the threshold question here is whether the patch helps usefully
> for tables with typical numbers of columns (ie, not 800 ;-)), and

I wouldn't expect it to help at all for < 50 columns, and probably not
measurably below 200.

> doesn't hurt materially in any common scenario.  If it does, I think

Yeah, I think that's be bigger question.  Ok, I'll start working on a
new test case.  Here's my thinking on performance tests:

1. run pgbench 10 times both with and without the patch.  See if there's
any measurable difference.  There should not be.

2. Run with/without comparisons for the following scenarios:

Each table would have a SERIAL pk, a timestamp, and:

Chains of booleans:
# attributesNULL probability
16  0%  50% 90%
128 0%  50% 90%
512 0%  50% 90%

Chains of text:
16  0%  50% 90%
256 0%  50% 90%

... for 100,000 rows each.

Comparisons would include:

1) table size before and after testing
2) time required to read 1000 rows, by key
3) time required to read rows with each of
100 random column positions = some value
4) time required to insert 1000 rows
5) time required to update 1000 rows

Geez, I feel tired just thinking about it.  Jamie, can your team do any
of this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Josh Berkus wrote:

> I wasn't thinking about doing it every year -- just for 9.3, in order to
> encourage more reviewers, and encourage reviewers to do more reviews.

- -1. It's not cool to set it up and then stop it the next go round.

You want more reviewers? Start by streamlining the process as much as 
possible. I pretended I was new to the project and tried to figure 
out how to review something. The homepage has no mention of reviewers, 
not even if you drill down on some subpages. A Google search does lead 
one to:

http://wiki.postgresql.org/wiki/Reviewing_a_Patch

It has some good "you can do it" wordage. However, there is no clear path 
on how to actually start reviewing. There is this paragraph with 
two links in it:

  "The current commitfest is here[1] and has plenty of room for 
   you to help. You can sign up to become a Round Robin 
   Reviewer here[2]. Once you have, write a mail to the list 
   introducing yourself."

[1] Leads to the commitfest, with a nice summary, but no way for new people 
to know what to do.

[2] This link is even worse (http://www.postgresql.org/list/pgsql-rrreviewers/)
It's an archive list for pgsql-rrreviewers, with no way to subscribe 
and certainly no indication on it or the previous page that "sign up" 
means (one might guess) join the mailing list.

Anyway, just food for thought as far as attracting new people. It should 
be much easier and more intuitive. As far as "rewarding" current reviewers, 
put the names in the release notes, after each item. Full stop.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201306271636
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlHMoqIACgkQvJuQZxSWSsgCPACgovKYtxJV59Xro0MlxPDEHIy6
pmAAoOLOAlpO/dPlJbyHypdcY4ZxLCit
=RwMh
-END PGP SIGNATURE-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 03:51:15PM -0400, Alvaro Herrera wrote:
> Noah Misch wrote:
> 
> > Note that emacs and pgindent remain at odds over interior tabs in comments.
> > When pgindent finds a double-space (typically after a sentence) ending at a
> > tab stop, it replaces the double-space with a tab.  c-fill-paragraph will
> > convert that tab to a *single* space, and that can be enough to change many
> > line break positions.
> 
> We should really stop pgindent from converting those double-spaces to
> tabs.  Those tabs are later changed to three or four spaces when wording
> of the comment is changed, and things start looking very odd.
> 
> Really, we should get out of patched BSD indent entirely already.  How
> about astyle, for instance?  I'm told that with some sensible options,
> the diff to what we have now is not very large, and several things
> actually become sensible (such as function pointer decls, which are
> messed up rather badly by pgindent)

Sounds good to me.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ASYNC Privileges proposal

2013-06-27 Thread Josh Berkus
On 06/27/2013 02:49 AM, Chris Farmiloe wrote:
> So I would think that if this was to go further then "channels" would need
> to be more of a first class citizen and created explicitly, with CREATE
> CHANNEL, DROP CHANNEL etc:
> 
> CREATE CHANNEL channame;
> GRANT LISTEN ON CHANNEL channame TO rolename;
> GRANT NOTIFY ON CHANNEL channame TO rolename;
> LISTEN channame; -- OK
> NOTIFY channame, 'hi'; -- OK
> LISTEN ; -- exception: no channel named ""
> NOTIFY , 'hi'; -- exception: no channel named ""
> 
> Personally I think explicitly creating channels would be beneficial; I have
> hit issues where an typo in a channel name has caused a bug to go unnoticed
> for a while.
> But of course this would break backwards compatibility with the current
> model (with implicit channel names) so unless we wanted to force everyone
> to add "CREATE CHANNEL" statements during their upgrade then, maybe there
> would need to be some kind of system to workaround this

I agree, but that would make this a MUCH bigger patch than it is now.
So the question is whether you're willing to go there.

If nobody wants to work on that, I don't find it unreasonable to have
some permissions on LISTEN/NOTIFY period.  However, I'd like to see
separate permissions on LISTEN and NOTIFY; I can easily imagine wanting
to grant a user one without the other.

Also, someone would need to do performance tests to see how much the
permissions check affects performance.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-27 Thread Alvaro Herrera
Noah Misch wrote:

> Note that emacs and pgindent remain at odds over interior tabs in comments.
> When pgindent finds a double-space (typically after a sentence) ending at a
> tab stop, it replaces the double-space with a tab.  c-fill-paragraph will
> convert that tab to a *single* space, and that can be enough to change many
> line break positions.

We should really stop pgindent from converting those double-spaces to
tabs.  Those tabs are later changed to three or four spaces when wording
of the comment is changed, and things start looking very odd.

Really, we should get out of patched BSD indent entirely already.  How
about astyle, for instance?  I'm told that with some sensible options,
the diff to what we have now is not very large, and several things
actually become sensible (such as function pointer decls, which are
messed up rather badly by pgindent)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Markus Wanner
On 06/27/2013 06:35 PM, Nicolas Barbier wrote:
> I am assuming that this (comparatively very small and super-hot) index
> is cached all the time, while for the other indexes (that are
> supposedly super-huge) only the top part stays cached.
> 
> I am mostly just trying to find out where Yuri’s “partitioning is
> needed for super-huge tables” experience might come from, and noting
> that Heikki’s argument might not be 100% valid.

I think the OP made that clear by stating that his index has relatively
low selectivity. That seems to be a case that Postgres doesn't handle
very well.

> I think that the
> “PostgreSQL-answer” to this problem is to somehow cluster the data on
> the “hotness column” (so that all hot rows are close to one another,
> thereby improving the efficiency of the caching of relation blocks) +
> partial indexes for the hot rows (as first mentioned by Claudio; to
> improve the efficiency of the caching of index blocks).

Agreed, sounds like a sane strategy.

> My reasoning was: To determine which index block to update (typically
> one in both the partitioned and non-partitioned cases), one needs to
> walk the index first, which supposedly causes one additional (read)
> I/O in the non-partitioned case on average, because there is one extra
> level and the lower part of the index is not cached (because of the
> size of the index). I think that pokes a hole in Heikki’s argument of
> “it really doesn’t matter, partitioning == using one big table with
> big non-partial indexes.”

Heikki's argument holds for the general case, where you cannot assume a
well defined hot partition. In that case, the lowest levels of all the
b-trees of the partitions don't fit in the cache, either. A single index
performs better in that case, because it has lower overhead.

I take your point that in case you *can* define a hot partition and
apply partitioning, the hot(test) index(es) are more likely to be cached
and thus require less disk I/O.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize

2013-06-27 Thread Noah Misch
On Wed, Jun 26, 2013 at 03:48:23PM -0700, Jeff Janes wrote:
> On Mon, May 13, 2013 at 7:26 AM, Noah Misch  wrote:
> > This patch introduces MemoryContextAllocHuge() and repalloc_huge() that
> > check
> > a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother
> > recording
> > whether they were allocated as huge; one can start with palloc() and then
> > repalloc_huge() to grow the value.
> 
> 
> Since it doesn't record the size, I assume the non-use as a varlena is
> enforced only by coder discipline and not by the system?

We will rely on coder discipline, yes.  The allocator actually does record a
size.  I was referring to the fact that it can't distinguish the result of
repalloc(p, 7) from the result of repalloc_huge(p, 7).

> What is likely to happen if I accidentally let a pointer to huge memory
> escape to someone who then passes it to varlena constructor without me
> knowing it?  (I tried sabotaging the code to make this happen, but I could
> not figure out how to).   Is there a place we can put an Assert to catch
> this mistake under enable-cassert builds?

Passing a too-large value gives a modulo effect.  We could inject an
AssertMacro() into SET_VARSIZE().  But it's a hot path, and I don't think this
mistake is too likely.

> The only danger I can think of is that it could sometimes make some sorts
> slower, as using more memory than is necessary can sometimes slow down an
> "external" sort (because the heap is then too big for the fastest CPU
> cache).  If you use more tapes, but not enough more to reduce the number of
> passes needed, then you can get a slowdown.

Interesting point, though I don't fully understand it.  The fastest CPU cache
will be a tiny L1 data cache; surely that's not the relevant parameter here?

> I can't imagine that it would make things worse on average, though, as the
> benefit of doing more sorts as quicksorts rather than merge sorts, or doing
> mergesort with fewer number of passes, would outweigh sometimes doing a
> slower mergesort.  If someone has a pathological use pattern for which the
> averages don't work out favorably for them, they could probably play with
> work_mem to correct the problem.  Whereas without the patch, people who
> want more memory have no options.

Agreed.

> People have mentioned additional things that could be done in this area,
> but I don't think that applying this patch will make those things harder,
> or back us into a corner.  Taking an incremental approach seems suitable.

Committed with some cosmetic tweaks discussed upthread.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_resetxlog -m documentation not up to date

2013-06-27 Thread Alvaro Herrera
Peter Eisentraut wrote:

> It was introduced in 0ac5ad5134f2769ccbaefec73844f8504c4d6182 "Improve
> concurrency of foreign key locking", but there is also no explanation
> there.  It is apparently needed in pg_upgrade.
> 
> This should be documented, and I think the help line should be changed
> to something like
> 
>   -m XID,XID   set next and oldest multitransaction ID

I have come up with the attached patch.  Does this match your
expectations?  Also, I'm a bit annoyed that there's no easy "append N
zeroes to a number" recipe for multixact members such as there is for
pg_clog and multixact offsets.  Any suggestions on how to improve the
current recommendation?

(Also, I found out that if the last multixact/offsets file is not
exactly 32 pages long, the server will fail to create further mxacts
after following the recipe in docs; IIRC because it looks up the mxact
previous to the next one.)


*** a/doc/src/sgml/ref/pg_resetxlog.sgml
--- b/doc/src/sgml/ref/pg_resetxlog.sgml
***
*** 27,33  PostgreSQL documentation
 -o oid
 -x xid
 -e xid_epoch
!-m mxid
 -O mxoff
 -l xlogfile
 datadir
--- 27,33 
 -o oid
 -x xid
 -e xid_epoch
!-m mxid,mxid
 -O mxoff
 -l xlogfile
 datadir
***
*** 81,87  PostgreSQL documentation
 -m, -O,
 and -l
 options allow the next OID, next transaction ID, next transaction ID's
!epoch, next multitransaction ID, next multitransaction offset, and WAL
 starting address values to be set manually.  These are only needed when
 pg_resetxlog is unable to determine appropriate values
 by reading pg_control.  Safe values can be determined as
--- 81,87 
 -m, -O,
 and -l
 options allow the next OID, next transaction ID, next transaction ID's
!epoch, next and oldest multitransaction ID, next multitransaction offset, 
and WAL
 starting address values to be set manually.  These are only needed when
 pg_resetxlog is unable to determine appropriate values
 by reading pg_control.  Safe values can be determined as
***
*** 104,115  PostgreSQL documentation
  
  
   
!   A safe value for the next multitransaction ID (-m)
can be determined by looking for the numerically largest
file name in the directory pg_multixact/offsets under the
!   data directory, adding one, and then multiplying by 65536.  As above,
!   the file names are in hexadecimal, so the easiest way to do this is to
!   specify the option value in hexadecimal and add four zeroes.
   
  
  
--- 104,119 
  
  
   
!   A safe value for the next multitransaction ID (first part of 
-m)
can be determined by looking for the numerically largest
file name in the directory pg_multixact/offsets under the
!   data directory, adding one, and then multiplying by 65536.
!   Conversely, a safe value for the oldest multitransaction ID (second 
part of
!   -m)
!   can be determined by looking for the numerically smallest
!   file name in the same directory and multiplying by 65536.
!   As above, the file names are in hexadecimal, so the easiest way to do
!   this is to specify the option value in hexadecimal and append four 
zeroes.
   
  
  
***
*** 118,126  PostgreSQL documentation
A safe value for the next multitransaction offset (-O)
can be determined by looking for the numerically largest
file name in the directory pg_multixact/members under the
!   data directory, adding one, and then multiplying by 65536.  As above,
!   the file names are in hexadecimal, so the easiest way to do this is to
!   specify the option value in hexadecimal and add four zeroes.
   
  
  
--- 122,130 
A safe value for the next multitransaction offset (-O)
can be determined by looking for the numerically largest
file name in the directory pg_multixact/members under the
!   data directory, adding one, and then multiplying by 52352.  As above,
!   the file names are in hexadecimal.  There is no simple recipe such as
!   the ones above of appending zeroes.
   
  
  
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
***
*** 1036,1042  usage(void)
printf(_("  -e XIDEPOCH  set next transaction ID epoch\n"));
printf(_("  -f   force update to be done\n"));
printf(_("  -l XLOGFILE  force minimum WAL starting location for 
new transaction log\n"));
!   printf(_("  -m XID,OLDESTset next multitransaction ID and oldest 
value\n"));
printf(_("  -n   no update, just show extracted control 
values (for testing)\n"));
printf(_("  -o OID   set next OID\n"));
printf(_("  -O OFFSETset next multitransaction offset\n"));
--- 1036,1042 
printf(_("  -e XI

Re: [HACKERS] Computer VARSIZE_ANY(PTR) during debugging

2013-06-27 Thread Alvaro Herrera
Amit Langote escribió:

> Looking at the attlen == -1 value in tupDescriptor of the
> ResultTupleSlot, VARSIZE_ANY() is used to calculate the length and
> added to offset, but I find no way to calculate that while I am
> dubugging.

I guess you could just add a few "macro define" lines to your .gdbinit,
containing definitions equivalent to those in postgres.h.  Haven't tried
this for the varlena macros, though I do have a couple of others in
there and they ease work at times.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Alvaro Herrera
Andrew Dunstan escribió:
> 
> On 06/27/2013 02:38 PM, Josh Berkus wrote:
> >>If we're going to try harder to ensure that reviewers are credited,
> >>it'd probably be better to take both the commit log and the release
> >>notes out of that loop.
> >I'd pull the reviewers out of the CF app.   Even in its current
> >implementation, that's the easiest route.
> 
> I have not honestly found these to be terribly accurate.

Yeah, they're not.  I do take care to review past threads when crediting
reviewers in commit messages, and I don't even bother to look at the CF
app.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 02:17:25PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote:
> >> What I would be opposed to is continuing to list the original authors in
> >> the release notes and putting reviewers, testers, co-authors, etc. on a
> >> separate web page.  If we're gonna move people, let's move *all* of
> >> them.  Also, it needs to be on something more trustworthy than the wiki,
> >> like maybe putting it at postgresql.org/developers/9.3/
> 
> > I think you will have trouble keeping those two lists synchronized.  I
> > think you will need to create a release note document that includes all
> > names, then copy that to a website and remove the names just before the
> > release is packaged.
> 
> Unless Bruce is doing more work than I think he is, the attribution data
> going into the release notes is just coming from whatever the committer
> said in the commit log message.  I believe that we've generally been

Yes, that's all I do.  "Bruce is doing more work than I think he is"
gave me a chuckle.  ;-)

> careful to credit the patch author, but I'm less confident that everyone
> who merited a "review credit" always gets mentioned --- that would
> require going through the entire review thread at commit time, and I for
> one can't say that I do that.
> 
> If we're going to try harder to ensure that reviewers are credited,
> it'd probably be better to take both the commit log and the release
> notes out of that loop.

I assume you are suggesting we _not_ use the commit log and release
notes for reviewer credit.   Good point;  we might be able to pull that
from the commit-fest app, though you then need to match that with the
release note text.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Andrew Dunstan


On 06/27/2013 02:38 PM, Josh Berkus wrote:

If we're going to try harder to ensure that reviewers are credited,
it'd probably be better to take both the commit log and the release
notes out of that loop.

I'd pull the reviewers out of the CF app.   Even in its current
implementation, that's the easiest route.



I have not honestly found these to be terribly accurate.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Josh Berkus

> If we're going to try harder to ensure that reviewers are credited,
> it'd probably be better to take both the commit log and the release
> notes out of that loop.

I'd pull the reviewers out of the CF app.   Even in its current
implementation, that's the easiest route.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 12:54 PM, Atri Sharma  wrote:
>> Well, it does take longer to fsync a larger byte range to disk than a
>> smaller byte range, in some cases.  But it's generally more efficient
>> to write one larger range than many smaller ranges, so you come out
>> ahead on the whole.
>
> Right, that does make sense.
>
> So, the overhead of writing a lot of WAL buffers is mitigated because
> one large write is better than lots of smaller rights?

Yep.  To take a degenerate case, suppose that you had many small WAL
records, say 64 bytes each, so more than 100 per 8K block.  If you
flush those one by one, you're going to rewrite that block 100 times.
If you flush them all at once, you write that block once.

But even when the range is more than the minimum write size (8K for
WAL), there are still wins.  Writing 16K or 24K or 32K submitted as a
single request can likely be done in a single revolution of the disk
head.  But if you write 8K and wait until it's done, and then write
another 8K and wait until that's done, the second request may not
arrive until after the disk head has passed the position where the
second block needs to go.  Now you have to wait for the drive to spin
back around to the right position.

The details of course vary with the hardware in use, but there are
very few I/O operations where batching smaller requests into larger
chunks doesn't help to some degree.  Of course, the optimal transfer
size does vary considerably based on the type of I/O and the specific
hardware in use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-27 Thread Fabien COELHO


Please find attached a v12, which under timer_exceeded interrupts 
clients which are being throttled instead of waiting for the end of the 
transaction, as the transaction is not started yet.


Please find attached a v13 which fixes conflicts introduced by the long 
options patch committed by Robert Haas.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 80203d6..da88bd7 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * When threads are throttled to a given rate limit, this is the target delay
+ * to reach that rate in usec.  0 is the default and means no throttling.
+ */
+int64		throttle_delay = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -200,11 +206,13 @@ typedef struct
 	int			listen;			/* 0 indicates that an async query has been
  * sent */
 	int			sleeping;		/* 1 indicates that the client is napping */
+boolthrottling; /* whether nap is for throttling */
 	int64		until;			/* napping until (usec) */
 	Variable   *variables;		/* array of variable definitions */
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	bool		throttled;  /* whether current transaction was throttled */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -222,6 +230,10 @@ typedef struct
 	instr_time *exec_elapsed;	/* time spent executing cmds (per Command) */
 	int		   *exec_count;		/* number of cmd executions (per Command) */
 	unsigned short random_state[3];		/* separate randomness for each thread */
+int64   throttle_trigger;  /* previous/next throttling (us) */
+	int64   throttle_lag;  /* total transaction lag behind throttling */
+	int64   throttle_lag_max;  /* max transaction lag */
+
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -230,6 +242,8 @@ typedef struct
 {
 	instr_time	conn_time;
 	int			xacts;
+	int64   throttle_lag;
+	int64   throttle_lag_max;
 } TResult;
 
 /*
@@ -353,6 +367,7 @@ usage(void)
 		   "  -n, --no-vacuum  do not run VACUUM before tests\n"
 		   "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
 		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -R, --rate SPEC  target rate in transactions per second\n"
 		   "  -s, --scale=NUM  report this scale factor in output\n"
 		   "  -S, --select-onlyperform SELECT-only transactions\n"
 		   "  -t, --transactions   number of transactions each client runs "
@@ -895,17 +910,58 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 {
 	PGresult   *res;
 	Command   **commands;
+	booldo_throttle = false;
 
 top:
 	commands = sql_files[st->use_file];
 
+	/* handle throttling once per transaction by inserting a sleep.
+	 * this is simpler than doing it at the end.
+	 */
+	if (throttle_delay && ! st->throttled)
+	{
+		/* compute delay to approximate a Poisson distribution
+		 * 100 => 13.8 .. 0 multiplier
+		 *  10 => 11.5 .. 0
+		 *   1 =>  9.2 .. 0
+		 *1000 =>  6.9 .. 0
+		 * if transactions are too slow or a given wait shorter than
+		 * a transaction, the next transaction will start right away.
+		 */
+		int64 wait = (int64)
+			throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+
+		thread->throttle_trigger += wait;
+
+		st->until = thread->throttle_trigger;
+		st->sleeping = 1;
+		st->throttling = true;
+		st->throttled = true;
+		if (debug)
+			fprintf(stderr, "client %d throttling "INT64_FORMAT" us\n",
+	st->id, wait);
+	}
+
 	if (st->sleeping)
 	{			/* are we sleeping? */
 		instr_time	now;
+		int64 now_us;
 
 		INSTR_TIME_SET_CURRENT(now);
-		if (st->until <= INSTR_TIME_GET_MICROSEC(now))
+		now_us = INSTR_TIME_GET_MICROSEC(now);
+		if (st->until <= now_us)
+		{
 			st->sleeping = 0;	/* Done sleeping, go ahead with next command */
+			if (st->throttling)
+			{
+/* measure lag of throttled transaction */
+int64 lag = now_us - st->until;
+thread->throttle_lag += lag;
+if (lag > thread->throttle_lag_max)
+	thread->throttle_lag_max = lag;
+st->throttling = false;
+			}
+		}
 		else
 			return true;		/* Still sleeping, nothing to do here */
 	}
@@ -1034,7 +1090,7 @@ top:
 	 * This is more than we really ought to know about
 	 * instr_time
 	 */
-	fprintf(logfile, "%d %d %.0f %d %ld %ld\n",
+	fprintf(logfile, "%d %d %.0f %d %ld.%06ld\n",
 			st->id, st->cnt, usec, st->use_file,
 			(long) now.tv_sec, (long) now.tv_usec);
 #else
@@ -1043,7 +1099,7 @@ top:
 	 * On Windows, instr_time doesn't provide a timestamp
 	 * anyway
 	 */
-	fprintf(logfile, "%d %d %.0f %d 0 0\n",
+	fprintf(logfile, "%d %d %.0f %d 0.0\n",
 			st->id, st->cnt, u

Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-27 Thread Robert Haas
On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs  wrote:
>> I am told, one of the very popular setups for DR is to have one
>> local sync standby and one async (may be cascaded by the local sync). Since
>> this new feature is more useful for DR because taking a fresh backup on a
>> slower link is even more challenging, IMHO we should support such setups.
>
> ...which still doesn't make sense to me. Lets look at that in detail.
>
> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
> being safety standby at a distance.
>
> Either A or B is master, except in disaster. So if A is master, then B
> would be the failover target. If A fails, then you want to failover to
> B. Once B is the target, you want to failback to A as the master. C
> needs to follow the new master, whichever it is.
>
> If you set up sync rep between A and B and this new mode between A and
> C. When B becomes the master, you need to failback from B from A, but
> you can't because the new mode applied between A and C only, so you
> have to failback from C to A. So having the new mode not match with
> sync rep means you are forcing people to failback using the slow link
> in the common case.

It's true that in this scenario that doesn't really make sense, but I
still think they are separate properties.  You could certainly want
synchronous replication without this new property, if you like the
data-loss guarantees that sync rep provides but don't care about
failback.  You could also want this new property without synchronous
replication, if you don't need the data-loss guarantees that sync rep
provides but you do care about fast failback.  I admit it seems
unlikely that you would use both features but not target them at the
same machines, although maybe: perhaps you have a sync standby and an
async standby and want this new property with respect to both of them.

In my admittedly limited experience, the use case for a lot of this
technology is in the cloud.  The general strategy seems to be: at the
first sign of trouble, kill the offending instance and fail over.
This can result in failing over pretty frequently, and needing it to
be fast.  There may be no real hardware problem; indeed, the failover
may be precipitated by network conditions or overload of the physical
host backing the virtual machine or any number of other nonphysical
problems.  I can see this being useful in that environment, even for
async standbys.  People can apparently tolerate a brief interruption
while their primary gets killed off and connections are re-established
with the new master, but they need the failover to be fast.  The
problem with the status quo is that, even if the first failover is
fast, the second one isn't, because it has to wait behind rebuilding
the original master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote:
>> What I would be opposed to is continuing to list the original authors in
>> the release notes and putting reviewers, testers, co-authors, etc. on a
>> separate web page.  If we're gonna move people, let's move *all* of
>> them.  Also, it needs to be on something more trustworthy than the wiki,
>> like maybe putting it at postgresql.org/developers/9.3/

> I think you will have trouble keeping those two lists synchronized.  I
> think you will need to create a release note document that includes all
> names, then copy that to a website and remove the names just before the
> release is packaged.

Unless Bruce is doing more work than I think he is, the attribution data
going into the release notes is just coming from whatever the committer
said in the commit log message.  I believe that we've generally been
careful to credit the patch author, but I'm less confident that everyone
who merited a "review credit" always gets mentioned --- that would
require going through the entire review thread at commit time, and I for
one can't say that I do that.

If we're going to try harder to ensure that reviewers are credited,
it'd probably be better to take both the commit log and the release
notes out of that loop.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-27 Thread Fabien COELHO



Otherwise, he simplest possible adaptation, if it is required to have the
progress feature under fork emulation to pass it, is that under "fork
emulation" each processus reports its current progress instead of having a
collective summing.


Perhaps that's worth doing.  I agree with Fabien that full support of
this feature in the process model is more trouble than it's worth,
though, and I wouldn't scream loudly if we just didn't support it.
--disable-thread-safety doesn't have to be entirely penalty-free.


Attached is patch version 5.

It includes this solution for fork emulation, one report per thread 
instead of a global report. Some code duplication for that.


It also solves conflicts introduced by the long options patch.

Finally, I've added a latency measure as defended by Mitsumasa. However 
the formula must be updated for the throttling patch.


Maybe I should have submitted a bunch of changes to pgbench in one patch. 
I thought that separating orthogonal things made reviewing simpler so the 
patches were more likely to pass, but I'm not so sure that the other 
strategy would have been that bad.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 80203d6..84b969a 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -74,7 +74,7 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #include 
 #else
 /* Use emulation with fork. Rename pthread identifiers to avoid conflicts */
-
+#define PTHREAD_FORK_EMULATION
 #include 
 
 #define pthread_tpg_pthread_t
@@ -164,6 +164,8 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+int			progress = 0;   /* thread progress report every this seconds */
+int progress_nclients = 0; /* number of clients for progress report */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -352,6 +354,7 @@ usage(void)
 		   "(default: simple)\n"
 		   "  -n, --no-vacuum  do not run VACUUM before tests\n"
 		   "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
+		   "  -P, --progress SEC   show thread progress report every SEC seconds\n"
 		   "  -r, --report-latencies   report average latency per command\n"
 		   "  -s, --scale=NUM  report this scale factor in output\n"
 		   "  -S, --select-onlyperform SELECT-only transactions\n"
@@ -2136,6 +2139,7 @@ main(int argc, char **argv)
 		{"unlogged-tables", no_argument, &unlogged_tables, 1},
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
+		{"progress", required_argument, NULL, 'P'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2202,7 +2206,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2357,6 +2361,16 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 'P':
+progress = atoi(optarg);
+if (progress <= 0)
+{
+	fprintf(stderr,
+	   "thread progress delay (-P) must not be negative (%s)\n",
+			optarg);
+	exit(1);
+}
+break;
 			case 0:
 /* This covers long options which take no argument. */
 break;
@@ -2482,6 +2496,7 @@ main(int argc, char **argv)
 	 * changed after fork.
 	 */
 	main_pid = (int) getpid();
+	progress_nclients = nclients;
 
 	if (nclients > 1)
 	{
@@ -2733,6 +2748,11 @@ threadRun(void *arg)
 	int			nstate = thread->nstate;
 	int			remains = nstate;		/* number of remaining clients */
 	int			i;
+	/* for reporting progress: */
+	int64   thread_start = INSTR_TIME_GET_MICROSEC(thread->start_time);
+	int64		last_report = thread_start;
+	int64		next_report = last_report + progress * 100;
+	int64		last_count = 0;
 
 	AggVals		aggs;
 
@@ -2896,6 +2916,68 @@ threadRun(void *arg)
 st->con = NULL;
 			}
 		}
+
+#ifdef PTHREAD_FORK_EMULATION
+		/* each process reports its own progression */
+		if (progress)
+		{
+			instr_time now_time;
+			int64 now;
+			INSTR_TIME_SET_CURRENT(now_time);
+			now = INSTR_TIME_GET_MICROSEC(now_time);
+			if (now >= next_report)
+			{
+/* generate and show report */
+int64 count = 0;
+int64 run = now - last_report;
+float tps, total_run, latency;
+
+for (i = 0 ; i < nstate ; i++)
+	count += state[i].cnt;
+
+total_run = (now - thread_start) / 100.0;
+tps = 100.0 * (count-last_count) / run;
+latency = 1000.0 * nstate / tps;
+
+fprintf(stderr, "progress %d

Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)

2013-06-27 Thread Alvaro Herrera
Fabien COELHO escribió:

> For the text formatting, I tried to keep the screen width under 80
> characters, because if the line is too wide it is harder to read as
> the eye may loose the alignment. But being homogeneous with other
> commands is fine with me as well.

The format chosen by Robert fits in 80 columns and looks very good to
me.  Thanks to both.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Tom Lane
Josh Berkus  writes:
> So, is this patch currently depending on performance testing, or not?
> Like I said, it'll be a chunk of time to set up what I beleive is a
> realistic performance test, so I don't want to do it if the patch is
> likely to be bounced for other reasons.

AFAICS, the threshold question here is whether the patch helps usefully
for tables with typical numbers of columns (ie, not 800 ;-)), and
doesn't hurt materially in any common scenario.  If it does, I think
we'd take it.  I've not read the patch, so I don't know if there are
minor cosmetic or correctness issues, but at bottom this seems to be a
very straightforward performance tradeoff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: query result history in psql

2013-06-27 Thread Alvaro Herrera
Maciej Gajewski escribió:

> > Those issues aside - I think it's a great feature! I can add the
> > grammatical fixes I made whenever the final patch is ready. Or earlier,
> > whatever works for you. Also, this is my first time reviewing a patch, so
> > please let me know if I can improve on anything. Thanks!
> 
> This is  my first submitted patch, so I can't really comment on the
> process. But if you could add the author's email to CC, the message would
> be much easier to spot. I replied after two days only because I missed the
> message in the flood of other pgsql-hacker messages. I think I need to scan
> the list more carefully...

It's better to post a review as a reply to the message which contains
the patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLogInsert scaling, revisited

2013-06-27 Thread Andres Freund
On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote:
> There's this in the comment near the top of the file:

Btw, I find the 'you' used in the comment somewhat irritating. Not badly
so, but reading something like:

 * When you are about to write
 * out WAL, it is important to call WaitXLogInsertionsToFinish() *before*
 * acquiring WALWriteLock, to avoid deadlocks. Otherwise you might get stuck
 * waiting for an insertion to finish (or at least advance to next
 * uninitialized page), while you're holding WALWriteLock.

just seems strange to me. If this directed at plugin authors, maybe, but
it really isn't...

But that's probably a question for a native speaker...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote:
> On 06/27/2013 08:56 AM, Bruce Momjian wrote:> Adding the names to each
> release note item is not a problem;  the
> > problem is the volume of names that overwhelms the release note text.  If
> > we went that direction, I predict we would just remove _all_ names from
> > the release notes.
> 
> That's not a realistic objection.  We'd be talking about one or two more
> names per patch, with a handful of exceptions.

It is realistic because I did this for 9.2 beta and people didn't like
it;  how much more realistic do you want?

> If you're going to make an objection, object to the amount of extra work
> it would be, which is significant with the current state of our
> technology.  Realistically, it'll take the help of additional people.

No extra work required, it is all copy/paste for me.

> On 06/27/2013 09:19 AM, Tom Lane wrote:
> > Yeah.  Keep in mind that the overwhelming majority of the audience for
> > the release notes doesn't actually give a darn who implemented what.
> 
> There is some argument to be made about moving the whole "list of names"
> to some other resource, like a web page.  In fact, if the mythical land
> where we have a new CF application, that other resource could even be
> generated by the CF app with some hand-tweaking (this would also require
> some tweaks to community profiles etc.).

Yes, I am fine with moving all names.  However, I predict you will do
more to demotivate patch submitters than you will to motivate patch
reviewers.  That is fine with me, as motivating developers/reviewers is
not our top priority.

> What I would be opposed to is continuing to list the original authors in
> the release notes and putting reviewers, testers, co-authors, etc. on a
> separate web page.  If we're gonna move people, let's move *all* of
> them.  Also, it needs to be on something more trustworthy than the wiki,
> like maybe putting it at postgresql.org/developers/9.3/

I think you will have trouble keeping those two lists synchronized.  I
think you will need to create a release note document that includes all
names, then copy that to a website and remove the names just before the
release is packaged.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Josh Berkus
All,

So, is this patch currently depending on performance testing, or not?
Like I said, it'll be a chunk of time to set up what I beleive is a
realistic performance test, so I don't want to do it if the patch is
likely to be bounced for other reasons.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Josh Berkus
On 06/27/2013 08:56 AM, Bruce Momjian wrote:> Adding the names to each
release note item is not a problem;  the
> problem is the volume of names that overwhelms the release note text.  If
> we went that direction, I predict we would just remove _all_ names from
> the release notes.

That's not a realistic objection.  We'd be talking about one or two more
names per patch, with a handful of exceptions.

If you're going to make an objection, object to the amount of extra work
it would be, which is signigficant with the current state of our
technology.  Realistically, it'll take the help of additional people.

On 06/27/2013 09:19 AM, Tom Lane wrote:
> Yeah.  Keep in mind that the overwhelming majority of the audience for
> the release notes doesn't actually give a darn who implemented what.

There is some argument to be made about moving the whole "list of names"
to some other resource, like a web page.  In fact, if the mythical land
where we have a new CF application, that other resource could even be
generated by the CF app with some hand-tweaking (this would also require
some tweaks to community profiles etc.).

What I would be opposed to is continuing to list the original authors in
the release notes and putting reviewers, testers, co-authors, etc. on a
separate web page.  If we're gonna move people, let's move *all* of
them.  Also, it needs to be on something more trustworthy than the wiki,
like maybe putting it at postgresql.org/developers/9.3/

On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan 
wrote:
> Encouraging new people is good, but recognizing sustained, long-term
> contributions is good, too.  I think we should do more of that, too.

I wasn't thinking about doing it every year -- just for 9.3, in order to
encourage more reviewers, and encourage reviewers to do more reviews.
But that would only work if we're also giving reviewers credit; as
several people have said, they care more about credit than they do about
prizes.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-27 Thread Tom Lane
Fabien COELHO  writes:
>>> Here is a v4 that takes into account most of your points: The report is
>>> performed for all threads by thread 0, however --progress is not supported
>>> under thread fork emulation if there are more than one thread. The report
>>> time does not slip anymore.

>> I don't believe that to be an acceptable restriction.

> My first proposal is to remove the fork emulation altogether, which would 
> remove many artificial limitations to pgbench and simplify the code 
> significantly. That would be an improvement.

I would object strongly to that, as it would represent a significant
movement of the goalposts on what is required to build Postgres at all,
ie platforms on which --enable-thread-safety is unavailable or expensive
would be out in the cold.  Perhaps that set is approaching empty, but a
project that's still standardized on C89 has little business making such
a choice IMO.

> Otherwise, he simplest possible adaptation, if it is required to have the 
> progress feature under fork emulation to pass it, is that under "fork 
> emulation" each processus reports its current progress instead of having a 
> collective summing.

Perhaps that's worth doing.  I agree with Fabien that full support of
this feature in the process model is more trouble than it's worth,
though, and I wouldn't scream loudly if we just didn't support it.
--disable-thread-safety doesn't have to be entirely penalty-free.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support & snappy prototype

2013-06-27 Thread Andres Freund
On 2013-06-27 09:17:10 -0700, Hitoshi Harada wrote:
> On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas  wrote:
> 
> > On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund 
> > wrote:
> > > There will be a newer version of the patch coming today or tomorrow, so
> > > there's probably no point in looking at the one linked above before
> > > that...
> >
> > This patch is marked as "Ready for Committer" in the CommitFest app.
> > But it is not clear to me where the patch is that is being proposed
> > for commit.
> >
> >
> >
> > No, this conversation is about patch #1153, pluggable toast compression,
> which is in Needs Review, and you may be confused with #1127, extensible
> external toast tuple support.

Well, actually this is the correct thread, and pluggable toast
compression developed out of it. You had marked #1127 as ready for
committer although you had noticed an omission in
heap_tuple_fetch_attr...
Answered with the new patch version toplevel in the thread, to make this
hopefully less confusing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)

2013-06-27 Thread Fabien COELHO


[...] So I changed that, and committed this, with some further cosmetic 
changes. [...]


Thanks for the commit & the style improvements.

For the text formatting, I tried to keep the screen width under 80 
characters, because if the line is too wide it is harder to read as the 
eye may loose the alignment. But being homogeneous with other commands is 
fine with me as well.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-27 Thread Andres Freund
Hi,

Please find attached the next version of the extensible toast
support. There basically are two changes:

* handle indirect toast tuples properly in heap_tuple_fetch_attr
  and related places
* minor comment adjustments

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 5c7079067f870a65d65ee09cccbf743c88b64c82 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 142 +++
 src/include/c.h  |   2 +
 src/include/postgres.h   |  84 +++--
 3 files changed, 191 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..5ee5e33 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -87,11 +87,11 @@ static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  * heap_tuple_fetch_attr -
  *
  *	Public entry point to get back a toasted value from
- *	external storage (possibly still in compressed format).
+ *	external source (possibly still in compressed format).
  *
  * This will return a datum that contains all the data internally, ie, not
- * relying on external storage, but it can still be compressed or have a short
- * header.
+ * relying on external storage or memory, but it can still be compressed or
+ * have a short header.
  --
  */
 struct varlena *
@@ -99,13 +99,35 @@ heap_tuple_fetch_attr(struct varlena * attr)
 {
 	struct varlena *result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an external stored plain value
 		 */
 		result = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * copy into the caller's memory context. That's not required in all
+		 * cases but sufficient for now since this is mainly used when we need
+		 * to persist a Datum for unusually long time, like in a HOLD cursor.
+		 */
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* doesn't make much sense, but better handle it */
+		if (VARATT_IS_EXTERNAL_ONDISK(attr))
+			return heap_tuple_fetch_attr(attr);
+
+		/* copy datum verbatim */
+		result = (struct varlena *) palloc(VARSIZE_ANY(attr));
+		memcpy(result, attr, VARSIZE_ANY(attr));
+	}
 	else
 	{
 		/*
@@ -128,7 +150,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +167,17 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +224,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +237,17 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(redirect.pointer));
+
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+			 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +311,7 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (str

Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Greg Smith

On 6/25/13 2:44 PM, Josh Berkus wrote:

On the other hand, I will point out that we currently have a shortage of
reviewers, and we do NOT have a shortage of patch submitters.


That's because reviewing is harder than initial development.  The only 
people who think otherwise are developers who don't do enough review.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Atri Sharma
> Well, it does take longer to fsync a larger byte range to disk than a
> smaller byte range, in some cases.  But it's generally more efficient
> to write one larger range than many smaller ranges, so you come out
> ahead on the whole.

Right, that does make sense.

So, the overhead of writing a lot of WAL buffers is mitigated because
one large write is better than lots of smaller rights?

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Atri Sharma
>
> commit_delay exists to artificially increase the window in which the
> leader backend waits for more group commit followers. At higher client
> counts, that isn't terribly useful because you'll naturally have
> enough clients anyway, but at lower client counts particularly where
> fsyncs have high latency, it can help quite a bit. I mention this
> because clearly commit_delay is intended to trade off latency for
> throughput. Although having said that, when I worked on commit_delay,
> the average and worse-case latencies actually *improved* for the
> workload in question, which consisted of lots of small write
> transactions. Though I wouldn't be surprised if you could produce a
> reasonable case where latency was hurt a bit, but throughput improved.

Thanks for your reply.

The logic says that latency will be hit when commit_delay is applied,
but I am really interested in why we get an improvement instead.

Can small writes be the reason?

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-27 Thread Peter Eisentraut
On 6/27/13 4:19 AM, Dean Rasheed wrote:
> I'd say there are clearly people who want it, and the nature of some
> of those answers suggests to me that we ought to have a better answer
> in core.

It's not clear what these people wanted this functionality for.  They
all wanted to analyze a table to compare with another table (or the same
table later).  Either, they wanted this to detect data changes, in which
case the right tool is a checksum, not a cryptographic hash.  We already
have several checksum implementations in core, so we could expose on of
them.  Or they wanted this to protect their data from tampering, in
which case the right tool is a cryptographic hash, but Noah argues that
a sum of MD5 hashes is not cryptographically sound.  (And in any case,
we don't put cryptographic functionality into the core.)

The reason md5_agg is proposed here and in all those cited posts is
presumably because the md5() function was already there anyway.  The the
md5() function is there because the md5 code was already there anyway
because of the authentication.  Let's not add higher-order
already-there-anyway code. ;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn

2013-06-27 Thread Tom Lane
Andres Freund  writes:
> On 2013-06-26 20:07:40 -0400, Tom Lane wrote:
>> I still want to delete the test for SOCK_ERRNO == 0.  I traced that back
>> to commit da9501bddb4dc33c031b1db6ce2133bcee7b, but I can't find
>> anything in the mailing list archives to explain that.  I have an
>> inquiry in to Jan to see if he can remember the reason ...

> That looks strange, yes.

Committed that way.  We can always undo it if Jan can defend the logic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-27 Thread Fabien COELHO


Dear Robert,


Here is a v4 that takes into account most of your points: The report is
performed for all threads by thread 0, however --progress is not supported
under thread fork emulation if there are more than one thread. The report
time does not slip anymore.


I don't believe that to be an acceptable restriction.


The "pthread fork emulation" is just an ugly hack to run pgbench on a host 
that does not have pthreads (portable threads). I'm not sure that it 
applies on any significant system, but I can assure you that it imposes 
severe limitations about how to do things properly in pgbench: As there is 
no threads, there is no shared memory, no locking mecanism, nothing 
really. So it is hard to generated a shared report in such conditions.


My first proposal is to remove the fork emulation altogether, which would 
remove many artificial limitations to pgbench and simplify the code 
significantly. That would be an improvement.


Otherwise, he simplest possible adaptation, if it is required to have the 
progress feature under fork emulation to pass it, is that under "fork 
emulation" each processus reports its current progress instead of having a 
collective summing.


Note that it is possible to implement the feature with interprocess 
communications, but really generating many pipes will add a lot of 
complexity to the code, and I do not thing that the code nor this simple 
feature deserve that.


Another option is to have each thread to report its progression 
indenpently with all implementations, that what I did in the first 
instance. It is much less interesting, but it would be homogeneous 
although poor for every versions.


We generally require features to work on all platforms we support.  We 
have made occasional compromises there, but generally only when the 
restriction is fundamental to the platform rather than for developer 
convenience.


I agree with this kind of "generally", but please consider that "pthread 
fork emulation" really means "processes", so that simple things with 
threads become significantly more complex to implement.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Nicolas Barbier
2013/6/27 Markus Wanner :

> On 06/27/2013 11:12 AM, Nicolas Barbier wrote:
>
>> Imagine that there are a lot of indexes, e.g., 50. Although a lookup
>> (walking one index) is equally fast, an insertion must update al 50
>> indexes. When each index requires one extra I/O (because each index is
>> one level taller), that is 50 extra I/Os. In the partitioned case,
>> each index would require the normal smaller amount of I/Os. Choosing
>> which partition to use must only be done once: The result “counts” for
>> all indexes that are to be updated.
>
> I think you're underestimating the cost of partitioning. After all, the
> lookup of what index to update for a given partition is a a lookup in
> pg_index via pg_index_indrelid_index - a btree index.

I am assuming that this (comparatively very small and super-hot) index
is cached all the time, while for the other indexes (that are
supposedly super-huge) only the top part stays cached.

I am mostly just trying to find out where Yuri’s “partitioning is
needed for super-huge tables” experience might come from, and noting
that Heikki’s argument might not be 100% valid. I think that the
“PostgreSQL-answer” to this problem is to somehow cluster the data on
the “hotness column” (so that all hot rows are close to one another,
thereby improving the efficiency of the caching of relation blocks) +
partial indexes for the hot rows (as first mentioned by Claudio; to
improve the efficiency of the caching of index blocks).

> Additionally, the depth of an index doesn't directly translate to the
> number of I/O writes per insert (or delete). I'd rather expect the avg.
> number of I/O writes per insert into a b-tree to be reasonably close to
> one - depending mostly on the number of keys per page, not depth.

My reasoning was: To determine which index block to update (typically
one in both the partitioned and non-partitioned cases), one needs to
walk the index first, which supposedly causes one additional (read)
I/O in the non-partitioned case on average, because there is one extra
level and the lower part of the index is not cached (because of the
size of the index). I think that pokes a hole in Heikki’s argument of
“it really doesn’t matter, partitioning == using one big table with
big non-partial indexes.”

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Peter Geoghegan
On Thu, Jun 27, 2013 at 12:56 AM, Atri Sharma  wrote:
> Now, with group commits, do we see a spike in that disk write latency,
> especially in the cases where the user has set wal_buffers to a high
> value?

commit_delay exists to artificially increase the window in which the
leader backend waits for more group commit followers. At higher client
counts, that isn't terribly useful because you'll naturally have
enough clients anyway, but at lower client counts particularly where
fsyncs have high latency, it can help quite a bit. I mention this
because clearly commit_delay is intended to trade off latency for
throughput. Although having said that, when I worked on commit_delay,
the average and worse-case latencies actually *improved* for the
workload in question, which consisted of lots of small write
transactions. Though I wouldn't be surprised if you could produce a
reasonable case where latency was hurt a bit, but throughput improved.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote:
>> It could be pretty satisfactory to have a simple listing, in the
>> release notes, of the set of reviewers.  That's a lot less
>> bookkeeping than tracking this for each and every change.

> Adding the names to each release note item is not a problem;  the
> problem is the volume of names that overwhelms the release note text.  If
> we went that direction, I predict we would just remove _all_ names from
> the release notes.

Yeah.  Keep in mind that the overwhelming majority of the audience for
the release notes doesn't actually give a darn who implemented what.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Andrew Dunstan


On 06/27/2013 12:12 PM, Tom Lane wrote:

Bruce Momjian  writes:

On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote:

It could be pretty satisfactory to have a simple listing, in the
release notes, of the set of reviewers.  That's a lot less
bookkeeping than tracking this for each and every change.

Adding the names to each release note item is not a problem;  the
problem is the volume of names that overwhelms the release note text.  If
we went that direction, I predict we would just remove _all_ names from
the release notes.

Yeah.  Keep in mind that the overwhelming majority of the audience for
the release notes doesn't actually give a darn who implemented what.



Maybe we should have a Kudos / Bragging rights wiki page, with a table 
something like this:


   Release
   Feature Name
   Principal Author(s)
   Contributing Author(s)
   Code Reviewer(s)
   Tester(s)


Constructing it going backwards would be an interesting task :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support & snappy prototype

2013-06-27 Thread Hitoshi Harada
On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas  wrote:

> On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund 
> wrote:
> > There will be a newer version of the patch coming today or tomorrow, so
> > there's probably no point in looking at the one linked above before
> > that...
>
> This patch is marked as "Ready for Committer" in the CommitFest app.
> But it is not clear to me where the patch is that is being proposed
> for commit.
>
>
>
> No, this conversation is about patch #1153, pluggable toast compression,
which is in Needs Review, and you may be confused with #1127, extensible
external toast tuple support.


-- 
Hitoshi Harada


Re: [HACKERS] Min value for port

2013-06-27 Thread Christopher Browne
On Thu, Jun 27, 2013 at 9:22 AM, Andres Freund wrote:

> On 2013-06-27 15:11:26 +0200, Magnus Hagander wrote:
> > On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut 
> wrote:
> > > On 6/27/13 6:34 AM, Magnus Hagander wrote:
> > >> Is there a reason why we have set the min allowed value for port to 1,
> > >> not 1024? Given that you can't actually start postgres with a value of
> > >> <1024, shoulnd't the entry in pg_settings reference that as well?
> > >
> > > Are you thinking of the restriction that you need to be root to use
> > > ports <1024?  That restriction is not necessarily universal.  We can
> let
> > > the kernel tell us at run time if it doesn't like our port.
> >
> > Yes, that's the restriction I was talking about. It's just a bit
> > annoying that if you look at pg_settings.min_value it doesn't actually
> > tell you the truth. But yeah, I believe Windows actually lets you use
> > a lower port number, so it'd at least have to be #ifdef'ed for that if
> > we wanted to change it.
>
> You can easily change the setting on linux as well. And you can grant
> specific binaries the permission to bind to restricted ports without
> being root.
> I don't think the additional complexity to get a sensible value in there
> is warranted.
>

With that large a set of local policies that can change the "usual
< 1024" policy, yep, I agree that it's not worth trying too hard on this
one.

And supposing something like SE-Linux can grant bindings for a particular
user/binary to access a *specific* port, that represents a model that is
pretty incompatible with the notion of a "minimum value."

On the one hand, the idea of having to add a lot of platform-specific
code (which may further be specific to a framework like SE-Linux)
is not terribly appealing.

Further, if the result is something that doesn't really fit with a
"minimum,"
is it much worth fighting with the platform localities?

Indeed, I begin to question whether indicating a "minimum" is actually
meaningful.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Bruce Momjian
On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote:
> b) It would be a pretty good thing to mention reviewers within commit notes;
> that provides some direct trace-back as to who it was that either validated
> that the change was good, or that let a bad one slip through.
> 
> c) The release notes indicate authors of changes; to have a list of reviewers
> would be a fine thing.
> 
> If it requires inordinate effort to get the reviewers directly attached to 
> each
> and every change, perhaps it isn't worthwhile to go to extreme efforts to that
> end.
> 
> It could be pretty satisfactory to have a simple listing, in the release 
> notes,
> of the set of reviewers.  That's a lot less bookkeeping than tracking this for
> each and every change.

Adding the names to each release note item is not a problem;  the
problem is the volume of names that overwhelms the release note text.  If
we went that direction, I predict we would just remove _all_ names from
the release notes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Christopher Browne
On Thu, Jun 27, 2013 at 10:37 AM, Robert Haas  wrote:

> On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan 
> wrote:
> > I'd like to see prizes each release for "best contribution" and "best
> > reviewer" - I've thought for years something like this would be worth
> > trying. Committers and core members should not be eligible - this is
> about
> > encouraging new people.
>
> Encouraging new people is good, but recognizing sustained, long-term
> contributions is good, too.  I think we should do more of that, too.
>

Conforming with David Fetter's pointer to the notion that sometimes attempts
to reward can backfire, I'm not sure that it will be super-helpful to
create "special"
rewards.

On the other hand, to recognize reviewer contributions in places relevant
to where
they take place seems pretty apropos, which could include:

a) Obviously we already capture this in the CommitFest web site (but it's
worth mentioning when trying to do a "census")

b) It would be a pretty good thing to mention reviewers within commit notes;
that provides some direct trace-back as to who it was that either validated
that the change was good, or that let a bad one slip through.

c) The release notes indicate authors of changes; to have a list of
reviewers
would be a fine thing.

If it requires inordinate effort to get the reviewers directly attached to
each
and every change, perhaps it isn't worthwhile to go to extreme efforts to
that
end.

It could be pretty satisfactory to have a simple listing, in the release
notes,
of the set of reviewers.  That's a lot less bookkeeping than tracking this
for
each and every change.

The statement of such a list is a public acknowledgement of those that help
assure that the quality of PostgreSQL code remains excellent. (And that may
represent a good way to sell this "kudo".)

This allows organizations that are sponsoring PostgreSQL development to
have an extra metric by which *they* can recognize that their staff that
do such work are being recognized as contributors.  It seems to me that
this is way more useful than a free t-shirt or the like.


Re: [HACKERS] fixing pg_ctl with relative paths

2013-06-27 Thread Fujii Masao
On Thu, Jun 27, 2013 at 10:36 AM, Josh Kupershmidt  wrote:
> On Wed, Jun 26, 2013 at 12:22 PM, Fujii Masao  wrote:
>> On Wed, Jun 26, 2013 at 2:36 PM, Hari Babu  wrote:
>>> On June 26, 2013 5:02 AM Josh Kupershmidt wrote:
Thanks for the feedback. Attached is a rebased version of the patch with
>>> the two small issues noted fixed.
>>
>> The following description in the document of pg_ctl needs to be modified?
>>
>> restart might fail if relative paths specified were specified on
>> the command-line during server start.
>
> Right, that caveat could go away.
>
>> +#define DATADIR_SPEC   "\"-D\" \""
>> +
>> +   datadir = strstr(post_opts, DATADIR_SPEC);
>>
>> Though this is a corner case, the patch doesn't seem to handle properly the 
>> case
>> where "-D" appears as other option value, e.g., -k option value, in
>> postmaster.opts
>> file.
>
> Could I see a command-line example of what you mean?

postmaster -k "-D", for example. Of course, it's really a corner case :)

Another corner case is, for example, pg_ctl -D test1 -o "-D test2", 
that is, multiple -D specifications appear in the command-line.

Can we overlook these cases?

>> Just idea to work around that problem is to just append the specified -D 
>> option
>> and value to post_opts. IOW, -D option and value appear twice in post_opts.
>> In this case, posteriorly-located ones are used in the end. Thought?
>
> Hrm, I think we'd have to be careful that postmaster.opts doesn't
> accumulate an additional -D specification with every restart.

Yes. Oh, I was thinking that postmaster writes only -D specification which
postmaster actually uses, in the opts file. So that accumulation would not
happen, I thought. But that's not true. Postmaster writes all the specified
arguments in the opts file.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 5:49 AM, Dimitri Fontaine
 wrote:
> I think that's a limitation of the old model and we don't want to turn
> templates for extensions into being shared catalogs. At least that's my
> understanding of the design consensus.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 7:29 AM, Marko Kreen  wrote:
> On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed  
> wrote:
>> On 26 June 2013 21:46, Peter Eisentraut  wrote:
>>> On 6/26/13 4:04 PM, Dean Rasheed wrote:
 A quick google search reveals several people asking for something like
 this, and people recommending md5(string_agg(...)) or
 md5(string_agg(md5(...))) based solutions, which are doomed to failure
 on larger tables.
>>>
>>> The thread discussed several other options of checksumming tables that
>>> did not have the air of a crytographic offering, as Noah put it.
>>>
>>
>> True but md5 has the advantage of being directly comparable with the
>> output of Unix md5sum, which would be useful if you loaded data from
>> external files and wanted to confirm that your import process didn't
>> mangle it.
>
> The problem with md5_agg() is that it's only useful in toy scenarios.
>
> It's more useful give people script that does same sum(hash(row))
> on dump file than try to run MD5 on ordered rows.
>
> Also, I don't think anybody actually cares about MD5(table-as-bytes), instead
> people want way to check if 2 tables or table and dump are same.

I think you're trying to tell Dean to write the patch that you want
instead of the patch that he wants.  There are certainly other things
that could be done that some people might sometimes prefer, but that
doesn't mean what he did isn't useful.

That having been said, I basically agree with Noah: I think this would
be a useful extension (perhaps even in contrib?) but I don't think we
need to install it by default.  It's useful, but it's also narrow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 7:27 AM, Amit Kapila  wrote:
> Now I can look into it further, I have still not gone through in detail
> about your new approach to calculate the reltuples, but I am wondering
> whether there can be anyway with which estimates can be improved with
> different calculation in vac_estimate_reltuples().

I think this is getting at the threshold question for this patch,
which is whether it's really making things better or just moving the
problems around.  I mean, I have no problem accepting that the new
algorithm is (1) reasonably cheap and (2) better in some cases.  But
if it's worse in other cases, which AFAICS hasn't been discussed, then
it's no good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-06-27 Thread Peter Eisentraut
On 6/27/13 10:57 AM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 6/26/13 12:17 PM, Tom Lane wrote:
>>> (I like to
>>> point at mysql's regression tests, which take well over an hour even on
>>> fast machines.  If lots of tests are so helpful, why is their bug rate
>>> no better than ours?)
> 
>> Tests are not (primarily) there to prevent bugs.
> 
> Really?  Pray tell, what do you think they're for?  Particularly code
> coverage tests, which is what these are?

Tests document the existing functionality of the code, to facilitate
refactoring. (paraphrased, Uncle Bob)

Case in point, the existing ALTER DDL code could arguably use even more
refactoring.  But no one wants to do it because it's unclear what will
break.  With the proposed set of tests (which I have not read to
completion), this could become quite a bit easier, both for the coder
and the reviewer.  But these tests probably won't detect, say, locking
bugs in such new code.  That can only be prevented by careful code
review and a clean architecture.  Perhaps, MySQL has neither. ;-)

Code coverage is not an end itself, it's a tool.

In this sense, tests prevent existing functionality being broken, which
might be classified as a bug.  But it's wrong to infer that because
system X has a lot of bugs and a lot of tests, having a lot of tests
must be useless.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation/help for materialized and recursive views

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 5:29 AM, Magnus Hagander  wrote:
>> The functionality of materialized views will (over time) totally swamp
>> that of normal views, so mixing all the corresponding documentation
>> with the documentation for normal views probably doesn’t make things
>> easier for people that are only interested in normal views.
>
> That's a better point I think. That said, it would be very useful if
> it actually showed up in "\h CREATE VIEW" in psql - I wonder if we
> should just add the syntax to that page, and then link said future
> information on a separate page somehow?

IMHO, it's better to keep them separate; they are very different beasts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group Commits Vs WAL Writes

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 3:56 AM, Atri Sharma  wrote:
> When we do a commit, WAL buffers are written to the disk. This has a
> disk latency for the required I/O.

Check.

> Now, with group commits, do we see a spike in that disk write latency,
> especially in the cases where the user has set wal_buffers to a high
> value?

Well, it does take longer to fsync a larger byte range to disk than a
smaller byte range, in some cases.  But it's generally more efficient
to write one larger range than many smaller ranges, so you come out
ahead on the whole.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-27 Thread Robert Haas
On Wed, Jun 26, 2013 at 7:16 AM, Fabien COELHO  wrote:
> Here is a v4 that takes into account most of your points: The report is
> performed for all threads by thread 0, however --progress is not supported
> under thread fork emulation if there are more than one thread. The report
> time does not slip anymore.

I don't believe that to be an acceptable restriction.  We generally
require features to work on all platforms we support.  We have made
occasional compromises there, but generally only when the restriction
is fundamental to the platform rather than for developer convenience.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error code returned by lock_timeout

2013-06-27 Thread Boszormenyi Zoltan

2013-06-27 17:03 keltezéssel, Fujii Masao írta:

On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan  wrote:

Hi,

I just realized that in the original incarnation of lock_timeout,
I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT)
but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED
which is the same as for statement_timeout.

Which would be more correct?

ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable...


I think, too. Can someone commit this one-liner to master and 9.3?

Best regards,
Zoltán Bsöszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

--- src/backend/tcop/postgres.c~	2013-06-27 07:05:08.0 +0200
+++ src/backend/tcop/postgres.c	2013-06-27 17:10:28.040972642 +0200
@@ -2900,7 +2900,7 @@
 			DisableNotifyInterrupt();
 			DisableCatchupInterrupt();
 			ereport(ERROR,
-	(errcode(ERRCODE_QUERY_CANCELED),
+	(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
 	 errmsg("canceling statement due to lock timeout")));
 		}
 		if (get_timeout_indicator(STATEMENT_TIMEOUT, true))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-06-27 Thread Robert Haas
On Tue, Jun 25, 2013 at 4:28 PM, Heikki Linnakangas
 wrote:
>> The only feedback we have on how bad things are is how long it took
>> the last fsync to complete, so I actually think that's a much better
>> way to go than any fixed sleep - which will often be unnecessarily
>> long on a well-behaved system, and which will often be far too short
>> on one that's having trouble. I'm inclined to think think Kondo-san
>> has got it right.
>
> Quite possible, I really don't know. I'm inclined to first try the simplest
> thing possible, and only make it more complicated if that's not good enough.
> Kondo-san's patch wasn't very complicated, but nevertheless a fixed sleep
> between every fsync, unless you're behind the schedule, is even simpler.

I'm pretty sure Greg Smith tried it the fixed-sleep thing before and
it didn't work that well.  I have also tried it and the resulting
behavior was unimpressive.  It makes checkpoints take a long time to
complete even when there's very little data to flush out to the OS,
which is annoying; and when things actually do get ugly, the sleeps
aren't long enough to matter.  See the timings Kondo-san posted
downthread: 100ms delays aren't going let the system recover in any
useful way when the fsync can take 13 s for one file.  On a system
that's badly weighed down by I/O, the fsync times are often
*extremely* long - 13 s is far from the worst you can see.  You have
to give the system a meaningful time to recover from that, allowing
other processes to make meaningful progress before you hit it again,
or system performance just goes down the tubes.  Greg's test, IIRC,
used 3 s sleeps rather than your proposal of 100 ms, but it still
wasn't enough.

> In
> particular, it's easier to tie that into the checkpoint scheduler - I'm not
> sure how you'd measure progress or determine how long to sleep unless you
> assume that every fsync is the same.

I think the thing to do is assume that the fsync phase will take 10%
or so of the total checkpoint time, but then be prepared to let the
checkpoint run a bit longer if the fsyncs end up being slow.  As Greg
has pointed out during prior discussions of this, the normal scenario
when things get bad here is that there is no way in hell you're going
to fit the checkpoint into the originally planned time.  Once all of
the write caches between PostgreSQL and the spinning rust are full,
the system is in trouble and things are going to suck.  The hope is
that we can stop beating the horse while it is merely in intensive
care rather than continuing until the corpse is fully skeletized.
Fixed delays don't work because - to push an already-overdone metaphor
a bit further - we have no idea how much of a beating the horse can
take; we need something adaptive so that we respond to what actually
happens rather than making predictions that will almost certainly be
wrong a large fraction of the time.

To put this another way, when we start the fsync() phase, it often
consumes 100% of the available I/O on the machine, completing starving
every other process that might need any.  This is certainly a
deficiency in the Linux I/O scheduler, but as they seem in no hurry to
fix it we'll have to cope with it as best we can.  If you do the
fsyncs in fast succession (and 100ms separation might as well be no
separation at all), then the I/O starvation of the entire system
persists through the entire fsync phase.  If, on the other hand, you
sleep for the same amount of time the previous fsync took, then on the
average, 50% of the machine's I/O capacity will be available for all
other system activity throughout the fsync phase, rather than 0%.

Now, unfortunately, this is still not that good, because it's often
the case that all of the fsyncs except one are reasonably fast, and
there's one monster one that is very slow.  ext3 has a known bad
behavior that dumps all dirty data for the entire *filesystem* when
you fsync, which tends to create these kinds of effects.  But even on
better-behaved filesystem, like ext4, it's fairly common to have one
fsync that is painfully longer than all the others.   So even with
this patch, there are still going to be cases where the whole system
becomes unresponsive.  I don't see any way to to do better without a
better kernel API, or a better I/O scheduler, but that doesn't mean we
shouldn't do at least this much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.3 doc fix] ECPG VAR command does not describe the actual specification

2013-06-27 Thread Michael Meskes
> Looking around the 9.3 doc, I found a small, but not-insignificant
> error in the documentation.

Thanks for finding and fixing. Patch committed.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error code returned by lock_timeout

2013-06-27 Thread Fujii Masao
On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan  wrote:
> Hi,
>
> I just realized that in the original incarnation of lock_timeout,
> I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT)
> but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED
> which is the same as for statement_timeout.
>
> Which would be more correct?

ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-06-27 Thread Tom Lane
Peter Eisentraut  writes:
> On 6/26/13 12:17 PM, Tom Lane wrote:
>> (I like to
>> point at mysql's regression tests, which take well over an hour even on
>> fast machines.  If lots of tests are so helpful, why is their bug rate
>> no better than ours?)

> Tests are not (primarily) there to prevent bugs.

Really?  Pray tell, what do you think they're for?  Particularly code
coverage tests, which is what these are?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move unused buffers to freelist

2013-06-27 Thread Kevin Grittner
Andres Freund  wrote:

> I don't think I actually found any workload where the bgwriter
> actually wroute out a relevant percentage of the necessary pages.

I had one at Wisconsin Courts.  The database which we targeted with
logical replication from the 72 circuit court databases (plus a few
others) on six database connection pool with about 20 to (at peaks)
hundreds of transactions per second modifying the database (the
average transaction involving about 20 modifying statements with
potentially hundreds of affected rows), with maybe 2000 to 3000
queries per second on a 30 connection pool, wrote about one-third
each of the dirty buffers with checkpoints, background writer, and
backends needing to read a page.  I shared my numbers with Greg,
who I believe used them as one of his examples for how to tune
memory, checkpoints, and background writer, so you might want to
check with him if you want more detail.

Of course, we set bgwriter_lru_maxpages = 1000 and
bgwriter_lru_multiplier = 4, and kept shared_buffers to 2GB to hit
that.  Without the reduced shared_buffers and more aggressive
bgwriter we hit the problem with writes overwhelming the RAID
controller's cache and causing everything in the database to
"freeze" until it cleared some cache space.

I'm not saying this invalidates your general argument; just that
such cases do exist.  Hopefully this data point is useful.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-06-27 Thread Peter Eisentraut
On 6/26/13 12:17 PM, Tom Lane wrote:
> (I like to
> point at mysql's regression tests, which take well over an hour even on
> fast machines.  If lots of tests are so helpful, why is their bug rate
> no better than ours?)

Tests are not (primarily) there to prevent bugs.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-06-27 Thread Peter Eisentraut
On 6/27/13 10:20 AM, Robert Haas wrote:
> So I'd like to endorse Josh's idea: subject to appropriate review,
> let's add these test cases.  Then, if it really turns out to be too
> burdensome, we can take them out, or figure out a sensible way to
> split the suite.  Pushing all of Robins work into a secondary suite,
> or throwing it out altogether, both seem to me to be things which will
> not be to the long-term benefit of the project.

I agree.  If it gets too big, let's deal with it then.  But let's not
make things more complicated because they *might* get too big later.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for CREATE OPERATOR

2013-06-27 Thread Tom Lane
Robins Tharakan  writes:
> 2. Should I assume that all database objects that get created, need to be
> dropped explicitly? Or is this point specifically about ROLES?

It's about any global objects (that wouldn't get dropped by dropping the
regression database).  As far as local objects go, there are benefits to
leaving them around, particularly if they present interesting test cases
for pg_dump/pg_restore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]

2013-06-27 Thread Peter Eisentraut
On 6/23/13 10:50 PM, Tom Lane wrote:
> It'd sure be interesting to know what the SQL committee's target parsing
> algorithm is.

It's whatever Oracle and IBM implement.

> Or maybe they really don't give a damn about breaking
> applications every time they invent a new reserved word?

Well, yes, I think that policy was built into the language at the very
beginning.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Robert Haas
On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan  wrote:
> I'd like to see prizes each release for "best contribution" and "best
> reviewer" - I've thought for years something like this would be worth
> trying. Committers and core members should not be eligible - this is about
> encouraging new people.

Encouraging new people is good, but recognizing sustained, long-term
contributions is good, too.  I think we should do more of that, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)

2013-06-27 Thread Robert Haas
On Thu, Jun 27, 2013 at 10:24 AM, Fujii Masao  wrote:
> On Thu, Jun 27, 2013 at 10:02 PM, Robert Haas  wrote:
>> On Tue, Jun 25, 2013 at 3:09 PM, Fabien COELHO  wrote:
 I think --quiet-log should be spelled --quiet.
>>>
>>> ISTM that --quiet usually means "not verbose on stdout", so I added log
>>> because this was specific to the log output, and that there may be a need
>>> for a --quiet option for stdout at some time.
>>
>> The output that is quieted by -q is not the log output produced by
>> --log; it's the regular progress output on stdout/stderr.
>>
>> So I changed that, and committed this, with some further cosmetic
>> changes.  I made the formatting of the help message more like psql's
>> help message, including adjusting pgbench to start the description of
>> each option in the same column that psql does.  This got rid of a lot
>> of line breaks and IMHO makes the output of pgbench --help quite a bit
>> more readable.  I made stylistic adjustments to the documentation
>> portion of the patch as well, again to match the markup used for psql.
>
> In help messages:
>
> +  "  -s NUM, --scale NUM  scaling factor\n"
>
> This should be "-s, --scale=NUM" for the sake of consistency with other
> options.

Woops, missed that one.  Fixed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-06-27 Thread Robert Haas
On Wed, Jun 26, 2013 at 5:49 PM, Jeff Janes  wrote:
> On Tue, Jun 18, 2013 at 12:09 AM, Heikki Linnakangas
>  wrote:
>> Jeff's patch seems to somewhat alleviate the huge fall in performance I'm
>> otherwise seeing without the nonlocked-test patch. With the nonlocked-test
>> patch, if you squint you can see a miniscule benefit.
>>
>> I wasn't expecting much of a gain from this, just wanted to verify that
>> it's not making things worse. So looks good to me.
>
> Hi Heikki,
>
> Thanks for trying out the patch.
>
> I see in the commitfest app it is set to "Waiting on Author" (but I don't
> know who "maiku41" is).
>
> Based on the comments so far, I don't know what I should be doing on it at
> the moment, and I thought perhaps your comment above meant it should be
> "ready for committer".
>
> If we think the patch has a risk of introducing subtle errors, then it
> probably can't be justified based on the small performance gains you saw.
>
> But if we think it has little risk, then I think it is justified simply
> based on cleaner code, and less confusion for people who are tracing a
> problematic process.  If we want it to do a random escalation, it should
> probably look like a random escalation to the interested observer.

I think it has little risk.  If it turns out to be worse for
performance, we can always revert it, but I expect it'll be better or
a wash, and the results so far seem to bear that out.  Another
interesting question is whether we should fool with the actual values
for minimum and maximum delays, but that's a separate and much harder
question, so I think we should just do this for now and call it good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move unused buffers to freelist

2013-06-27 Thread Andres Freund
On 2013-06-27 09:50:32 -0400, Robert Haas wrote:
> On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund  wrote:
> > Contention wise I aggree. What I have seen is that we have a huge
> > amount of cacheline bouncing around the buffer header spinlocks.
> 
> How did you measure that?

perf record -e cache-misses. If you want it more detailed looking at
{L1,LLC}-{load,store}{s,misses} can sometimes be helpful too.
Also, running perf stat -vvv postgres -D ... for a whole benchmark can
be useful to compare how much a change influences cache misses and such.

For very detailed analysis running something under valgrind/cachegrind
can be helpful too, but I usually find perf to be sufficient.

> > I have previously added some adhoc instrumentation that printed the
> > amount of buffers that were required (by other backends) during a
> > bgwriter cycle and the amount of buffers that the buffer manager could
> > actually write out.
> 
> I think you can see how many are needed from buffers_alloc.  No?

Not easily correlated with bgwriter activity. If we cannot keep up
because it's 100% busy writing out buffers I don't have many problems
with that. But I don't think we often are.

> > Problems with the current code:
> >
> > * doesn't manipulate the usage_count and never does anything to used
> >   pages. Which means it will just about never find a victim buffer in a
> >   busy database.
> 
> Right.  I was thinking that was part of this patch, but it isn't.  I
> think we should definitely add that.  In other words, the background
> writer's job should be to run the clock sweep and add buffers to the
> free list.

We might need to split it into two for that. One process to writeout
dirty pages, one to populate the freelist.
Otherwise we will probably regularly hit the current scalability issues
because we're currently io contended. Say during a busy or even
immediate checkpoint.

>  I think we should also split the lock: a spinlock for the
> freelist, and an lwlock for the clock sweep.

Yea, thought about that when writing the thing about the exclusive lock
during the clocksweep.

> > * by far not aggressive enough, touches only a few buffers ahead of the
> >   clock sweep.
> 
> Check.  Fixing this might be a separate patch, but then again maybe
> not.  The changes we're talking about here provide a natural feedback
> mechanism: if we observe that the freelist is empty (or less than some
> length, like 32 buffers?) set the background writer's latch, because
> we know it's not keeping up.

Yes, that makes sense. Also provides adaptability to bursty workloads
which means we don't have too complex logic in the bgwriter for that.

> > There's another thing we could do to noticeably improve scalability of
> > buffer acquiration. Currently we do a huge amount of work under the
> > freelist lock.
> > ...
> > So, we perform the entire clock sweep until we found a single buffer we
> > can use inside a *global* lock. At times we need to iterate over the
> > whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all
> > the usage counts enough (if the database is busy it can take even
> > longer...).
> > In a busy database where usually all the usagecounts are high the next
> > backend will touch a lot of those buffers again which causes massive
> > cache eviction & bouncing.
> >
> > It seems far more sensible to only protect the clock sweep's
> > nextVictimBuffer with a spinlock. With some care all the rest can happen
> > without any global interlock.
> 
> That's a lot more spinlock acquire/release cycles, but it might work
> out to a win anyway.  Or it might lead to the system suffering a
> horrible spinlock-induced death spiral on eviction-heavy workloads.

I can't imagine it to be worse that what we have today. Also, nobody
requires us to only advance the clocksweep by one page, we can easily do
it say 29 pages at a time or so if we detect the lock is contended.

Alternatively it shouldn't be too hard to make it into an atomic
increment, although that requires some trickery to handle the wraparound
sanely.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >