from:"Aidan Van Dyk"

Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Aidan Van Dyk

On Tue, May 26, 2015 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  Realistically, with merge.conflictstyle = diff3 (why is this not the
  default?), resolving whitespace conflicts that occur when you try to
  cherry-pick is typically not very difficult.

 Really?  The problems I have generally come from places where pgindent
 has changed the line breaks, not just horizontal spacing.  I haven't
 seen anything that copes with this, certainly not git.


Iif pgindet were easy to run, committers could start complaining if patch
submissions don't abide by pg coding style conventions.

Part of submitting a patch would be making sure that an pgindent run
after the patch has been applied is still a no-op...  A reviewer could
easily check it, and a committer could easily squash the pgindent run
result in if they wanted to be nice to a 1st time submitter...

If every patch were pgindent clean, then you would never end up with
these huge pgindent commits causing pain...

a.

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-27 Thread Aidan Van Dyk

On Wed, Aug 27, 2014 at 3:32 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Hello Andres,

  [...]

 I think you're misunderstanding how spread checkpoints work.


 Yep, definitely:-) On the other hand I though I was seeking something
 simple, namely correct latency under small load, that I would expect out
 of the box.

 What you describe is reasonable, and is more or less what I was hoping
 for, although I thought that bgwriter was involved from the start and
 checkpoint would only do what is needed in the end. My mistake.


If all you want is to avoid the write storms when fsyncs start happening on
slow storage, can you not just adjust the kernel vm.dirty* tunables to
start making the kernel write out dirty buffers much sooner instead of
letting them accumulate until fsyncs force them out all at once?


a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a
slave.

Re: [HACKERS] db_user_namespace a temporary measure

2014-03-11 Thread Aidan Van Dyk

So I'll admit to using it, only in toy setups...

I use it with trust and ident, on local connections though, not password

I try to keep my laptops clean of mysqld, and I use PG.  And only on my
laptop/PC,  I make a database for every user...  And every app get's a
userid, and a schemaEvery user get's passwordless access to their
database.  And the userid associates the app, and the defaults that get
used on their connections.

So, I think it's neat, but wouldn't be put out if it was removed ...


On Tue, Mar 11, 2014 at 9:47 AM, Magnus Hagander mag...@hagander.netwrote:

 On Tue, Mar 11, 2014 at 2:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Magnus Hagander mag...@hagander.net writes:
  On Sun, Mar 9, 2014 at 9:00 PM, Thom Brown t...@linux.com wrote:
  It will be 12 years this year since this temporary measure was
  added.  I'm just wondering, is there any complete solution that
  anyone had in mind yet?  Or should this just be deprecated?

  I'd say +1 to remove it. That would also make it possible to get id of
  password authentication...

 If we remove it without providing a substitute feature, people who are
 using it will rightly complain.

 Are you claiming there are no users, and if so, on what evidence?


 I am claiming that I don't think anybody is using that, yes.

 Based on the fact that I have *never* come across it on any system I've
 come across since, well, forever. Except once I think, many years ago, when
 someone had enabled it by mistake and needed my help to remove it...

 But we should absolutely deprecate it first in that place. Preferrably
 visibly (e.g. with a log message when people use it). That could at least
 get those people who use it to let us know they do, to that way figure out
 if they do - and can de-deprecate it.

 Or if someone wants to fix it properly of course :)

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/




-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a
slave.

Re: [HACKERS] drop duplicate buffers in OS

2014-01-16 Thread Aidan Van Dyk

Can we just get the backend that dirties the page to the posix_fadvice
DONTNEED?

Or have another helper that sweeps the shared buffers and does this
post-first-dirty?

a.


On Wed, Jan 15, 2014 at 1:34 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jan 15, 2014 at 1:53 AM, KONDO Mitsumasa
 kondo.mitsum...@lab.ntt.co.jp wrote:
  I create patch that can drop duplicate buffers in OS using usage_count
  alogorithm. I have developed this patch since last summer. This feature
 seems to
  be discussed in hot topic, so I submit it more faster than my schedule.
 
  When usage_count is high in shared_buffers, they are hard to drop from
  shared_buffers. However, these buffers wasn't required in file cache.
 Because
  they aren't accessed by postgres(postgres access to shared_buffers).
  So I create algorithm that dropping file cache which is high usage_count
 in
  shared_buffers and is clean state in OS. If file cache are clean state
 in OS, and
  executing posix_fadvice DONTNEED, it can only free in file cache without
 writing
  physical disk. This algorithm will solve double-buffered situation
 problem and
  can use memory more efficiently.
 
  I am testing DBT-2 benchmark now...

 The thing about this is that our usage counts for shared_buffers don't
 really work right now; it's common for everything, or nearly
 everything, to have a usage count of 5.  So I'm reluctant to rely on
 that for much of anything.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a
slave.

Re: [HACKERS] Cause of recent buildfarm failures on hamerkop

2012-09-14 Thread Aidan Van Dyk

On Fri, Sep 14, 2012 at 4:56 AM, Magnus Hagander mag...@hagander.net wrote:

 I assume this means that the git checkout was created with options that
 allowed conversion of text files to \r\n line endings.

If we have text files that we need to be binary equivilents for
the purpose of diffing, we should probably attribute them in git
attributes to make sure they are not considered text autocrlf'able.
It could be as simple as adding:
*.out-text
*.data   -text
*.source -text
into src/test/regress/.gitattributes

 I'm not sure if we should just write this off as pilot error, or if we
 should try to make the regression tests proof against such things.  If
 the latter, how exactly?

 I don't think we need to make them proof against it. But it wouldn't
 hurt to have a check that threw a predictable error when it happens.
 E.g. a first step in the regression tests that just verifies what kind
 of line endings are in a file. Could maybe be as simple as checking
 the size of the file?

This leads to making sure you keep your verification list in source,
and up-to-date too...

a.



-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-10 Thread Aidan Van Dyk

On Mon, Sep 10, 2012 at 11:12 AM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Sun, Sep 2, 2012 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Notably, while the lack of any background processes is just what you want
 for pg_upgrade and disaster recovery, an ordinary application is probably
 going to want to rely on autovacuum; and we need bgwriter and other
 background processes for best performance.  So I'm speculating about
 having a postmaster process that isn't listening on any ports, but is
 managing background processes in addition to a single child backend.
 That's for another day though.


 Since we are forking a child process anyway, and potentially other auxiliary
 processes too, would it make sense to allow multiple backends too (allow
 multiple local applications connect to this instance)? I believe (I may be
 wrong) that embedded databases (SQLLite et. al.) use a library interface, in
 that the application makes a library call and waits for that API call to
 finish (unless, of course, the library supports async operations or the
 application uses threads). The implementation you are proposing uses socket
 communication, which lends itself very easily to client-server model, and if
 possible, it should be leveraged to provide for multiple applications
 talking to one local DB.

 I have this use case in mind: An application is running using this
 interface, and an admin now wishes to do some maintenance, or inspect
 something, so they can launch local pgAdmin using the same connection string
 as used by the original application. This will allow an admin to perform
 tuning, etc. without having to first shutdown the application.

 Here's how this might impact the design (I may very well be missing many
 other things, and I have no idea if this is implementable or not):

 .) Database starts when the first such application is launched.
 .) Database shuts down when the last such application disconnects.
 .) Postgres behaves much like a regular Postgres installation, except that
 it does not accept connections over TCP/IP or Unix Doamin Sockets.
 .) The above implies that we use regular Postmaster machinery, and not the
 --sinlgle machinery.
 .) Second and subsequent applications use the postmaster.pid (or something
 similar) to find an already running instance, and connect to it.
 .) There's a race condition where the second application is starting up,
 hoping to connect to an already running insatnce, but the first application
 disconnects (and hence shuts down the DB) before the second one can
 successfully connect.

 I haven't thought much about the security implications of this yet. Maybe
 the socket permissions would restrict an unauthorized user user from
 connecting to this instance.

That's kind of the reason why I suggested up thread tring to decouple
the *starting* of the backend with the options to PQ connect...

A Helper function in libpq could easily start the backend, and
possibly return a conninfostring to give PQconnectdb...

But if they are decoupled, I could easily envision an app that
pauses it's use of the backend to allow some other libpq access to
it for a period.

You'd have to trust whatever else you let talk on the FD to the
backend, but it might be useful...
-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-05 Thread Aidan Van Dyk

So, in the spirit of not painting ourselves into a tiny corner here on
the whole single backend and embedded database problem with pg
options, can we generalize this a bit?

Any way we could make psql connect to a given fd, as an option?  In
theory, that could be something opened by some out-side-of-postgresql
tunnel with 3rd party auth in the same app that uses libpq directly,
or it could be a fd prepared  by something that specifically launched
a single-backend postgres, like in the case of pg_upgrade, pg_uprade
itself, and passed to psql, etc, which would be passed in as options.

In theory, that might even allow the possibility of starting the
single-backend only once and passing it to multiple clients in
succession, instead of having to stop/start the backend between each
client.  And it would allow the possiblity of something (pg_upgrade,
or some other application) to control the start/stop of the backend
outside the libpq connection.

Now, I'm familiar with the abilities related to passing fd's around in
Linux, but have no idea if we'd have comparable methods to use on
Windows.

a.

On Wed, Sep 5, 2012 at 8:11 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Sep 5, 2012 at 3:17 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 9/5/12 5:59 PM, Daniel Farina wrote:
 I agree with this, even though in theory (but not in practice)
 creative use of unix sockets (sorry windows, perhaps some
 port-allocating and URL mangling can be done instead) and conventions
 for those would allow even better almost-like-embedded results,
 methinks.  That may still be able to happen.

 Sure, everyone who cares can already do this, but some people probably
 don't care enough.  Also, making this portable and robust for everyone
 to use, not just your local environment, is pretty tricky.  See
 pg_upgrade test script, for a prominent example.

 To my knowledge, no one has even really seriously tried to package it
 yet and then told the tale of woe, and it was an especially
 un-gratifying exercise for quite a while on account of multiple
 postgreses not getting along on the same machine because of SysV
 shmem.

 The bar for testing is a lot different than pg_upgrade (where a
 negative consequence is confusing and stressful downtime), and many
 programs use fork/threads and multiple connections even in testing,
 making its requirements different.

 So consider me still skeptical given the current reasoning that unix
 sockets can't be a good-or-better substitute, and especially
 accounting for programs that need multiple backends.

 --
 fdr


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] build farm machine using make -j 8 mixed results

2012-09-04 Thread Aidan Van Dyk

On Sep 4, 2012 6:06 PM, Andrew Dunstan and...@dunslane.net wrote:

Frankly, I have had enough failures of parallel make that I think doing
this would generate a significant number of non-repeatable failures (I had
one just the other day that took three invocations of make to get right).
So I'm not sure doing this would advance us much, although I'm open to
persuasion.

Seeing as most PostgreSQL bugs appear with concurrency, I think we should
leave our default config with 1 for max connections.

;-)

Parallel make failures are bugs in the dependencies as described in our
make files.

For the build phase, I don't recall parallel problems and as a habit I
usually use parallel makes. I would like that to be supported and I think
I've seen fixes applied when we had issues before. Cutting build times to
1/2 to 1/4 is good.

Checks and tests are harder because often they can't run in parallel. But
then we shouldn't have them listed as independent prerequisites for
targets. Ideally. But fixing it might not be worth the cost since an
acceptable work around (rely upon make to serialize the test sequences in
the particular order) is pretty painless (so far).

Of course, having the ability to run the tests 8 at a time (or more) and
reduce the time by 80% would be nice .;-)
On Sep 4, 2012 6:06 PM, Andrew Dunstan and...@dunslane.net wrote:

On 09/04/2012 05:49 PM, Peter Eisentraut wrote:

On 9/1/12 12:12 PM, Robert Creager wrote:

I change the build-farm.conf file to have the following make line:

make = 'make -j 8', # or gmake if required. can include path if
necessary.

2 pass, 4 fail. Is this a build configuration you want to pursue?

Sure that would be useful, but it's pretty clear that the check stages
don't work in parallel. It think it's because the ports conflict, but
there could be any number of other problems.

That said, it would be useful, in my mind, to support parallel checks.
But unless someone is going to put in the work first, you should
restrict your parallel runs to the build and install phases.

The buildfarm code doesn't contain a facility to use a different make
incantation for each step. It's pretty much an all or nothing deal. Of
course, you can hack run_build.pl to make it do that, but I highly
discourage that. For one thing, it makes upgrading that much more
difficult. All the tweaking is supposed to be done vie the config file. I
guess I could add a setting that allowed for per step make flags.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using pg_upgrade on log-shipping standby servers

2012-07-20 Thread Aidan Van Dyk

If you're wanting to automatically do some upgrades wouldn't an easier route be:

1) run pg_upgrade, up to the point where it actually start's
copying/linking in old cluster data files, and stop the new
postmaster.
2) Take a base backup style copy (tar, rsync, $FAVOURITE) of the new
cluster (small, since without data files)
3) Have pg_upgrade leave a log of exactly which old cluster data files
go where in the new cluster

That way, anybody, any script, etc who wants to make a new standby
from and old one only needs the pg_upgrade base backup (which should
be small, no data, just catalog stuff), and the log of which old files
to move where.

The only pre-condition is that the standby's old pg *APPLIED* WAL up
to the exact same point as the master's old pg.  In that case the
standby's old cluster data files should same enough (maybe hint bits
off?) to be used.

a.

On Fri, Jul 20, 2012 at 12:25 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jul 17, 2012 at 06:02:40PM -0400, Bruce Momjian wrote:
 Second, the user files (large) are certainly identical, it is only the
 system tables (small) that _might_ be different, so rsync'ing just those
 would add the guarantee, but I know of no easy way to rsync just the
 system tables.

 OK, new idea.  I said above I didn't know how to copy just the non-user
 table files (which are not modified by pg_upgrade), but actually, if you
 use link mode, the user files are the only files with a hard link count
 of 2.  I could create a script that copied from the master to the slave
 only those files with a link count of one.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Aidan Van Dyk

On Thu, Jul 12, 2012 at 9:21 AM, Shaun Thomas stho...@optionshouse.com wrote:

 So far as transaction durability is concerned... we have a continuous
 background rsync over dark fiber for archived transaction logs, DRBD for
 block-level sync, filesystem snapshots for our backups, a redundant async DR
 cluster, an offsite backup location, and a tape archival service stretching
 back for seven years. And none of that will cause the master to stop
 processing transactions unless the master itself dies and triggers a
 failover.

Right, so if the dark fiber between New Orleans and Seattle (pick two
places for your datacenter) happens to be the first thing failing in
your NO data center.  Disconenct the sync-ness, and continue.  Not a
problem, unless it happens to be Aug 29, 2005.

You have lost data.  Maybe only a bit.  Maybe it wasn't even
important.  But that's not for PostgreSQL to decide.

But because your PG on DRDB continued when it couldn't replicate to
Seattle, it told it's clients the data was durable, just minutes
before the whole DC was under water.

OK, so a wise admin team would have removed the NO DC from it's
primary role days before that hit.

Change the NO to NYC and the date Sept 11, 2001.

OK, so maybe we can concede that these types of major catasrophies are
more devestating to us than loosing some data.

Now your primary server was in AWS US East last week.  It's sync slave
was in the affected AZ, but your PG primary continues on, until, since
it was a EC2 instance, it disappears.  Now where is your data?

Or the fire marshall orders the data center (or whole building) EPO,
and the connection to your backup goes down minutes before your
servers or other network peers.

 Using PG sync in its current incarnation would introduce an extra failure
 scenario that wasn't there before. I'm pretty sure we're not the only ones
 avoiding it for exactly that reason. Our queue discards messages it can't
 fulfil within ten seconds and then throws an error for each one. We need to
 decouple the secondary as quickly as possible if it becomes unresponsive,
 and there's really no way to do that without something in the database, one
 way or another.

It introduces an extra failure, because it has introduce an extra
data durability guarantee.

Sure, many people don't *really* want that data durability guarantee,
even though they would like the maybe guaranteed version of it.

But that fine line is actually a difficult (impossible?) one to define
if you don't know, at the moment of decision, what the next few
moments will/could become.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-12 Thread Aidan Van Dyk

On Thu, Jul 12, 2012 at 8:27 PM, Jose Ildefonso Camargo Tolosa

 Yeah, you need that with PostgreSQL, but no with DRBD, for example
 (sorry, but DRBD is one of the flagships of HA things in the Linux
 world).  Also, I'm not convinced about the 2nd standby thing... I
 mean, just read this on the docs, which is a little alarming:

 If primary restarts while commits are waiting for acknowledgement,
 those waiting transactions will be marked fully committed once the
 primary database recovers. There is no way to be certain that all
 standbys have received all outstanding WAL data at time of the crash
 of the primary. Some transactions may not show as committed on the
 standby, even though they show as committed on the primary. The
 guarantee we offer is that the application will not receive explicit
 acknowledgement of the successful commit of a transaction until the
 WAL data is known to be safely received by the standby.

 So... there is no *real* warranty here either... I don't know how I
 skipped that paragraph before today I mean, this implies that it
 is possible that a transaction could be marked as commited on the
 master, but the app was not informed on that (and thus, could try to
 send it again), and the transaction was NOT applied on the standby
 how can this happen? I mean, when the master comes back, shouldn't the
 standby get the missing WAL pieces from the master and then apply the
 transaction? The standby part is the one that I don't really get, on
 the application side... well, there are several ways in which you can
 miss the commit confirmation: connection issues in the worst moment,
 and the such, so, I guess it is not *so* serious, and the app should
 have a way of checking its last transaction if it lost connectivity to
 server before getting the transaction commited.

But you already have that in a single server situation as well.  There
is a window between when the commit is durable (and visible to
others, and will be committed after recovery of a crash), when the
client doesn't yet know it's committed (and might never get the commit
message due to server crash, network disconnect, client middle-tier
crash, etc).

So people are already susceptible to that, and defending against it, no? ;-)

And they are susceptible to that if they are on PostgreSQL, Oracle, MS
SQL, DB2, etc.

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-10 Thread Aidan Van Dyk

On Tue, Jul 10, 2012 at 9:28 AM, Shaun Thomas stho...@optionshouse.com wrote:

 Async is simply too slow for our OLTP system except for the disaster
 recovery node, which isn't expected to carry on within seconds of the
 primary's failure. I briefly considered sync mode when it appeared as a
 feature, but I see it's still too early in its development cycle, because
 there are no degraded operation modes. That's fine, I'm willing to wait.

But this is where some of us are confused with what your asking for.
async is actually *FASTER* than sync.  It's got less over head.
Synchrounous replication is basicaly async replication, with an extra
overhead, and an artificial delay on the master for the commit to
*RETURN* to the client.  The data is still committed and view able to
new queries on the master, and the slave at the same rate as with
async replication.  Just that the commit status returned to the client
is delayed.

So the async is too slow is what we don't understand.

 I just don't understand the push-back, I guess. RAID-1 is the poster child
 for synchronous writes for fault tolerance. It will whine constantly to
 anyone who will listen when operating only on one device, but at least it
 still works. I'm pretty sure nobody would use RAID-1 if its failure mode
 was: block writes until someone installs a replacement disk.

I think most of us in the synchronous replication must be syncronous
replication camp are there because the guarantees of a simple RAID 1
just isn't good enough for us ;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-07-09 Thread Aidan Van Dyk

On Mon, Jul 9, 2012 at 3:26 PM, Joshua D. Drake j...@commandprompt.com wrote:

 On 07/09/2012 12:02 PM, Josh Berkus wrote:


 Hackers,

 So I want to repeat this because I think we are conflating several uses
 for a bug tracker which aren't the same, and which really need to be
 dealt with seperately.

 -- Better CF App: to track feature submissions, discussion, revisions
 and reviews.

 -- Bug Submitter: easy-access way for users to submit bugs and check on
 their status later.


 Not sure how to handle the first two. Bug submission is always a pita and
 although we could use the fix-bug-later app, it would clutter it as we were
 trying to determine real bugs vs user error.


And whatever you/we do, be *VERY* aware of the
pile-of-...-in-the-bugtracker problem.  I just happend to have Joel
Spolsky's post come through my RSS reader, where he talked about about
bugtrackers, and suggested:

- Do not allow more than two weeks (in fix time) of bugs to get into
the bug database.
- If you have more than that, stop and fix bugs until you feel like
you’re fixing stupid
  bugs. Then close as “won’t fix” everything left in the bug database.
Don’t worry, the
  severe bugs will come back.

The biggest problem of whatever tool is used for anything, is making
sure tool is useful enough to people that need to use it to make it
worth their while.

A tracker (of any type) that is even *insanely* useful for users, but
that doesn't give *developpers* (note, that's developpers, not
managers, or cat-herders, or cheer-leaders)  any extra value is bound
to fill and soon become un-usefull for even uses..

If you want the develops to use it, it has to be worth their time *to
them* to use it.

Witness the hundreds of graves that are he thousands bugzilla bugs out
there filed against even active open-source projects.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-07-07 Thread Aidan Van Dyk

On Fri, Jul 6, 2012 at 4:50 PM, Peter Eisentraut pete...@gmx.net wrote:

 I have code in the wild that defines new operators and casts and has no
 C code and is not in an extension and has no business being in an
 extension.

Nobody is claiming that pgdump shouldn't dump it.

But, since you're using operators, what would you think is an
appropriate name for the file the operator is dumped into?

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-07-05 Thread Aidan Van Dyk

On Thu, Jul 5, 2012 at 12:57 PM, David E. Wheeler da...@justatheory.com wrote:
 On Jul 5, 2012, at 3:21 PM, Andrew Dunstan wrote:

 No they are not necessarily one logical unit. You could have a bunch of
 functions called, say, equal which have pretty much nothing to do with
 each other, since they refer to different types.

 +1 from me for putting one function definition per file.

 +1 for an option (I prefer one file for my projects, but might need multiple 
 files for other projects).

-1

I'd rather have the few overloaded-functions in one file (hopefully
with deterministic ordering) and a sane, simple filename, than have
every function in every database in a separate file with some strange
mess in the filename that makes me cringe every time I see it.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Covering Indexes

2012-06-28 Thread Aidan Van Dyk

On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 The other question is whether such an index would prevent an update from
 being HOT when the non-indexed values are touched.  That could be a
 significant difference.

I don't see Index-Only-Scans being something that will be used in
high churn tables.

So as long as the value of these covering/included fields is tied to
index-only scans, maybe it isn't a problem?

Of course, we have have a hard time convincing people that the  index
only scans they want can't be index only because heap pages aren't
all visible...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-20 Thread Aidan Van Dyk

On Wed, Jun 20, 2012 at 9:21 AM, Amit Kapila amit.kap...@huawei.com wrote:

 Example Scenario -

 Now assume we have Data files and WAL files intact and only control file is 
 lost.


Just so I understand correctly, the aim of this is to fix the
situation where out of the thousands of files and 100s of GB of data
in my pg directory, the *only* corruption is that a single file
pg_control file is missing?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-20 Thread Aidan Van Dyk

On Wed, Jun 20, 2012 at 3:15 PM, Andres Freund and...@2ndquadrant.com wrote:

To recap why we think origin_id is a sensible design choice:

There are many sensible replication topologies where it does make sense that
you want to receive changes (on node C) from one node (say B) that originated
from some other node (say A).
Reasons include:
* the order of applying changes should be as similar as possible on all nodes.
That means when applying a change on C that originated on B and if changes
replicated faster from A-B than from A-C you want to be at least as far with
the replication from A as B was. Otherwise the conflict ratio will increase.
If you can recreate the stream from the wal of every node and still detect
where an individual change originated, thats easy.

OK, so in this case, I still don't see how the origin_id is even enough.

C applies the change originally from A (routed through B, because it's
faster). But when it get's the change directly from A, how does it
know to *not* apply it again?

* the interconnects between some nodes may be more expensive than from others
* an interconnect between two nodes may fail but others dont

Because of that we think its sensible to be able generate the full LCR stream
with all changes, local and remote ones, on each individual node. If you then
can filter on individual origin_id's you can build complex replication
topologies without much additional complexity.

I'm not saying that we need to implement all possible conflict
resolution algorithms right now - on the contrary I think conflict
resolution belongs outside core - but if we're going to change the WAL
record format to support such conflict resolution, we better make sure
the foundation we provide for it is solid.
I think this already provides a lot. At some point we probably want to have
support for looking on which node a certain local xid originated and when that
was originally executed. While querying that efficiently requires additional
support we already have all the information for that.

There are some more complexities with consistently determining conflicts on
changes that happened in a very small timewindown on different nodes but thats
something for another day.

BTW, one way to work around the lack of origin id in the WAL record
header is to just add an origin-id column to the table, indicating the
last node that updated the row. That would be a kludge, but I thought
I'd mention it..
Yuck. The aim is to improve on whats done today ;)

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-20 Thread Aidan Van Dyk

On Wed, Jun 20, 2012 at 3:27 PM, Andres Freund and...@2ndquadrant.com wrote:

 OK, so in this case, I still don't see how the origin_id is even enough.

 C applies the change originally from A (routed through B, because it's
 faster).  But when it get's the change directly from A, how does it
 know to *not* apply it again?
 The lsn of the change.

So why isn't the LSN good enough for when C propagates the change back to A?

Why does A need more information than C?

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-20 Thread Aidan Van Dyk

On Wed, Jun 20, 2012 at 3:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Wednesday, June 20, 2012 09:41:03 PM Aidan Van Dyk wrote:
 On Wed, Jun 20, 2012 at 3:27 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  OK, so in this case, I still don't see how the origin_id is even
  enough.
 
  C applies the change originally from A (routed through B, because it's
  faster).  But when it get's the change directly from A, how does it
  know to *not* apply it again?
 
  The lsn of the change.

 So why isn't the LSN good enough for when C propagates the change back to
 A?
 Because every node has individual progress in the wal so the lsn doesn't mean
 anything unless you know from which node it originally is.

 Why does A need more information than C?
 Didn't I explain that two mails up?

Probably, but that didn't mean I understood it... I'm trying to keep up here ;-)

So the origin_id isn't strictly for the origin node to know filter an
LCR it's applied already, but it is also to correlate the LSN's
because the LSN's of the re-generated LCR's are meant to contain the
originating nodes's LSN, and every every node applying LCRs needs to
be able to know where it is in every node's LSN progress.

I had assumed any LCR's generated on a node woudl be relative to the
LSN sequencing of that node.

 Now imagine a scenario where #1 and #2 are in Europe and #3 and #4 in north
 america.
 #1 wants changes from #3 and #4 when talking to #3 but not those from #2  and
 itself (because that would be cheaper to get locally).

Right, but if the link between #1 and #2 ever slows down, changes
from #3 and #4 may very well already have #2's changes, and even
require them.

#1 has to apply them, or is it going to stop applying LCR's from #3
when it see's LCRs from #3 coming in that originate on #2 and have
LSNs greater than what it's so far received from #2?


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] synchronous_commit and remote_write

2012-05-08 Thread Aidan Van Dyk

On Tue, May 8, 2012 at 9:13 PM, Bruce Momjian br...@momjian.us wrote:
 It seems pretty confusing that synchronous_commit = 'remote_write' means
 write confirmed to the remote socket, not write to the file system.  Is
 there no better term we could some up with?  remote_pipe?
 remote_transfer?

remote_accept?

And then, I could envision (if it continues down this road):
  off
  local
  remote_accept
  remote_write
  remote_sync
  remote_apply (implies visible to new connections on the standby)

Not saying all off these are necessarily worth it, but they are all
the various stages of WAL processing on the remote...



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] synchronous_commit and remote_write

2012-05-08 Thread Aidan Van Dyk

On Tue, May 8, 2012 at 10:09 PM, Bruce Momjian br...@momjian.us wrote:

 And then, I could envision (if it continues down this road):
   off
   local
   remote_accept
   remote_write
   remote_sync
   remote_apply (implies visible to new connections on the standby)

 Not saying all off these are necessarily worth it, but they are all
 the various stages of WAL processing on the remote...

 The _big_ problem with write is that we might need that someday to
 indicate some other kind of write, e.g. write to kernel, fsync to disk.

Well, yes, but in the sequence of:
   remote_accept
   remote_write
   remote_sync

it is much more clear...

With a single remote_write, you can't tell just by itself it that is
intended to  be it's a write *to* the remote, or it's a write *by*
the remote.  But when combined with other terms, only one makes sense
in all cases.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Http Frontend implemented using pgsql?

2012-03-31 Thread Aidan Van Dyk

On Sat, Mar 31, 2012 at 6:27 AM, Dobes Vandermeer dob...@gmail.com wrote:
 I had a thought that it might be interesting to have a simple C fronted that
 converts HTTP to and from some pgsql friendly structure and delegates all
 the core logic to a stored procedure in the database.

 This might make it easier to hack on the API without worrying about memory
 management and buffer overflow vulnerabilities.

 Is this a brain wave or a brain fart?

Something along the lines of a stripped down mod_libpq?
   http://asmith.id.au/mod_libpq.html

If we had something along the lines of JSON - Row/setof in core, I
could see this being a very nice RPC mechanism for PostgreSQL.

Plain HTTP still give's you the session/transaction control problem of
stateless clients, but maybe coupled with PgPool you could cobble
something together...

a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-11 Thread Aidan Van Dyk

On Wed, Jan 11, 2012 at 7:13 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

At the moment, double-writes are done in one batch, fsyncing the
double-write area first and the data files immediately after that. That's
probably beneficial if you have a BBU, and/or a fairly large shared_buffers
setting, so that pages don't get swapped between OS and PostgreSQL cache too
much. But when those assumptions don't hold, it would be interesting to
treat the double-write buffers more like a 2nd WAL for full-page images.
Whenever a dirty page is evicted from shared_buffers, write it to the
double-write area, but don't fsync it or write it back to the data file yet.
Instead, let it sit in the double-write area, and grow the double-write
file(s) as necessary, until the next checkpoint comes along.

Ok, but for correctness, you need to *fsync* the double-write buffer
(WAL) before you can issue the write on the normal datafile at all.

All the double write can do is move the FPW from the WAL stream (done
at commit time) to some other double buffer space (which can be done
at write time).

It still has to fsync the write-ahead part of the double write
before it can write any of the normal part, or you leave the the
torn-page possibility.

And you still need to keep all the write-ahead part of the
double-write around until all the normal writes have been fsynced
(checkpoint time) so you can redo them all on crash recovery.

So, I think that the work in double-writes has merit, but if it's done
correctly, it isn't this magic bullet that suddenly gives us atomic,
durable writes for free.

It has major advantages (including, but not limited too)
1) Moving the FPW out of normal WAL/commit processing
2) Allowing fine control of (possibly seperate) FPW locations on a per
tablespace/relation basis

It does this by moving the FPW/IO penalty from the commit time of a
backend dirtying the buffer first, to the eviction time of a backend
evicting a dirty buffer. And if you're lucky enough that the
background writer is the only one writing dirty buffers, you'll see
lots of improvements in your performance (equivilent of running with
current FPW off). But I have a feeling that many of us see backends
having to write dirty buffers often enough too that the reduction in
commit/WAL latency will be offset (hopefully not as much) by increased
query processing time as backends double-write dirty buffers.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-11 Thread Aidan Van Dyk

On Wed, Jan 11, 2012 at 7:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 The question is how you prevent torn pages when a slave server crashes
 during replay.  Right now, the presence of FPIs in the WAL stream,
 together with the requirement that replay restart from a checkpoint,
 is sufficient to guarantee that any torn pages will be fixed up.  If
 you remove FPIs from WAL and don't transmit some substitute information,
 ISTM you've lost protection against slave server crashes.

This double-write stragegy is all an attempt to make writes durable.
 You remove the FPW from the WAL stream only because you're writes
are make durable using some other stragegy, like the double-write.
Any standby will need to be using some stragegy to make sure it's
writes are durable, namely, the same double-write.

So on a standby crash, it will replay whatever FPWs it has in the
double-write buffer it has accumulated to make sure it's writes were
consistent.  Exactly as the master would do.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Aidan Van Dyk

On Fri, Jan 6, 2012 at 5:17 PM, Merlin Moncure mmonc...@gmail.com wrote:
On Fri, Jan 6, 2012 at 2:03 PM, Andres Freund and...@anarazel.de wrote:
The standby can set hint bits locally that weren't set on the data it
received from the master. This will require rechecksumming and
rewriting the page, but obviously we can't write the WAL records
needed to protect those writes during recovery. So a crash could
create a torn page, invalidating the checksum.
Err. Stupid me, thanks.

Ignoring checksum errors during Hot Standby operation doesn't fix it,
either, because eventually you might want to promote the standby, and
the checksum will still be invalid.
Its funny. I have the feeling we all are missing a very obvious brilliant
solution to this...

Like getting rid of hint bits?

Or even just not bothering to consider them as making buffers dirty,
so the only writes are already protected by the double-write (WAL, or
if they get some DW outside of WAL).

I think I've said it before, but I'm guessing OLTP style database
rarely have pages written that are dirty that aren't covered by real
changes (so have the FPW anyways) and OLAP type generally freeze after
loads to avoid the hint-bit-write penalty too...

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-06 Thread Aidan Van Dyk

On Fri, Jan 6, 2012 at 6:48 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 I think I've said it before, but I'm guessing OLTP style database
 rarely have pages written that are dirty that aren't covered by real
 changes (so have the FPW anyways) and OLAP type generally freeze after
 loads to avoid the hint-bit-write penalty too...

But ya, again, I've never measured ;-)


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-04 Thread Aidan Van Dyk

On Tue, Jan 3, 2012 at 9:22 PM, Robert Haas robertmh...@gmail.com wrote:

It seems to me that if you are happy with #2, you don't really need to
enable sync rep in the first place.

At any rate, even without multiple component failures, this
configuration makes it pretty easy to lose durability (which is the
only point of having sync rep in the first place). Suppose the NIC
card on the master is the failing component. If it happens to drop
the TCP connection to the clients just before it drops the connection
to the standby, the standby will have all the transactions, and you
can fail over just fine. If it happens to drop the TCP connection to
the just before it drops the connection to the clients, the standby
will not have all the transactions, and failover will lose some
transactions - and presumably you enabled this feature in the first
place precisely to prevent that sort of occurrence.

I do think that it might be useful to have this if there's a
configurable timeout involved - that way, people could say, well, I'm
OK with maybe losing transactions if the standby has been gone for X
seconds. But if the only possible behavior is equivalent to a
zero-second timeout I don't think it's too useful. It's basically
just going to lead people to believe that their data is more secure
than it really is, which IMHO is not helpful.

So, I'm a big fan of syncrep guaranteeing it's guarantees. To me,
that's the whole point. Having it fall out of sync rep at any point
*automatically* seems to be exactly counter to the point of sync rep.

That said, I'm also a big fan of monitoring everything as well as I could...

I'ld love a hook script that was run if sync-rep state ever changed
(heck, I'ld even like it if it just choose a new sync standby).

Even better, is there a way we could start injecting notify events
into the cluster on these types of changes? Especially now that
notify events can take payloads, it means I don't have to keep
constantly polling the database to see if it things its connected,
etc.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alternate psql file locations

2011-12-31 Thread Aidan Van Dyk

On Sat, Dec 31, 2011 at 3:17 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Andrew Dunstan's message of sáb dic 31 12:52:02 -0300 2011:
 It's not a big thing, but I just found myself in a shared environment
 wanting to be able to set alternative locations for the psql startup
 file and history. I know there's the HISTFILE variable, but I can't
 easily set that automatically unless I can at least have my own .psqlrc.
 ISTM it should be a fairly simple thing to provide these, via
 environment variables. Is there general interest in such a thing?

 I wanted such a thing mere two weeks ago ...

Generally when I've wanted these things, I just make a new $HOME in
my shared user home dir:

export HOME=$HOME/aidan

It's worked for things I've wanted, I haven't tried it for psql stuff

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-30 Thread Aidan Van Dyk

On Thu, Dec 29, 2011 at 11:44 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

You wind up with a database free of torn pages before you apply WAL.
full_page_writes to the WAL are not needed as long as double-write is
used for any pages which would have been written to the WAL. If
checksums were written to the double-buffer metadata instead of
adding them to the page itself, this could be implemented alone. It
would probably allow a modest speed improvement over using
full_page_writes and would eliminate those full-page images from the
WAL files, making them smaller.

Correct. So now lots of people seem to be jumping on the double-write
bandwagon and looking at some the things it promise: All writes are
durable

This solves 2 big issues:
- Remove torn-page problem
- Remove FPW from WAL

That up front looks pretty attractive. But we need to look at the
tradeoffs, and then decide (benchmark anyone).

Remember, postgresql is a double-write system right now. The 1st,
checkumed write is the FPW in WAL. It's fsynced. And the 2nd synced
write is when the file is synced during checkpoint.

So, postgresql currently has an optimization now that not every write
has *requirements* for atomic, instant durability. And so postgresql
get's to do lots of writes to the OS cache and *not* request them to
be instantly synced. And then at some point, when it's reay to clear
the 1st checksumed write, make sure everywrite is synced. And lots of
work went into PG recently to get even better at the collection of
writes/syncs that happen at checkpoint time to take even biger
advantage of the fact that its' better to write everything in a fil
efirst, then call a single sync.

So moving to this new double-write-area bandwagon, we move from a WAL
FPW synced at the commit, collect as many other writes, then final
sync type system to a system where *EVERY* write requires syncs of 2
separate 8K writes at buffer write-out time. So we avoid the FPW at
commit (yes, that's nice for latency), and we guarentee every buffer
written is consistent (that fixes our hit-bit-only dirty writes from
being torn). And we do that at a cost of every buffer write requiring
2 fsyncs, in a serial fashion. Come checkpoint, I'm wondering

Again, all that to avoid a single optimization that postgresql currently has:
1) writes for hint-bit only buffers don't need to be durable

And the problem that optimization introduces:
1) Since they aren't guarenteed durable, we can't believe a checksum

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bghinter process

2011-12-21 Thread Aidan Van Dyk

On Wed, Dec 21, 2011 at 1:59 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 But, well, tuples that are succesfully hinted need no more hint bits.

Not only do they need no more hinting, they also allow the next
client-serving process that hits it avoid the clog lookup to determine
the hint.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums

2011-12-20 Thread Aidan Van Dyk

On Tue, Dec 20, 2011 at 12:38 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 I don't think the problem is having one page of corruption.  The
 problem is *not knowing* that random pages are corrupted, and
 living in the fear that they might be.

 What would you want the server to do when a page with a mismatching
 checksum is read?

But that's exactly the problem.  I don't know what I want the server
to do, because I don't know if the page with the checksum mismatch is
one of the 10GB of pages in the page cache that were dirty and poses 0
risk (i.e. hint-bit only changes made it dirty), a page that was
really messed up on the kernel panic that last happened causing this
whole mess, or an even older page that really is giving bitrot...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums

2011-12-18 Thread Aidan Van Dyk

On Sun, Dec 18, 2011 at 11:21 PM, Josh Berkus j...@agliodbs.com wrote:
On 12/18/11 5:55 PM, Greg Stark wrote:
There is another way to look at this problem. Perhaps it's worth
having a checksum *even if* there are ways for the checksum to be
spuriously wrong. Obviously having an invalid checksum can't be a
fatal error then but it might still be useful information. Rright now
people don't really know if their system can experience torn pages or
not and having some way of detecting them could be useful. And if you
have other unexplained symptoms then having checksum errors might be
enough evidence that the investigation should start with the hardware
and get the sysadmin looking at hardware logs and running memtest
sooner.

Frankly, if I had torn pages, even if it was just hint bits missing, I
would want that to be logged. That's expected if you crash, but if you
start seeing bad CRC warnings when you haven't had a crash? That means
you have a HW problem.

As long as the CRC checks are by default warnings, then I don't see a
problem with this; it's certainly better than what we have now.

But the scary part is you don't know how long *ago* the crash was.
Because a hint-bit-only change w/ a torn-page is a non event in
PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
and scrub every page in the database.

So you could have a crash, then a recovery, and a couple clean
shutdown-restart combinations before you happen to read the needed
page that was torn in the crash $X [ days | weeks | months ] ago.
It's specifically because PostgreSQL was *DESIGNED* to make torn pages
a non-event (because WAL/FPW fixes anything that's dangerous), that
the whole CRC issue is so complicated...

I'll through out a few random thoughts (some repeated) that people who
really want the CRC can fight over:

1) Find a way to not bother writing out hint-bit-only-dirty pages
I know people like Kevin keep recommending a vacuum freeze after a
big load to avoid later problems anyways and I think that's probably
common in big OLAP shops, and OLTP people are likely to have real
changes on the page anyways. Does anybody want to try and measure
what type of performance trade-offs we'ld really have on a variety of
normal (ya, I know, what's normal) workloads? If the page has a
real change, it's got a WAL FPW, so we avoid the problem

2) If the writer/checksummer knows it's a hint-bit-only-dirty page,
can it stuff a cookie checksum in it and not bother verifying?
Looses a bit of the CRC guarentee, especially around crashes which
is when we expect a torn page, but avoids the whole scary! scary!
Your database is corrupt! false-positives in the situation PostgreSQL
was specifically desinged to make not scary.

#) Anybody investigated putting the CRC in a relation fork, but not
right in the data block? If the CRC contains a timestamp, and is WAL
logged before the write, at least on reading a block with a wrong
checksum, if a warning is emitted, the timestamp could be looked at by
whoever is reading the warning and know tht the block was written
shortly before the crash $X $PERIODS ago

The whole CRC is only a warning because we expect to get them if we
ever crashed means that the time when we most want them, we have to
assume they are bogus... And to make matters worse, we don't even
know when the perioud of they may be bugus ends, unless we have a
way to methodically force PG through ever buffer in the database after
the crash... And then that makes them very hard to consider
useful...

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs si...@2ndquadrant.com wrote:

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with
pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious. Either that, of I've
completely missed something in these threads... I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small. So on the recovering
instance, the clog is small. PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big. But that file was copied
after the backup was started, not after the backup finished. So its
size is only guarenteed to be as big as it was when the backup
started. Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug f...@phlo.org wrote:
On Oct26, 2011, at 15:12 , Simon Riggs wrote:
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in
ca9fd2fe.1d8d2%linas.virba...@continuent.com)

2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file pg_clog/0001 at
offset 32768: Success.

Note that it says Success at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not Success.

And even more pointedly, in the original go around on this:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/ after pg_start_backup call:
-rw--- 1 postgres postgres 8192 Sep 23 14:31

Changed during the rsync phase to this:
-rw--- 1 postgres postgres 16384 Sep 23 14:33

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw--- 1 postgres postgres 8192 Sep 23 14:31

With the error:
2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206
2011-09-23 14:33:46 CEST DETAIL: Could not read from file
pg_clog/ at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the recent clog/ from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Aidan Van Dyk

On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 - Suppose the table has a million rows and we're going to read 100 of
 them, or 0.01%.  Now it might appear that a covering index has a
 negligible advantage over a non-covering index, but in fact I think we
 still want to err on the side of trying to use the covering index.

 Given that fact pattern we still will, I think.  We'll still prefer an
 indexscan over a seqscan, for sure.  In any case, if you believe the
 assumption that those 100 rows are more likely to be recently-dirtied
 than the average row, I'm not sure why you think we should be trying to
 force an assumption that index-only will succeed here.

The elephant in the room is that the index-only-scan really doesn't
save a *whole* lot if the heap pages are already in shared buffers.
But it matters a *lot* when they heap pages are not in shared buffers
(both ways, saving IO, or causing lots of random IO)

Can we hope that if pages are not in shared buffers, they are not
recently modified, so hopefully both all visible, and have the VM
bit?set?  Or does the table-based nature of vacuum mean there is no
value there?

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME

2011-10-12 Thread Aidan Van Dyk

On Wed, Oct 12, 2011 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 *shrug* ok. Another thought I had was to have the file raise an error
 and have that filtered out by the extension mechanism. But I'm not sure
 if it's worth the trouble.

 Hmm ...

 \echo You should use CREATE EXTENSION foo to load this file!

 and teach CREATE EXTENSION to drop any line beginning with \echo?
 The latter part seems easy enough, but I'm not quite sure about the
 wording or placement of the \echo command.  Putting it at the top
 feels natural but the message might scroll offscreen due to errors...

Decorate them with a marker like:
   \extension name version

And make the CREATE EXTENSION skip (or verify) it?

It will make psql stop on the \extension command.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_ctl -w/wait and config-only directories

2011-10-03 Thread Aidan Van Dyk

On Mon, Oct 3, 2011 at 7:10 PM, Andrew Dunstan and...@dunslane.net wrote:

Agreed. If you remove that, the logical problem goes away and it
becomes a simple problem of dumping the contents of postgresql.conf and
having pg_ctl (and pg_upgrade) use that. Let me look at how much code
that would take.

Yeah, this pattern can be changed to have a config file that reads:

data_directory = '/path/to/data'
include '/path/to/common/config'

and I presume (or hope) that would meet your need, and not upset the FHS
purists.

I kinda like the way the debian (and ubuntu) packages do it...

They start pg_ctl/postgres like:
... -D /path/to/real-data/data-dir -c
config_file=/etc/postgresql/$INSTANCE/postgresql.conf

In /etc/postgresql/$INSTANCE/postgresql.conf, these are explictly set:
data_directory=/path/to/real-data/data-dir
hba_file=/etc/postgresql/$INSTANCE/pg_hba.conf
ident_file=/etc/postgresql/$INSTANCE/pg_ident.conf
external_pid_file=/var/run/postgresql/$INSTANCE.pid

It actually looks in /etc/postgresql/$INSTANCE/postgresql.conf to find
data_directory to use when invoking pg_ctl/postgres.

But, in my opinion, there is enough flexibility with postgresql's
config (and ability to pass unrecorded options to postmaster at
startup too) that pg_upgrade can't guarantee it's going to figure out
every thing automatically given a single $pgdata location to start
from. That's simply not realistic. Distros who do stranger things
than debian (and probably even Debian) are going to have to give their
users guidance on how to call pg_upgrade with their specific setup of
paths/configs/invocations. It's simply that simple.

I'ld be happy enough if pg_upgrade could easily upgrade given a
datadir that had a postgresql.conf in it, or possibly a
postgresql.conf that had data_directory set in it.

Anything else, and I say it's responsibility of whoever scripted the
startup to be able to provide all the necessary information to
pg_upgrade (be it by extra command line options, or crafting a special
pg_data with symlinks that is more normal).

a.
--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Aidan Van Dyk

On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Unfortunately, it's impossible, because the error message Could not read
from file pg_clog/0001 at offset 32768: Success is shown (and startup
aborted) before the turn for redo starts at message arrives.

It looks to me that pg_clog/0001 exists, but it shorter than recovery
expects. Which shouldn't happen, of course, because the start-backup
checkpoint should flush all the clog that's needed by recovery to disk
before the backup procedure begins to them.

I think the point here is that recover *never starts*. Something in
the standby startup is looking for a value in a clog block that
recovery hadn't had a chance to replay (produce) yet.

So the standby is looking into the data directory *before* recovery
has had a chance to run, and based on that,goes to look for something
in clog page that wasn't guarenteed to exists at the start of the
backup period, and bombing out before recovery has a chance to start
replaying WAL and write the new clog page.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-09-21 Thread Aidan Van Dyk

On Wed, Sep 21, 2011 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
On Wed, Sep 21, 2011 at 1:08 PM, Josh Berkus j...@agliodbs.com wrote:
On 9/21/11 10:07 AM, Robert Haas wrote:
On Wed, Sep 21, 2011 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
Yeah, I get it. But I think standby would confuse them, too, just in
a different set of situations.

Other than PITR, what situations?

Hot backup?

Hot backup == PITR. You're just not bothering to accumulate WAL logs.

Well, I don't think of it that way, but YMMV, of course.

I think that the major differentiating factor is the intended action
when caught up, and the definition of caught up, and trying to use a
single term for both of them is going to always cause confusion.

So I tend to think of the use cases by their continuation. A
slave is intended to continually keep trying to get more once it's
retrieved and applied all the changes it can. It can be hot, or cold,
streaming, or archive, etc... And recovery is intended to stop
recovering and become normal once it's finished retrieving and
applying all changes it can. Again, it has multiple ways to retrive
it's wal too.

And I think Tom touched on this point in the
recovery.conf/recovery.done thread a bit too. Maybe we need to
really start talking about the different when done do ...
distinctions, and and using that distinction to help our nomenclature.

Both recovery/slave (both hot or cold) use the same retrieve/apply
machinery (and thus configuration options). But because of the
different caught up action, are different features.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-09-21 Thread Aidan Van Dyk

On Wed, Sep 21, 2011 at 1:34 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 And I think Tom touched on this point in the
 recovery.conf/recovery.done thread a bit too.

Doh!  That's this thread

/me slinks away, ashamed for not even taking a close look at the to/cc list...

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What Would You Like To Do?

2011-09-14 Thread Aidan Van Dyk

On Wed, Sep 14, 2011 at 12:09 PM, Jaime Casanova ja...@2ndquadrant.com wrote:

 last time i tried it (last year), it seems broken because i couldn't
 log in with any user anymore... but it could be that i did something
 wrong so i didn't report until i could confirm but i hadn't the time
 and i forgot it since then

I haven't tried it on 9.0/9.1, but I used it on a 8.4 cluster, and it
worked, with all the caveats of needing all the user@database users
created correctly, and the right use of quoting, and @ in logins,
etc The biggest being the lack of md5...

Definitely not straight forward, and users are still global, just
suffixed with an @database to make then unique between database
namespaces.

But I found it useful when needing to hand out seperate usernames
for different apps because they all needed to have their own
search_path and other settings set before login (yes, dumb apps,
mostly odbc), and be able to have the same userid for different
databases, using different settings...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to improve reliability of postgresql on linux nfs

2011-09-13 Thread Aidan Van Dyk

On Tue, Sep 13, 2011 at 7:30 AM, Florian Pflug f...@phlo.org wrote:


 Sorry for the self-reply. I realized only after hitting send that I
 got the ENOSPC handling wrong again - we probably ought to check for
 ENOSPC as well as ret == 0. Also, it seems preferable to return the
 number of bytes actually written instead of -1 if we hit an error during
 retry.

 With this version, any return value other than amount signals an
 error, the number of actually written bytes is reported even in the
 case of an error (to the best of pg_write_nointr's knowledge), and
 errno always indicates the kind of error.

Personally, I'ld think that's ripe for bugs.   If the contract is that
ret != amount is the error case, then don't return -1 for an error
*sometimes*.

If you sometimes return -1 for an error, even though ret != amount is
the *real* test, I'm going to guess there will be lots of chance for
code to do:
  if (pg_write_no_intr(...)  0)
   ...

which will only catch some of the errors, and happily continue with the rest...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to improve reliability of postgresql on linux nfs

2011-09-13 Thread Aidan Van Dyk

On Tue, Sep 13, 2011 at 10:14 AM, Florian Pflug f...@phlo.org wrote:

 Personally, I'ld think that's ripe for bugs.   If the contract is that
 ret != amount is the error case, then don't return -1 for an error
 *sometimes*.

 Hm, but isn't that how write() works also? AFAIK (non-interruptible) write()
 will return the number of bytes written, which may be less than the requested
 number if there's not enough free space, or -1 in case of an error like
 an invalid fd being passed.

Looking through the code, it appears as if all the write calls I've
seen are checking ret != amount, so it's probably not as big a deal
for PG as I fear...

But the subtle change in semantics (from system write ret != amount
not necessarily a real error, hence no errno set) of pg_write ret !=
amount only happening after a real error (errno should be set) is
one that could yet lead to confusion.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.conf archive_command example

2011-09-08 Thread Aidan Van Dyk

On Thu, Sep 8, 2011 at 2:05 AM, Fujii Masao masao.fu...@gmail.com wrote:

That's an option. But I don't think that finding an existing file is so
serious
problem. The most common cases which cause a partially-filled archived
file are;

1. The server crashes while WAL file is being archived, and then the server
restarts. In this case, the restarted server would find partially-filled
archived file.

2. In replication environment, the master crashes while WAL file is being
archived, and then a failover happens. In this case, new master would
find partially-filled archived file.

In these cases, I don't think it's so unsafe to overwrite an existing file.

Personally, I think both of these show examples of why PG should be
looking hard at either providing a simple robust local directory based
archive_command, or very seriously pointing users at properly written
tools like omniptr, or ptrtools, walmgr, etc...

Neither of those cases should ever happen. If you're copying a file
into the archive, and making it appear non-atomically in your archive,
your doing something wrong.

Period.

No excuses.

a.
--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mosbench revisited

2011-08-04 Thread Aidan Van Dyk

On Wed, Aug 3, 2011 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:

  And hoping that the Linux guys decide to do something about it.
  This isn't really our bug - lseek is quite cheap in the uncontended
 case.

Has anyone tried this on a recent kernel (i.e. 2.6.39 or later), where
they've finally remove the BKL out of VFS/inode?

I mean, complaining about scalability in linux 2.6.18 is like
complaining about scalability in postgresql 8.2 ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New partitioning WAS: Check constraints on partition parents only?

2011-07-28 Thread Aidan Van Dyk

On Thu, Jul 28, 2011 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote:
Robert,

If the value is less than v1, put it in a partition called p1.
If the value is less than v2, put it in a position called p2.
repeat ad nauseum, and then, optionally:
If the value is not less than any of the above, put it in a partition
called poverflow.

Sure. I'm just restarting the discussion from the point of what's the
very simplest implementation of partitioning we could create and still
be useful?

Second, the key-based partitioning I described would actually be
preferred to what you describe by a lot of users I know, because it's
even simpler than what you propose, which means less contract DBA work
they have to pay for to set it up.

But part of the desire for simple partitioning is to make sure the
query planner and execution knows about partitions, can do exclude
unnecessary partitions from queries. If partion knowledge doesn't
help the query plans, its not much use excpt to reduce table size,
which isn't a hard task with the current inheritance options.

But if the partition selection is an opaque simple key type
function, you haven't given the planner/executor anything better to be
able to pick partitions for queries, unless the query is an exact key
= type of operation.

So I'm failing to see the benefit of that key based partitioning,
even if that key-based function was something like date_trunc on a
timestamp...

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FOREIGN TABLE doc fix

2011-06-13 Thread Aidan Van Dyk

On Mon, Jun 13, 2011 at 12:30 PM, Robert Haas robertmh...@gmail.com wrote:

 Incidentally, are you planning to revive the PostgreSQL FDW for 9.2?
 That would be a killer feature.

Even more killer would be that it could be built/packaged as an
extension, and use for 9.1 too ;-)

a.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FOREIGN TABLE doc fix

2011-06-13 Thread Aidan Van Dyk

On Mon, Jun 13, 2011 at 3:54 PM, Dave Page dp...@pgadmin.org wrote:

 Yeah - MySQL is one of the ones I've been hacking on. It's hard to be
 motivated if its going to need a complete rewrite within a year
 though. I'll still have to work on it, as I've committed to giving
 talks on it, but others might not bother to even start.

It's a double-edged sword.  If nobody writes anything, because
everyone is afraid to possibly having to change things, nothing will
never need to be changed ;-)

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI predicate locking on heap -- tuple or row?

2011-05-23 Thread Aidan Van Dyk

On Mon, May 23, 2011 at 2:26 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:


 I don't see that -- it could be correct because of the conceptual
 difference between an UPDATE and a DELETE/INSERT pair.

 In other words, if SSI needs to be more rigorous in the UPDATE
 case, it can only be because snapshot isolation is less rigorous in
 that case, and the additional rigor that SSI must apply there must
 be exactly equal to whatever snapshot isolation isn't picking up
 (as compared with the DELETE/INSERT case).

 Does that make any sense? It seems logical to me, but IJWH.

 I've always loved logic, but one of the most intriguing aspects is
 identifying the unproven assumptions in an argument.  You have a
 built-in premise that there is no significant difference between an
 UPDATE and a DELETE/INSERT pair, in which case the logic is flawless
 which is leading you to the conclusion that a lock on the visible
 tuple is enough.  I'm not confident in that premise, so the simple
 argument doesn't persuade me.

I *think* (but am in no means familiar with SSI, or an expert on the
problems it's trying to solve), that Robert was only arguing that SSI
is only relevant to solve problems that the non SSI wouldn't catch.
And the sameness of UPDATE vs DELETE+INSERT, is simply because if
you can only see the data as it was *completely before* or *completely
after* a transaction (not as it was after the delete, before the
insert), then to you, it doesn't matter if the transaction did an
UPDATE, or an DELETE+INSERT.  All you see is either $OLDROW, or
$NEWROW, depending if you see it before, or after, not the
transformation from $OLDROW to $NEWROW.

So, if SSI conflicts something on the UPDATE case, it would necessrily
have to conflict the DELETE+INSERT case as well, and vice-versa.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] branching for 9.2devel

2011-04-25 Thread Aidan Van Dyk

On Mon, Apr 25, 2011 at 11:32 AM, Christopher Browne cbbro...@gmail.com wrote:

Methinks there'd need to be an experiment run where pgindent is run
each time on some sort of parallel tree for a little while, to let
people get some feel for what changes it introduces.

The point is that if the tools worked everywhere, the same, then
it it should be run *before* the commit is finalized (git has a
hundred+1 ways to get this to happen, be creative).

So if you ever ran it on a $COMMIT from the published tree, it would
never do anything.

From the sounds of it though, it's not quite ready for that.

Unfortunately, I'd fully expect there to be some interference between patches.

Your patch changes the indentation of the code a little, breaking the
patch I wanted to submit just a little later. And, by the way, I had
already submitted my patch. So you broke my patch, even though mine
was contributed first.

But if the only thing changed was the indentation level (because
$PATCH2 wrapped a section of code your $PATCH1 changes completely in a
new block, or removed a block level), git tools are pretty good at
handling that.

So, if everything is *always* pgindent clean, that means your new
patch is too, and the only conflicting white-space-only change would
be a complete block-level indentation (easily handled). And you still
have those block-level indentation changes even if not using pgindent.

Of course, that all depends on:
1) pgindent being work everywhere, exactly the same
2) Discipline of all new published commits being pgindent clean.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Packaging

2011-04-25 Thread Aidan Van Dyk

On Mon, Apr 25, 2011 at 12:00 PM, David E. Wheeler da...@kineticode.com wrote:

These are really great points. I knew I wasn't thrilled about this suggest,
but wasn't sure why. Frankly, I think it will be really confusing to users
who think they have FooBar 1.2.2 installed but see only 1.2 in the database.
I don't think I would do that, personally. I'm much more inclined to have the
same extension version everywhere I can.

Really, that means you just a sql function to your extension,
somethign similary to uname -a, or rpm -qi, which includes something
that is *forced* to change the postgresql catalog view of your
extension every time you ship a new version (major, or patch), and
then you get the exact version (and whatever else you include) for
free every time you update ;-)

The thing to remember is that the postgresql extensions are managing
the *postgresql catalogs* view of things, even though the shared
object used by postgresql to provide the particular catalog's
requirements can be fixed.

If your extension is almost exclusively a shared object, and the only
catalog things are a couple of functions defined to point into the C
code, there really isn't anything catalog-wise that you need to
manage for upgrades.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgindent weirdness

2011-04-20 Thread Aidan Van Dyk

On Wed, Apr 20, 2011 at 12:38 PM, Andrew Dunstan and...@dunslane.net wrote:

 So in the case at hand, we actually *need* to remove the struct from
 RelationGetBufferForTuple's declaration, so that BulkInsertStateData
 gets used as a typedef name in that way.

Since the general form seems to be to declare things as:
   typedef struct foo { ... } foo;

Is there any reason why we see any struct foo in the sources other
than in the typedef line?

Legacy and invasive patch are good enough reasons, if they are it...

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench \for or similar loop

2011-04-19 Thread Aidan Van Dyk

On Tue, Apr 19, 2011 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think what that really translates to is I don't want to bother doing
 the careful design work that Robert talked about. -1 for that approach.

As someone not doing any of that work, agreed ;-)

 I generally feel that such a feature would be better off done
 server-side --- after all, there's more clients in the world than psql
 and pgbench, and not all of them could use a C library even if we had
 one.  But in either case the coding work is going to be dwarfed by the
 design work, if it's done right and not just the-first-hack-that-
 comes-to-mind.

And for the first-hack-that-comes-to-mind, I find my self pulling
out the named fifo trick all the time, and just leaving my for/loop/if
logic  in bash writing SQL commands to the fifo, occasionally getting
psql to write an answer to a file that I then read back in bash

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench \for or similar loop

2011-04-19 Thread Aidan Van Dyk

On Tue, Apr 19, 2011 at 1:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Aidan Van Dyk ai...@highrise.ca wrote:

 And for the first-hack-that-comes-to-mind, I find my self
 pulling out the named fifo trick all the time, and just leaving my
 for/loop/if logic  in bash writing SQL commands to the fifo,
 occasionally getting psql to write an answer to a file that I then
 read back in bash

 I'm not clear on exactly what you're proposing there, but the thing
 I've considered doing is having threads to try to keep a FIFO queue
 populated with a configurable transaction mix, while a configurable
 number of worker threads pull those transactions off the queue and
 submit them to the server.  The transactions would need to be
 scripted in some way such that they could query a value and then use
 it in another statement, or use flow control for conditional
 execution or looping.  And, of course, there would need to be a way
 do define conditions under which a transaction would roll back and
 retry from the beginning -- with the retry being a separate count
 and the failed attempt not counted in the TPS numbers.

 It would take that much infrastructure to have a performance test
 which would give numbers which would correspond well to an actual
 production load in our environment.  It still wouldn't be quite as
 good as actually logging production activity and playing it back,
 but it would come pretty close with a lot less work per test.

Well, I don't think I'm doing anything nearly as complicated as what
your'e thinking...

I'm talking about simple stuff like:

mkfifo psql.fifo
exec 4 psql.fifo
psql  psql.fifo
...
for i in $(seq 1 1000)
do
echo SELECT 1; 4
done

Couple that with:
   echo \o /path/to/some/file 4
and other \settitngs, and I can use bash for all my logic, and just
feed lines/statements to psql to have them executed as I wish, with
output directed/formated as I wish...


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-09 Thread Aidan Van Dyk

On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian br...@momjian.us wrote:
Bruce Momjian wrote:
Alvaro Herrera wrote:

Why is it important to have the original pg_clog files around? Since
the transactions in question are below the freeze horizon, surely the
tuples that involve those transaction have all been visited by vacuum
and thus removed if they were leftover from aborted transactions or
deleted, no? So you could just fill those files with the 0x55 pattern
(signalling all transactions are committed) and the net result should
be the same. No?

Forgive me if I'm missing something. I haven't been following this
thread and I'm more than a little tired (but wanted to shoot this today
because I'm gonna be able to, until Monday).

To answer your other question, it is true we _probably_ could assume all
the rows were committed, except that again, vacuum might not have run
and the pages might not be full so single-page cleanup wasn't done
either.

OK, continuing the thought of just making all the old clog files as
all committed...

Since it only affects toast tables, the only time the system (with
normal queries) would check for a particular toast tuple, the tuple
referring to it would have been committed, right? So forcing all
transactions committed for the older clog segments might mean a scan
on a *toast* heap might return tuples as committed when they might
have been aborted, but the real table heap would never refer to those,
right?

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions Dependency Checking

2011-04-05 Thread Aidan Van Dyk

On Tue, Apr 5, 2011 at 4:20 PM, David E. Wheeler da...@kineticode.com wrote:
 On Apr 4, 2011, at 3:57 PM, Tom Lane wrote:

 I think the general movement is toward *feature* dependancies.  So for
 intstance, an extension can specify what *feature* it requires, and
 difference versions of an extension can provide different
 features.

 Right.

 Sounds like a book-keeping nightmare for extension developers. It will 
 discourage large or rapidly-evolving extensions like pgTAP because it will be 
 a PITA to specify features.

Sure, but if you want, the feature you can provide can be something like:
   pgtap-1.0 (or any of pgtap-0.2{0,1,2,3,4}).

And if your package is backwards compatable, it could even provide:
   pgtap-0.25
   pgtap-0.24
   pgtap-0.23

And that also means that you don't have to screw every body over when
some future pgtap-123.45 is no longer compatible, and the extensions
have relied on $VERSION  0.23 meaning they'll work with it.

I mean, PG itself is an example.  Does pg  8.4 mean your code will
work with all future (or even past, but  8.4) PG versions?

 We're not there yet, and we're not going to get there in time for 9.1.
 But in any case, mechanisms that involve version ordering comparisons
 seem to be on their way out for deciding whether package A is
 compatible with package B.

 This is news to me, frankly, and the bookkeeping requirements seem 
 potentially awful.

 If it's possible that it won't work out this way, that those arguing for 
 version dependency resolution end up getting the consensus, not having a 
 version string format is going to be a nightmare. On the other hand, if we 
 added one now, and feature dependency tracking won the day, well, a version 
 string format could always be loosened later.

As someone who has had to try and deal with package versions for
dependencies in RPM and DEB, and been through the hell that is open
source package variants, all with the ability to turn on/off features
at configure/compile time, a just versions even with  , =, =, =, 
all mapped correctly isn't good enough.

Of course, I'ld love for extension in 9.1 to provide a basic
provides/features for my extension to give, but if that train has
already left the station, I don't have much choice ;-(

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions Dependency Checking

2011-04-05 Thread Aidan Van Dyk

On Tue, Apr 5, 2011 at 4:51 PM, David E. Wheeler da...@kineticode.com wrote:

Of course, I'ld love for extension in 9.1 to provide a basic
provides/features for my extension to give, but if that train has
already left the station, I don't have much choice ;-(

Yeah, but the way it is doesn't break the ability to do it later. I suspect
that Dim and Tom will be thinking about it for 9.2.

Anyway, your post helps me to understand things better, and so I'm less
insistent about imposing a version numbering scheme now (though I still think
it would be more useful to have one than not).

Versions are useful for figuring out if I should upgrade packages or
not. But I believe the extension framework has explicitly made the
upgrade problem a manual one at this point, either taking
destination versions from the control, or the alter command.

So for PGXN's problem, I see the point of versions being required.
But for installation the dependancy graph, provides/features rather
than versions are much more useful.And automatic feature/provides
(like library so, and symbol versions in the OS package world,
objects in PG world) would definitely be nice, but my Makefile can
build those for me for now until 9.2 (or 9.3, 9.3, etc), if only I had
a way to track them with my installed extension ;-) /stop begging

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] .ini support for .pgpass

2011-04-05 Thread Aidan Van Dyk

On Tue, Apr 5, 2011 at 6:34 PM, Joshua D. Drake j...@commandprompt.com wrote:

 Bare, useful, but not really friendly nor flexible. I would love to be
 able to do this:

 [ecom]
 hostname=
 port=
 database=
 username=
 password=

That looks a lot like a pg_service file.

 psql ecom

 boom, I am in.

 Thoughts?

So you're really looking to make psql use service connection
definitions more easily, not just retrieve the password associated
with the given (maybe defaulted) host:port:database:user, right?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions Dependency Checking

2011-04-04 Thread Aidan Van Dyk

On Mon, Apr 4, 2011 at 6:06 PM, Robert Haas robertmh...@gmail.com wrote:

I don't. We deliberately decided *not* to have any wired-in
interpretation of extension numbers, and I don't think that decision
needs to be reversed. David can choose to enforce something for stuff
distributed through PGXN if he wishes, but that's no concern of the core
server's. In particular I'm really skeptical of the theory that we need
or should want version restrictions in Requires references. The
equivalent feature in RPM is deprecated for Fedora/RedHat packaging use,
and I see no reason why we'd need it more than they do.

Oh, really? How can you possibly get by without it? Dependencies of
this type are all over the place.

I think the general movement is toward *feature* dependancies. So for
intstance, an extension can specify what *feature* it requires, and
difference versions of an extension can provide different
features.

In that case, you don't really need extenson foo 2.1, you need the
feature that foo 2.1.x provides, maybe foo-api-2 (note that 2 would
be part of a name, not any comparison aware version.

I'm already going to be naming my extensions with major versions
as part of the name (like all the distro postgresql packages) so my
versions will only ever be simple integers of exactly compatable
objects.

But checking
http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html,
I don't see any provides mechanism. That might be something
actually needed if we are trying to avoid version comparisons and
want to be describing actual dependencies...

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-22 Thread Aidan Van Dyk

On Fri, Mar 18, 2011 at 3:41 PM, Markus Wanner mar...@bluegap.ch wrote:
On 03/18/2011 08:29 PM, Simon Riggs wrote:
We could do that easily enough, actually, if we wished.

Do we wish?

I personally don't see any problem letting a standby show a snapshot
before the master. I'd consider it unneeded network traffic. But then
again, I'm completely biased.

In fact, we *need* to have standbys show a snapshot before the master.

By the time the master acks the commit to the client, the snapshot
must be visible to all client connected to both the master and the
syncronous slave.

Even with just a single server postgresql cluster, other
clients(backends) can see the commit before the commiting client
receives the ACK. Just that on a single server, the time period for
that is small.

Sync rep increases that time period by the length of time from when
the slave reaches the commit point in the WAL stream to when it's ack
of that point get's back to the wal sender. Ideally, that ACK time is
small.

Adding another round trip in there just for a go almost to $COMIT,
ok, now go to $COMMIT type of WAL/ack is going to be pessimal for
performance, and still not improve the *guarentees* it can make.

It can only slightly reduce, but not eliminated that window where them
master has WAL that the slave doesn't, and without a complete
elimination (where you just switch the problem to be the slave has the
data that the master doesn't), you haven't changed any of the
guarantees sync rep can make (or not).

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-16 Thread Aidan Van Dyk

On Wed, Mar 16, 2011 at 8:30 PM, Robert Haas robertmh...@gmail.com wrote:

 I think the most important part of all this is that it is logged.
 Anyone who is running synchronous replication should also be doing
 careful monitoring; if not, shame on them, because if your data is
 important enough that you need synchronous replication, it's surely
 important enough to watch the logs.  If you don't, all sorts of bad
 things can happen to your data (either related to sync rep, or
 otherwise) and you'll have no idea until it's far too late.

+

If your data is that important, your logs/monitoring are *equally*
important, because they are what give you confidence your data is as
safe as you think it is...


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-07 Thread Aidan Van Dyk

On Mon, Mar 7, 2011 at 2:21 PM, Andrew Dunstan and...@dunslane.net wrote:

 For me, that's enough to call it synchronous replication. It provides a
 useful guarantee to the client. But you could argue for an even stricter
 definition, requiring atomicity so that if a transaction is not successfully
 replicated for any reason, including crash, it is rolled back in the master
 too. That would require 2PC.


 My worry is that the stricter definition is what many people will expect,
 without reading the fine print.

They they are either already hosed or already using 2PC.

a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-07 Thread Aidan Van Dyk

On Mon, Mar 7, 2011 at 2:29 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 They they are either already hosed or already using 2PC.

Sorry, to expand on my all too brief comment, even *without*
replication, they are hosed.

Once you issue commit, you have know knowledge if the commit is
durable, (or even posibly seen by somoene else even) until you get the
acknowledgement of the commit.

That's already a posibility with a single machine databse.  Adding
replication in it, just increases the perioud that window exists for
(and the possiblity of things making something Bad hit that window).

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quick Extensions Question

2011-03-03 Thread Aidan Van Dyk

On Thu, Mar 3, 2011 at 4:30 PM, Robert Haas robertmh...@gmail.com wrote:

So what? AFAIK the extension patch hasn't broken anything here that
used to work. People can still install languages the way they always
have. What we're talking about here is a way of installing languages
that is arguably nicer than what they are doing now. The window for
feature enhancements is already closed until 9.2, unless you want to
go back and start working through every patch we marked Returned with
Feedback during this last CommitFest.

No, what is being talked about isn't intended as a way of installing
languages that is ... nicer. What is being talked about is allowing
an Extension that is being installed know that it's going to blow up
because it's required language (plpgsql, for instance) isn't
installed.

Maybe it's a problem with extensions that isn't easily solvable, but
that means extension authors are going to have a readme in their
extension with the followign text:
EXTENSION mystuff requires that pl/pgsql be installed in the
database. There is no way for the extension to check this before
it is installed, so make sure it's installed, or be prepared to
cope with errors during the installation.

And make sure you don't try and drop pl/pgsql language when
the extension is installed either.

Maybe that's enough for 9.1.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Aidan Van Dyk

On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao masao.fu...@gmail.com wrote:

 1. The primary is running with allow_standalone_primary = on. There
    is only one (synchronous) standby connected.

OK.  Explicitly configured to allow the master to report as commited
stuff which isn't on a/any slave.

 7. New primary doesn't have some transactions committed to the
    client, i.e., transaction lost happens!!

And this is a surprise?

I'm not saying there isn't a better way to to sequence/control a
shutdown to make this risk less, but isn't that the whole point of the
allow_standalone_primary debate/option?

If there isn't a sync slave for whatever reason, just march on, I'll
deal with the transactions that are committed and not replicated some
other way.

I guess complaining that it shouldn't be possible to just march on
when no sync slave is available is one possible way oof dealing
with them ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-11 Thread Aidan Van Dyk

On Fri, Feb 11, 2011 at 6:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:

No --- in the current vision, a control file may describe a whole
collection of versions of the same extension, and the parameter in
question is selecting the default or preferred version to install.
I'm not wedded to default_version, but I think just plain version
is a misnomer.

As someone who wants to use extensions and packages (rpm/dpkg)
together to distribute PG database pieces, I think this multi-version
approach is going to be problematic.

Here's why.

I release exetension afoo, initial as version 1.0. From my
understanding, it's going to contain:
afoo control file, named something particular)
- default_version = 1.0
- encoding utf8
foo-1.0.sql installstion script
and any requried shared libraries

And I now release and updated version 1.1 which fixes a problem. No problem:
afoo control file:
- default_version = 1.1
- encoding utf8
afoo-1.1.sql installation
afoo-upgrade-1.0-1.1.sql upgrade script
any required shared libraries for afoo-1.

Now, I decide to add some major new changes to my afoo for version 2.
I'ld like to package it up:
afoo control file
- default_version = 2.0
- encoding utf8
afoo-2.0.sql installation
afoo-upgrade-1.1-2.0-sql upgrade script
Any ne shared libreries for afoo-2.

This gives my first problem. I can't package afoo-2.x seperately from
afoo-1.x, because they both want to write the afoo control file.
RPM/DPKG will cause me grief here.

But now, let's make it harder. I've found a grave bug in 1.1, which
causes the PG backend to segfault. Easy fix, good thing, so now I
release 1.2:
afoo control file
- default_version = 1.2
- encoding utf8
afoo-1.2.sql installation
afoo-upgrade-1.0-1.1.sql upgrade
afoo-upgrade-1.1-1.2.sql upgrade
any shared libraries for afoo-1

So, this is not a problem for upgrading 1.0/1.1 - 1.2. But if I have
1.1 on my system, and let's say I forced a 2.0 into the system
(telling dpkg/rpm to overwrite the common file), I'm going to do that
again here now with 1.2, and my afoo control file will have
default_version = 1.2 instead of the 2.0

So, I'm not even working about the in-database side of the
multi-versions (alhthough I definately want the ability to have
multiple versions in the same database), but we're not even going to
be able to get the files onto the system to support multiple versions
nicely.

So this is going to drive me the same direction the same problem drove
packages for rpm/dpkg. I'm going to have to name my extension
afoo-1 and afoo-2 to be able to have them both co-exist on the
filesystem independantly, and at that point, *I* don't need multiple
versions of it anymore. I'm going to keep the same extension
objects/libraries backwards compatible, and I just need a way to tell
PG to run something after I've replaced the shared libraries to
perform any upgrade tweeks.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-11 Thread Aidan Van Dyk

On Fri, Feb 11, 2011 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 This gives my first problem.  I can't package afoo-2.x seperately from
 afoo-1.x, because they both want to write the afoo control file.

 No, you ship *one* package that supports both 1.1 and 2.0.

Hm...  As an example of a project that generally has pretty good
software release practices, I'm glat that the PostgreSQL project
doesn't operate this way.

Having to download/install/upgrade a package with all of pg
9.1.$lateset and 9.0.$latest just to get a fix for 8.4.$latest would
be a bit of a bummer...

And a hopefull extension author/packages/user, I *want* to be able to
release/distribute different versions seperately, just like PostgreSQL
does.  And I'll do that by packaging my extension with a major
version in the name, much like the packages for PostgreSQL does.  But
once I've done that, I don't need the multiple extension versions, all
I need is the ability to run $something when I upgrade an extension,
once the files under it have been upgraded.

;-)

 But now, let's make it harder.  I've found a grave bug in 1.1, which
 causes the PG backend to segfault.  Easy fix, good thing, so now I
 release 1.2:

 Unless the bug is such that you have to change the installation script
 file, there is no reason to bump the version number at all.  These
 version numbers apply to the install SQL script, not the underlying .so.

Right.  If everything is exactly binary compatible and it's just a .so
fix, I don't need to.  But let's assume something like slonly (or
bucardo or longdiste, or PyQ, or PostGIS) start's trying to make use
of extensions.  I can very much see a bug fix minor version upgrade
changing things that might need trigers/etc to be altered to take
advantage of the fixed way of doing things.  Or a SQL view/function
had a bug with an  null handling joins that needs fixing, etc.  Lots
of reasons for an upgrade to need to change an SQL object.

And of course, if I have slony 1.2.$x replicating one of my databases,
I'ld love to be able to try slony 2 and have it packaged on my system
too to test somethign else.   And not have to upgrade my slony 2
instance just to get the critical bugfix for my production slony
1.2$x+1.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-11 Thread Aidan Van Dyk

On Fri, Feb 11, 2011 at 7:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:

If you were expecting this proposal to make things easier as far as
dealing with multiple major releases, sorry, our ambitions don't extend
that far yet.

Sorry, I might have been confusing here... I'm not talking about *PG*
major releases.

I'm talking about major release of my extensions. So, assoming I
only care about PG 9.1, but I have afoo-1.x and afoo-2.x that I
develop and release (much like PostgreSQL has 8.4.x and 9.0.x it
releases), I want to be able to provide a bug-fix of my afoo-1.x
extension, and not require that for them to get that bug fix, they
also need to get the latest 2.x installed as well (which may or may
not be in use elsewhere in the cluster, or by a 2nd cluster on the
same machine).

Or, similarly, if I have a master type branch of an extension in use
in my qa DB, upgrading it requires forcing an upgrade of the old 8.4
branch extension in use in my prod database, simply because the
extension infrastructure has forced extension authors to only be able
to release a single extension that alwyas packages the lastest of
all back branches...

Of course, it won't, because just like the RPM/DPKG situation,
packagers are going to put the major version number into the
extension name to avoid that.

So, I like that the attempt is to support multiple versions. But
unless you can manage the files (both shared libraries, and any
scripts to create/update SQL objects) for different version
independently, I can't see the multiple versions at once capabilites
that are being discussed being actually being used by anything more
than the most basic extensions...

Just like if I need a bugfix of PostgreSQL 8.4, I'm not forced to
*install* 9.0, because PG has decide that the proper way to release
ist o make a single release of all versions.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-10 Thread Aidan Van Dyk

On Thu, Feb 10, 2011 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Well, the difference is that loose objects are just on my system,
 whereas extensions are supposed to work on anybody's system.  I'm not
 clear that it's possible to write an extension that depends on a
 relocatable extension in a sensible way.  If it is, objection
 withdrawn.

 I don't deny that there are risks here.  But I think the value of being
 able to move an extension when it is safe outweighs the difficulty that
 sometimes it isn't safe.  I think we can leave making it safer as a
 topic for future investigation.

Personally, I'ld rather be able to install the *same*
extension/version in different schemas at the same time then move an
extension from 1 schema to another, although I have no problems with
extensions moving out under a function's foot (just like loose
objects).

a.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-02 Thread Aidan Van Dyk

On Wed, Feb 2, 2011 at 12:31 PM, David E. Wheeler da...@kineticode.com wrote:

They are identical except for the extra line in the second one. If I had, say
15 different versions of an extension, then I'd have 15 upgrade scripts.
That's fine. But in your plan, the script to upgrade from version 1 to
version 15 would have all the same code as the v14 script, plus any
additional. The v14 script would have everything in v13. v13 would have
everything in v12. With no support for the equivalent of psql's \i, that's
extremely redundant and a huge PITA to maintain. Hence my hate.

My proposal would also have 15 upgrade scripts, but each one would only
upgrade from the previous one. So to upgrade from v1 to v15, UPGRADE
EXTENSION would run all of them. So v15 would only need to have deltas from
v14. V14 would need only deltas from v13. Etc.

My concern with this approach (upgrade is forced through all
intermetiary versions) is that the shared libray now for version 15
*has* to have all the intermediary compatibility for *all* versions
in it. So it has to have functions with all symbols so the CREATE
... staements for all previous 15 versions can succeed.

With having the $old - $new scripts, the new .so only needs to have
functions enough that the DROPs work, and the new CREATE... work.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep for 2011CF1

2011-01-21 Thread Aidan Van Dyk

On Fri, Jan 21, 2011 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:

We all think our own proposed options are the only reasonable thing, but
that helps us not at all in moving forwards. I've put much time into
delivering options many other people want, so there is a range of
function. I think we should hear from Aidan first before we decide to
remove that aspect.

Since invited, I'll describe what I *want* do to do. I understand I
may not get it ;-)

When no sync slave is connected, yes, I want to stop things hard. I
don't mind read-only queries working, but what I want to avoid (if
possible) is having the master do lots of inserts/updates/deletes for
clients, fsyncing them all to disk (so on some strange event causing
recovery they'll be considered commit) and just delay the commit
return until it has a valid sync slave connected and caught up again.
And *I*'ld prefer if client transactions get errors right away rather
than begin to hang if a sync slave is not connected.

Even with single server, there's the window where stuff could be
committed but the client not notified yet. And that leads to
transactions which need to be verified. And with sync rep, that
window get's a little larger. But I'ld prefer not to make it a hanger
door, *especially* when it gets flung open at the point where the shit
has hit the fan and we're in the midst of switching over to manual
processing...

So, in my case, I'ld like it if PG couldn't do anything to generate
any user-initiated WAL unless there is a sync slave connected. Yes, I
understand that leads to hard-fail, and yes, I understand I'm in the
minority, maybe almost singular in that desire.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep for 2011CF1

2011-01-21 Thread Aidan Van Dyk

On Fri, Jan 21, 2011 at 1:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 21, 2011 at 12:23 PM, Aidan Van Dyk ai...@highrise.ca wrote:
 When no sync slave is connected, yes, I want to stop things hard.

 What you're proposing is to fail things earlier than absolutely
 necessary (when they try to XLOG, rather than at commit) but still
 later than what I think Simon is proposing (not even letting them log
 in).

 I can't see a reason to disallow login, because read-only transactions
 can still run in such a situation --- and, indeed, might be fairly
 essential if you need to inspect the database state on the way to fixing
 the replication problem.  (Of course, we've already had the discussion
 about it being a terrible idea to configure replication from inside the
 database, but that doesn't mean there might not be views or status you
 would wish to look at.)

And just disallowing new logins is probably not even enough, because
it allows current logged in clients forward progress, leading
towards an eventual hang (with now committed data on the master).

Again, I'm trying to stop forward progress as soon as possible when
a sync slave isn't replicating.  And I'ld like clients to fail with
errors sooner (hopefully they get to the commit point) rather than
accumulate the WAL synced to the master and just wait at the commit.

So I think that's a more complete picture of my quick not do anything
with no synchronous slave replicating that I think was what led to
the no-login approach.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep for 2011CF1

2011-01-21 Thread Aidan Van Dyk

On Fri, Jan 21, 2011 at 1:32 PM, Robert Haas robertmh...@gmail.com wrote:

Again, I'm trying to stop forward progress as soon as possible when
a sync slave isn't replicating. And I'ld like clients to fail with
errors sooner (hopefully they get to the commit point) rather than
accumulate the WAL synced to the master and just wait at the commit.

Well, stopping all WAL activity with an error sounds *more* reasonable
than refusing all logins, but I'm not personally sold on it. For
example, a brief network disruption on the connection between master
and standby would cause the master to grind to a halt... and then
almost immediately resume operations.

Yup. And I'm OK with that. In my case, it would be much better to
have a few quick failures, which can complete automatically a few
seconds later then to have a big buildup of transactions to re-verify
by hand upon starting manual processing.

But again, I'll stress that I'm talking about whe the master has no
sync slave connected. a brief netowrk disruption between the
master/slave isn't likely going to disconnect the slave. TCP is
pretty good at handling those. If the master thinks it has a sync
slave connected, I'm fine with it continuing to queue WAL for it even
if it's lagging noticeably.

More generally, if you have
short-running transactions, there's not much difference between
wait-at-commit and wait-at-WAL, and if you have long-running
transactions, then wait-at-WAL might be gumming up the works more than
necessary.

Again, when there is not sync slave *connected*, I don't want to wait
*at all*. I want to fail ASAP. If there is a sync slave, and it's
just slow, I don't really care where it waits.

From my experience, if the slave is not connected (i.e TCP connection
has been disconnected), then we're in something like:

1) Proper slave shutdown: pilot error here stopping it if the master requires it
2) Master start, slave not connected yet: I'm fine with getting
errors here... We *hope* a slave will be here soon, but...
3) network has seperated master/slave: TCP means it's been like this
for a long time already...
4) Slave hardware/os low-level hang/crash: TCP means it's been like
this for a while already before master's os tears down the connection
5) Slave has crashed (or rebooted) and slave OS has closed/rejected
our TCP connection

In all of these, I'ld love for my master not to be generating WAL and
letting clients think they are making progress. And I'm hoping that
for #3 4 above, PG will have keepalive type traffic that will
prevent me from queing WAL for normal TCP connection time values.

One idea might be to wait both before and after commit. If
allow_standalone_primary is off, and a commit is attempted, we check
whether there's a slave connected, and if not, wait for one to
connect. Then, we write and sync the commit WAL record. Next, we
wait for the WAL to be ack'd. Of course, the standby might disappear
between the first check and the second, but it would greatly reduce
the possibility of the master being ahead of the standby after a
crash, which might be useful for some people.

Ya, but that becomes much more expensive. Instead of it just being a
write WAL, fsync WAL, send WAL, wait for slave, it becomes write
WAL, fsync WAL, send WAL, wait for slave fsync, write WAL, fsync WAL,
send WAL, wait for slave fsync. And it's expense is all the time,
rather than just when the no slave no go situations arise.

And it doesn't reduce the transactions I need to verify by hand
either, because that waiting/error still only happens at the COMMIT
statement from the client.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add support for logging the current role

2011-01-14 Thread Aidan Van Dyk

On Fri, Jan 14, 2011 at 4:56 PM, Andrew Dunstan and...@dunslane.net wrote:

I'm not sure I really want to make it that flexible :-)

To deal with the issue Tom's referring to, I think it would be sufficient if
we just allowed users to suppress production of certain columns (as long as
we never do anything so evil as to add a new column in the middle).

There are some other issues with the format. I know Josh has bitched about
the presence of command tags in certain fields, for example.

If there is going to be any change, how about using fixed columns (an
possibly allowing them to be empty for stuff that's expensive to
create/write), but adding a 1st column that contains a version
identifyer. And to make it easy, maybe the PG major version as the
version value.

If the 1st column is always the version, tools can easily know if
they understand all the columns (and what order they are in) and it'
easy to write a conversion that strips/re-aranges columns from a
newer CVS dump to match an older one if you have tools that don't know
about newer column layouts..

Personally, I'm not worried about the CSV logs being backwards
compatible as long as there's a very easy way to know what I might be
looking at, so conversion is easy...

But then again, I don't have multiple gigabytes of logs to process either.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Aidan Van Dyk

On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm not convinced.  I was thinking that we could simply treat it like
 SIGQUIT, if it's available.  I doubt there's a real use case for
 continuing to run queries after the postmaster and all the background
 processes are dead.  Expedited death seems like much better behavior.
 Even checking PostmasterIsAlive() once per query would be reasonable,
 except that it'd add a system call to check for a condition that
 almost never holds, which I'm not eager to do.

If postmaster has a few fds to spare, what about having it open a pipe
to every child it spawns.  It never has to read/write to it, but
postmaster closing will signal the client's fd.  The client just has
to pop the fd into whatever nrmal poll/select event handlign it uses
to notice when the parent's pipe is closed.

A FIFO would allow postmaster to not need as many file handles, and
clients reading the fifo would notice when the writer (postmaster)
closes it.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread Aidan Van Dyk

On Wed, Jan 12, 2011 at 10:15 AM, David Fetter da...@fetter.org wrote:

Considering that parallell base backups would be io-bound (or
network-bound), there is little need to actually run them in parallell

That's not actually true. Backups at the moment are CPU-bound, and
running them in parallel is one way to make them closer to I/O-bound,
which is what they *should* be.

Remember, we're talking about filesystem base backups here. If you're
CPU can't handle a stream from disk - network, byte for byte (maybe
encrypting it), then you've spend *WY* to much on your storage
sub-system, and way to little on CPU.

I can see trying to parallize the base backup such that each
table-space could be run concurrently, but that's about it.

There are other proposals out there, and some work being done, to make
backups less dependent on CPU, among them:

- Making the on-disk representation smaller
- Making COPY more efficient

As far as I know, none of this work is public yet.

pg_dump is another story. But it's not related to base backups for
PIT Recovery/Replication.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-01 Thread Aidan Van Dyk

On Sat, Jan 1, 2011 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote:
On Sat, 2011-01-01 at 14:40 -0800, Josh Berkus wrote:

Standby in general deals with the A,D,R triangle (Availability,
Durability, Response time). Any one configuration is the A,R
configuration, and the only reason to go out with it for 9.1 is
because it's simpler to implement than the D,R configuration (all
standbys must ack).

Nicely put. Not the only reason though...

As I showed earlier, the AR gives you 99.999% availability and the DR
gives you 94% availability, considering a 3 server config. If you add
more servers, the availability of the DR option gets much worse, very
quickly.

The performance of AR is much better also, and stays same or better as
cluster size increases. DR choice makes performance degrade as cluster
size increases, since it works at the speed of the slowest node.

I'm all for getting first-past-post in for 9.1. Otherwise I fear
we'll get nothing.

Stephen and I will only be able to use 1 sync slave, the DR-site
one. That's fine. I can live with it, and make my local slave be
async. Or replicate the FS/block under WAL. I can monitor the
out of it, and unless it goes down, it should easily be able to keep
up with the remote sync one beind a slower WAN link.

And I think both Stephen and I understand your availability math.
We're not arguing that the 1st past post both gives better query
availabiliyt, and cluster scale performance.

But when the primary datacenter servers are dust in the crater (or
boats in the flood, or ash in the fire), I either keep my job, or I
don't. And that depends on whether there is a chance I (my database
system) confirmed a transaction that I can't recover.

So sync rep with 1st past post already makes my job easier. I'll take
it over nothing ;-)

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Aidan Van Dyk

On Fri, Dec 31, 2010 at 5:26 AM, Simon Riggs si...@2ndquadrant.com wrote:

Your picture above is a common misconception. I will add something to
the docs to explain this.

2. sync does not guarantee that the updates to the standbys are in any
way coordinated. You can run a query on one standby and get one answer
and at the exact same time run the same query on another standby and get
a different answer (slightly ahead/behind). That also means that if the
master crashes one of the servers will be ahead or behind. You can use
pg_last_xlog_receive_location() to check which one that is.

When people say they want *all* servers to respond, its usually because
they want (2), but that is literally impossible in a distributed system.

Just to try and be clear again, in sync that Stefan and I are
talking about, we really don't care that the slave could be a hot
standby answering queries. In fact, mine wouldn't be. Mine would
likely be pg_streamrecv or something. I'm just looking for a
guarantee that I've got a copy of the data safely in the next rack,
and a separate building before I tell the client I've moved his money.

I want a synchronous replication of the *data*, and not a system where
I can distribute queries. I'm looking for disaster mitigation, not
load mitigation. A replacement for clustered/replicated
devices/filesystems under pg_xlog.

Having the next rack slave be hot in terms of applying WAL and ready
to take over instantly would be a bonus, as long as I can guarantee
it's current (i.e has all data a primary's COMMIT has acknowledged).

So, that's what I want, and that's what your docs suggest is
impossible currently; 1st past post means that I can only ever
reliably configure 1 sync slave and be sure it will have all
acknowledged commits. I can likely get *close* to that by putting
only my slowest slave as the only sync slave, and monitoring the
heck out of my asynchronous but I want to be synchronous slave, but
I'ld rather trust the PG community to build robust synchronization
than myself to build robust enough monitoring to catch that my slave
is farther behind than the slower synchronous one.

That said, I think the expectation is that if I were building a
query-able hot standby cluster in sync rep mode, once I get a commit
confirmation, I should be able to then initiate a new transaction on
any member of that sync rep cluster and see the data I just committed.

Yes, I know I could see *newer* data. And I know that the primary
could already have newer data. Yes, we have the problem even on a
single pg cluster on a single machine. But the point is that if
you've committed, any new transactions see *at least* that data or
newer. But no chance of older.

But personally, I'm not interested in that ;-)
--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Aidan Van Dyk

On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote:

As the README says that is not self-contained (for no fault of its own) and
one should typically set archive_command to guarantee zero WAL loss.

Yes. Though you can combine it fine with wal_keep_segments if you
think that's safe - but archive_command is push and this tool is pull,
so if your backup server goes down for a while, pg_streamrecv will get
a gap and fail. Whereas if you configure an archive_command, it will
queue up the log on the master if it stops working, up to the point of
shutting it down because of out-of-disk. Which you *want*, if you want
to be really sure about the backups.

I was thinking I'ld like use pg_streamrecv to make my archive, and
the archive script on the master would just verify the archive has
that complete segment.

This get's you an archive synced as it's made (as long as streamrecv
is running), and my verifyarchive command would make sure that if
for some reason, the backup archive went down, the wal segments
would be blocked on the master until it's up again and current.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Aidan Van Dyk

On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat r...@xzilla.net wrote:

If primary crashes while commits are waiting for acknowledgement, those
transactions will be marked fully committed if the primary database
recovers, no matter how allow_standalone_primary is set.

This seems backwards; if you are waiting for acknowledgement, wouldn't the
normal assumption be that the transactions *didnt* make it to any standby,
and should be rolled back ?

This is the standard 2-phase commit problem. The primary server *has*
committed it, it's fsync has returned, and the only thing keeping it
from returning the commit to the client is that it's waiting on a
synchronous ack from a slave.

You've got 2 options:
1) initiate fsync on the slave first
- In this case, the slave is farther ahead than the primary, and if
primary fails, you're *forced* to have a failover. The standby is
head of the primary, so the primary recovering can cause divergence.
And you'll likely have to do a base-backup style sync to get a new
primary/standby setup.
2) initiate fsync on the primary first
- In this case, the slave is always slightly behind. If if your
primary falls over, you don't give commit messages to the clients, but
if it recovers, it might have committed data, and slaves will still be
able to catch up.

The thing is that currently, even without replication, #2 can happen.
If your db falls over before it gets the commit packet stuffed out the
network, you're in the same boat. The data might be committed, even
though you didn't get the commit packet, and when your DB recovers,
it's got the committed data that you never knew was committed.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Aidan Van Dyk

On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com wrote:

description of split stuff

So, how different (or not) is this to the directory format that was
coming out of the desire of a parallel pg_dump?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Aidan Van Dyk

On Wed, Dec 29, 2010 at 9:11 AM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Wed, Dec 29, 2010 at 8:31 AM, Joel Jacobson j...@gluefinance.com wrote:


 2010/12/29 Aidan Van Dyk ai...@highrise.ca

 On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com
 wrote:

 description of split stuff

 So, how different (or not) is this to the directory format that was
 coming out of the desire of a parallel pg_dump?

 Not sure what format you are referring to? Custom, tar or plain text?
 I noticed there are two undocumented formats as well, append and file.
 I tried both of these undocumented formats, but it did not procude any
 directory structure of the dumped objects.
 Could you please explain how to use the directory format is such a
 format already exists?
 I can't find it in the documentation nor the source code of HEAD.

 It is still being discussed as a patch to pg_dump. Google for directory
 archive format for pg_dump, specifically in archives.postgresql.org.

Specifically:
Message-ID: aanlktimueltxwrsqdqnwxik_k1y3ych1u-9nghzqp...@mail.gmail.com

 AFAIK, that applies to parallel dumps of data (may help in --schema-only
 dumps too), and what you are trying is for schema.

Right, but one of the things it does is break the dump in to parts,
and put them in a directory/file organization.

Both are doing it for different reasons, but doing pretty much the
same thing.  But can the layout/organization of Joachim's patch can be
made human friendly in the vein of Joel's vision?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Aidan Van Dyk

On Tue, Dec 28, 2010 at 11:59 AM, Joel Jacobson j...@gluefinance.com wrote:

I don't follow, what do you mean with failure modes? The oid in the
filename? I suggested to use a sequence instead but you didn't comment on
that. Are there any other failure modes which could cause a diff -r between
two different databases to break?

Both OID and sequence mean that your likely to get a diff which is
nothing more than complete files removed from 1 side and added to the
othe rside with different names (i.e. oid's don't match, or an
added/removed object changes all following sequence assingments).

If you're going to try and split, I really think the only usefull
filename has to be similar to something like:
schema/type/name/part

If you want to use diff, you pretty much have to make sure that the
*path* will be identical for similary named objects, irrespective of
anything else in the database. And path has to be encoding aware.

And you want names that glob well, so for instance, you could exclude
*.data (or a schema) from the diff.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-24 Thread Aidan Van Dyk

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake j...@commandprompt.com wrote:

 I would have to agree here. The idea that we have to search email is bad
 enough (issue/bug/feature tracker anyone?) but to have someone say,
 search the archives? That is just plain rude and anti-community.

Saying search the bugtracker is no less rude than search the archives...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend to
keep going back to gmane's search)

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Aidan Van Dyk

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs si...@2ndquadrant.com wrote:

 So what you suggest works only if we restrict CRC checking to blocks
 incoming to the buffer cache, but leaves us unable to do CRC checks on
 blocks once in the buffer cache. Since many blocks stay in cache almost
 constantly, we're left with the situation that the most heavily used
 parts of the database seldom get CRC checked.

With this statement, you just moved the goal posts on the checksumming
ideas.  In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

Being able to arbitrary (i.e at any point in time) prove that the
shared buffers contents are exactly what they should be may be a
worthy goal, but that's many orders of magnitude more difficult than
verifying that the bytes we read from disk are the ones we wrote to
disk.

a.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Aidan Van Dyk

On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs si...@2ndquadrant.com wrote:

 I'm sure it will take a little while for everybody to understand why a
 full CRC implementation is both necessary and now possible. Paradigm
 shifts of thought do seem like teleports, but they can be beneficial.

But please don't deny the rest of us airbags while you keep working on
teleportation ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

2010-12-20 Thread Aidan Van Dyk

On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao masao.fu...@gmail.com wrote:
OK. How about keepalive-like parameters and behaviors?

replication_keepalives_idle
replication_keepalives_interval
replication_keepalives_count

The master sends the keepalive packet if replication_keepalives_idle
elapsed after receiving the last ACK packet including the receive/
fsync/replay LSNs from the standby. OTOH, the standby sends the
ACK packet back to the master as soon as receiving the keepalive
packet.

If the master could not receive the ACK packet for
replication_keepalives_interval, it repeats sending the keepalive
packet and receiving the ACK replication_keepalives_count -1
times. If no ACK packet has finally arrived, the master thinks the
standby has been dead.

I thought we were using a single TCP session per standby/slave? So
adding another KEEPALIVE into the local buffer side of the TCP
stream isn't going to help a stuck one arrive earlier.

You really only have a few situations:

1) Network problems. Stuffing more stuff into the local buffers isn't
gonig to help get packets from the remote that it would like to send
(I say like to send, because network problems could be on either/both
directions, the remote may or may not have seen our keepalive
requrest)

2) The remote is getting them, and is swamped. It's not going to get
processing our 2nd keepalive any sooner than processing our 1st.

If a walreceiver reads a keepalive request, Just declare that it
must reply immediately. Then the master config can trust that a
keepalive should be replied to pretty quickly if networks is ok. TCP
will make it get there eventually if it's a bad network, and the
admins have set it be very network tolerant.

The ACK might report that the salve is hopelessly behind on
fsyncing/applying it's WAL, but that's good too. At least then the
ACK comes back, and the master knows the slave is still churning away
on the last batch of WAL, and can decide if it wants to think the
slave is too far behind and boot it out.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION ... UPGRADE;

2010-12-13 Thread Aidan Van Dyk

On Sat, Dec 11, 2010 at 4:35 PM, David E. Wheeler da...@kineticode.com wrote:

 What about having the following keys supported in the control file:

  upgrade_version = 'script.version.sql'
  upgrade_all = 'script.sql'

 Why not just use an upgrade script naming convention? Think: Convention over 
 configuration.

Mainly, because of the situation where I have may versions that can
all be upgraded from the same script.  I'ld much rather distribution
just 3 scripts (install + 2 upgrades), and a control file with
something like this (pretend I'm on version 2.6)
upgragde-1.0 = $EXT-upgrade-1.sql
upgragde-1.1 = $EXT-upgrade-1.sql
upgragde-1.1.1 = $EXT-upgrade-1.sql
upgragde-1.1.2 = $EXT-upgrade-1.sql
upgragde-1.2 = $EXT-upgrade-1.sql
upgragde-1.3 = $EXT-upgrade-1.sql
upgragde-1.4 = $EXT-upgrade-1.sql
upgragde-1.4.1 = $EXT-upgrade-1.sql
upgrade-2.0 = $EXT-upgrade-2.sql
upgrade-2.1 = $EXT-upgrade-2.sql
upgrade-2.2 = $EXT-upgrade-2.sql
upgrade-2.2.1 = $EXT-upgrade-2.sql
upgrade-2.3 = $EXT-upgrade-2.sql
upgrade-2.4 = $EXT-upgrade-2.sql
upgrade-2.5 = $EXT-upgrade-2.sql


Forcing convention on me to maitain/install an upgrade script for
every single version is way more than asking me to just specify an
upgrade script for versions.

Again, I'ld love for the version to support some sort of prefix or
wildcard matching, so I could do:
upgrade-1.* =  $EXT-upgrade-1.sql
upgrade-2.* =  $EXT-upgrade-2.sql

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION ... UPGRADE;

2010-12-13 Thread Aidan Van Dyk

On Mon, Dec 13, 2010 at 9:55 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:

 Again, I'ld love for the version to support some sort of prefix or
 wildcard matching, so I could do:
     upgrade-1.* =  $EXT-upgrade-1.sql
     upgrade-2.* =  $EXT-upgrade-2.sql

 Problem is: what to do if a single upgrade matches more than one line?
 The only safe answer is to error out and refuse to upgrade but that
 ain't nice to the user. How much is that a problem here?

To get a wildcard match (or a prefix match) for version upgrades, I'ld
be willing to have that error if I give a bad set of version matches.
If only have those 2 lines to manage, it's a lot more likely I won't
mess them up than if I have to manage 30 almost identical lines and
not miss/duplicate a version.

 ;-)

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION ... UPGRADE;

2010-12-10 Thread Aidan Van Dyk

On Fri, Dec 10, 2010 at 4:50 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:

Now, what about having the control file host an 'upgrade' property where
to put the script name? We would have to support a way for this filename
to depend on the already installed version, I'm thinking that %v might
be the easiest here (read: I want to avoid depending on any version
scheme).

version = '13'
script = 'foo.sql'
upgrade = 'foo_upgrade.%v.13.sql'

If I was linking of putting bundling my utiliites up as an extension
(yes, I would that from a packaging/DB management perspective), I
think I'ld like a control like that, but with a bit of a wildcard
version matching, something like:
version = '3.12'
upgrade-1. = 'utils-upgrade-1.0.sql'
upgrade-2. = 'utils-upgrade-2..0.sql
upgrade-3. = 'nothing'

I'm thinking of a scheme where the upgrade-$VERSION uses a prefix
match, so 1.1, 1.2, 1.3 would all be matched by 1.. The 3.=nothing
is some way of specifing you don't need to do anything, becuase my n.X
release are all compatible sql-so wise. They would only be bug
fixes if I did something wrong in my stuff.. Anything not compatible
woudl bump the first number.

If it's a prefix type match, then the PG versionins woudl work too,
for intsance:
upgrade-9.0.=...
would match any pg 9.0.*

I guess you could use SQL like if that' more consitent...

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk

On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.

Just a small point of clarification - you need to have both that
unknown archtecture, and that architecture has to have postgres
process running simultaneously on difference CPUs with different
caches that are incoherent to have those problems.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk

On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote:
 Well, its not generally true - you are right there. But there is a wide range
 for syscalls available where its inherently true (which is what I sloppily
 referred to). And you are allowed to call a, although quite restricted, set of
 system calls even in signal handlers. I don't have the list for older posix
 versions in mind, but for 2003 you can choose something from several like
 write, lseek,setpgid which inherently have to serialize. And I am quite sure
 there were sensible calls for earlier versions.

Well, it's not quite enough just to call into the kernel to serialize
on some point of memory, because your point is to make sure that
*this particular piece of memory* is coherent.  It doesn't matter if
the kernel has proper fencing in it's stuff if the memory it's
guarding is in another cacheline, because that won't *necessarily*
force cache coherency in your local lock/variable memory.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk

On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:

 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

 Sure you do.  But so what?  Are you going to compile PostgreSQL and
 implement TAS as a simple store and read-fence as a simple load?  How
 likely is that to work out well?

If I was trying to port PostgreSQL to some strange architecture, and
my strange architecture didtt' have all the normal TAS and memory
bariers stuff because it was only a UP system with no cache, then yes,
and it would work out well ;-)

If it was some strange SMP architecture, I wouldn't expect *anything*
to work out well if the architecture doesn't have some sort of
TAS/memory barrier/cache-coherency stuff in it ;-)

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Aidan Van Dyk

On Thu, Nov 18, 2010 at 1:01 PM, Josh Berkus j...@agliodbs.com wrote:

We use it. Do you have an alternative that doesn't lower security
besides Kerberos? Anti-ident arguments are straw man arguments - If
you setup identd badly or don't trust remote root or your network,
ident sucks as an authentication mechanism.

Actually, you're trusting that nobody can add their own machine as a node on
your network. All someone has to do is plug their linux laptop into a
network cable in your office and they have free access to the database.

I think you need to give him a little more credit than that... From
the description he gave, I wouldn't be surprised if the networks he's
using ident on, he's got switch ports locked, limited server access,
etc...

His whole point was that in his locked down network, ident is *better*
that giving everybody yet another password they have to manage, have
users not mis-manage, and make sure users don't mis-use...

So, yes, ident is only as secure as the *network and machines* it's
used on. Passwords are only as secure as the users managing them, and
the machines/filesystems containing .pgpass ;-)

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-15 Thread Aidan Van Dyk

On Mon, Nov 15, 2010 at 11:22 AM, Robert Haas robertmh...@gmail.com wrote:

 Yeah, this infrastructure doesn't really allow that.  The truncate
 happens way too early on in startup to execute any user-provided code.

But you could use the very feature of unlogged tables to know if
you've initialized some unlogged table by using an unlogged table to
note the initilization.

If the value you expect isn't in your noted table, you know that
it's been truncated...

Sure, it's app side, but the hole point of unlogged tables it to
allow optimzations when the appside knows the data's dispensable and
rebuild-able.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming contrib. (was multi-platform, multi-locale regression tests)

2010-11-11 Thread Aidan Van Dyk

On Thu, Nov 11, 2010 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote:

It's intentional behavior. It gives up when there are too many
differences to avoid being slow.

And, it's configurable, at least to diff and merge. If it's not
available in all the other porcelains, yes, that would be bugs that
should be fixed:
-lnum
The -M and -C options require O(n^2) processing time where
n is the number of potential
rename/copy targets. This option prevents rename/copy
detection from running if the number
of rename/copy targets exceeds the specified number.

And can even be specified as config options diff.renameLimit and
merge.renameLimit.

We should adopt that philosophy. I suggest we limit all tables in future to
1m rows in the interests of speed.

As long as it's configurable, and if it would make operations on
smaller tables faster, than go for it.

And we should by defualt limit shared_buffers to 32MB. Oh wait.

There are always tradeoffs when picking defaults, a-la-postgresql.conf.

We as a community are generally pretty quick to pick up the defaults
are very conservative, make sure you tune ... song when people
complain about pg being too slow

;-)

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk

On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark gsst...@mit.edu wrote:

 But buffering the page only means you've got some consistent view of
 the page. It doesn't mean the checksum will actually match the data in
 the page that gets written out. So when you read it back in the
 checksum may be invalid.

I was assuming that if the code went through the trouble to buffer the
shared page to get a stable, non-changing copy to use for
checksumming/writing it, it would write() the buffered copy it just
made, not the original in shared memory...  I'm not sure how that
write could be in-consistent.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk

On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby j...@nasby.net wrote:

Huh, this implies that if we did go through all the work of
segregating the hint bits and could arrange that they all appear on
the same 512-byte sector and if we buffered them so that we were
writing the same bits we checksummed then we actually *could* include
them in the CRC after all since even a torn page will almost certainly
not tear an individual sector.

If there's a torn page then we've crashed, which means we go through crash
recovery, which puts a valid page (with valid CRC) back in place from the
WAL. What am I missing?

The problem case is where hint-bits have been set. Hint bits have
always been we don't really care, but we write them.

A torn-page on hint-bit-only writes is ok, because with a torn page
(assuming you dont' get zero-ed pages), you get the old or new chunks
of the complete 8K buffer, but they are identical except for only
hint-bits, which eiterh the old or new state is sufficient.

But with a check-sum, now, getting a torn page w/ only hint-bit
updates now becomes noticed. Before, it might have happened, but we
wouldn't have noticed or cared.

So, for getting checksums, we have to offer up a few things:
1) zero-copy writes, we need to buffer the write to get a consistent
checksum (or lock the buffer tight)
2) saving hint-bits on an otherwise unchanged page. We either need to
just not write that page, and loose the work the hint-bits did, or do
a full-page WAL of it, so the torn-page checksum is fixed

Both of these are theoretical performance tradeoffs. How badly do we
want to verify on read that it is *exactly* what we thought we wrote?

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk

On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark gsst...@mit.edu wrote:

 Then we might have to get rid of hint bits. But they're hint bits for
 a metadata file that already exists, creating another metadata file
 doesn't solve anything.

Is there any way to instrument the writes of dirty buffers from the
share memory, and see how many of the pages normally being written are
not backed by WAL (hint-only updates)?  Just dumping those buffers
without writes would allow at least *checksums* to go throug without
loosing all the benifits of the hint bits.

I've got a hunch (with no proof) that the penalty of not writing them
will be born largely by small database installs.  Large OLTP databases
probably won't have pages without a WAL'ed change and hint-bits set,
and large data warehouse ones will probably vacuum freeze big tables
on load to avoid the huge write penalty the 1st time they scan the
tables...

/waving hands

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 >

1 - 100 of 363 matches

Mail list logo