Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Harold Giménez
I think the main issue is when a table rewrite is triggered on a DDL
command on a large table, as this is what frequently leads to
unavailability. The idea of introducing a NOREWRITE keyword to DDL
commands then came up (credit: Peter Geoghegan). When the NOREWRITE
keyword is used and the DDL statement would rewrite the table, the
command errors and exits.

This would allow ORM and framework authors to include the NOREWRITE
option by default, only to be disabled on a per-statement basis by the
developer, once they have assessed that it may be safe or otherwise
they still want to proceed with this. The workflow for an app
developer then becomes:

* Write offending data migration (eg: add a column with a NOT NULL
constraint and default value)
* Test it locally, either by running automated test suite or running on staging
* See that it fails because of NOREWRITE option
* Assess situation. If it's a small table, or I still want to ignore,
override the option. Or rewrite migration to avoid rewrite.
* Repeat

I like this a lot just because it's simple, limited in scope, and can
be easily integrated into ORMs saving users hours of downtime and
frustration.

Thoughts?

On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:


 On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr
 wrote:

 Hi fellow hackers,

 I would like to work on a new feature allowing our users to assess the
 amount of trouble they will run into when running a DDL script on their
 production setups, *before* actually getting their services down.

 The main practical example I can offer here is the ALTER TABLE command.
 Recent releases are including very nice optimisations to it, so much so
 that it's becoming increasingly hard to answer some very basic
 questions:

   - what kind of locks will be taken? (exclusive, shared)
   - on what objects? (foreign keys, indexes, sequences, etc)
   - will the table have to be rewritten? the indexes?

 Of course the docs are answering parts of those, but in particular the
 table rewriting rules are complex enough that “accidental DBAs” will
 fail to predict if the target data type is binary coercible to the
 current one.

 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

  2. What do you think such a feature should look like?

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


 What you are proposing is some kind of dry-run with verbose output?

 Regards,

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-24 Thread Harold Giménez
On Fri, Jan 24, 2014 at 6:46 AM, Magnus Hagander mag...@hagander.net wrote:

 On Thu, Jan 23, 2014 at 2:01 AM, Greg Stark st...@mit.edu wrote:

 On Wed, Jan 22, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
  Probably Heroku has some more specific exploit case to be concerned
  about here; if so, might I suggest taking it up with the -security list?

 I don't think there's a specific vulnerability that needs to be kept
 secret here.

 Here's an example. I just created a new hobby database which is on a
 multi-tenant cluster and ran select * from pg_stat_activity. Here are
 two of the more interesting examples:

  463752 | de5nmf0gbii3u5 | 32250 |   463751 | qspfkgrwgqtbcu | unicorn
 worker[1] -p 30390 -c ./config/unicorn.rb ||
   | |  |
 |   |
  | || insufficient privilege
  463752 | de5nmf0gbii3u5 | 32244 |   463751 | qspfkgrwgqtbcu | unicorn
 worker[0] -p 30390 -c ./config/unicorn.rb ||
   | |  |
 |   |
  | || insufficient privilege


 Note that the contents of the ARGV array are being set by the
 unicorn task queuing library. It knows it's making this information
 visible to other users with shell access on this machine. But the
 decision to stuff the ARGV information into the application_name is
 being made by the Pg driver. Neither is under the control of the
 application author who may not even be aware this is happening.
 Neither component has the complete information to make a competent
 decision about whether this information is safe to be in
 application_name or not.

 Note that the query is showing as insufficient privilege even
 though it is listed in the ps output -- the same ps output that is
 listing the unicorn ARGV that is being shown in the
 application_name

 You might say that the Pg gem is at fault for making a blanket policy
 decision for applications that the ARGV is safe to show to other
 database users but realistically it's so useful to see this
 information for your own connections that it's probably the right
 decision. Without it it's awfully hard to tell which worker is on
 which connection. It would just be nice to be able to treat
 application_name the same as query.


 I would say that yes, this is clearly broken in the Pg gem. I can see it
 having such a default, but not allowing an override...

Uhm, it does allow an override as I said before.


 The application can of course issue a SET application_name, assuming there
 is a hook somewhere in the system that will run after the connection has
 been established. I've had customers use that many times in java based
 systems for example, but I don't know enough about the pg gem, or unicorn,
 to have a clue if anything like it exists there. This is also a good way to
 track how connections are used throughout a pooled system where the same
 connection might be used for different things at different times.

 What actually happens if you set the application_name in the connection
 string in that environment? Does it override it to it's own default? If so,
 the developers there clearly need to be taught about
 fallback_application_name.

It can be overridden using any of these methods. It does in fact use
fallback_application_name when it defaults to $0, see
https://bitbucket.org/ged/ruby-pg/src/6c2444dc63e17eb695363993e8887cc5d67750bc/lib/pg/connection.rb?at=default#cl-46


 And what happens if you set it in PGAPPNAME?

It works fine:

```
PGAPPNAME=this_is_a_custom_app_name ruby -w -rpg -e
PG.connect(dbname: 'hgmnz', host: 'localhost').exec('SELECT
application_name FROM pg_stat_activity') { |res| res.each { |row| puts
row.values_at('application_name') } }
this_is_a_custom_app_name
```

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Harold Giménez
On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/23/2014 12:34 PM, Joshua D. Drake wrote:

 Hello,

 I have run into yet again another situation where there was an
 assumption that autovacuum was keeping up and it wasn't. It was caused
 by autovacuum quitting because another process requested a lock.

 In turn we received a ton of bloat on pg_attribute which caused all
 kinds of other issues (as can be expected).

 The more I run into it, the more it seems like autovacuum should behave
 like vacuum, in that it gets precedence when it is running. First come,
 first serve as they say.

 Thoughts?

 If we let autovacuum block user activity, a lot more people would turn
 it off.

 Now, if you were to argue that we should have some way to monitor the
 tables which autovac can never touch because of conflicts, I would agree
 with you.

Agree completely. Easy ways to monitor this would be great. Once you
know there's a problem, tweaking autovacuum settings is very hard and
misunderstood, and explaining how to be effective at it is a dark art
too.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 12:31 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 01/21/2014 04:19 PM, Heikki Linnakangas wrote:
  On 01/21/2014 07:22 AM, Harold Giménez wrote:
  First of all, I apologize for submitting a patch and missing the
  commitfest
  deadline. Given the size of the patch, I thought I'd submit it for your
  consideration regardless.
 
  This patch prevents non-superusers from viewing other user's
  pg_stat_activity.application_name.  This topic was discussed some time
  ago
  [1] and consequently application_name was made world readable [2].
 
  I would like to propose that we hide it instead by reverting to the
  original behavior.  There is a very large number of databases on the same
  cluster shared across different users who can easily view each other's
  application_name values.  Along with that, there are some libraries that
  default application_name to the name of the running process [3], which
  can
  leak information about what web servers applications are running, queue
  systems, etc. Furthermore leaking application names in a multi-tenant
  environment is more information than an attacker should have access to on
  services like Heroku and other similar providers.
 
  I don't find these arguments compelling to change it now. It's
  well-documented that application_name is visible to everyone. Just don't
  put sensitive information there.
 
  For those users that don't mind advertising application_name, the patch
  would be highly inconvenient. For example, the database owner could no
  longer see the application_name of other users connected to her database.

 It also means that monitoring tools must run as superuser to see
 information they require, which to me is a total showstopper.


Well, the fact is that if you don't run monitoring tools as superuser,
there may not be enough connection slots available anyways, in cases
where actual usage is consuming all of max_connections, and only the
reserved slots are available. So in a way it's already unreliable to
run monitoring as non-superuser unfortunately.



 If you want control over visibility of application_name, it should be
 done with a column privilige granted to a system role, or something like
 that - so the ability to see it can be given to public on default
 (thus not breaking BC) and if it's revoked from public, given to roles
 that need to see it.


Something along these lines sounds like would solve the problem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 7:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen Frost sfr...@snowman.net writes:
 * Craig Ringer (cr...@2ndquadrant.com) wrote:
 If you want control over visibility of application_name, it should be
 done with a column privilige granted to a system role, or something like
 that - so the ability to see it can be given to public on default
 (thus not breaking BC) and if it's revoked from public, given to roles
 that need to see it.

 I agree with this- individuals should be able to control access to this
 information for their databases/clusters.

 I think that'd be much more complexity than the case justifies.  The
 argument that application_name might contain sensitive information seems
 ludicrously weak to me: whatever a client is exposing as application_name
 is its own darn choice.  If you don't like it, go fix the client.
 If there is some client library that sets application_name without
 allowing the choice to be overridden, then that's a problem with that
 library, not with the server's behavior.

I don't know of a client where it can't be overridden. The friction
occurs when by default it sets it to something useful to a developer
(useful eg: to find what process is holding a lock), but is not
possible to conceal from other users on the same cluster. If this were
an in-premise or private cluster the point is moot.

Furthermore consider when even using application_name for it's
original intended use. On a shared environment as I'm describing here,
that makes it possible for an attacker to identify what apps connect
to a given server, or on the other hand is a way to find out where a
given application stores its data, which can be used for a more
targeted attack.

Beyond that yes, it's definitely possible to fix the client, but the
cited is but one example of defaults people are using in the wild, and
if the trend continues we'll be facing fanout of this behavior. I feel
like being conservative and fixing the issue on the server side is
worthwhile.

Regards,

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 4:01 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 21, 2014 at 03:57:37PM -0800, Harold Giménez wrote:
  It also means that monitoring tools must run as superuser to see
  information they require, which to me is a total showstopper.


 Well, the fact is that if you don't run monitoring tools as superuser,
 there may not be enough connection slots available anyways, in cases
 where actual usage is consuming all of max_connections, and only the
 reserved slots are available. So in a way it's already unreliable to
 run monitoring as non-superuser unfortunately.

 You might need to run as superuser in these cases, but it is hard to see
 why would need to do that in the normal case.

Definitely agree with you. This is just an example of how running
monitoring as superuser is not necessarily the worst thing, and there
are other reasons to do it already.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 4:19 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 21, 2014 at 04:06:46PM -0800, Harold Giménez wrote:
 I don't know of a client where it can't be overridden. The friction
 occurs when by default it sets it to something useful to a developer
 (useful eg: to find what process is holding a lock), but is not
 possible to conceal from other users on the same cluster. If this were
 an in-premise or private cluster the point is moot.

 Furthermore consider when even using application_name for it's
 original intended use. On a shared environment as I'm describing here,
 that makes it possible for an attacker to identify what apps connect
 to a given server, or on the other hand is a way to find out where a
 given application stores its data, which can be used for a more
 targeted attack.

 So security through obscurity?  Why wouldn't the attacker just try all
 the app methods at once and not even bother looking at the application
 name?

A malicious attacker may want to attack or harm `app1`. They write a
script that provisions databases and check in pg_stat_activity until
they find an application_name of `app1`. Having found the database
holding app1's data, they then use a targeted attack on postgres, say
a privilege escalation attack or any other vulnerability we don't know
exists yet. Without application_name, the attacker would be unable to
find the target database host to attack.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 4:38 PM, Stephen Frost sfr...@snowman.net wrote:
 * Harold Giménez (har...@heroku.com) wrote:
 Definitely agree with you. This is just an example of how running
 monitoring as superuser is not necessarily the worst thing, and there
 are other reasons to do it already.

 It's a horrible thing and that isn't a good reason- if my database isn't
 accepting connections, I probably don't care one bit how bloated a table
 is.  Indeed, I care *more* that I'm out of connections and would want to
 know that ASAP.

This is a separate topic, but in such a case I'd want to know that
I've reached max_connections, which may not be a problem if I just
don't need any more connections, but I still need something connecting
to make sure the service is available at all and can respond to simple
SELECT 1 queries and a myriad of other things you'd want to keep track
of.


 That said, I'm not against the general idea that the 'reserved'
 connections be opened up to roles beyond superuser (or have some kind of
 priority system, etc), but that's an independent concern and should not
 be a justification for making monitoring require superuser privs.

+1

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 4:46 PM, Stephen Frost sfr...@snowman.net wrote:
 * Harold Giménez (har...@heroku.com) wrote:
 This is a separate topic, but in such a case I'd want to know that
 I've reached max_connections, which may not be a problem if I just
 don't need any more connections, but I still need something connecting
 to make sure the service is available at all and can respond to simple
 SELECT 1 queries and a myriad of other things you'd want to keep track
 of.

 I've never heard of an environment where you can be absolutely confident
 that you need exactly max_connections and zero more.  I seriously doubt
 one exists.

 The service is not available if only a superuser can connect, imv.

People push the limit all the time. They may run at 80% of their max
and occasionally (and temporarily) scale up to a known bounded level,
but no more.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 4:53 PM, Josh Berkus j...@agliodbs.com wrote:
 It would be really nice to be able to GRANT/REVOKE on some of these
 special system views ...

I think this would be ideal, too.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: hide application_name from other users

2014-01-21 Thread Harold Giménez
On Tue, Jan 21, 2014 at 5:22 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 22/01/14 13:32, Harold Giménez wrote:

 On Tue, Jan 21, 2014 at 4:19 PM, Bruce Momjian br...@momjian.us wrote:

 On Tue, Jan 21, 2014 at 04:06:46PM -0800, Harold Giménez wrote:

 I don't know of a client where it can't be overridden. The friction
 occurs when by default it sets it to something useful to a developer
 (useful eg: to find what process is holding a lock), but is not
 possible to conceal from other users on the same cluster. If this were
 an in-premise or private cluster the point is moot.

 Furthermore consider when even using application_name for it's
 original intended use. On a shared environment as I'm describing here,
 that makes it possible for an attacker to identify what apps connect
 to a given server, or on the other hand is a way to find out where a
 given application stores its data, which can be used for a more
 targeted attack.


 So security through obscurity?  Why wouldn't the attacker just try all
 the app methods at once and not even bother looking at the application
 name?


 A malicious attacker may want to attack or harm `app1`. They write a
 script that provisions databases and check in pg_stat_activity until
 they find an application_name of `app1`. Having found the database
 holding app1's data, they then use a targeted attack on postgres, say
 a privilege escalation attack or any other vulnerability we don't know
 exists yet. Without application_name, the attacker would be unable to
 find the target database host to attack.


 If said malicious attacker can log into postgres and issue its own queries,
 and connect to other database then you are in serious trouble already.

They can connect to their database, but not other databases in the cluster.


 I also wonder that if such an attacker knows the application name, that
 would suggest that they have access to the application server and are able
 to read its config files...which would probably also contain the host and db
 name too (and possibly the password in some unfortunate cases)!

No, I am considering something along the lines of www.foobar.com would
have an application_name =~ '%foobar%'


 In addition people normally name app users and app dbs in a fairly
 predictable way (e.g 'finance' app uses 'finance' user to connect to
 'finance' db), so I'm not seeing much value in trying to hide application
 name.

Some database service providers like Heroku Postgres generate role
names and passwords, which are not predictable and users cannot
change.

Regards,
-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: hide application_name from other users

2014-01-20 Thread Harold Giménez
First of all, I apologize for submitting a patch and missing the commitfest
deadline. Given the size of the patch, I thought I'd submit it for your
consideration regardless.

This patch prevents non-superusers from viewing other user's
pg_stat_activity.application_name.  This topic was discussed some time ago
[1] and consequently application_name was made world readable [2].

I would like to propose that we hide it instead by reverting to the
original behavior.  There is a very large number of databases on the same
cluster shared across different users who can easily view each other's
application_name values.  Along with that, there are some libraries that
default application_name to the name of the running process [3], which can
leak information about what web servers applications are running, queue
systems, etc. Furthermore leaking application names in a multi-tenant
environment is more information than an attacker should have access to on
services like Heroku and other similar providers.

Thanks and regards,


-Harold Giménez

[1] http://www.postgresql.org/message-id/14808.1259452...@sss.pgh.pa.us
[2]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0c61cff57a1dc7685fcac9f09451b261f14cb711
[3]
https://bitbucket.org/ged/ruby-pg/src/6c2444dc63e17eb695363993e8887cc5d67750bc/lib/pg/connection.rb?at=default#cl-44


hide_application_name_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing pg_dump default file format

2013-11-08 Thread Harold Giménez
I don't want to hijack this thread any further, but Craig, thanks for your
insight.

-Harold


On Thu, Nov 7, 2013 at 8:35 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 11/08/2013 11:41 AM, Harold Giménez wrote:
 
 
 
  On Thu, Nov 7, 2013 at 7:01 PM, Craig Ringer cr...@2ndquadrant.com
  mailto:cr...@2ndquadrant.com wrote:
 
 
  (a) Lots of people only upgrade every two, three, or even more major
  versions. I'm dealing with clients on 8.3, and people still pop up on
  Stack Overflow with 8.1 sometimes! These people don't ever see the
  deprecated phase.
 
 
  Interesting that they'd never update their clients either. I guess if
  that's true
  there's a mentality of if it ain't broke don't fix it going on here.

 I've seen all sorts of combinations of client and server versions in the
 wild. Ancient JDBC, ODBC, etc drivers are also common.

  Would they read change logs before upgrading, or just cross their
 fingers?
 
  (b) People routinely ignore cron job output. Even important job
 output.
  They won't see the messages, and won't act on them if they do until
  something actually breaks.
 
 
  How common is this?

 I couldn't possibly say, I can only go by what I see in communication
 with clients, in private mail, in the mailing lists, and on Stack Overflow.

 I do see a couple of different groups:

 * People who upgrade casually without looking at release notes etc, then
 wonder why everything just broke. Typically people running PostgreSQL in
 development environments. I'm not too fussed about this group, they do
 it to themselves. On the other hand they're a *big* group (think Ruby on
 Rails developers, etc) and the more of them who whine about how
 PostgreSQL is painful to upgrade and always breaks, the worse it is for
 general uptake of Pg.

 * Those who don't upgrade because they don't know or care to. That box
 has been sitting there doing its thing for ages, and until they hit some
 key limitation, trigger an old bug, or finally need to do something
 different they don't realise that life would've been easier if they
 hadn't stayed on 7.1. These folks seem to be the most likely ones to
 unthinkingly upgrade from 7.something-low or 8.x straight to 9.3 without
 reading the release notes then wonder why things don't work, especially
 since they'll tend to do it in a hurry as a knee-jerk to try to fix a
 problem.

 * People who want to upgrade but are choked by bureaucracy and change
 management processes that move on a tectonic time scale. They're still
 on 8.something-low because it's just too painful to change. They're the
 ones who'll ask you to backport synchronous replication into 9.0 because
 they're not allowed to upgrade to 9.1/9.2, despite the obvious insanity
 of running custom-patched and rebuilt software instead of a well-tested
 public release. When they upgrade they do it in giant leaps despite the
 pain involved, just because it's so hard to upgrade at all. They're
 likely to plan carefully and do it right.

 * People who'd love to upgrade, but have an old database running a 24x7
 mission critical system with enough data that they just can't get a
 dump/reload window, and they're on something older than 8.4 so they
 can't pg_upgrade. When they do upgrade they tend to plan it in detail,
 test carefully first, etc, so again they're pretty OK.


 In general, people are busy and the database isn't all they care about.
 They probably read the release notes about as well as you read the
 release notes on a new distro upgrade or something else that you care
 moderately about.

 If you're doing a major upgrade from an old Pg to a very new one there's
 a lot to take in and a lot of rel notes to read. One of the things that
 might help here is if we had (on the wiki, maybe) a single document that
 kept a running list of compatibility affecting changes and upgrades that
 need special action, along with a recommended upgrade path. That way
 people wouldn't have to read 6 major versions of release notes, get half
 way, and give up.

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] Changing pg_dump default file format

2013-11-07 Thread Harold Giménez
On Thu, Nov 7, 2013 at 8:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Magnus Hagander mag...@hagander.net writes:
  On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake j...@commandprompt.com
 wrote:
  I humbly request on behalf of those who manage production postgresql
  instances that we change the default backup file format from -Fp to -Fc.

  I'm a bit worried about it breaking peoples scripts as well, if they
  pipe the output.

 Whether they pipe the output or not, this will break people's scripts.
 I agree it's a legacy behavior that we wouldn't choose again, but you
 could say that about an awful lot of things including much of the SQL
 standard.  I think it's too late to consider this now.


Just a thought: If it's the right thing to do, why not do it in small
steps?
For example, couldn't we deprecate the behavior on 9.4, and switch over in
9.5? By deprecate I mean a warning message on STDERR making it clear
they're relying on behavior that is scheduled to change.

I don't think there's any precedence in postgres or it's tooling of this
kind of
deprecation process, but it seems we need to define something like it.

Thoughts?

-H


Re: [HACKERS] Changing pg_dump default file format

2013-11-07 Thread Harold Giménez
On Thu, Nov 7, 2013 at 7:01 PM, Craig Ringer cr...@2ndquadrant.com wrote:


 (a) Lots of people only upgrade every two, three, or even more major
 versions. I'm dealing with clients on 8.3, and people still pop up on
 Stack Overflow with 8.1 sometimes! These people don't ever see the
 deprecated phase.


Interesting that they'd never update their clients either. I guess if
that's true
there's a mentality of if it ain't broke don't fix it going on here.

Would they read change logs before upgrading, or just cross their fingers?



 (b) People routinely ignore cron job output. Even important job output.
 They won't see the messages, and won't act on them if they do until
 something actually breaks.


How common is this?


Re: [HACKERS] Redesigning checkpoint_segments

2013-06-06 Thread Harold Giménez
Hi,

On Wed, Jun 5, 2013 at 11:05 PM, Joshua D. Drake j...@commandprompt.comwrote:


 On 6/5/2013 10:54 PM, Peter Geoghegan wrote:

 On Wed, Jun 5, 2013 at 10:27 PM, Joshua D. Drake j...@commandprompt.com
 wrote:


 Instead of running out of disk space PANIC we should just write to an
 emergency location within PGDATA


This merely buys you some time, but with aggressive and sustained write
throughput you are left on the same spot. Practically speaking it's the
same situation as increasing the pg_xlog disk space.


 and log very loudly that the SA isn't paying attention. Perhaps if that
 area starts to get to an unhappy place we immediately bounce into read-only
 mode and log even more loudly that the SA should be fired. I would think
 read-only mode is safer and more polite than an PANIC crash.


I agree it is better than PANIC, but read-only mode is definitely also a
form of throttling; a much more abrupt and unfriendly one if I may add.

Regards,

-Harold


Re: [HACKERS] pg_upgrade improvements

2012-04-11 Thread Harold Giménez
On Wed, Apr 11, 2012 at 5:40 PM, Bruce Momjian br...@momjian.us wrote:

 On Wed, Apr 04, 2012 at 07:26:58PM -0700, Harold Giménez wrote:
  There could be incoming connections for a number of
  reasons: either the user or the user's applications are reestablishing
  connections, or something like collectd on the localhost attempts to
  connect during that small window.

 Well, we did address this in PG 9.2 by having pg_upgrade use a
 non-default port number when starting servers, 50432.  You can do that
 too in PG 9.1 by just specifying non-default port numbers when you run
 pg_upgrade.  We do start the server with a special --binary-upgrade
 mode, and we could do all sorts of connection limits in that mode, but
 having the port number be different seemed the clearest solution.


The non-default port number is a good solution, better than modifying
pg_hba.
Thanks for pointing that out!

-Harold


[HACKERS] pg_upgrade improvements

2012-04-04 Thread Harold Giménez
Hi all,

I've written a pg_upgrade wrapper for upgrading our users (heroku) to
postgres 9.1. In the process I encountered a specific issue that could
easily be improved. We've had this process work consistently for many users
both internal and external, with the exception of just a few for whom the
process fails and required manual handholding.

Before it performs the upgrade, the pg_upgrade program starts the old
cluster, does various checks, and then attempts to stop it. On occasion
stopping the cluster fails - I've posted command output on a gist [1].
Manually running the pg_upgrade shortly afterwards succeeds. We believe
stopping the cluster times out because there are other connections to the
cluster that are established in that small window. There could be incoming
connections for a number of reasons: either the user or the user's
applications are reestablishing connections, or something like collectd on
the localhost attempts to connect during that small window.

Possible workarounds on the current version:

* Add an iptables rule to temporarily reject connections from the outside.
This is not viable because in a multitenant environment a process may write
an iptables rule, and meanwhile another process may permanently save rules,
including the temporary one. We can defend against that, but it does add a
lot of complexity.
* Rewrite pg_hba.conf temporarily while the pg_upgrade script runs to
disallow any other connections.

A possible solution for pg_upgrade is for it to make pg_upgrade use the
--force flag when stopping the cluster to kick connections out. There is no
reason to be polite in this case. Another idea that was kicked around with
my colleagues was to start the cluster in single-user mode, or only allow
unix socket connections somewhere in /tmp. Anything that rejects other
connections would be helpful.

It would also be nice if the invocation of pg_ctl didn't pipe its output to
/dev/null. I'm sure it would contain information that would directly point
at the root cause and could've saved some debugging and hand waving time.

Finally, just a note that while we haven't performed a huge number of
upgrades yet, we have upgraded a few production systems and for the most
part it has worked great.

Regards,

-Harold

[1] https://gist.github.com/112c97378c490d8f70fc


[HACKERS] Reviewing patch URI connection string support for libpq

2012-02-23 Thread Harold Giménez
Hello hackers,

I've been a reader of this list for some time, but have never posted.

I have interest in the URI connection string support patch[1], so I'm in
the process of reviewing it. I have a couple of comments and questions:

1. I see no tests in the patch. I'd like to start getting together a set of
tests, likely based on the connection string permutations found on Greg
Smith's response[2]. However I don't find an obvious place to put them.
They could possibly live in the test/examples directory. Another thought is
to use dblink in a test, although it may be problematic to depend on a
contrib package for a test, to say the least. Any thoughts on how to test
this are most welcome.
2. The documentation/manual was not updated as part of this patch, so this
is pending.
3. I for one do prefer the `postgres` prefix, as opposed to `postgresql`
for the reasons stated on an earlier thread [3]. In my opinion the best way
to move forward is to support them both.

The good news is the patch still applies fine on the 9.2 HEAD, and seems to
work well.

Regards,

-Harold

[1] https://commitfest.postgresql.org/action/patch_view?id=720
[2]
http://archives.postgresql.org/message-id/4f45253c.4030...@2ndquadrant.com
[3] http://archives.postgresql.org/pgsql-hackers/2011-12/msg00499.php