Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Christopher Browne
On Sep 25, 2017 1:39 PM, "Joshua D. Drake"  wrote:

> On 09/25/2017 10:32 AM, Petr Jelinek wrote:
>
>> On 25/09/17 19:26, Tom Lane wrote:
>>
>>> Alvaro Hernandez  writes:
>>>
>>
>
>> There is already about 3 million output plugins out there so I think we
>> did reasonable job there. The fact that vast majority of that are
>> various json ones gives reasonable hint that we should have that one in
>> core though.
>>
>
> And I am sure that 2ndQuadrant would be happy to add it to their version
> of Postgres and maintain it themselves.
>
>
That's not the case to be thinking about...  Of course, 2Q can support some
extensions for their customers that use their binaries, that is not a
contraindication for having more in core.

I'd rather think based on questions like...  "Will that be supportable in
an ordinary Amazon instance?" Or "... In a Heroku instance?"

Those (Amazon and Heroku) are places that Slony won't run because it needs
a bit too much in the way of database superuser capabilities.

It's a very useful exercise to know which bits of this are easy to resolve
versus difficult versus irreconcilable.

Of course, we can't force everything into core, but given what IS in core,
it can look mighty dumb if...

a) we have neat and crucial features in core that are (validly!) trumpeted
in release notes, but

b) those features aren't usable without substantial out-of-core extensions
that many users cannot use.

Perhaps it's valid for logical replication to be considered out-of-scope
for generic Amazon/Heroku instances, but I'd prefer that to be the result
of a reasoned intent.


Re: [HACKERS] SAP Application deployment on PostgreSQL

2017-09-08 Thread Christopher Browne
On 8 September 2017 at 15:34, chiru r  wrote:
> We have multiple SAP applications running on Oracle as backend and looking
> for an opportunity to migrate from Oracle to PostgreSQL. Has anyone ever
> deployed SAP on PostgreSQL community edition?
>
> Is PostgreSQL community involved in any future road-map of SAP application
> deployment on PostgreSQL?
>
> Thanks
> chiru

This has been asked about on SAP's forum, and that's the most appropriate
place, in that their applications are very much database-specific.

https://archive.sap.com/discussions/thread/1941255

I imagine that it would be a "broadly interesting" idea to run R/3
against PostgreSQL,
but, as observed in the discussion thread, the "SAP kernel" is very much NOT
database-agnostic.  The work that would need to be done to do so would require
considerable work on the part of SAP AG, and I'd be somewhat surprised to see
them do it.

Recall that once upon a time, SAP AG acquired the sources for ADABAS-D,
renamed it SAP-DB, and, for a while, made it available as "open source."  At
the time, it appeared that this was some sort of corporate gamesmanship
relating to a vendor selling what one might call "Product O".

SAP AG presumably spent a fair bit of effort (and money) establishing that
port for some of their products.  (I imagine that a port to run R/3 on
PostgreSQL
might be easier/simpler than running it on SAP-DB, but that's just me
imagining...)  They subsequently drew the code back to be proprietary, and
have released several versions of MaxDB since.

I'd be curious as to reasons to expect that SAP AG would want to do a PostgreSQL
port.  (No doubt other ex-BASIS consultants would also be interested!)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-08-16 Thread Christopher Browne
On 16 August 2016 at 17:08, Piotr Stefaniak  wrote:
> On 2016-08-16 18:33, Robert Haas wrote:
>> It wouldn't be that much work to maintain, either: we'd
>> just set up some buildfarm members that compiled using C++ and when
>> they turned red, we'd go fix it.
>
> I think that there exist subtle differences between C and C++ that
> without compile-time diagnostic could potentially lead to different
> run-time behavior.

It seems to me that if we were really keen on attaching in another
"totally compiled" language, that C++ wouldn't seem like the best
choice.

As you say, it's subtly different, which seems a bit dangerous to me.

Further, it's not as if C++ is particularly newer than C.  C is about 45
years old; C++, at 33, hardly seems like a "spry young whippersnapper"
whose inclusion ought to lead to vast excitement.

The would-be "spry young things" that head to my mind are Rust and
Go.  I'm not sure it's terribly plausible to have parts of Postgres
written in both C and (Rust|Go); they're different enough that
I'm not sure what functionality would mix sensibly.  But I think
that would be more interesting, all the same.  Perhaps it would
work out well to be able to create background workers in Rust,
or to implement a stored procedure language in Go.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Christopher Browne
On 13 July 2016 at 15:06, Robert Haas  wrote:

> On Thu, Jul 7, 2016 at 9:25 PM, Simon Riggs  wrote:
> > I note also that replication slots aren't backed up by pg_dump; I see
> > analogy here and think that at least some parts of logical replication
> will
> > be similar and not require DDL at all, just as slots do not.
>
> I agree with that.  Of course, it's *impossible* to usefully back up a
> slot because the key ingredient in a slot is the LSN after which WAL
> should be preserved - and it's meaningless to preserve that across a
> dump and restore cycle.  But, for example, replication set definitions
> can be preserved across a dump and restore and I am quite sure users
> will find it very unfortunate if they aren't.
>

There should be some way of dumping and restoring these sorts of structures,
and if I were thinking of the name of a tool to dump them, it seems to me
that pg_dump is a pretty good name for it...  (Look for slonikconfdump.sh
for the latest iteration of the Slony variation, if interested...)

I have implemented "slony_dump" a couple of times; if that had become a
built-in, I sure hope a pg_dump flag could have been the thing to request
such.

The same seems likely true of FDW configuration; it sure would be nice to
be able to dump that in a consistent, reusable way.  Again, nice to have
that be an extension of pg_dump.

Replication configuration should be able to be dumped out in a form that
can be readily loaded somewhere else.  It might not be something to have
pg_dump do by default, but it should sure be somewhere; if it isn't, then
that's a reasonably serious shortcoming.  Slony didn't have such until
2009; a serious implementation of Logical Replication shouldn't wait
that long.

If what gets spit out is a series of
  select replicate_these_relations('['public']');
requests, well, I can actually live with that.

In the long run, it's preferable to have
  ALTER SCHEMA PUBLIC ENABLE REPLICATION;

but if the desired syntax isn't clear, at the start, we can surely live with
having functions, initially, as long as:
a) We know that's not intended as being the ultimate solution;
b) There's *some* sort of upgrade path that is helpful to indicate the
syntax that falls out;
c) There's tooling to dump out cluster information, whatever the syntax
form.

I'm getting quoted on being OK with not having syntax, initially...
I'm still fine with that, but take the above caveats to see my intent.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Christopher Browne
On 11 May 2016 at 12:58, Josh berkus  wrote:

> Together with that, automated substitution of materialized views for
> query clauses.
>
> Also: optimizing for new hardware, like persistent memory.


I recently saw some material in ACM SIGOPS on tuning filesystems to play
better with some of the new sorts of storage

An interesting such article was thus... <
http://dl.acm.org/citation.cfm?id=2819002>  The idea of it was to research
better ways of doing hash table updates with PCM (Phase Change Memory)
which apparently may be up-and-coming but with fairly different write
characteristics than we're used to.  You essentially write a fairly large
page at a time, and can only do limited numbers of updates to any given
page.
That encourages things like log-structured filesystems, but with further
efforts to reduce there being "hot spots."

The paper was focused on hash tables; if the hardware turns out to be
important, it'll also be important to have better variations on B-trees.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Template for commit messages

2016-01-30 Thread Christopher Browne
On 30 January 2016 at 05:11, Robert Haas  wrote:
>
> Well, this gets at one of the problems here, which is that you can't
> fix a commit message once the commit has been pushed.  So even if we
> all agreed in principle to a standard format, it's not clear that you
> could enforce compliance with that format to a degree sufficient to
> make machine-parseability a reality.

Yep, there's the rub in it.

Commit messages are authoritative for the things that ARE found in
commit messages.

If making them authoritative for a whole bunch of things means it is
necessary to force everyone to run some piece of commit-message-
monitoring code against their own repo, and any failure to run the
message monitoring code causes the project to fail to have
authoritative information, then it should be clear that we've
constructed something a wee bit too fragile.

A thing plausible instead is to collect the authoritative-at-commit-time
message information into an external system (hey, I wonder if anyone
has some sort of structured data store implemented that could be
good at this!), and then have a way to feed additional information into
that system that would become the authoritative source for any
requirements for the extra metadata.

That, of course, smells like a database that draws metadata from git,
and augments with further streams of inputs.  There is certainly
something problematic about assuming that we can *always* get
supplementary data.  Begs the question of how we shame people
into going back and filling the blanks we wanted filled.

It seems foolish to me to imagine that we can ensure that the
data *always* arrives at commit time; any laziness there represents
a permanent "data fault"; making it asynchronous shifts the problem
to a different spot.  I suspect we can only approach success on it,
and get *partial* metadata, at best.  If it's enough better than nothing,
then maybe that's good enough.  And I'll bet that the Commitfest
database already contains a lot of the data desired to fill blanks...

Further, if the point is to encourage reviews by making sure credit
(and hence data to support GIVING credit) is given, then it is not
inapropos for those keen on receiving credit to be responsible for
marking off the patches they know they contributed to.  That's
less fragile than expecting all credit to be attached by the
committer at commit time.

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] psql - -dry-run option

2015-12-17 Thread Christopher Browne
On 17 December 2015 at 14:16, Pavel Stehule  wrote:
> or different idea - just enforce syntax check without execution.

That seems pretty cool...  I'd find "syntax check without execution" to be
pretty useful to test SQL (and especially DDL).

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Christopher Browne
On 30 October 2015 at 09:26, Robert Haas  wrote:
>
> On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut  wrote:
> > On 10/28/15 4:18 AM, Victor Wagner wrote:
> >> On Mon, 26 Oct 2015 16:25:57 -0400
> >> Peter Eisentraut  wrote:
> >>
> >>> Also, this assumes that all the components other than host and port
> >>> are the same.  Earlier there was a discussion about why the ports
> >>> would ever need to be different.  Well, why can't the database names
> >>> be different? I could have use for that.
> >>
> >> Because of way postgresql replication is implemented.
> >
> > There are multiple types of PostgreSQL replication, and there will be
> > others in the future.
>
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?  I think host and IP would hit 98% of the use cases
> here.

I think it makes the feature WORSE.  I am getting more and more convinced
that the Correct Solution is for this feature to be handled by submitting
multiple URIs, and my argument isn't even based on any aspects of
implementation complexity.

Take as example the case where I have two database servers I want to
be considered.

a) Database with URI
   postgresql://cbbro...@server-a.example.info:5432/my-first-database

b) Database with URL
   postgresql://rob...@server-b.example.info:7032/my-second-database

With all the "per-variable multiplicities", this would turn into a
combinatorial explosion of combinations, some 2^4, or 16 possible servers,
some of which likely don't exist, and others of which aren't proper (e.g. -
trying to connect to database a) using robert's credentials).

Possibly some of those combinations are outright improper.

I'm not going to claim it's terribly wise to be doing the cross-credential
bit; I'd think it likely to be rather dumb for the application to use one
user when connecting to one database and another when connecting to
another.  But the notion of it being nondeterministic is just awful.

I head back to... "the way OpenLDAP does this"...
  They let you specify multiple LDAP servers...
ldap://server1.example.info:389 ldap://server2.example.info:389
where URIs are separated by whitespace.

Seems like Best Goodness for Postgres to do something analogous...
   postgresql://cbbro...@server-a.example.info:5432/my-first-database
postgresql://rob...@server-b.example.info:7032/my-second-database

Is it a bit long?  Sure.  But it is unambiguous, and requires *only*
whitespace parsing to separate the URIs.  I'd think it fine for Postgres to
support this via multiple "-d" options; that would be about as good.

That can cover 100% of cases, and I don't see it needing to interact
specially with odd bits such as "was that a replication slot?"  You can
always choose to shoot yourself in the foot, but this doesn't spin the
roulette for you...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] quieting DEBUG3

2015-10-28 Thread Christopher Browne
On 27 October 2015 at 20:51, Tom Lane  wrote:
>
> Craig Ringer  writes:
> > I think it'd be helpful to define some level of policy about what the
> > debug levels are intended for, so there's some guidance on what level
> > to emit messages on rather than playing "pick a number".
>
> +1 ... I doubt anyone has ever looked at that in a holistic way.
>
> regards, tom lane

A few years ago, I went through Slony's logging with a view to this very
thing.

It was pretty fruitful in changing log levels for a lot of things, and
actually
led to more things being logged 'always', as I noticed useful places to do
CONFIG level logging in the process.

But we were able to get things to the point where INFO logging provided
enough output most of the time.

Looking through the Postgres code base to this end seems likely to be
pretty useful, and I agree, it would need to be done holisticly.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-26 Thread Christopher Browne
On 26 October 2015 at 16:25, Peter Eisentraut  wrote:

> On 10/14/15 6:41 AM, Victor Wagner wrote:
> > 1. It is allowed to specify several hosts in the connect string, either
> > in URL-style (separated by comma) or in param=value form (several host
> > parameters).
>
> I'm not fond of having URLs that are not valid URLs according to the
> applicable standards.  Because then they can't be parsed or composed by
> standard libraries.
>
> Also, this assumes that all the components other than host and port are
> the same.  Earlier there was a discussion about why the ports would ever
> need to be different.  Well, why can't the database names be different?
>  I could have use for that.
>
> I think you should just accept multiple URLs.
>

I'd give a "+1" on this...

As an area of new behaviour, I don't see a big problem with declining to
support every wee bit of libpq configuration, and instead requiring the
use of URLs.

Trying to put "multiplicities" into each parameter (and then considering
it at the pg_service level, too) is WAY more complicated, and for a
feature where it seems to me that it is pretty reasonable to have a
series of fully qualified URLs.

Specifying several URLs should be easier to understand, easier to
test, easier to code, and easier to keep from blowing up badly.

I'll observe that this is the way that OpenLDAP supports specifying
multiple clients, so this technique is familiar in other contexts
where people are trying to accomplish the same kind of thing.
Sample docs, perhaps not authoritative, but useful enough...

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Can extension build own SGML document?

2015-10-14 Thread Christopher Browne
On 14 October 2015 at 13:04, Jim Nasby  wrote:

> On 9/15/15 10:13 AM, Tom Lane wrote:
>
>> Jim Nasby  writes:
>>
>>> On 9/15/15 8:43 AM, Tom Lane wrote:
>>>
 AFAICT from a quick look at its documentation, asciidoc can produce
 either html or docbook output; so as soon as you want something other
 than html output (in particular, PDF), you're back to relying on the
 exact same creaky docbook toolchain we use now.  Only with one extra
 dependency in front of it.

>>>
>> a2x (http://www.methods.co.nz/asciidoc/a2x.1.html) states that it can
>>> generate "PDF, EPUB, DVI, PS, LaTeX, XHTML (single page or chunked), man
>>> page, HTML Help or plain text formats using asciidoc(1) and other
>>> applications (see REQUISITES section). SOURCE_FILE can also be a DocBook
>>> file with an .xml extension."
>>>
>>
>> AFAICS, for all cases other than HTML output, the "other applications"
>> are basically the docbook toolchain.
>>
>
> I just started looking at , which seems to be the newer way to handle
> asciidoc. Aside from being a lot faster than a2x/asciidoc, it can produce
> docbook natively. However...
>
> What I expect would be a lot more effort is actually converting all the
>>> SGML to asciidoc. A quick google search doesn't turn up anything
>>> promising.
>>>
>>
>> Yeah, the cost of conversion means we're not likely to want to experiment
>> to see what's better :-(.
>>
>
> If the only concern is handling docbook format (which is what our SGML
> docs produce? Then https://github.com/asciidoctor/asciidoctor-fopub might
> be an option. It's intended for use with asciidoctor, but the README does
> state:
>
> "Using the asciidoctor-fopub project, you can convert any DocBook file
> into a nicely formatted PDF with nothing more than a Java runtime (JVM) and
> development kit (JDK). All the open source software required to perform the
> conversion is automatically fetched from the internet the first time you
> run it."
>
> So maybe it would allow removing some of more problematic parts of the
> toolchain?
>
> Also, if our SGML does produce docbook as an intermediate it might be
> possible to convert that to asciidoc via
> https://github.com/oreillymedia/docbook2asciidoc.


There's a misconception there...

Our SGML *is* DocBook.  Natively, no translation needed.

DocBook is a document type, and our documentation is already written using
that document type (DOCTYPE).  Easily seen thus:

postgres@cbbrowne ~/p/d/s/sgml> grep DOCTYPE
postgres.sgml
master!?
 asciidoc ---> DocBook
transition.

The trouble that we have is that what we have isn't a "DocBook file", but
rather a fairly large set of files representing a DocBook document.

I'm not sure what improvement we'd get out of using asciidoctor-fopub.

There would be some merit to some remapping to transform "creaky old
DocBook 4.2" (what we're using) to a newer version, perhaps biased towards
XML, and have our toolset merge the bits into a big XML (in DocBook 5,
presumably) file for processing using more modern DocBook tools.

I could probably build some DSSSL as helper (my HTML-to-DocBook DSSSL was
highly incomplete, and nonetheless surprisingly widely referenced for
years...), but we'd best be clear on what we think we're getting as
improvement.  Switching to a less expressive format is unlikely to be a
win, however creaky the current DocBook/DSSSL tools are.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Database schema diff

2015-10-13 Thread Christopher Browne
On 13 October 2015 at 11:48, Michal Novotny 
wrote:

> Hi guys,
>
> I would like to ask you whether is there any tool to be able to compare
> database schemas ideally no matter what the column order is or to dump
> database table with ascending order of all database columns.
>
> For example, if I have table (called table) in schema A and in schema B
> (the time difference between is 1 week) and I would like to verify the
> column names/types matches but the order is different, i.e.:
>
> Schema A (2015-10-01) |  Schema B (2015-10-07)
>   |
> id int|  id int
> name varchar(64)  |  name varchar(64)
> text text |  description text
> description text  |  text text
>
> Is there any tool to compare and (even in case above) return that both
> tables match? Something like pgdiff or something?
>
> This should work for all schemas, tables, functions, triggers and all
> the schema components?
>
> Also, is there any tool to accept 2 PgSQL dump files (source for
> pg_restore) and compare the schemas of both in the way above?
>
> Thanks a lot!
> Michal


I built a tool I call "pgcmp", which is out on GitHub <
https://github.com/cbbrowne/pgcmp>

The one thing that you mention that it *doesn't* consider is the ordering
of columns.

It would not be difficult at all to add that comparison; as simple as adding
an extra capture of table columns and column #'s.  I'd be happy to consider
adding that in.

Note that pgcmp expects the database to be captured as databases; it pulls
data
from information_schema and such.  In order to run it against a pair of
dumps,
you'd need to load those dumps into databases, first.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Christopher Browne
On 30 September 2015 at 14:31, Joshua D. Drake <j...@commandprompt.com> wrote:
>
> On 09/30/2015 11:23 AM, Christopher Browne wrote:
>
>> It's well and nice to think that an issue tracker resolves all of this,
>> and, if we
>> had tiny numbers of issues, we could doubtless construct a repository
>> indicating so.  (Seems to me that the bit of "fan service" for GitHub's
>> bug tracker fits into that perspective on things...)
>
>
> CMD has over a 1000 customers. All of those that are active have a
Redmine tracker. Our current ticket count is over 70k. Without it, we would
never be able to service them correctly.
>
> What you describe is not a tool problem, it is a people problem. That
exists regardless of the tool. The tool is designed to (in theory) make the
people problem, less.
>
> In CMDs considerable experience while not only developing, consulting,
writing docs, maintain repos, and confidential information... it would be
impossible to achieve without Redmine at our core.
>
> JD
>
> (I am sure other tools provide the same level of service, it is just what
we use)

There's nothing there for me to disagree with, which presumably means that
we're somehow talking past each other.

And it's not just "presumably"; there's a clear place for there to be a
disconnect.

It's perfectly reasonable that CMD would (and presumably does) make the
proper curation of Redmine data an informal condition of employment.
People want to stay working at CMD?  They have to keep their issue tracking
data in good form.  At best, they'll experience the wrath of The Drake, and
I'll leave the worst to you!  :-)

That curation effort is entirely more challenging to impose for a project
like PostgreSQL.  If I decline to fill in the RT information (assuming RT
were chosen), there's no basis for someone "firing" me from the PostgreSQL
project.

I'd be entirely surprised to find a "curation-free" system; I've worked
with RT and Bugzilla, and both require some periodic efforts to go back and
make sure issues are kept in good order (for which "curation" is a very
good word).  It's pretty thankless work, which is why open source projects
that use such systems wind up with pretty messy sets of data.

It's not totally a tool problem, but once you've chosen a tool, there's
some concommittant people problems that fall out of the entropy of the
resulting system.  And I think it was reasonable for Merlin to question the
notion of the tool solving the problem.  For a tool to help requires
commitment to curation efforts.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Christopher Browne
On 30 September 2015 at 12:26, Joshua D. Drake  wrote:
>
> On 09/30/2015 07:44 AM, Merlin Moncure wrote:
>
>> I'm not trolling in any way.  I'm just challenging you to back up your
>> blanket assertions with evidence.  For example, you're assertion that
>> mailing lists are insufficient is simply stated and expected to be
>> taken on faith: *How* is it insufficient and *what* do things like in
>> the new world?
>
>
> I am short on time today but I will take this specific one:
>
> Mailing lists are great for discourse however they do not:
>
> 1. Provide easy access to archived information
> Searching google isn't an answer it is a band-aid
> 2. Provide proper access to valid information
> Ever get an answer, check the link, find out the solution
references a 5 year old version of PostgreSQL and then find out the problem
is fixed in the 9.4 but not 9.3. You are running 9.3.
> (an issue tracker could track this, easily)
> 3. Provide properly linked information across threads
> My favourite is this:
> SUBJECT: Help (was no longer wanting help)
> Now nothing makes sense on the thread. It should be a new issue.
> 4. Using a recent submission as an example:
> j...@idealist.org just submitted 6 patches. They are all based
around making basebackups more useful (specifically pg_basebackup). This is
awesome, but he has created 6 different threads with different discussions
which will likely cause intercommunication between threads.
>
> Using an issue tracker the first patch would be a parent issue
and the subsequent patches would be child issues (that whole dependency
thing). A single click would provide all the information required to
correctly determine what is going on with the series of interrelated
patches. A mailing list does not provide that.
>
> I could go on for a long time with specific examples that our current
model does not serve.

It's well and nice to think that an issue tracker resolves all of this,
and, if we
had tiny numbers of issues, we could doubtless construct a repository
indicating so.  (Seems to me that the bit of "fan service" for GitHub's
bug tracker fits into that perspective on things...)

However, after having seen an RT system with tens of thousands of
tickets, it seems wishful thinking to me to imagine that simply adopting
an issue tracking system does much of anything to resolve these things.

It does not go without rather a lot more more than "mere assertion" that an
issue tracker directly improves those cases.

To the contrary, from what I have seen, if there's not rather a lot of
curation
work continually done on an issue tracking system, you *don't* get any of
those things.

I found with RT that if people were at all sloppy in how problems were
reported/reported on, that you get none of #1, #2, or #3.

It may very well be *worse* than that; it seems quite likely to me that if
an issue tracker is not being continually curated by substantially ALL of
its users, then you don't get any of those things.  That *is* a lot more
pessimistic, and considerably likely, as it's pretty certain that members
of our email-loving community will decline to get involved in curating
data in some web app.

It seems likely to me that there's some value in trying out debbugs,
as it may provide some useful improvements, however imperfect.

Going to something "way better", particularly if it requires widely
distributed curation efforts, won't be better; it'll probably be a waste
of efforts.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-22 Thread Christopher Browne
On 22 September 2015 at 15:11, Robert Haas  wrote:
>
> On Tue, Sep 22, 2015 at 2:33 PM, Andrew Dunstan 
wrote:
> > Good point. In the 18th and 19th centuries it was deemed by some
grammarians
> > to be incorrect for some reason, (and yet Thackeray still used it in
Vanity
> > Fair, for instance) and now some reactionaries and misogynists are
fighting
> > to maintain that somewhat latter day rule. But I'm pretty certain their
> > numbers will dwindle, as they preach to an ever shrinking choir.
>
> I agree that this construction is grammatically acceptable in many if
> not all cases, but I still think that phrasing the sentences to avoid
> this construction is a good idea where we can do it easily.  For
> example, this is clearly a good idea:
>
>  So the database administrator can
>  decide which languages are available in which databases and can make
> -some languages available by default if he chooses.
> +some languages available by default if desired.
>
> And so is this, which just gets rid of a sentence that really isn't
needed:
>
>  Possibly, your site administrator has already created a database
> -for your use.  He should have told you what the name of your
> -database is.  In that case you can omit this step and skip ahead
> +for your use.  In that case you can omit this step and skip ahead
>  to the next section.
>
> But consider this one:
>
> -return any user name he chooses. This authentication method is
> +return any user name they choose. This authentication method is
>
> You could say "any arbitrary user name" or "any username whatsoever".

Those all seem like they might improve the combination of
specificity and non-specificity.

Almost certainly no one intended to indicate that the administrator was
specifically male or female, and removing an indicator diminishes some
potential for confusion.

I'll throw in, for good measure, that "users" are not necessarily even
*human*; it is common for users to get attached to applications, and
the applications (presumably!) haven't got any gender.

I could visit French for a moment, where all nouns are expected to
have gender.  (Where "la" indicates a "female she-ness", and "le"
indicates "masculinity.")

"La chaise est féminin, comme la table, alors que le sol est masculin."

The chair is feminine, as the table, however the floor is masculine.

And the explanation of the gendering of third person pronouns
(ils versus elles) always seemed very strange to me.

I think that using "he or she" (as has been suggested) heads down a
questionable path, as that's demanding (in somewhat the French
fashion!) a defined set of gender indicates.  That would properly head,
in a baroque "PC" context, to sillyness like...

"The user should do [something]; he or she or it, or the cis/trans/asexual
personage or connection used by a web application, whether written in
C, Java, Perl, PHP, running on Windows or Unix or ..." as the increasing
inclusions heads to some pathological limit.

> Or here:
>
>or within a session via the SET command.  Any user is
> -  allowed to change his session-local value.  Changes in
> +  allowed to change their session-local value.  Changes in
>
> You could say "This requires no special privileges".  This isn't
> really an exact rewrite of the sentence, but in context it means the
> same thing.

Notice that the changes you are suggesting tend to actually *shorten* the
text!  I like that.

> Or here:
>
> --- Who works for us when she must pay for it?
> +-- Who works for us when they must pay for it?
>
> You could say "-- We pay employees; they don't pay us."
>
> I don't think any of these changes are outright wrong except for
> "might not be the same as the database user that is to be connect as",
> which seems like a muddle.  But I think some of them could be changed
> to use other wording that would read more smoothly.
>
> Of course, that is just my opinion, and I clearly feel a lot less
> strongly about this than some other people.

I'd be pleased to see (perhaps even help) patches to the documentation
that make it read better and perhaps "more kindly."

Doing a "let's run through and substitute some legalistic wording in
order to be politically correct" will irritate people; instead, make the
documentation *better*.  Replacing "he" with "he/she/cis/trans/Unix/Windows"
(or some such) wouldn't make it *better*.  (And probably that's a phrase
that's missing some legalistic wherefores whereases!)

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-09-03 Thread Christopher Browne
On 3 September 2015 at 12:57, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote
>
> On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas  wrote:
>>
>>
>> Maybe someday we should have all that, but I think for right now
>> that's complicating things unnecessarily.  I think the best proposal
>> so far is to allow the host=X option to be repeated multiple times.
>> If you repeat the host=X option N times, you can also repeat the
>> port=X option exactly N times, or else you can specify it just once.
>> Done.
>
>
> But this already breaks backwards-compatibility with any clients who
belief that whatever value specified the latest takes precedence.  I'm not
arguing that there are such use cases in the wild or that it's entirely
sane thing to do, but still.
>
> More importantly, this will break any code that tries to parse the
conninfo string and produce a hashmap from it for modification.

The notion of an "ordered hashmap" makes me break out in hives...

>> Alternatively, leave the host=X option alone and add a new option
>> hostlist=X, allowing a comma-separated list of names or IPs, with each
>> hostname or IP allowed an optional :port suffix.  If host=X parameter
>> is omitted or the connection to that machine fails, try everybody in
>> the hostlist concurrently, or with some configurable (and presumably
>> short) delay between one and then next.  Again, done.
>
>
> The exact behavior in case of both host/port and hostlist are specified
becomes really tricky then.  It's already tricky enough, if you recall the
service files -- how are they going to come into play here?
>
> I believe the less there are implicit workings in the way libpq connects,
the better.

In that case, let's have a New Option, and expressly break with the
implicit bits.

The new option ONLY accepts URIs, but allows it to be submitted multiple
times.

psql --uri postgresql://postgres@favehost:5432/some_db_name --uri
postgresql://postgres@favehost:5432/another_db_name --uri
postgresql://postgres@favehost:5432/third_db_name --uri
postgresql://postgres@favehost:5432/fourth_backup_db

Parsing conninfo strings is no fun.  I'm finding I prefer using URIs.  They
may even be easier to parse, not that I have thus far cared; using URIs
tends to mean I don't *need* to parse anything.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] WIP: About CMake v2

2015-08-28 Thread Christopher Browne
It's broadly interesting, but since it bakes in a build dependency on
CMake, there is some risk that the dependencies become an insurmountable
problem.

(Does CMake run on a VAX 11/780??  :-))

It is probably worth a try, to see what improvements arise, albeit with the
need to accept some risk of refusal of the change.

The experiment is most likely necessary: we won't know the benefits without
trying.

If the results represent little improvement, there will be little or no
appetite to jump through the dependency hoops needed to get the change
accepted.

On the other hand, if there are big gains, that encourages pushing thru the
dependency issues.
On Aug 28, 2015 10:45, YUriy Zhuravlev u.zhurav...@postgrespro.ru wrote:

 Hello Hackers

 How would you react if I provided a patch which introduces a CMake build
 system?

 Old thread:
 http://www.postgresql.org/message-id/200812291325.13354.pete...@gmx.net

 The main argument against the it's too hard. I'm right?

 Thanks!
 --
 YUriy Zhuravlev
 Postgres Professional: http://www.postgrespro.com
 The Russian Postgres Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Fix token exceeding NAMELEN

2015-05-13 Thread Christopher Browne
On 13 May 2015 at 17:55, Tom Lane t...@sss.pgh.pa.us wrote:

 Aaron W. Swenson titanof...@gentoo.org writes:
  Trying to build HEAD and ran into this issue building the docs:
  openjade:logicaldecoding.sgml:575:62:Q: length of name token must
  not exceed NAMELEN (44)
  openjade:replication-origins.sgml:87:67:Q: length of name token must
  not exceed NAMELEN (44)

 Hmm ... that's odd.  I don't see any such failure here, and the buildfarm
 members that build the docs aren't complaining either.  What version of
 openjade are you using exactly?

  So, I've attached a patch that'll fix it.

 I have no particular objection to the patch as stated, but I'm just
 wondering if this is the tip of a tool compatibility iceberg we were
 not previously aware of.


I recall us hitting this with Slony documentation.  The NAMELEN limit
lay in the SGML/DocBook configuration that was configured at the
distribution level, so that it differed (crucially) betwen Debian and
Red Hat.

Red Hat used to have a lower name length limit, and while overriding
it was technically possible, it required modifying configuration that
the distribution thought was owned by one of the SGML packages,
and hence the modification seemed pretty inadvisable.

I thought that this restriction was alleviated years ago, so I'm a bit
surprised to see this come up in 2015.  (Or perhaps Gentoo hasn't
yet opened up some limits???  :-) )
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Add CREATE support to event triggers

2014-11-10 Thread Christopher Browne
On 8 November 2014 17:49, Robert Haas robertmh...@gmail.com wrote:
  We could just integrate those parts, and be done with it. But would that
  actually be a good thing for the community? Then slony needs to do it
  and potentially others as well? Then auditing can't use it? Then
  potential schema tracking solutions can't use it?

 Do you think Slony is really going to use this?  I guess we can let
 the Slony guys speak for themselves, but I've been skeptical since day
 one that this is the best way to do DDL replication, and I still am.
 There are lots of ways that a replicated DDL statement can fail on the
 replicas, and what are you going to do then?  It's too late to show
 the user the error message, so you can throw it in a log someplace and
 hope that somebody notices, but that's it.  It makes a lot more sense
 to me to use some kind of a tool that applies the DDL in a coordinated
 fashion on all nodes - or even just do it manually, since it might
 very well be desirable to take the lock on different nodes at widely
 different times, separated by a switchover.  I certainly think there's
 a use-case for what you're trying to do here, but I don't think it'll
 be right for everyone.

 Certainly, if the Slony guys - or some other team building an
 out-of-core replication solutions says, hey, we really want this in
 core, that would considerably strengthen the argument for putting it
 there.  But I haven't heard anyone say that yet - unlike logical
 decoding, were we did have other people expressing clear interest in
 using it.

  There've been people for a long while asking about triggers on catalogs
  for that purpose. IIRC Jan was one of them.

 My impression, based on something Christopher Brown said a few years
 ago, is that Slony's DDL trigger needs are largely satisfied by the
 existing event trigger stuff.  It would be helpful to get confirmation
 as to whether that's the case.

I'm not sure that a replication system that intends to do partial
replication
(e.g. - being selective of what objects are to be replicated) will
necessarily
want to use the CREATE event triggers to capture creates.

Several cases pop up with different answers:
a) I certainly don't want to replicate temporary tables
b) I almost certainly don't want to replicate unlogged tables
c) For more ordinary tables, I'm not sure I want to extend Slony
to detect them and add them automatically, because there
are annoying sub-cases

   c.1) If I'm working on data conversion, I may create not totally
 temporary tables that are nonetheless not worthy to replicate.
 (I'm working on such right now)

Long and short: it seems likely that I'd frequently NOT want all new tables
added to replication, at least not all of them, all the time.

What would seem valuable, to me, would be to have a CREATE event
trigger that lets me know the OID and/or fully qualified name of the new
object so that perhaps the replication system:

a) Has some kind of rule system to detect if it wants to replicate it,

b) Logs the change so a human might know later that there's new stuff
that probably ought to be replicated

c) Perhaps a human might put replication into a new suggestive
mode, a bit akin to Slony's EXECUTE SCRIPT, but where the human
essentially says, Here, I'm running DDL against this connection for a
while, and I'd be grateful if Postgres told Slony to capture all the new
tables and sequences and replicated them.

There are kind of two approaches:

a) Just capture the OIDs, and have replication go back later and grab
the table definition once the dust clears on the master

b) We need to capture ALL the DDL, whether CREATE or ALTER, and
forward it, altered to have fully qualified names on everything so that
we don't need to duplicate all the set search_path requests and
such.

I suppose there's also a third...

c) Have a capability to put an event trigger function in place that makes
DDL requests fail.

That's more useful than you'd think; if, by default, we make them fail,
and with an error messages such as
  DDL request failed as it was not submitted using slonik DDL TOOL

then we have protection against uncontrolled application of DDL.

DDL TOOL would switch off the fail trigger, possibly trying to
capture the DDL, or perhaps just capturing the statements passed
to it so they get passed everywhere.   (That heads back to a) and b);
what should get captured...)

I'm not sure that all of that is totally internally coherent, but I hope
there
are some ideas worth thinking about.
--
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Christopher Browne
On Tue, Sep 2, 2014 at 7:08 PM, Jan Wieck j...@wi3ck.info wrote:

 On 09/02/2014 06:56 PM, Andrew Dunstan wrote:

 People are free to do what they want, but to my mind that would be a
 massive waste of resources, and probably imposing a substantial extra
 maintenance burden on the core committers.


 I hear you and agree to some degree.

 But at the same time I remember that one of the strengths of Postgres used
 to be to be able to incorporate new ideas.

 This seems to be one of those cases.

 Instead of fork plpgsql2, what about designing a completely new
 PL/postgres from scratch? It will only take 3-10 years, but I bet it will
 be worth it after all. And I mean that. No sarcasm.


Indeed.  That seems like a better approach to me.

The notion of hacking features onto plpgsql2 that mostly seem like SQL
enhancements is a waste of time.

I see a few places that look like they might be worth thinking about as
part of a new thing; it's important that they not merely be minor tweaks
of the old thing otherwise we might as well just improve pl/pgsql.
 (That's no bad idea, to be sure!)

1.  A classic place where I have leaped to other languages has been to do
sophisticated string smashing.  Often pl/perl, but when that proved
troublesome to run on AIX, I was perfectly happy to port to pl/tcl.  My
idle waggish thought over the years has been to consider pl/snobol or
pl/icon.  It's conceivable that some improvement to the Postgres built-in
string functions might make this unnecessary.  That's worthy of
exploration.  (And implementing pl/snobol is probably not!  :-))

2.  People have been pointing out that EXECUTE isn't all that wonderful.  I
think it is no coincidence that in all the languages that have an EVAL, it
is considered poor form to make unnecessary use of it.  (True whether we're
talking about Common Lisp or Perl!)  The discussion of using FORMAT() to
generate execute fodder fits with this.  It is again quite plausible that
having some improved functions might make it unnecessary to go all the way
to the extent of implementing a new language.

3.  Is there anything to be learned from Tutorial D?  That is, Date 
Darwen's would-be alternative to SQL of their Third Manifesto?

4.  All the existing stored function languages run inside a transaction
context.  I'd sure like it if I could have a platform, running on the
database, which could invoke and control transactions.  A layer that can
open connections, manage transactions, and such ought to be mighty useful.
 That CANNOT be implemented as one of the existing pl/whatevers.

To justify a new language requires having some requirements that cannot be
fulfilled by cleaning up an existing thing.
--
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Christopher Browne
Last year, I built a pl/pgsql generator of version 1-ish UUIDs, which
would combine timestamps with local information to construct data that kind
of emulated the timestamp+MAC address that is version #1 of UUID.

Note that there are several versions of UUIDs:

1.  Combines MAC address, timestamp, random #
2.  DCE Security (replaces some bits with user's UID/GID and others with
POSIX Domain); I don't think this one is much used...
3.  MD5 Hash
4.  Purely Random
5.  SHA-1 Hash

There are merits to each.  The tough one is #1, as that requires pulling
data that can't generally be accessed portably.

I figured out (and could probably donate some code) how to construct the
bits of #1 using the inputs of *my* choice (e.g. - I set up to make up my
own MAC address surrogate, and transformed PostgreSQL timestamp values into
the timestamp, and threw in my own bit of randomness), which provided
well-formed UUIDs with nice enough characteristics.

It wouldn't be out there to do a somewhat PostgreSQL-flavoured version of
this that wouldn't actually use MAC addresses, but rather, would use data
we have:

a) Having a sequence feeding some local uniqueness would fit with the
clock seq bits (e.g. - the octets in RFC 4122 entitled
clock-seq-and-reserved and clock-seq-low)
b) NOW() provides data for time-low, time-mid, time-high-and-version
c) We'd need 6 hex octets for node; I seem to recall there being
something established by initdb that might be usable.

The only piece that's directly troublesome, for UUID Type 1, is the node
value.  I'll observe that it isn't unusual for UUID implementations to
generate random values for that.

Note that for the other UUID versions, there's NO non-portable data needed.

It seems to me that a UUIDserial type, which combined:
  a) A sequence, to be the 'clock';
  b) Possibly another sequence to store local node ID, which might get
seeded from DB internals
would provide a PostgreSQL-flavoured version of UUID Type 1.


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Christopher Browne
On Thu, Apr 24, 2014 at 8:43 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Of course, the weak spot in this analysis is the assumption that there
  are actually 122 independent bits in the value.  It's not difficult to
  imagine that systems with crummy random() implementations might only have
  something like 32 bits worth of real randomness.

 Obviously you can't use random(). That's why I talked about
 cryptographic PRNGs, crypto libraries do proper seeding and generate
 reliably random numbers all the time.


... And we can't be certain that there won't be some internal
characteristic weakness.

Cryptography is *hard*; treating it as certainty that things will be gotten
correct
tends to be a foolish assumption.

Which is why UUID type 1 resolves this by combining multiple sorts of
anti-correlations, the combination of:
a) Node-specific information (e.g. - in the standard form, parts of the MAC
address), so no collisions between node A and node B.
b) Timestamp, so that things that happen at different times will be kept
unique.
c) An extra sequence, so that if there are multiple events on the same node
at the same time, they *still* don't collide.

I trust the combination to work pretty well, and that's why it was designed
that way.

A RNG, however good, can't provide the same guarantees of lack of conflicts.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Patch: iff - if

2014-04-15 Thread Christopher Browne
On Tue, Apr 15, 2014 at 6:19 PM, Andreas 'ads' Scherbaum 
adsm...@wars-nicht.de wrote:


 Hi,

 stumbled over a number of iff in the source where if is meant - not
 sure what the real story behind this is, but attached is a patch to fix the
 about 80 occurrences.


IFF is a common idiom in mathematics (and logic, which means it will be
of interest in computer science, which shares boundaries with both) which
stands for If And Only If.

Unless it has very recently fallen out of fashion in mathematics, it's
almost certainly intentional to use iff in the comments, and not merely a
typo.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Storing the password in .pgpass file in an encrypted format

2014-02-21 Thread Christopher Browne
On Fri, Feb 21, 2014 at 7:49 AM, firoz e v firoz...@huawei.com wrote:

  Hi,



 Is there a way to store the password in .pgpass file in an encrypted
 format (for example, to be used by pg_dump).



 Even though, there are ways to set the permissions on .pgpass, to disallow
 any access to world or group, the security rules of many organizations
 disallow to hold any kind of passwords, as plain text.



 If there is no existing way to do this, shall we take up this, as a patch?


As observed by others, storing the password in encrypted form in .pgpass
merely means that you need to store the password to decrypt .pgpass in
still another file that would, again, run afoul of such security policies.
There is no appetite in the community to do implementation work that is
provably useless as it cannot accomplish what people imagine to accomplish.

The thing you could do instead that would *look* like it is encrypted is to
use a certificate (e.g. - SSL).  The certificate that you'd need to put on
the client still needs to be in something that is effectively plain text
(however much it looks like nonsensical encrypted text).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Extra functionality to createuser

2013-12-12 Thread Christopher Browne
On Wed, Dec 11, 2013 at 7:53 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 10, 2013 at 9:55 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Dec 10, 2013 at 12:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Dec 7, 2013 at 11:39 PM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Fri, Dec 6, 2013 at 10:31 AM, Peter Eisentraut pete...@gmx.net wrote:

 How about only one role name per -g option, but allowing the -g option
 to be repeated?

I think that might simplify the problem and patch, but do you think
 it is okay to have inconsistency
for usage of options between Create User statement and this utility?

 Yes.  In general, command-line utilities use a very different syntax
 for options-passing that SQL commands.  Trying to make them consistent
 feels unnecessary or perhaps even counterproductive.  And the proposed
 syntax is certainly a convention common to many other command-line
 utilities, so I think it's fine.

 Okay, the new way for syntax suggested by Peter has simplified the problem.
 Please find the updated patch and docs for multiple -g options.

 Committed.

Looks good, thanks!

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-11-20 Thread Christopher Browne
On Tue, Nov 19, 2013 at 11:54 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On further tests, I found inconsistency in behavior when some special
 characters are used in role names.

 1. Test for role name containing quotes
 a. In psql, create a role containing quotes in role name.
create role amitk in role test_ro'le_3;

 b. Now if we try to make a new role member of this role using
 createuser utility, it gives error
 try-1
 createuser.exe -g test_ro'le_3 -p 5446 amitk_2
 createuser: creation of new role failed: ERROR:  unterminated quoted
 string at or near 'le_3;
 LINE 1: ... NOCREATEDB NOCREATEROLE INHERIT LOGIN IN ROLE test_ro'le_3;
 try-2
 createuser.exe -g test_ro'le_3 -p 5446 amitk
 createuser: creation of new role failed: ERROR:  unterminated quoted
 string at or near 'le_3;
 LINE 1: ... NOCREATEDB NOCREATEROLE INHERIT LOGIN IN ROLE test_ro'le_3;

 c. If I try quoted string in new role to be created, it works fine.
 createuser.exe -p 5446 am'itk_2

 As quoted strings work well for role names, I think it should work
 with -g option as well.

 2. Test for role name containing special character ';' (semicolon)
 a. create role test;_1;

 b. Now if we try to make a new role member of this role using
 createuser utility, it gives error
 try-1
 createuser.exe -g test;_1 -p 5446 amitk_4
 createuser: creation of new role failed: ERROR:  syntax error at or near _1
 LINE 1: ...RUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN IN ROLE test;_1;
 try-2^
 createuser.exe -g test;_1 -p 5446 amitk_4
 createuser: creation of new role failed: ERROR:  syntax error at or near _1
 LINE 1: ...RUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN IN ROLE test;_1;
 ^
 try-3
 createuser.exe -g 'test;_1' -p 5446 amitk_4
 createuser: creation of new role failed: ERROR:  syntax error at or
 near 'test;_1'
 LINE 1: ...SER NOCREATEDB NOCREATEROLE INHERIT LOGIN IN ROLE 'test;_1';

 c. If I try semicolon in new role to be created, it works fine.
 createuser.exe -p 5446 amit;k_3

 As semicolon work well for role names, I think it should work with -g
 option as well.

I was not unconscious of there being the potential for issue here; there is an
easy answer of double quoting the string, thus:

diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 88b8f2a..04ec324 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -308,7 +308,7 @@ main(int argc, char *argv[])
if (conn_limit != NULL)
appendPQExpBuffer(sql,  CONNECTION LIMIT %s, conn_limit);
if (roles != NULL)
-   appendPQExpBuffer(sql,  IN ROLE %s, roles);
+   appendPQExpBuffer(sql,  IN ROLE \%s\, roles);
appendPQExpBufferStr(sql, ;\n);

if (echo)
(END)

I was conscious of not quoting it.  Note that other parameters are not quoted
either, so I imagined I was being consistent with that.

I have added the above change, as well as rebasing, per Peter's recommendation.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..5a38d2e 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,16 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-g replaceable 
class=parameterroles/replaceable//term
+  termoption--roles=replaceable 
class=parameterroles/replaceable//term
+  listitem
+   para
+Indicates roles to which this role will be added immediately as a new 
member.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-i//term
   termoption--inherit//term
   listitem
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 83623ea..04ec324 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -24,6 +24,7 @@ main(int argc, char *argv[])
{host, required_argument, NULL, 'h'},
{port, required_argument, NULL, 'p'},
{username, required_argument, NULL, 'U'},
+   {roles, required_argument, NULL, 'g'},
{no-password, no_argument, NULL, 'w'},
{password, no_argument, NULL, 'W'},
{echo, no_argument, NULL, 'e'},
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   char   *roles = NULL;
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, createuser, help);
 
-   while ((c = 

Re: [HACKERS] Extra functionality to createuser

2013-11-20 Thread Christopher Browne
Wait, that doesn't work if more than one role is added, as they get
merged together by the quoting.

A somewhat ugly amount of quoting can be done at the shell level to
induce double quotes.

$ createuser -g \test_rol'e_3\ usequoted3

I note that similar (with not quite identical behaviour) issues apply
to the user name.  Perhaps the
resolution to this is to leave quoting issues to the administrator.
That simplifies the problem away.
I suspect that the apparatus needed to do a thorough solution (e.g. -
parse the string, and do something
smarter) may be larger than is worth getting into.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2013-11-20 Thread Christopher Browne
On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Hello,

 Attached you can find a very-much-WIP patch to add CREATE info support
 for event triggers (normalized commands).  This patch builds mainly on
 two things:

 1. Dimitri's DDL rewrite patch he submitted way back, in
http://www.postgresql.org/message-id/m2zk1j9c44@2ndquadrant.fr

 I borrowed the whole ddl_rewrite.c code, and tweaked it a bit.  There
 are several things still wrong with it and which will need to be fixed
 before a final patch can even be contemplated; but there are some
 questions that require a consensus answer before I go and fix it all,
 because what it will look like will depend on said answers.

I have tried this out; the patch applies fine.

Note that it induces (modulo my environment) a failure in make check.

The opr_sanity test fails.

postgres@cbbrowne ~/p/s/t/regress diff expected/opr_sanity.out
results/opr_sanity.out
348,350c348,351
  oid | proname
 -+-
 (0 rows)
---
  oid  | proname
 --+--
  3567 | pg_event_trigger_get_normalized_commands
 (1 row)

That's a minor problem; the trouble there is that the new function is not
yet documented.  Not a concern at this stage.

 2. The ideas we used to build DROP support.  Mainly, the interesting
thing here is the fact that we use a SRF to report, at
ddl_command_end, all the objects that were created during execution
of that command.  We do this by collecting them in a list in some raw
form somewhere during ProcessUtility, and then spitting them out if
the SRF is called.  I think the general idea is sound, although of
course I admit there might be bugs in the implementation.

 Note this patch doesn't try to add any kind of ALTER support.  I think
 this is fine in principle, because we agreed that we would attack each
 kind of command separately (divide to conquer and all that); but there
 is a slight problem for some kind of objects that are represented partly
 as ALTER state during creation; for example creating a table with a
 sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
 be other cases I'm missing, also.  (The REFRESH command is nominally
 also supported.)

I imagine that the things we create in earlier stages may help with later
stages, so it's worth *some* planning so we can hope not to build bits
now that push later enhancements into corners that they can't get out of.

But I'm not disagreeing at all.

 Now about the questions I mentioned above:

 a) It doesn't work to reverse-parse the statement nodes in all cases;
 there are several unfixable bugs if we only do that.  In order to create
 always-correct statements, we need access to the catalogs for the
 created objects.  But if we are doing catalog access, then it seems to
 me that we can do away with the statement parse nodes completely and
 just reconstruct the objects from catalog information.  Shall we go that
 route?

Here's a case where it doesn't work.

testevent@localhost-  create schema foo;
CREATE SCHEMA
testevent@localhost-  create domain foo.bar integer;
CREATE DOMAIN
testevent@localhost-  CREATE OR REPLACE FUNCTION snitch() RETURNS
event_trigger LANGUAGE plpgsql AS $$
testevent$# DECLARE
testevent$# r RECORD;
testevent$# BEGIN
testevent$# FOR r IN SELECT * FROM
pg_event_trigger_get_normalized_commands()
testevent$# LOOP
testevent$# RAISE NOTICE 'object created: id %, statement %',
testevent$# r.identity, r.command;
testevent$# END LOOP;
testevent$# END;
testevent$# $$;
CREATE FUNCTION
testevent@localhost-  CREATE EVENT TRIGGER snitch ON ddl_command_end
EXECUTE PROCEDURE snitch();
CREATE EVENT TRIGGER
testevent@localhost-  set search_path to public, foo;
SET
testevent@localhost-  create table foo.foo2 (acolumn bar);
NOTICE:  object created: id foo.foo2, statement CREATE TABLE foo.foo2
(acolumn bar)
CREATE TABLE

The trouble is that you have only normalized the table name.  The
domain, bar, needs its name normalized as well.

 b) What's the best design of the SRF output?  This patch proposes two
 columns, object identity and create statement.  Is there use for
 anything else?  Class/object OIDs perhaps, schema OIDs for objects types
 that have it?  I don't see any immediate need to that info, but perhaps
 someone does.

Probably an object type is needed as well, to know if it's a table or
a domain or a sequence or whatever.

I suspect that what will be needed to make it all usable is some sort of
structured form.  That is in keeping with Robert Haas' discomfort with
the normalized form.

My minor gripe is that you haven't normalized enough (e.g. - it should be
CREATE TABLE foo.foo2 (acolumn foo.bar), capturing the normalization of
data types that are referenced).

But Robert's quite right that users may want more than just to capture that
literally; they may want to modify it, for instance, 

Re: [HACKERS] Extra functionality to createuser

2013-11-19 Thread Christopher Browne
On Mon, Nov 18, 2013 at 1:01 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Sat, Nov 16, 2013 at 4:57 AM, Christopher Browne cbbro...@gmail.com 
 wrote:
 Few comments:

 1.
 +  termoption-g//term
 +  termoption--roles//term

 All other options which require argument are of form:
  termoption-c replaceable 
 class=parameternumber/replaceable//term
   termoption--connection-limit=replaceable
 class=parameternumber/replaceable//term

 So I think it is better to have this new option which require argument
 in similar form.

Sounds good, done.

 2.
 +Indicates roles to associate with this role.

 I think word associate is not very clear, wouldn't it be better to
 mention that this new role will be member of roles specified.
 For example:
 Indicates roles to which the new role will be immediately added as a new 
 member.

With a switch of immediately and added, done.  That does better
describe the behaviour.

 3.
 + case 'g':
 + roles = pg_strdup(optarg);
 + break;

 If we see most of other options in case handling are ordered as per
 their order in long_options array. For example

 static struct option long_options[] = {
 {host, required_argument, NULL, 'h'},
 {port, required_argument, NULL, 'p'},
 ..

 Now the order of handling for both is same in switch case or while get
 opt_long() function call. I think this makes code easy to understand
 and modify.
 However there is no functionality issue here, so you can keep the code
 as per your existing patch as well, this is just a suggestion.

That is easy enough to change, and yes, indeed, having the new code
look just like what it is near seems an improvement.

I picked the location of the 'g:' in the opt_long() call basically arbitrarily;
if there is any reason for it to go in a different spot, I'd be happy to
shift it.


-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..5a38d2e 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,16 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-g replaceable 
class=parameterroles/replaceable//term
+  termoption--roles=replaceable 
class=parameterroles/replaceable//term
+  listitem
+   para
+Indicates roles to which this role will be added immediately as a new 
member.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-i//term
   termoption--inherit//term
   listitem
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index d1542d9..ecbb21c 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -24,6 +24,7 @@ main(int argc, char *argv[])
{host, required_argument, NULL, 'h'},
{port, required_argument, NULL, 'p'},
{username, required_argument, NULL, 'U'},
+   {roles, required_argument, NULL, 'g'},
{no-password, no_argument, NULL, 'w'},
{password, no_argument, NULL, 'W'},
{echo, no_argument, NULL, 'e'},
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   char   *roles = NULL;
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, createuser, help);
 
-   while ((c = getopt_long(argc, argv, h:p:U:wWedDsSaArRiIlLc:PEN,
+   while ((c = getopt_long(argc, argv, h:p:U:g:wWedDsSaArRiIlLc:PEN,
long_options, 
optindex)) != -1)
{
switch (c)
@@ -97,6 +99,9 @@ main(int argc, char *argv[])
case 'U':
username = pg_strdup(optarg);
break;
+   case 'g':
+   roles = pg_strdup(optarg);
+   break;
case 'w':
prompt_password = TRI_NO;
break;
@@ -302,6 +307,8 @@ main(int argc, char *argv[])
appendPQExpBuffer(sql,  NOREPLICATION);
if (conn_limit != NULL)
appendPQExpBuffer(sql,  CONNECTION LIMIT %s, conn_limit);
+   if (roles != NULL)
+   appendPQExpBuffer(sql,  IN ROLE %s, roles);
appendPQExpBuffer(sql, ;\n);
 
if (echo)
@@ -334,6 +341,7 @@ help(const char *progname)
printf(_(  -D, --no-createdb role cannot create databases 
(default)\n));
printf(_(  -e, --echoshow the commands being sent to 
the server\n));
printf(_(  -E

Re: [HACKERS] Extra functionality to createuser

2013-11-15 Thread Christopher Browne
On Fri, Nov 15, 2013 at 3:14 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 11/14/13, 4:35 PM, Christopher Browne wrote: On Thu, Nov 14, 2013 at
 5:41 AM, Sameer Thakur samthaku...@gmail.com wrote:
 So i think -g option is failing

 Right you are.

 I was missing a g: in the getopt_long() call.

 Attached is a revised patch that handles that.


 src/bin/scripts/createuser.c:117: indent with spaces.
 +   case 'g':
 src/bin/scripts/createuser.c:118: indent with spaces.
 +   roles = pg_strdup(optarg);

OK, I ran pgindent on createuser.c, which leads to the Next Patch...

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..5fedc80 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,16 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-g//term
+  termoption--roles//term
+  listitem
+   para
+Indicates roles to associate with this role.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-i//term
   termoption--inherit//term
   listitem
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index d1542d9..e2e1134 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -47,6 +47,7 @@ main(int argc, char *argv[])
{pwprompt, no_argument, NULL, 'P'},
{encrypted, no_argument, NULL, 'E'},
{unencrypted, no_argument, NULL, 'N'},
+   {roles, required_argument, NULL, 'g'},
{NULL, 0, NULL, 0}
};
 
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   char   *roles = NULL;
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, createuser, help);
 
-   while ((c = getopt_long(argc, argv, h:p:U:wWedDsSaArRiIlLc:PEN,
+   while ((c = getopt_long(argc, argv, h:p:U:g:wWedDsSaArRiIlLc:PEN,
long_options, 
optindex)) != -1)
{
switch (c)
@@ -112,6 +114,9 @@ main(int argc, char *argv[])
case 'D':
createdb = TRI_NO;
break;
+   case 'g':
+   roles = pg_strdup(optarg);
+   break;
case 's':
case 'a':
superuser = TRI_YES;
@@ -302,6 +307,8 @@ main(int argc, char *argv[])
appendPQExpBuffer(sql,  NOREPLICATION);
if (conn_limit != NULL)
appendPQExpBuffer(sql,  CONNECTION LIMIT %s, conn_limit);
+   if (roles != NULL)
+   appendPQExpBuffer(sql,  IN ROLE %s, roles);
appendPQExpBuffer(sql, ;\n);
 
if (echo)
@@ -334,6 +341,7 @@ help(const char *progname)
printf(_(  -D, --no-createdb role cannot create databases 
(default)\n));
printf(_(  -e, --echoshow the commands being sent to 
the server\n));
printf(_(  -E, --encrypted   encrypt stored password\n));
+   printf(_(  -g, --roles   roles to associate with this new 
role\n));
printf(_(  -i, --inherit role inherits privileges of roles 
it is a\n
 member of (default)\n));
printf(_(  -I, --no-inherit  role does not inherit 
privileges\n));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-11-14 Thread Christopher Browne
On Thu, Nov 14, 2013 at 5:41 AM, Sameer Thakur samthaku...@gmail.com wrote:
 So i think -g option is failing

Right you are.

I was missing a g: in the getopt_long() call.

Attached is a revised patch that handles that.

And it behaves better:
postgres@cbbrowne ~/p/s/b/scripts ./createuser -g purge_role -U
postgres newuser4
postgres@cbbrowne ~/p/s/b/scripts pg_dumpall -g | grep newuser4
CREATE ROLE newuser4;
ALTER ROLE newuser4 WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB
LOGIN NOREPLICATION;
GRANT purge_role TO newuser4 GRANTED BY postgres;

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..5fedc80 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,16 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-g//term
+  termoption--roles//term
+  listitem
+   para
+Indicates roles to associate with this role.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-i//term
   termoption--inherit//term
   listitem
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index d1542d9..c469b52 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -47,6 +47,7 @@ main(int argc, char *argv[])
{pwprompt, no_argument, NULL, 'P'},
{encrypted, no_argument, NULL, 'E'},
{unencrypted, no_argument, NULL, 'N'},
+   {roles, required_argument, NULL, 'g'},
{NULL, 0, NULL, 0}
};
 
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   char   *roles = NULL;
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, createuser, help);
 
-   while ((c = getopt_long(argc, argv, h:p:U:wWedDsSaArRiIlLc:PEN,
+   while ((c = getopt_long(argc, argv, h:p:U:g:wWedDsSaArRiIlLc:PEN,
long_options, 
optindex)) != -1)
{
switch (c)
@@ -112,6 +114,9 @@ main(int argc, char *argv[])
case 'D':
createdb = TRI_NO;
break;
+   case 'g':
+   roles = pg_strdup(optarg);
+   break;
case 's':
case 'a':
superuser = TRI_YES;
@@ -302,6 +307,8 @@ main(int argc, char *argv[])
appendPQExpBuffer(sql,  NOREPLICATION);
if (conn_limit != NULL)
appendPQExpBuffer(sql,  CONNECTION LIMIT %s, conn_limit);
+   if (roles != NULL)
+   appendPQExpBuffer(sql,  IN ROLE %s, roles);
appendPQExpBuffer(sql, ;\n);
 
if (echo)
@@ -334,6 +341,7 @@ help(const char *progname)
printf(_(  -D, --no-createdb role cannot create databases 
(default)\n));
printf(_(  -e, --echoshow the commands being sent to 
the server\n));
printf(_(  -E, --encrypted   encrypt stored password\n));
+   printf(_(  -g, --roles   roles to associate with this new 
role\n));
printf(_(  -i, --inherit role inherits privileges of roles 
it is a\n
 member of (default)\n));
printf(_(  -I, --no-inherit  role does not inherit 
privileges\n));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?

2013-11-04 Thread Christopher Browne
On Thu, Oct 31, 2013 at 3:42 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 31, 2013 at 2:44 PM, Garick Hamlin gham...@isc.upenn.edu
wrote:
 I think using /dev/urandom directly would be surprising. At least it
would
 have probably have taken me a while to figure out what was depleting the
 entropy pool here.

 Perhaps so; a bigger problem IMHO is that it's not portable. I think
 the only way to solve this problem is to import (or have an option to
 link with) a strong, sophisticated PRNG with much larger internal
 state than pg_lrand48, which uses precisely 48 bits of internal state.
 For this kind of thing, I'm fairly sure that we need something with
 at least 128 bits of internal state (as wide as the random value we
 want to generate) and I suspect it might be advantageous to have
 something a whole lot wider, maybe a few kB.

I mentioned the notion of building an entropy pool, into which one might
add various sorts of random inputs, under separate cover...

The last time I had need of a rather non-repeating RNG, I went with
a Fibonacci-based one, namely Mersenne Twister...

http://en.wikipedia.org/wiki/Mersenne_twister

The sample has 624 integers (presumably that means 624x32 bits) as
its internal state. Apparently not terribly suitable for cryptographic
purposes,
but definitely highly non-repetitive, which is what we're notably
worried about for UUIDs.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?

2013-10-31 Thread Christopher Browne
On Thu, Oct 31, 2013 at 2:44 PM, Garick Hamlin gham...@isc.upenn.edu wrote:
 On Thu, Oct 31, 2013 at 01:59:04PM -0400, Robert Haas wrote:
 On Thu, Oct 31, 2013 at 1:02 PM, Garick Hamlin gham...@isc.upenn.edu wrote:
  On Thu, Oct 31, 2013 at 09:54:14PM +0900, MauMau wrote:
  From: Robert Haas robertmh...@gmail.com
  ISTM that the biggest problem is that we don't have a random number
  generator which generates enough bits of randomness to implement
  uuid_generate_v3.  I think relatively few people would cry if we
  didn't support uuid_generate_v1(), and the others all look simple
  enough, provided there's somewhere to get lots of random bits.
 
  On Linux, it seems like we could get those bits from /dev/urandom,
  though I'm not sure how efficient that would be for the case where
  many UUIDs are being generated at once.  But that wouldn't be very
  portable.  It's tempting to think that we'd need a PRNG that generates
  wider values, for which we might find other application also.  But I'm
  not volunteering to be the one to create such a thing.
 
  OpenSSL provides rand_bytes() which generates random bytes of any length.
  It uses /dev/urandom or /dev/random on UNIX/Linux and Crypto API of
  Microsoft on Windows.
 
  What about using a cipher here as the PRNG? It seems like using openssl
  rand_bytes() to seed aes in ctr would work ok without starving the system 
  of
  entropy when making a lot of uuids.

 There are two good reasons for us NOT to rely on OpenSSL:

 Right, that makes sense.  openssl is a non-starter here.  In which case that
 approach is no easier than any other prng.

 I think using /dev/urandom directly would be surprising.  At least it would
 have probably have taken me a while to figure out what was depleting the
 entropy pool here.

There's something arguably duplicative about this, but I'd be kind of
inclined to
implement an entropy pool, and enable introducing entropy using whatever
means are wished.

That way, we can scale reasonably well for all the cases.

By default, we seed using a timestamp (or similar) and use some stream cipher
that is of suitable portability to stir the pot.  That's obviously
not perfect, but it
will *work*, and offer reasonable results on systems that don't offer better
facilities.

Having a pool lends itself to stirring extra ingredients in with
reasonable ease.

- If you have a better PRNG, you can stir that in.  Perhaps it replaces the
  default; I'm not sure it's important.

 - If you have /dev/random or /dev/urandom, you can stir some of that into
   the mix.

 - Someone that is feeling paranoid about their PRNG could stir in
   data from an entropy key (like http://www.entropykey.co.uk/)

The latter options aren't much use to someone without suitable hardware or
OS support, but they're not imperative.

Doing funky things to the pool seems like an example of something useful to
do using the background worker facility of 9.3+.  And if you have an application
where you're using really a lot of UUIDs, the burden of configuring that may be
reasonable to pay.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for external scripts for Extensions?

2013-10-23 Thread Christopher Browne
On Wed, Oct 23, 2013 at 12:24 PM, Josh Berkus j...@agliodbs.com wrote:
 Dimitri,

 Josh Berkus josh(at)agliodbs(dot)com writes:
 pg_partman has several external (python) scripts which help the
 extension, located in /extras/ in its source.  The problem currently is
 that if you install pg_partman via pgxn or package, you don't get those
 scripts, because there's no install location for them.

 See also my proposal to solve that, I'd welcome some design level
 discussions about it:

   http://www.postgresql.org/message-id/m28uyzgof3@2ndquadrant.fr

 Well, that's a bit more complicated than what I had in mind.  I was
 thinking just a location like $PGSHARE/extensions/lib/

 Note that, in the case of pg_partman, we're talking about external
 scripts designed to be run by hand, not libaries to be called by the PL
 functions.

I suppose that Slony would be a possible other user of this facility...

It has something of a mixture of things that get added:

a) There are the functions and tables that it adds.

Ideally, these could be added in via CREATE EXTENSION, so that
it would be somewhat logical for them to live in $PGSHARE/extensions.
(I think that's somewhat tough to do using the CREATE EXTENSION
facility as currently constituted as the slony schema installation process,
at present, does a number of remappings, putting FQ names inside
functions and such.)

b) Then there are the crucial programs that it uses, slon
(replication daemon) and slonik (configuration tool).

Those seem to fit well with your intent; they are additional programs
(not scripts per se, but I don't think that contradicts) that need to
live somewhere reasonably near to the database installation.

By default, we stow slon/slonik in the same directory as is used
by all the other PostgreSQL binaries (e.g. - psql, pg_ctl, ...),
e.g. - $BINDIR

Perhaps that is an OK place for pg_partman.  It would seem
pretty acceptable to me to consider declaring something like
$PGSHARE/extensions/bin for such things.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW API / flow charts for the docs?

2013-10-18 Thread Christopher Browne
I would be more inclined to let GraphViz into the process than Dia; the
former fits *much* better into a Make-based process.

It is worth observing that there are schema diagramming systems (SchemaSpy
is mighty likable) that build diagrams using GraphViz.  We have integrated
this into internal deployments at Afilias.

Rod Taylor's autodoc tool has some minimal Dia support, but it lacks the
ability to do layout; if you use Dia output, you'll have to manually drag
things around to have the tables *not* all atop one another.  I'm a little
surprised Dia hasn't integrated in GraphViz to help with that; note that
the MacOS diagram tool OmniGraffle (proprietary, considered quite good,
competitive with Visio) uses GraphViz to help do automatic object layout.
That this is integrated inside an interactive GUI app is pretty wild/cool.

I'd think it a fine idea to add rules for .dot files (the usual GraphViz
language/format to describe graphs) to the PG Make rules.


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Christopher Browne
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian br...@momjian.us wrote:
 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

I should think it possible to reimplement it in C.  It was considerably
useful to start by implementing in Python, as that evades various sorts
of efforts needed in C (e.g. - memory allocation, picking a hash table
implementation), and allows someone to hack on it without needing to
run through a recompile every time something is touched.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-09-27 Thread Christopher Browne
Attached is a patch implementing the -g / --roles option for createuser.

I'll be attaching it to the open CommitFest shortly.


createuser.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extra functionality to createuser

2013-09-26 Thread Christopher Browne
Sitting on my todo list for a while has been to consider the idea of
adding a bit of additional functionality to createuser.

One of the functions of CREATE ROLE is to associate the role with
other roles, thus...

   create role my_new_user nosuperuser nocreatedb login
IN ROLE app_readonly_role, app2_writer_role;

That isn't something that I can do using createuser; to do that, I
would need to submit two requests separately:

   PGUSER=postgres createuser -D -S -l my_new_user
   PGUSER=postgres psql -c grant app_readonly_role, app2_writer_role
to my_new_user;

I could certainly change over to using psql to do all the work, but it
would be rather nice if createuser had (say) a -g option which
allowed specifying the set of roles that should be assigned.

Thus, the above commands might be replaced by:
   PGUSER=postgres createuser -D -S -l -g
app_readonly_role,app2_writer_role my_new_user

Would this be worth adding to the ToDo list?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_system_identifier()

2013-08-23 Thread Christopher Browne
On Fri, Aug 23, 2013 at 3:10 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:

 After someone in IRC asked if there was an equivalent to MySQL's
 server_id, it was noted that we do have a system identifier but it's not
 very accessible.

 The attached patch implements a pg_system_identifier() function that
 exposes it.


 Would it make sense for such identifiers be standard UUID
 (http://en.wikipedia.org/wiki/UUID)?

 Should there be a UUID per cluster? and/or per database, possibly deduce
 from the cluster one? Should it be configurable, say from postgresql.conf?

   get_pg_uuid()
   get_pg_uuid('template0')

 Note that there is a set of uuid functions provided as a module that may
 help.

There is sense to this, sure.

I'd think that constructing a Type 5 (SHA-1) UUID based on some local
information would make a lot of sense.

In effect, based on constructing SHA-1 on a string looking like:

Database system identifier:   5651554613500795646
Maximum data alignment:   8
Database block size:  8192
WAL block size:   8192
Maximum length of identifiers:64
Date/time type storage:   64-bit integers
Version:  PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by
gcc (Debian 4.6.1-4) 4.6.1, 64-bit

== SHA-1 of b1b012cc85149d2fe4bf0fc18c38dcf1218e95a5

(Note that I didn't put anything into that which is mutable such as
port numbers, MAC addresses, or IP
addresses - seems to introduce risk of false-negatives to me...)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Personal note: taking some vacation time in Sep/Oct

2013-08-20 Thread Christopher Browne
On Tue, Aug 20, 2013 at 2:55 AM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 On 20/08/13 15:26, Tom Lane wrote:

 I will be taking a long (and long-overdue) vacation from Sep 10 to Oct 20.
 I expect to have email access, but won't be doing much more than minimally
 keeping up with my inbox.

 This means I'll be pretty much AWOL for the September commitfest :-(.
 That's unfortunate, but the dates for this trip were frozen long before
 the 9.4 development calendar was.

 regards, tom lane


 but, But, BUT, you're not human - you can't possibly take leave, the sky
 will fall  all manners of divers calamities will come to pass!!!

I think a scene from Ghostbusters comes in handy here...

Dr. Peter Venkman: This city is headed for a disaster of biblical proportions.
Mayor: What do you mean, biblical?
Dr Ray Stantz: What he means is Old Testament, Mr. Mayor, real wrath
of God type stuff.
Dr. Peter Venkman: Exactly.
Dr Ray Stantz: Fire and brimstone coming down from the skies! Rivers
and seas boiling!
Dr. Egon Spengler: Forty years of darkness! Earthquakes, volcanoes...
Winston Zeddemore: The dead rising from the grave!
Dr. Peter Venkman: Human sacrifice, dogs and cats living together...
mass hysteria!
Mayor: All right, all right! I get the point!

Man, dogs and cats living together!!!

[I wonder what skewing this will have on the analytical statistics on
the pgsql mailing lists???
http://www.citusdata.com/blog/57-postgresql-full-text-search]

 MORE SERIOUSLY:
 Enjoy your more than well earnt leave!

Indeed.  We'll try to keep the dogs and cats suitably apart!  :-)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] killing pg_dump leaves backend process

2013-08-10 Thread Christopher Browne
On Sat, Aug 10, 2013 at 12:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Tatsuo Ishii is...@postgresql.org writes:
 I noticed pg_dump does not exit gracefully when killed.
 start pg_dump
 kill pg_dump by ctrl-c
 ps x

 27246 ?Ds96:02 postgres: t-ishii dbt3 [local] COPY
 29920 ?S  0:00 sshd: ishii@pts/5
 29921 pts/5Ss 0:00 -bash
 30172 ?Ss 0:00 postgres: t-ishii dbt3 [local] LOCK TABLE waiting

 As you can see, after killing pg_dump, a backend process is (LOCK
 TABLE waiting) left behind. I think this could be easily fixed by
 adding signal handler to pg_dump so that it catches the signal and
 issues a query cancel request.

 If we think that's a problem (which I'm not convinced of) then pg_dump
 is the wrong place to fix it.  Any other client would behave the same
 if it were killed while waiting for some backend query.  So the right
 fix would involve figuring out a way for the backend to kill itself
 if the client connection goes away while it's waiting.

This seems to me to be quite a bit like the TCP keepalive issue.

We noticed with Slony that if something ungraceful happens in the
networking layer (the specific thing noticed was someone shutting off
networking, e.g. /etc/init.d/networking stop before shutting down
Postgres+Slony), the usual timeouts are really rather excessive, on
the order of a couple hours.

Probably it would be desirable to reduce the timeout period, so that
the server could figure out that clients are incommunicado reasonably
quickly.  It's conceivable that it would be apropos to diminish the
timeout values in postgresql.conf, or at least to recommend that users
consider doing so.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Listen/notify across clusters

2013-07-10 Thread Christopher Browne
Shouldn't be possible.

The act of requesting to LISTEN requires doing a sort of update to the
database.  In elder versions, it put tuple(s) into pg_catalog.pg_listener,
and that's Right Well Disallowed on a WAL-based replica.

I would think that if you're keen on building an event detection
substrate, particularly one that's supposed to cross clusters, then you
should consider using something actually attuned to that, such as a message
queueing system, whether an AMQP implementation such as RabbitMQ, or a
message bus like Spread.  If you do that, then you can do this in much
broader cross-cluster ways for unrelated Postgres clusters.


Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List

2013-07-03 Thread Christopher Browne
On Wed, Jul 3, 2013 at 2:24 PM, CĂ©dric Villemain ced...@2ndquadrant.comwrote:

  Clearly I ticked off a bunch of people by publishing the list.  On the
  other hand, in the 5 days succeeding the post, more than a dozen
  additional people signed up to review patches, and we got some of the
  ready for committer patches cleared out -- something which nothing
  else I did, including dozens of private emails, general pleas to this
  mailing list, mails to the RRReviewers list, served to accomplish, in
  this or previous CFs.

 Others rules appeared, like the 5 days limit.
 To me it outlines that some are abusing the CF app and pushing there
 useless
 patches (not still ready or complete, WIP, ...


Seems to me that useless overstates things, but it does seem fair to
say that some patches are not sufficiently well prepared to be efficiently
added into Postgres.

 So, as an experiment, call it a mixed result.  I would like to have some
  other way to motivate reviewers than public shame.  I'd like to have
  some positive motivations for reviewers, such as public recognition by
  our project and respect from hackers, but I'm doubting that those are
  actually going to happen, given the feedback I've gotten on this list to
  the idea.

 You're looking at a short term, big effect.
 And long term ? Will people listed still be interested to participate in a
 project which stamps people ?

 With or without review, it's a shame if people stop proposing patches
 because
 they are not sure to get time to review other things *in time*.


Well, if the project is hampered by not being able to get *all* the
changes that people imagine that they want to put in, then we have a
real problem of needing a sort of triage to determine which changes
will be accepted, and which will not.

Perhaps we need an extra status in the CommitFest application, namely
one that characterizes:
   Insufficiently Important To Warrant Review

That's too long a term.  Perhaps Not Review-worthy expresses it better?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-27 Thread Christopher Browne
On Thu, Jun 27, 2013 at 10:37 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan and...@dunslane.net
 wrote:
  I'd like to see prizes each release for best contribution and best
  reviewer - I've thought for years something like this would be worth
  trying. Committers and core members should not be eligible - this is
 about
  encouraging new people.

 Encouraging new people is good, but recognizing sustained, long-term
 contributions is good, too.  I think we should do more of that, too.


Conforming with David Fetter's pointer to the notion that sometimes attempts
to reward can backfire, I'm not sure that it will be super-helpful to
create special
rewards.

On the other hand, to recognize reviewer contributions in places relevant
to where
they take place seems pretty apropos, which could include:

a) Obviously we already capture this in the CommitFest web site (but it's
worth mentioning when trying to do a census)

b) It would be a pretty good thing to mention reviewers within commit notes;
that provides some direct trace-back as to who it was that either validated
that the change was good, or that let a bad one slip through.

c) The release notes indicate authors of changes; to have a list of
reviewers
would be a fine thing.

If it requires inordinate effort to get the reviewers directly attached to
each
and every change, perhaps it isn't worthwhile to go to extreme efforts to
that
end.

It could be pretty satisfactory to have a simple listing, in the release
notes,
of the set of reviewers.  That's a lot less bookkeeping than tracking this
for
each and every change.

The statement of such a list is a public acknowledgement of those that help
assure that the quality of PostgreSQL code remains excellent. (And that may
represent a good way to sell this kudo.)

This allows organizations that are sponsoring PostgreSQL development to
have an extra metric by which *they* can recognize that their staff that
do such work are being recognized as contributors.  It seems to me that
this is way more useful than a free t-shirt or the like.


Re: [HACKERS] Min value for port

2013-06-27 Thread Christopher Browne
On Thu, Jun 27, 2013 at 9:22 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-06-27 15:11:26 +0200, Magnus Hagander wrote:
  On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut pete...@gmx.net
 wrote:
   On 6/27/13 6:34 AM, Magnus Hagander wrote:
   Is there a reason why we have set the min allowed value for port to 1,
   not 1024? Given that you can't actually start postgres with a value of
   1024, shoulnd't the entry in pg_settings reference that as well?
  
   Are you thinking of the restriction that you need to be root to use
   ports 1024?  That restriction is not necessarily universal.  We can
 let
   the kernel tell us at run time if it doesn't like our port.
 
  Yes, that's the restriction I was talking about. It's just a bit
  annoying that if you look at pg_settings.min_value it doesn't actually
  tell you the truth. But yeah, I believe Windows actually lets you use
  a lower port number, so it'd at least have to be #ifdef'ed for that if
  we wanted to change it.

 You can easily change the setting on linux as well. And you can grant
 specific binaries the permission to bind to restricted ports without
 being root.
 I don't think the additional complexity to get a sensible value in there
 is warranted.


With that large a set of local policies that can change the usual
 1024 policy, yep, I agree that it's not worth trying too hard on this
one.

And supposing something like SE-Linux can grant bindings for a particular
user/binary to access a *specific* port, that represents a model that is
pretty incompatible with the notion of a minimum value.

On the one hand, the idea of having to add a lot of platform-specific
code (which may further be specific to a framework like SE-Linux)
is not terribly appealing.

Further, if the result is something that doesn't really fit with a
minimum,
is it much worth fighting with the platform localities?

Indeed, I begin to question whether indicating a minimum is actually
meaningful.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Hash partitioning.

2013-06-25 Thread Christopher Browne
On Tue, Jun 25, 2013 at 12:08 PM, Yuri Levinsky yu...@celltick.com wrote:

 Guys,
 I am sorry for taking your time. The reason for my question is:
 As former Oracle DBA and now simple beginner PostgreSQL DBA I would like
 to say: the current partitioning mechanism might be improved. Sorry, it
 seems to me far behind yesterday requirements. As model for improvement
 the Oracle might be taken as example. Unfortunately I am not writing an
 C code and see my benefit to PostgreSQL community in only rising this
 issue. I'll be very happy to be helpful in something else, but...


Please don't flee over this...

As I think you can see, now, the partitioning problem is tougher than it
may at first seem to be.  It's quite useful to quickly get to the point of
understanding that.

There would indeed be merit in improving the partitioning apparatus,
and actually, I think it's been a couple of years since there has been
serious discussion of this.

The discussion tends to head into the rabbit hole of disputing about
whether one mechanism or another is ideal.  That's the wrong starting
point - we shouldn't start with what's easiest to make ideal, we
should start by determining what is required/desirable, without too
much reference, at least initially, on to how to achieve it.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Christopher Browne
The case where I wanted routine shutdown immediate (and I'm not sure I
ever actually got it) was when we were using IBM HA/CMP, where I wanted a
terminate with a fair bit of prejudice.

If we know we want to switch right away now, immediate seemed pretty much
right.  I was fine with interrupting user sessions, and there wasn't as
much going on in the way of system background stuff back then.

I wasn't keen on waiting on much of anything.  The background writer ought
to be keeping things from being too desperately out of date.

If there's stuff worth waiting a few seconds for, I'm all ears.

But if I have to wait arbitrarily long, colour me unhappy.

If I have to distinguish, myself, between a checkpoint nearly done flushing
and a backend that's stuck waiting forlornly for filesystem access, I'm
inclined to kill -9 and hope recovery doesn't take *too* long on the next
node...

If shutting a server down in an emergency situation requires a DBA to look
in, as opposed to init.d doing its thing, I think that's pretty much the
same problem too.


Re: [HACKERS] dynamic background workers

2013-06-17 Thread Christopher Browne
BTW, one of the ideas that popped up in the unConference session on
replication was why couldn't we use a background worker as a replication
agent?

The main reason pointed out was 'because that means you have to restart the
postmaster to add a replication agent.'  (e.g. - like a Slony slon
process)

There may well be other better reasons not to do so, but it would be nice
to eliminate this reason.  It seems seriously limiting to the bg-worker
concept for them to be thus restricted.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Processing long AND/OR lists

2013-05-27 Thread Christopher Browne
On Mon, May 27, 2013 at 1:42 AM, Gurjeet Singh gurj...@singh.im wrote:



 Joking about 640K aside, it doesn't seem reasonable to expect a truly
 enormous query as is generated by the broken forms of this logic to turn
 out happily.  I'd rather fix Slony (as done in the above patch).


 Yes, by all means, fix the application, but that doesn't preclude the
 argument that the database should be a bit more smarter and efficient,
 especially if it is easy to do.


Agreed, it seems like a fine idea to have the database support such
queries, as this eases coping with applications that might be more
difficult to get fixed.  (I can't see too many users generating such
enormous queries by hand!  :-))
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Processing long AND/OR lists

2013-05-26 Thread Christopher Browne
This situation falls from a problem that we noticed a mighty long time ago
in Slony, where the set of XIDs outstanding gets very large, and, attendant
to that, the set of action id values by which tuples are being filtered,
gets correspondingly large.

It happens when there is a long pause in application of replication data,
and is commonly the consequence of setting up replication on a very large
data table that takes a long time for the initial data copy.

At the time, Neil Conway observed this query breakage with a query that was
roughly 640K in size, from whence fell jokes to the effect, who would ever
need a query larger than 640K?

The resolution that I introduced at the time was to write a little parser
that would recognize sequences of adjacent values and merge them into
BETWEEN A and B clauses, which would bring the query size back to a
reasonable size.

In Slony 2.1, the issue re-emerged because the ordering of the action id
values was lost; the query had previously been implicitly forcing them into
order; we had to add an ORDER BY clause, to make the compressor work
again.
http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blobdiff;f=src/slon/remote_worker.c;h=b1f48043f8e25b4a74a392b0dbceeae8f3e18c27;hp=7fbf67c16f97cb7c3f209cf3be903ea52c4490a9;hb=c4ac435308a78a2db63bf267d401d842c169e87d;hpb=d4612aab78bac5a9836e3e2425c403878f7091c8

Joking about 640K aside, it doesn't seem reasonable to expect a truly
enormous query as is generated by the broken forms of this logic to turn
out happily.  I'd rather fix Slony (as done in the above patch).


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-26 Thread Christopher Browne
The assumption that we ought to plan expressly for an incompatibility that
essentially discards pg_upgrade seems premature, particularly in advance of
would-be solutions that, in some cases, mightn't actually work.

If pg_upgrade doesn't work, then, at present, the plausible solutions are
to either dump and restore, which might take way too long, or use one of
the logical replication systems (e.g. - Slony, Londiste, or similar, in the
absence of the would-be built-in logical replication).

Unfortunately, there are significant scenarios where none of these work,
particularly for data warehouse-like systems where the database size is so
large that the users cannot afford the disk space to construct a replica.
It sure seems premature to intentionally leave that set of users out in the
cold.


Re: [HACKERS] Patch to .gitignore

2013-05-23 Thread Christopher Browne
There hasn't been general agreement on the merits of particular .gitignore
rules of this sort.

You could hide your own favorite patterns by putting this into your
~/.gitignore that isn't part of the repo, configuring this globally, thus:
git config --global core.excludesfile '~/.gitignore'

That has the consequence that you can hide whatever things your own tools
like to create, and not worry about others' preferences.

Us Emacs users can put things like *~, #*#, and such into our own ignore
configuration; that doesn't need to bother you, and vice-versa for your
vim-oriented patterns.


Re: [HACKERS] [PATCH] Make psql -1 file.sql work as with -f

2013-05-10 Thread Christopher Browne
On Fri, May 10, 2013 at 9:50 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 8/9/12 9:08 AM, Robert Haas wrote:
  On Wed, Aug 8, 2012 at 6:50 PM, David Fetter da...@fetter.org wrote:
  I'm wondering if perhaps -- in addition to what you've done here -- we
  should make psql -1 error out if reading from a terminal.
 
  +1 for this.
 
  OK, done.
 
  I had to revise the original patch pretty heavily before committing;

 My first use of 9.3beta1 in development failed because of changes
 introduced by this patch, specifically because of the newly introduced
 error

 psql: -1 is incompatible with -c and -l

 I'm not convinced this is correct.  -c and -l are single-transaction
 actions almost by definition.

 This particular aspect of the change wasn't really brought up in the
 original thread.  What was your thinking?


FYI, I noticed this issue when building one of our applications against
HEAD;
I'm not sure I agree with you vis-a-vis the -c option, as it is certainly
plausible/meaningful
to do:
  psql -c begin; update [something]; insert [something]; delete
[something]; commit;
and for that to be different from
  psql -c update [something]; insert [something]; delete [something];

Consider it stipulated that it's pretty plausible to expect things to break
down if, in that
series of requests, the UPDATE fails, and it isn't nearly safe to assume
that the INSERT
and/or DELETE statements would succeed after all that.

I'd be pretty happy for
  psql -1 -c update [something]; insert [something]; delete [something];
to implicitly augment the query to:
  psql -c begin; update [something]; insert [something]; delete
[something]; commit;

It's a bit annoying (it bit me, giving me a complication, without any
evident benefit) for
psql -1 -c to refuse to run.

I'd rather that it behave similarly to psql -1 -f, and wrap the queries
in a transaction.
For it to behave badly if I try to induce transaction control (e.g. -
embedding BEGIN/END
inside the queries) would not come as a surprise; that would be much the
same as how
psql -1 -f works, where the extra BEGIN is warned as redundant and the
extra COMMIT
is considered an error.

As for psql -1 -l, it seems like a regression for that to fail.  Listing
the databases is
pretty much already a single transaction; adding -1 is perhaps
overspecifying things,
but it doesn't seem wrong.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Christopher Browne
On Mon, Apr 29, 2013 at 10:50 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Apr 29, 2013 at 9:25 AM, Atri Sharma atri.j...@gmail.com wrote:
 
  This is an interesting idea.  Historically I've always decomposed
  graphs into relational structures because that's the only practical
  way to query them.   Graphs are not currently able to be transported
  out of the database currently via JSON so one of the areas to focus
  your research will be how the client will consume the data.
  libpqtypes is one way to do it, but that will really restrict you
  audience so you'll probably need a rich set of functions present the
  internal data (just like hstore).
 
  I completely agree. Initially, I was thinking of exposing the data to
  user via HStore. But now, after Robert's suggestions, I think it will
  be better to have an alternate representation. JSON seems to be an
  excellent idea for that.

 I don't agree with this; JSON is not really designed to store graphs.
 You will probably need a customized internal representation, just like
 hstore, that expresses a graph like structure.

 This is not a trivial project.


Not trivial, indeed.

I see there being two directions where a data type goes.

1.  We created JSON and XML types as ways of storing data that has a robust
validation system.

They're still, in a sense, just plain old text, but it's plain old text
that the user can be certain satisfies the respective rules for
representations.

2.  Some types support special operations to allow the data to be queried
in novel ways.

That's NOT the case, at this point, for JSON or XML.

But it certainly IS the case for Jeff Davis' range types, which expose
access to some new sorts of data validation and indexing.

It is true for the inet type, which behaves rather differently from our
other types.

It is true for the tsearch indexes, that enable interesting random access
within some large blobs of stored data.

I'm not sure quite what we *would* want as the merits of graph-related
types.

I suspect that the best answer is NOT one where a graph is represented as a
value in a table; that has the implication that modifying The Graph
requires altering a single tuple, and that seems likely to become a
horrible bottleneck.  I'm suspicious that using HSTORE leads down that
path, where you'll have a database that has a table with just one tuple in
it, and where it's nearly impossible to alter that tuple.

I'm having a hard time thinking about what it looks like to have a graph as
table except to effectively compose the graph as a set of nodes, one tuple
per node, and I'm not sure that a new data type has anything to contribute
to that.

The one place where I *could* see a special type having a contribution is
for there to be a data type that can contain an arbitrary number of links.
That means you have one tuple per node, and, instead of needing a tuple for
each link between nodes, you have one attribute indicating *all* the
links.  (And interesting is for that one attribute to be usable for
foreign key purposes.)  That has a hard time scaling in cases where nodes
are over-connected, which is, broadly speaking, an acceptable sort of
scenario.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] question about postgres storage management

2013-04-19 Thread Christopher Browne
I would expect the strategy you have in mind to be more useful to apply at
the filesystem level, so that it's not in Postgres altogether.  (Ala
Stacker, remember DR-DOS?)

But, to speak arguable heresy, the demerits of this sort of thing are
described nicely in Another Database's Documentation: 
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html


The relevant bit that seems to describe fairly aptly why what you are
suggesting is unlikely to turn out well:

Some operating systems implement compression at the file system level.
Files are typically divided into fixed-size blocks that are compressed into
variable-size blocks, which easily leads into fragmentation. Every time
something inside a block is modified, the whole block is recompressed
before it is written to disk. These properties make this compression
technique unsuitable for use in an update-intensive database system.

The principle described is as applicable to Postgres as it is to InnoDB,
and is as applicable to attempting to compress disk blocks from within the
database as it is to apply it at the filesystem layer.

Postgres *does* make use of data compression, where applicable; see the
documentation for TOAST: 
http://www.postgresql.org/docs/9.2/static/storage-toast.html

You should avail yourself of the code on TOAST:
./src/backend/catalog/toasting.c
./src/backend/access/heap/tuptoaster.c
./src/include/catalog/toasting.h
./src/include/access/tuptoaster.h


Re: [HACKERS] sql_drop Event Trigger

2013-02-28 Thread Christopher Browne
On Thu, Feb 28, 2013 at 3:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's fairly obvious that
 (1) dealing with a DROP only after it's happened is pretty limiting;
 (2) allowing user-defined code to run mid-command is dangerous.
 What's at issue is the tradeoff we make between these inescapable
 facts, and I'm not sure if we can get consensus on that.

I'll note that Slony could do something useful with an ON EVENT trigger
even if there's NO data provided as to what tables got dropped.

We could get a prevent dropping by accident test that amounts to:

if exists (select 1 from _slony_schema.sl_table where
 not exists (select 1 from pg_catalog.pg_class where oid =
tab_reloid)) then
raise exception 'You attempted to drop a replicated table.  Shame on you!;
end if;

That could be extended to precede the exception by raising a warning
for each such table that was found.  That's a nice save the admin from
accidentally breaking replication check.

If we're agonizing over what more do we need to ensure it's useful?, and
it's looking pretty open-ended, well, for the above test, I don't need *any*
attributes to be passed to me by the event trigger in order to do something
that's useful enough.  I wouldn't feel horribly badly if things stopped short
of having ON DROP do anything extra.

If I *really* wanted to do some sophisticated tracking of things, the apropos
route might be to set up both a BEFORE and an AFTER trigger, have the
BEFORE part capture relevant data in memory (with some question of
how much relevant data) and then, in the AFTER trigger, refer back to
the captured data in order to do something.  That again doesn't require
adding much of anything to the trigger attributes.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sql_drop Event Trigger

2013-02-06 Thread Christopher Browne
On Wed, Feb 6, 2013 at 12:05 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 6, 2013 at 9:44 AM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Robert Haas robertmh...@gmail.com writes:
 I disagree with that. I don't see why the enclosing event trigger
 shouldn't be aware of all the objects dropped by the command that just
 ran to completion, *including* the effects of any event trigger fired
 recursively or not.

 Well, that could result in some DROP events being reported more than
 once, which I assume would be undesirable for someone hoping to use
 this for replication.

 Any command might have an event trigger attached doing a DROP, so that
 you don't know where to expect it, and it's well possible that in your
 example both the event triggers have been installed by different tools.

 It certainly is; in fact, it's likely.  So let's say that B is a
 replication trigger.  Don't you want it to hear about each drop
 exactly once?  If not, how will you avoid errors when you go to replay
 the events you've captured on another machine?

In this case, the hygenic change that we're thinking of making to Slony,
at least initially, is for the trigger to check to see if the table is
replicated,
and raise an exception if it is.

That forces the Gentle User to submit the Slony SET DROP TABLE
command http://slony.info/documentation/2.1/stmtsetdroptable.html.

Now, if we stipulate that imposition, then, for this kind of event, it
becomes unnecessary for event triggers to get *overly* concerned about
capturing more about dropping tables.  After all, SET DROP TABLE
already knows how to replicate its action, so what happens, in that
case is:

- User submits SET DROP TABLE
   - SET DROP TABLE drops the triggers for the table, cleans out
 Slony configuration surrounding the table, forwards request
 to other nodes

- User submits DROP TABLE
   - Slony is no longer involved with that table; there's nothing special
 anymore about replicating this; perhaps we capture and forward
 it via event trigger.

I'm not sure if that makes thinking about this easier, I hope so :-).


-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Christopher Browne
On Sat, Feb 2, 2013 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 2, 2013 at 1:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 You're right, this doesn't work superbly well, especially for
 insert-only tables... But imo the place to fix it is not the
 priorization logic but relation_needs_vacanalyze, since fixing it in
 priorization won't prevent the BAM just the timing of it.

 Agreed.

 I think scheduling a table for a partial vacuum every min_freeze * 2
 xids, even if its insert only, would go a long way of reducing the
 impact of full-table vacuums. Obviously that would require to retain the
 last xid a vacuum was executed in...

 I'm not sure that min_freeze * 2 is the right value, but otherwise agreed.

 I keep coming back to the idea that vacuum should have a high-priority
 queue and a low-priority queue.  When stuff meets the current
 thresholds, it goes into the high-priority queue.  But then there
 should be a low-priority queue where we do partial vacuums of things
 that meet some lower threshold - like the unfrozen portions of
 insert-only tables.

When I was thinking about your desire for unitless values, I found
myself uncomfortable about that, and I think I've mentioned that.

On further reflection, there's good reason.  The need to vacuum
tables with lots of dead tuples has very different characteristics
from the need to vacuum tables to avoid XID rollover.  Trying to
force them onto the same units seems unlikely to turn out
happily.

On the other hand, I always thought that there was use for having
multiple autovacuum queues, and giving queues different
shaped policies, one for each purpose, seems like a mighty
fine idea.  That way we don't need to worry about mixing the
policies.  There can be two best policies.

I'd go further, and have 3 queues:

a) A queue devoted to vacuuming small tables.  Anything
with more than [some number of relpages] need not apply.

b) A queue devoted to vacuuming tables with a lot of dead
tuples.

c) A queue devoted to vacuuming tables before their XID
rollover.

The appropriate strength functions for b) and c) can be
pretty simple, possibly the relevant bits of the functions that
Nasby and I have suggested.  And any time b) and c) find
small tables, throw them to queue a), essentially doing
the quick  easy vacuums.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-01 Thread Christopher Browne
On Fri, Feb 1, 2013 at 4:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 31, 2013 at 3:18 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 My intention was to apply a Nasby correction to Browne Strength and call
 the resulting function Browne' (Browne prime).  Does that sound better?

 /me rests head in hands.  I'm not halfway clever enough to hang with
 this crowd; I'm not even going to touch the puns in Chris' reply.

It's Friday... Fun needs to be had :-).

 Now seriously, I did experiment a bit with this and it seems to behave
 reasonably.  Of course, there might be problems with it, and I don't
 oppose to changing the name.  Vacuum strength didn't sound so great,
 so I picked the first term that came to mind.  It's not like picking
 people's last names to name stuff is a completely new idea; that said,
 it was sort of a joke.

 I don't think I really understand the origin of the formula, so
 perhaps if someone would try to characterize why it seems to behave
 reasonably that would be helpful (at least to me).

 f(deadtuples, relpages, age) =
deadtuples/relpages + e ^ (age*ln(relpages)/2^32)

 To maybe make that discussion go more quickly let me kvetch about a
 few things to kick things off:

 - Using deadtuples/relpages as part of the formula means that tables
 with smaller tuples (thus more tuples per page) will tend to get
 vacuumed before tables with larger tuples (thus less tuples per page).
  I can't immediately see why that's a good thing.

That wasn't intentional, and may be somewhat unfortunate.

I picked values that I knew could be easily grabbed, and we don't
have an immediate tuples-per-page estimate on pg_class.  An
estimate should be available in pg_statistic; I'm not sure that the
bias from this hurts things badly.

 - It's probably important to have a formula where we can be sure that
 the wrap-around term will eventually dominate the dead-tuple term,
 with enough time to spare to make sure nothing really bad happens; on
 the other hand, it's also desirable to avoid the case where a table
 that has just crossed the threshold for wraparound vacuuming doesn't
 immediately shoot to the top of the list even if it isn't truly
 urgent.  It's unclear to me just from looking at this formula how well
 the second term meets those goals.

I think the second term *does* provide a way for wraparound to dominate;
splitting it apart a bit...

Consider...

   age * ln(relpages)
e^ --
  2^32

The wraparound portion of this involves age/2^32...  In the beginning, the
numerator will be near zero, and denominator near 2 billion, so is roughly
1.  As age trends towards 2^32, the fraction (ignoring ln(relpages)) trends
towards 1, so that the longer we go without vacuuming, the more certain
that the fraction indicates a value near 1.  That *tends* to give you something
looking like e^1, or 2.71828+, ignoring the relpages part.

I threw in multiplying by ln(relpages) as a way to step Well Back from
rollover; that means that this term will start growing considerably before
rollover, and the larger the table, the sooner that growth takes place.

There is a problem with the ln(relpages) term; if the table has just 1 page,
the ln(relpages) = 0 so the value of the exponential term is *always* 1.
Probably should have ln(relpages+CONSTANT) so that we guarantee
the numerator is never 0.

I'm a bit worried that the exponential term might dominate *too* quickly.

For a table I have handy with 163K tuples, spread across 3357 pages,
ln(relpage) = 8.1188, and the range of the exponential bit travels
like follows:

dotpro0620@localhost-  select generate_series(1,20)*100/20 as
percent_wraparound, power(2.71828,
(65536.0*32768.0*generate_series(1,20)/20.0 *
ln(3357))/(65536.0*32768)) as wraparound_term;
 percent_wraparound | wraparound_term
+--
  5 | 1.50071232210687
 10 |  2.2521374737234
 15 | 3.37981045789535
 20 | 5.07212320054923
 25 | 7.61179778630838
 30 | 11.4231187312988
 35 | 17.1428150369499
 40 | 25.7264337615498
 45 |  38.607976149824
 50 | 57.9394655396491
 55 |  86.950469871638
 60 |  130.48764154935
 65 | 195.824411555774
 70 | 293.876107391077
 75 | 441.023495534592
 80 | 661.849394087408
 85 |  993.24554108594
 90 | 1490.57582238538
 95 | 2236.92550368832
100 | 3356.98166702019
(20 rows)

At the beginning, the wraparound portion is just 1.5, so easily dominated by
a table with a lot of dead tuples.  As the time to wraparound declines,
the term becomes steadily more urgent.  There may be constants
factors to fiddle with at the edges, but this term definitely heads towards
dominance.

That's 

Re: [HACKERS] Should pg_dump dump larger tables first?

2013-01-31 Thread Christopher Browne
On Thu, Jan 31, 2013 at 5:06 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 Also, it's far from obvious to me that largest first is the best rule
 anyhow; it's likely to be more complicated than that.

 But anyway, the right place to add this sort of consideration is in
 pg_restore --parallel, not pg_dump.  I don't know how hard it would be
 for the scheduler algorithm in there to take table size into account,
 but at least in principle it should be possible to find out the size of
 the (compressed) table data from examination of the archive file.

 From some experiences with pgloader and loading data in migration
 processes, often enough the most gains are to be had when you load the
 biggest table in parallel with loading all the little ones. It often
 makes it so that the big table loading time is not affected, and by the
 time it's done the rest of the database is done too.

 Loading several big'o'tables in parallel tend not to give benefits in
 the tests I've done so far, but that might be an artefact of python
 multi threading, I will do some testing with proper tooling later.

We had the idea of doing this with Slony, to try to process subscriptions
faster by, yes, indeed, loading large tables first, and throwing
reindexing off onto secondary threads.  The big win seemed to come
specifically from the reindexing aspect; that tends to take a fair bit
more time than the indexless COPY.

It would make the subscription process a bit more fragile, and would
add quite a bit of development work, for something that didn't seem to
be *that* much of a priority, so we never went past the
Gedankenexperiment of establishing that it seemed feasible.

A side-effect that we didn't have to worry about with Slony, but that
would be important for more general use, is what happens to the
processing of re-establishing foreign keys.  In Slony, we suppress
them on subscriber nodes; you'd need to worry about that for general
purpose use.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-31 Thread Christopher Browne
On Thu, Jan 31, 2013 at 3:18 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas escribiĂł:
 On Thu, Jan 31, 2013 at 2:40 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Alvaro Herrera escribiĂł:
 
  Okay, here's a patch along these lines.  I haven't considered Jim's
  suggestion downthread about discounting dead tuples from relpages; maybe
  we can do that by subtracting the pages attributed to dead ones,
  estimating via tuple density (reltuples/relpages).
 
  Patch attached.

 This strikes me as too clever by half.  You've introduced the concept
 of a Browne strength (apparently named for Christopher Browne) and
 yet you haven't even bothered to add a comment explaining the meaning
 of the term, let along justifying the choice of that formula rather
 than any other.  I don't want to dog this proposal to death, because
 surely we can do better than the status quo here, but adopting the
 first formula someone proposed without any analysis of whether it does
 the right thing cannot possibly be the correct decision process.

 My intention was to apply a Nasby correction to Browne Strength and call
 the resulting function Browne' (Browne prime).  Does that sound better?

 Now seriously, I did experiment a bit with this and it seems to behave
 reasonably.  Of course, there might be problems with it, and I don't
 oppose to changing the name.  Vacuum strength didn't sound so great,
 so I picked the first term that came to mind.  It's not like picking
 people's last names to name stuff is a completely new idea; that said,
 it was sort of a joke.

Color me amused :-).

And, when thinking about how strong these things are, just remember,
smell isn't everything.

I spent 20 minutes at a whiteboard arriving at the Browne strength,
and I think it's not unreasonable as a usage of the data already
immediately at hand.  But it is absolutely just intended as a
strawman proposal, and I'd be pleased to see it get prodded into
something more prime.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-30 Thread Christopher Browne
On Wed, Jan 30, 2013 at 4:05 PM, Jim Nasby j...@nasby.net wrote:
 On 1/25/13 11:56 AM, Christopher Browne wrote:

 With a little bit of noodling around, here's a thought for a joint
 function
 that I*think*  has reasonably common scales:


 f(deadtuples, relpages, age) =
 deadtuples/relpages + e ^ (age*ln(relpages)/2^32)


 Be careful with dead/relpages, because dead tuples increase relpages as
 well. The effect is extremely noticeable on frequently hit tables that need
 to be kept small. If you want to have a deadtuples/size metric, I think it
 would be far better to do deadtuples/non_bloated_table_size.

You may well be right about that; I didn't try to get to second order effects,
just to come up with something that seemed reasonably readily computable
based on existing data that seemed relevant.

One approach to things is to ask if we can tweak that model to be materially
better without requiring extra data collection.  It seems plausible to do so; we
might get something better by deducting dead tuples.  I'll not try to present
this yet; I think it's a better idea to come up with a broadly fuller model.

 Someone else in the thread mentioned that what we really need to be watching
 aren't raw values, but trends. Or you can think of it as watching first (or
 even second) derivatives if you like. I couldn't agree more. I believe there
 are several parts of Postgres that end up with a bunch of hard to tune GUCs
 specifically because we're measuring the wrong things.

My first order reaction is that this sounds great.  And I reckon it would be
a great idea to come up with a reasonably sophisticated model that covers
all of the behaviours that we see emerging.

Mind you, I'm not sure that we can actually *use* that model; it may require
collecting enough more data that it becomes burdensome to collect the
additional data.

 Take freezing for example. Since the only reason to freeze is XID wrap then
 the *ideal* time to start a freeze vacuum on a table is so that the vacuum
 would end *exactly* as we were about to hit XID wrap.

 Obviously that's a completely impractical goal to hit, but notice the
 simplicity of the goal: we only care about the vacuum ending right before
 we'd hit XID wrap. The only way to do that is to monitor how fast vacuums
 are running, how fast XIDs are being consumed, and how quickly the oldest
 XID in each table is advancing. Notice that all of those measurements are
 time derivatives.

Yep, and I think some second derivatives will prove useful too.

The function that I presented took a slightly different tack to things; the
exponential term would try to make sure that tables get frozen WELL in
advance of necessity.  That strikes me as being safer in cases where
a database is blowing through XIDs mighty quickly.

 From a more practical standpoint, I think it would be extremely useful to
 have a metric that showed how quickly a table churned. Something like dead
 tuples per time period. Comparing that to the non-bloated table size should
 give a very strong indication of how critical frequent vacuums on that table
 are.

That's a nice idea, though I'm pretty sure that this requires capturing and
storing a lot more statistical data, which presumably worsens things
(presuming the data gets stored in tables), making the database bigger,
adding an extra set of data that needs to get collected, queried, and
trimmed, and chewing through still more XIDs.

If you could demonstrate pretty forcibly improved behaviours falling
out of that, that may make it salable.  There's quite a bit of work
there, and we would need to accept a burden of still more stats
collection.

 I don't have a good metric in mind for freeze right now, but I do want to
 mention a use case that I don't think has come up before. When building a
 londiste slave (and presumably all the other trigger replication systems
 suffer from this), each table is copied over in a single transaction, and
 then updates start flowing in for that table. That can easily result in a
 scenario where you have an enormous volume of tuples that will all need
 freezing at almost exactly the same time. It would be nice if we could
 detect such a condition and freeze those tuples over time, instead of trying
 to freeze all of them in one shot.

I wonder if these are perhaps good candidates for being frozen immediately;
COPY FREEZE was added in not so long ago; it doesn't perfectly cover
this, but if I squint at it a bit...

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8de72b66a2edcf12c812de0a73bd50b6b7d81d62

I don't see a lot of value to having the data COPYed over to a subscriber
NOT getting frozen immediately.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-29 Thread Christopher Browne
On Mon, Jan 28, 2013 at 6:19 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Christopher Browne cbbro...@gmail.com writes:
 I'm poking at event triggers a bit; would like to set up some examples
 (and see if they
 work, or break down badly; both are interesting results) to do some
 validation of schema
 for Slony.

 Cool, thanks!

 What I'm basically thinking about is to set up some event triggers that run 
 on
 DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication
 side of things (e.g. - inject a request to drop the table/sequence
 from replication).

 Sure. In what got commited from the current patch series, you will only
 know that a DROP TABLE (or DROP SEQUENCE) occured, and we're trying to
 get to an agreement with Robert if we should prefer to add visibility to
 such events that occurs in a CASCADE statement or rather add the OID
 (and maybe the name) of the Object that's going to be dropped.

 Your opinion is worth a lot on that matter, if you have one to share! :)

Hmm.  I think some information about the object is pretty needful.

For the immediate case I'm poking at, namely looking for dropped tables,I
could determine that which object is gone by inference; if I run the trigger
as part of the ddl_command_end event, then I could run a query that
searches the slony table sl_table, and if I find any tables for which there
is no longer a corresponding table in pg_catalog.pg_class, then I infer
which table got dropped.

But I think I'd really rather know more explicitly which table is being dropped.

Having the oid available in some trigger variable should suffice.

It appears to me as though it's relevant to return an OID for all of the command
tags.

Something useful to clarify in the documentation is what differences are
meaningful between ddl_command_start and ddl_command_end to make
it easier to determine which event one would most want to use.

Musing a bit...  It seems to me that it might be a slick idea to run a
trigger at
both _start and _end, capturing metadata about the object into temp tables
at both times, which would then allow the _end function to compare the data
in the temp table to figure out what to do next.  I wouldn't think
that's apropos
as default behaviour; that's something for the crafty developer that's building
a trigger function to do.

Having a parse tree for the query that initiates the event would be
mighty useful,
as would be a canonicalized form of the query.

I think we could add some useful protection (e.g. - such as my example of
an event trigger that generates DROP TABLE FROM REPLICATION) using
the present functionality, even perhaps without OIDs, but I don't
think I'd want
to get into trying to forward arbitrary DDL without having the canonicalized
query available.

 I have a bit of a complaint as to what documentation is included; I don't see
 any references in the documentation to ddl_command_start / ddl_command_end,
 which seem to be necessary values for event triggers.

 What we have now here:

   http://www.postgresql.org/docs/devel/static/event-trigger-matrix.html
   http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html
   
 http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER

 Is it not visible enough, or really missing the point?

Ah, I missed the second one; I was looking under CREATE TRIGGER,
didn't notice that
CREATE EVENT TRIGGER was separately available; that resolves most of
what I thought
was missing.

I think a bit more needs to be said about the meanings of the events
and the command tags,
but what I imagined missing wasn't.

 I'd tend to think that there should be a new subsection in the man page for
 CREATE TRIGGER that includes at least two fully formed examples of event
 triggers, involving the two events in question.  Is change of that
 sort in progress?

 The event triggers are addressed in a whole new chapter of the docs,
 maybe that's why you didn't find the docs?

I found that chapter, just not the command :-).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-28 Thread Christopher Browne
I'm poking at event triggers a bit; would like to set up some examples
(and see if they
work, or break down badly; both are interesting results) to do some
validation of schema
for Slony.

What I'm basically thinking about is to set up some event triggers that run on
DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication
side of things (e.g. - inject a request to drop the table/sequence
from replication).

I have a bit of a complaint as to what documentation is included; I don't see
any references in the documentation to ddl_command_start / ddl_command_end,
which seem to be necessary values for event triggers.

I'd tend to think that there should be a new subsection in the man page for
CREATE TRIGGER that includes at least two fully formed examples of event
triggers, involving the two events in question.  Is change of that
sort in progress?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers: adding information

2013-01-25 Thread Christopher Browne
On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 OK, but can we lay the issue of a *normalized* command string to the
 side for just one minute, and talk about exposing the *raw* command
 string?  It seems to me that this would be (1) very easy to do, (2)
 reasonable to slip into 9.3, and (3) useful to some people.  Arguably,
 the normalized command string would be useful to MORE people, but I
 think the raw command string is not without its uses, and it's at
 least one order of magnitude less work.

 My understanding is that if the command string we give to event triggers
 is ambiguous (sub-object names, schema qualifications, etc), it comes
 useless for logical replication use. I'll leave it to the consumers of
 that to speak up now.

Useless is a strong word, but it certainly injures usefulness pretty
substantially.

If it isn't normalized, then either we accept that:

a) If you fail to properly qualify your inputs, when generating DDL,
we can offer that it's pretty likely it'll all smash on the floor when
we try to replicate it, or

b) We need to capture the active search_path from the
environment at the instant of the DDL capture event, and carry it
over along with the DDL.  If we could assume that the
GUC, search_path, was the correct value, that's possibly not super
difficult, but I'm not certain that's the correct thing to capture.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-24 Thread Christopher Browne
On Thu, Jan 24, 2013 at 5:22 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Backpatching sounds a bit scary. It's not a clear-cut bug, it's just that
 autovacuum could be smarter about its priorities. There are other ways you
 can still bump into the xid-wraparound issue, even with this patch.

I don't think this is a single-priority issue.  It's *also* crucial
that small tables
with high tuple attrition rates get vacuumed extremely frequently; your system
will bog down, albeit in a different way, if the small tables don't
get vacuumed enough.

This seems to me to involve multiple competing priorities where the
main solution
*I* can think of is to have multiple backends doing autovacuum, and assigning
some to XID activity and others to the small, needs vacuuming
frequently tables.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-16 Thread Christopher Browne
On Wed, Jan 16, 2013 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 15, 2013 at 3:37 PM, Christopher Browne cbbro...@gmail.com 
 wrote:
 That points towards a fix that involves having a set of non-arbitrary 
 commands
 that we allow plain users to use.

 Hmm.  There's an interesting thought...

 How about having a pg_filters table in pg_catalog which allows capturing
 labels and names of known-to-be-safe binary filters:

 insert into pg_filters (label, location)
 values
 ('zcat', '/usr/bin/zcat'),
 ('bzip2', '/usr/bin/bzip2'),
 ('bunzip2', '/usr/bin/bunzip2');

 And then having some capability to grant permissions to roles to use
 these filters.

 I suspect that's going to be less efficient than using a compression
 library that's linked into the backend, because you have to copy all
 the data through the kernel to another process and back.  And it's
 certainly a lot more complex.

More complex, certainly.

By spawning a separate process, we'd get benefit of multicore CPUs,
so I'm not sure I agree that it's necessarily slower.

 At any rate, I think it would be good to avoid letting our desire for
 infinite flexibility get in the way of doing something useful.

Oh, agreed.  I was actively thinking of the cooler bits of this pointing more
towards 9.4 than 9.3.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Christopher Browne
On Tue, Jan 15, 2013 at 2:53 PM, Stephen Frost sfr...@snowman.net wrote:
 The popen patch doesn't support the '|compression-binary' option through
 the FE protocol.  Even if it did, it would only be available for
 superusers as we can't allow regular users to run arbitrary commands on
 the server-side.

That points towards a fix that involves having a set of non-arbitrary commands
that we allow plain users to use.

Hmm.  There's an interesting thought...

How about having a pg_filters table in pg_catalog which allows capturing
labels and names of known-to-be-safe binary filters:

insert into pg_filters (label, location)
values
('zcat', '/usr/bin/zcat'),
('bzip2', '/usr/bin/bzip2'),
('bunzip2', '/usr/bin/bunzip2');

And then having some capability to grant permissions to roles to use
these filters.

That's not a version 1 capability...  Suppose we have, in 9.3, that there are
direct references to |/usr/bin/zcat (and such), and then hope, in
9.4, to tease
this out to be a non-superuser-capable facility via the above pg_filters?

These filters should be useful for FDWs as well as for COPY.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database

2013-01-03 Thread Christopher Browne
On Thu, Jan 3, 2013 at 12:27 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 3, 2013 at 11:15 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is what I did with my sample pl/python function ;)

 Yeah, except that the c in ctime does not stand for create, and
 therefore the function isn't necessarily reliable.  The problem is
 even worse for tables, where a rewrite may remove the old file and
 create a new one.  I mean, I'm not stupid about this: when I need to
 figure this kind of stuff out, I do in fact look at the file times -
 mtime, ctime, atime, whatever there is.  Sometimes that turns out to
 be helpful, and sometimes it doesn't.  An obvious example of the
 latter is when you're looking at a bunch of files that have just been
 untarred from a backup device.

Yep, and I think that the behaviour of tar pretty nicely characterizes
what's troublesome here.  It is quite likely that a tar run will *capture*
the creation time of a file, but if you pull data from a tar archive, it is
by no means obvious that the filesystem can or will accept that date
and apply it to the extracted copy.

I'd contrast pg_dump with tar in that the former is intended as more of
a logical dump than the latter, so that, in keeping with Greg Stark's
comments, these timestamps Should Not be captured or carried forward
by pg_dump.

The interaction with streaming replication is pretty analogous to the
interaction one might expect to get out of filesystem snapshot
technologies like DRBD, zfs, btrfs, LVM.  If we put a creation time
into pg_database or pg_class, then streaming replication will, as a
physical replication mechanism, carry the timestamp forward into
replicas, in pretty much exactly the same fashion that timestamps
would be carried onto clones/snapshots by the filesystem
snapshotting systems.

And in contrast, I'd expect Andres Freund's logical replication
infrastructure *NOT* to carry these dates over, but rather to establish
fresh new creation dates on a replica.  (And from a forensic perspective,
that's a perfectly fine thing.)

I imagine that we should be careful to put these forensic timestamps
onto things with some care.

- Putting them on pg_database seems like a fine idea.
- Putting them on pg_attribute seems mighty dodgy; I don't expect I'd
often care, and this change increases the size of an extremely heavily
accessed system table
- I am equivocal about putting them on pg_class.  That increases the
size of a pretty big, heavily accessed system table.
- Perhaps there are other relevant tables (pg_event_trigger,
pg_extension, FDW tables, pg_language, pg_proc, pg_tablespace); I
don't feel so strongly about them, but if you're puzzling over what
went wrong with an extension, event trigger, or FDW, time of creation
seems like it might be useful.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL STRICT

2012-12-21 Thread Christopher Browne
On Fri, Dec 21, 2012 at 10:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Marko Tiikkaja pgm...@joh.to writes:
  Courtesy of me, Christmas comes a bit early this year.  I wrote a patch
  which allows you to add STRICT into PERFORM and INSERT/UPDATE/DELETE
  without specifying an INTO clause.

 What is the use-case for this?  Won't this result in the word STRICT
 becoming effectively reserved in contexts where it currently is not?
 (IOW, even if the feature is useful, I've got considerable doubts about
 this syntax for it.  The INTO clause is an ugly, badly designed kluge
 already --- let's not make another one just like it.)

Yep, the use case for this seems mighty narrow to me.

I could use GET DIAGNOSTICS to determine if nothing got altered, and
it seems likely to me that expressly doing this via IF/ELSE/END IF would
be easier to read in function code than a somewhat magic STRICT
side-effect.

I certainly appreciate that brevity can make things more readable, it's
just
that I'm not sure that is much of a help here.

This is adding specific syntax for what seems like an unusual case to me,
which seems like an unworthwhile complication.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Writing Trigger Functions in C

2012-12-21 Thread Christopher Browne
On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes charle...@outlook.com
wrote:

 Hello guys,

 I've been finding performance issues when using a trigger to modify
inserts on a partitioned table.
 If using the trigger the total time goes from 1 Hour to 4 hours.

 The trigger is pretty simple:

 CREATE OR REPLACE FUNCTION quotes_insert_trigger()
 RETURNS trigger AS $
 BEGIN
 EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'_MM_DD')
||' VALUES (($1).*)' USING NEW ;
 RETURN NULL;
 END;
 $
 LANGUAGE plpgsql;

 I've seen that some of you guys have worked on writing triggers in C.

 Does anyone have had an experience writing a trigger for partitioning in
C ?

I'd want to be very careful about assuming that implementing the trigger
function in C
would necessarily improve performance.  It's pretty likely that it wouldn't
help much,
as a fair bit of the cost of firing a trigger have to do with figuring out
which function to
call, marshalling arguments, and calling the function, none of which would
magically disappear by virtue of implementing in C.

A *major* cost that your existing implementation has is that it's
re-planning
the queries for every single invocation.  This is an old, old problem from
the
Lisp days, EVAL considered evil  
http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil

The EXECUTE winds up replanning queries every time the trigger fires.

If you can instead enumerate the partitions explicitly, putting them into
(say) a
CASE clause, the planner could generate the plan once, rather than a
million
times, which would be a HUGE savings, vastly greater than you could expect
from
recoding into C.

The function might look more like:

create or replace function quotes_insert_trigger () returns trigger as $$
declare
c_rt text;
begin
   c_rt := to_char(new.received_time, '_MM_DD');
   case c_rt
 when '2012_03_01' then
   insert into 2012_03_01 values (NEW.*) using new;
 when '2012_03_02' then
   insert into 2012_03_02 values (NEW.*) using new;
 else
   raise exception 'Need a new partition function for %', c_rt;
 end case;
end $$ language plpgsql;

You'd periodically need to change the function to reflect the existing set
of
partitions, but that's cheaper than creating a new partition.

The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases.  You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.

But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] feature proposal - triggers by semantics

2012-11-15 Thread Christopher Browne
On Thu, Nov 15, 2012 at 2:53 PM, Darren Duncan dar...@darrenduncan.net
wrote:
 I still think the syntax of TRUNCATE FOR EACH ROW would be useful, but if
no one agrees...

I'm compelled to disagree.

What was useful about TRUNCATE in the first place was that it quickly
operated against the entire table.

If you want to change that to row-by-row processing, then it is actually a
little bit worse than
DELETE FROM some_table, in that it is introducing an irregularity in
language that no longer
provides any corresponding benefit.  (e.g. - such as that TRUNCATE is fast).

If you want to be certain of doing row-by-row processing, then the better
answer is to:
a) Use DELETE instead in your application, and
b) Put a guard on to prevent using TRUNCATE.  (e.g. - attach triggers that
react to TRUNCATE with go away, don't bother me!)

I observe that the Slony replication system initially implemented that
'guard' approach when TRUNCATE
triggers were first provided, until we came up with a suitable strategy to
capture and replicate the
TRUNCATE request.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-11-13 Thread Christopher Browne
Preface: I think there's some great commentary here, and find myself
agreeing
pretty whole-heartedly.

On Tue, Nov 13, 2012 at 2:45 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 13 November 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  The most popular relational database in the world is Microsoft Access,
  not MySQL. Access appears desirable because it allows a single user to
  create and use a database (which is very good). But all business
  databases have a requirement for at least one of: high availability,
  multi-user access or downstream processing in other parts of the
  business.
 
  That's a mighty sweeping claim, which you haven't offered adequate
  evidence for.  The fact of the matter is that there is *lots* of demand
  for simple single-user databases, and what I'm proposing is at least a
  first step towards getting there.

 I agree there is lots of demand for simple single-user databases and I
 wish that too. What I don't agree with is something that casts that
 requirement in stone by architecturally/permanently disallowing
 secondary connections.

 Evidence for claims:
 * The whole Business Intelligence industry relies on being able to
 re-purpose existing data, forming integrated webs of interconnecting
 databases. All of that happens after the initial developers write the
 first version of the database application.
 * Everybody wants a remote backup, whether its for your mobile phone
 contact list or your enterprise datastore.

 People are migrating away from embedded databases in droves for these
 very reasons.


There seems to be a continuum of different sorts of scenarios of
more-to-less
concurrency that are desirable for some different reasons.  From
most-to-least,
I can see:

1 - Obviously, there's the case that Postgres is eminently good at, of
supporting
  many users concurrently using a database.  We love that, let's not break
it :-).

2 - We have found it useful to have some extra work processes that do some
  useful internal things, such as vacuuming, forcing background writes,
  collecting statistics.  And an online backup requires having a second
process.

3 - People doing embedded systems find it attractive to attach all the data
  to the singular user running the system.  Witness the *heavy* deployment
  of SQLite on Android and iOS.  People make an assumption that this is
  a single-process thing, but I am inclined to be a bit skeptical.  What
they
  *do* know is that it's convenient to not spawn extra processes and do
  IPC.  That's not quite the same thing as it being a certainty that they
  definitely want not to have more than one process.

4 - There are times when there *is* certainty about not wanting there to be
more
  than one process.  When running pg_upgrade, or, at certain times, when
  doing streaming replication node status switches, one might have that
  certainty.  Or when reindexing system tables, which needs single user
mode.

For us to conflate the 3rd and 4th items seems like a mistake to me.


  The main disadvantage of approaching this via the existing single-user
  mode is that you won't have any autovacuum, bgwriter, etc, support.
  But the flip side is that that lack of infrastructure is a positive
  advantage for certain admittedly narrow use-cases, such as disaster
  recovery and pg_upgrade.  So while I agree that this isn't the only
  form of single-user mode that we'd like to support, I think it is *a*
  form we'd like to support, and I don't see why you appear to be against
  having it at all.

 I have no problem with people turning things off, I reject the idea
 that we should encourage people to never be able to turn them back on.


Yep.  That seems like conflating #2 with #4.

It's mighty attractive to have a forcible single process mode to add
safety to
certain activities.

I think we need a sharper knife, though, so we don't ablate off stuff like
#2, just
because someone imagined that Must Have Single Process!!! was the right
doctrine.


  A more reasonable objection would be that we need to make sure that this
  isn't foreclosing the option of having a multi-process environment with
  a single user connection.  I don't see that it is, but it might be wise
  to sketch exactly how that case would work before accepting this.

 Whatever we provide will become the norm. I don't have a problem with
 you providing BOTH the proposed single user mode AND the multi-process
 single user connection mode in this release. But if you provide just
 one of them and its the wrong one, we will be severely hampered in the
 future.

 Yes, I am very much against this project producing a new DBMS
 architecture that works on top of PostgreSQL data files, yet prevents
 maintenance, backup, replication and multi-user modes.

 I see this decision as a critical point for this project, so please
 consider this objection and where it comes from.


I don't think we're necessarily *hugely* hampered by doing one of 

Re: [HACKERS] WIP checksums patch

2012-11-08 Thread Christopher Browne
On Mon, Nov 5, 2012 at 12:19 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Oct 29, 2012 at 4:31 PM, Jim Nasby j...@nasby.net wrote:
  For whatever it's worth... we (and presumably others) still use londiste
 (or
  Slony) as our upgrade path, so we could tolerate a cluster-wide setting.
  We'd just set it when building new clusters via londiste and forget about
  it.
 
  So I'd rather see this get in at a cluster level than not make it at all
  while we wait for something better.

 Yeah.  I definitely think that we could shed an enormous amount of
 complexity by deciding that this is, for now, an option that can only
 be selected at initdb time.  That would remove approximately 85% of
 everything I've ever disliked about this patch - without, I think,
 precluding the possibility of improving things later.


I see one thing to be concerned about, there...

I imagine it would not be a totally happy thing if the only way to switch
it on/off was to use Slony or Londiste to replicate into a database with
the opposite setting.  (e.g. - This implies that built-in replication may
only replicate into a database with the identical checksum configuration.)

It's not outrageous for it to be a pretty heavyweight operation to switch
polarities, but there's such a thing as too heavy.
--
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] libpq

2012-11-06 Thread Christopher Browne
It seems not unusual for Linux distributions to supply libpq as part of a
separate package (whether via dpkg, which I think uses ar as the
archiver, or RPM, which uses cpio).

Possibly this is already provided on your system via some means akin to
those.

If, instead, you are keen on getting the source code for libpq in a
separate tarball, I'd seriously question why that would be expected to be
valuable.  On most systems, these days, it doesn't take terribly much time
or space (on our systems with lots of GBs) to build all of Postgres, so
separating the source code to the library out seems like an effort with not
much value.


On Tue, Nov 6, 2012 at 2:11 PM, Stefan humdumde...@gmail.com wrote:

 Would it be possible to provide libpq in separate tarbal or is that too
 much to ask for ?

 Thank you!

 Kind regards
 Stefan





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-10-30 Thread Christopher Browne
On Tue, Oct 30, 2012 at 5:25 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/29/12 6:40 AM, Chris Corbyn wrote:
 What's the use case of this? It sounds like it will just create a 
 maintenance nightmare where some stuff you expect to lookup in in 
 postgresql.conf is actually hiding in the .auto file. Assuming only super 
 users/sysadmins would have the ability to change things in the config file, 
 wouldn't they be more likely to just do it on the server and edit the .conf 
 (which among other things, keeps it tidy and orderly).

 The use is the ability to manage dozens, or hundreds, of PostgreSQL
 servers via Port 5432.  It would also make writing an auto-configurator
 easier.

 I agree that there's not much benefit if you're only managing a single
 PostgreSQL server.  There's a lot of benefit for those of us who have to
 manage a lot of them though.

I rather think that the fact that postgresql.conf has supported an
include directive since at least as far back as 8.2 (likely further;
I'll not bother spelunking further into the docs) makes this extremely
troublesome.

We have long supported the notion that this configuration does not
have a Unique Place to be (e.g. - if you use INCLUDE, then there are
at least two possible places).

I should think that doing this requires heading back towards there
being a single unique configuration stream, and over the course of
several versions, deprecating the INCLUDE directive.

I imagine it means we'd want to come up with a representation that has
suitable semantics for being written to, make sure it is reasonably
expressive *without* INCLUDE, and establish a migration path between
the old and new forms.  At some point, the old form can be treated as
vestigal, and be dropped.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-10-23 Thread Christopher Browne
On Tue, Oct 23, 2012 at 6:12 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/22/12 11:47 AM, Phil Sorber wrote:
 Also, it seems that about 75% of the patch is connection options processing.  
 How about
 we get rid of all that and just have them specify a connection string?  It 
 would be a break
 with tradition, but maybe it's time for something new.

I'd be pretty pleased if it had just two ways to get configured:
a) A connection string (which might, in the new order of things, be a
JDBC-like URI), or
b) Environment values drawn in from PGHOST/PGPORT/...

That's pretty much enough configurability, I'd think.

 Functionality:

 I'm missing the typical ping functionality to ping continuously.  If we're 
 going to call
 it pg_ping, it ought to do something similar to ping, I think.

Yep, should have equivalents to:
 -i, an interval between pings,
 -c, a count
 -w/-W, a timeout interval

Might be nice to have analogues to:
-D printing timestamp before each line
-q quiets output
-v verbose output (got it, check!)
-V version (got it, check!)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows

2012-10-21 Thread Christopher Browne
I agree that it seems inappropriate to preserve order.  That seems an
inappropriate imposition, inconsistent with what SQL does elsewhere.

If there is a natural sequence (e.g. - a value assigned by nextval()), that
offers a natural place to apply the usual order-imposing ORDER BY that we
are expected to use elsewhere.

I suppose it is troublesome if there is no such natural sequence, but I
wouldn't think it too meaningful to expect order without some visible
source of order.


Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-18 Thread Christopher Browne
On Thu, Oct 18, 2012 at 9:49 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 16 October 2012 15:26, Jan Wieck janwi...@yahoo.com wrote:
 This means that the transition time from the existing, trigger based
 approach to the new WAL based mechanism will see both technologies in
 parallel, which is no small thing to support.

 So, you're talking about a shim between the two in order to usefully
 support inter-version replication, or are you just thinking about
 making a clean break in compatibility for Postgres versions prior to
 9.3 in a new release branch?

It's early to assume either.

In Slony 2.0, we accepted that we were breaking compatibility with
versions of Postgres before 8.3; we accepted that because there were
considerable 'manageability' benefits (e.g. - system catalogues no
longer hacked up, so pg_dump works against all nodes, and some
dramatically reduced locking).

But that had the attendant cost that we have had to continue fixing
bugs on 1.2, to a degree, even until now, because people on Postgres
versions earlier than 8.3 have no way to use version 2.0.

Those merits and demerits apply pretty clearly to this.

It would be somewhat attractive for a version 2.3 (or, more likely,
to indicate the break from earlier versions, 3.0 to make the clean
break to the new-in-PG-9.3 facilities.  It is attractive in that we
could:
a) Safely remove the trigger-based log capture apparatus (or, at
least, I'm assuming so), and
b) Consciously upgrade to take advantage of all the latest cool stuff
found in Postgres 9.3.  (I haven't got any particular features in
mind; perhaps we add RANGE comparators for xid to 9.3, and make
extensive use of xid_range types?  That would be something that
couldn't reasonably get hacked to work in anything before 9.2...)
c) Drop out any special cases having to do with support of versions
8.3, 8.4, 9.0, 9.1, and 9.2.

But, of course, we'd be leaving everyone running 8.3 thru 9.2 behind,
if we did so, and would corresponding shackle ourselves to need to
support the 2.x branches for still longer.  And this would mean that
this Slony 3.0 would expressly NOT support one of our intended use
cases, namely to support upgrading from elder versions of Postgres.

A shim adds complexity, but retains the upgrade across versions
use case, and reduces the need to keep supporting elder versions of
Slony.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

2012-10-18 Thread Christopher Browne
On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 * works as table on INSERTS up to inserting logical WAL record describing
 the
 insert but no data is inserted locally.

 with all things that follow from the local table having no data
   - unique constraints don't make sense
   - indexes make no sense
   -  updates and deletes hit no data
   - etc. . .

Yep, I think I was understanding those aspects.

I think I disagree that indexes make no sense.

I think that it would be meaningful to have an index type for this,
one that is a pointer at WAL records, to enable efficiently jumping to
the right WAL log to start accessing a data stream, given an XID.
That's a fundamentally different sort of index than we have today
(much the way that hash indexes, GiST indexes, and BTrees differ from
one another).

I'm having a hard time thinking about what happens if you have
cascaded replication, and want to carry records downstream.  In that
case, the XIDs from the original system aren't miscible with the XIDs
in a message queue on a downstream database, and I'm not sure what
we'd want to do.  Keep the original XIDs in a side attribute, maybe?
It seems weird, at any rate.  Or perhaps data from foreign sources has
got to go into a separate queue/'sorta-table', and thereby have two
XIDs, the source system XID and the when we loaded it in locally
XID.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

2012-10-17 Thread Christopher Browne
Well, replication is arguably a relevant case.

For Slony, the origin/master node never cares about logged changes - that
data is only processed on replicas.  Now, that's certainly a little
weaselly - the log data (sl_log_*) has got to get read to get to the
replica.

This suggests, nonetheless, a curiously different table structure than is
usual, and I could see this offering interesting possibilities.

The log tables are only useful to read in transaction order, which is
pretty well the order data gets written to WAL, so perhaps we could have
savings by only writing data to WAL...

It occurs to me that this notion might exist as a special sort of table,
interesting for pgq as well as Slony, which consists of:

- table data is stored only in WAL
- an index supports quick access to this data, residing in WAL
- TOASTing perhaps unneeded?
- index might want to be on additional attributes
- the triggers-on-log-tables thing Slony 2.2 does means we want these
tables to support triggers
- if data is only held in WAL, we need to hold the WAL until (mumble,
later, when known to be replicated)
- might want to mix local updates with updates imported from remote nodes

I think it's a misnomer to think this is about having the data not locally
accessible.  Rather, it has a pretty curious access and storage pattern.

And a slick pgq queue would likely make a good Slony log, too.


Re: [HACKERS] [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

2012-10-17 Thread Christopher Browne
On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus j...@agliodbs.com wrote:

 It is not meant to be a full implementation of application level queuing
 system though but just the capture, persisting and distribution parts

 Using this as an application level queue needs a set of interface
 functions to extract the events and also to keep track of the processed
 events. As there is no general consensus what these shoul be (like if
 processing same event twice is allowed) this part is left for specific
 queue consumer implementations.

 Well, but AFAICT, you've already prohibited features through your design
 which are essential to application-level queues, and are implemented by,
 for example, pgQ.

 1. your design only allows the queue to be read on replicas, not on the
 node where the item was inserted.

I commented separately on this; I'm pretty sure there needs to be a
way to read the queue on a replica, yes, indeed.

 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
 earth would a client know which items they have executed and which they
 haven't?

If the items are actually stored in WAL, then it seems well and truly
impossible to do any of those three things directly.

What could be done, instead, would be to add successor items to
indicate that they have been dealt with, in effect, back-references.

You don't get to UPDATE or DELETE; instead, you do something like:

   INSERT into queue (reference_to_xid, reference_to_id_in_xid, action)
values (old_xid_1, old_id_within_xid_1, 'COMPLETED'), (old_xid_2,
old_id_within_xid_2, 'CANCELLED');

In a distributed context, it's possible that multiple nodes could be
reading from the same queue, so that while process at least once is
no trouble, process at most once is just plain troublesome.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-17 Thread Christopher Browne
On Wed, Oct 17, 2012 at 5:45 PM, Daniel Farina dan...@heroku.com wrote:
 retort -- which is true, Heroku's user base is not provably
 representative of all users.  But what else is there to go on, besides
 experiences of others, such as yours and Andrew's, or others?

Well, Heroku doesn't support Slony + Londiste (and I am not overly
indignant at your reasoning at not supporting them), so I think we can
guarantee that anything involving those trigger-based replication
systems will be absent from Heroku's user base.  Which is not to
express indignance, but merely that there are a few knowable biases
about your users.  And I'd expect quite a few unknow(n|able) ones.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncate if exists

2012-10-15 Thread Christopher Browne
On Mon, Oct 15, 2012 at 11:57 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Oct 15, 2012 at 10:34 AM, Greg Stark st...@mit.edu wrote:
 I'm a bit lost. I would think pl/pgsql is precisely the same as
 Oracle's pl/sql and MS's T-SQL. I see the complaint you have as a
 purely implementation detail. I don't think pl/pgsql is the best
 implemented part of Postgres but I don't see how integrating it into
 the core is going to automatically make it all wonderful either.

 It isn't.  But (1) there would be a significant usability benefit in
 not having to surround procedural logic with DO $$ BEGIN ... END $$
 and (2) PL/pgsql's performance issues seem to revolve around the fact
 that you don't get one big ol' plan thingy that can be passed to the
 executor and run; instead, you interpret each statement separately and
 pass them off to the executor one piece at a time.

The places where *I* care about this are places where performance is
almost entirely irrelevant to the question.

When I'm writing 'scripts' that are doing this kind of thing, I'm
doing schema 'surgery', and, within reason, it's not particularly
performance sensitive.  I'm much more worried about DDL scripts being
repeatable and manageable than I am about them being fast.

So I'm going to elide the performance bits.

Robert, when you first tossed out the notion of:

do $$
begin
if (select 1 from pg_class where relname = 'foo' and
pg_table_is_visible(oid)) then
truncate table foo;
end if;
end
$$;

my first reaction was Ick!   Why am I switching languages (e.g. -
from plain SQL to pl/pgsql), and running functions to do this?!?

In retrospect, your later comments make it pretty clear that you're
not proposing that as the end state, just that that's the
functionality that needs to be run.

That would would be equivalent to my would-be-strawman syntax of:

TRUNCATE TABLE public.foo IF EXISTS TABLE public.foo;

I'm comfortable that Dimitri didn't particularly love the idea of
stowing the conditional at the end; it was just a strawman proposal,
and what was particularly important to me was to make sure that it was
recognizable that other systems (e.g. - Perl, Ruby, probably SNOBOL)
have done the very same thing.  I'd be perfectly happy if someone came
up with something better.  The number of +1's thus far is pretty
gratifying, mind you.

 Fwiw my experience has consistently been that life got better whenever
 I moved anything I had implemented as PL/SQL or PL/pgsql into client
 code in Perl or Python.

 Hmm... I've had the opposite experience, which I guess is why I've got
 strong feelings about this.

When I'm managing schema, I have exactly *zero* interest in
switching over to Perl or Python.  Those aren't languages for managing
database schemas, and, if I wind up using them, my code is going to be
rife with context switches as I'm switching between
   oh, am I writing Perl code?
and
   Am I attached to the right Perl database connection object, with
the proper transaction context?
and
   Oh, here is the SQL DDL for managing the schema.

Two of these three varieties of contexts are distracting sidelines to
me.  Guess which are the two?  :-)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-15 Thread Christopher Browne
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus j...@agliodbs.com wrote:
 So, problem #1 is coming up with a mathematical formula.  My initial target 
 values are in terms of # of rows in the table vs. # of writes before analyze 
 is triggered:

 1 : 3
 10 : 5
 100 : 10
 1000 : 100
 10 : 2000
 100 : 5000
 1000 : 25000
 1 : 10

Do we necessarily care about smoothness?

If we don't at all, then this would be fine:

func powerlaw (tuples int) int {
if tuples  10 {
return 3
}
if tuples  100 {
return 5
}
if tuples  1000 {
return 10
}
if tuples  10 {
return 100
}
if tuples  100 {
return 2000
}
if tuples  1000 {
return 5000
}
if tuples  1 {
return 25000
}
return 10
}

If we want smoothness within the ranges, this is a piecewise linear
representation of your table:

func powerlaw2 (tuples int) int {
if tuples  10 {
return 3
}
if tuples  100 {
return 5 + 5 * (tuples - 90)/90
}
if tuples  1000 {
return 10 + 90 * (tuples - 900)/900
}
if tuples  10 {
return 100 + 1900 * (tuples - 99000)/99000
}
if tuples  100 {
return 2000 + 3000 * (tuples - 90)/90
}
if tuples  1000 {
return 5000 + 22000 * (tuples - 900)/900
}
if tuples  1 {
return 25000 + 75000 * (tuples - 9000)/9000
}
return 10   
}

That's in Go, but there shouldn't be anything too unfamiliar looking
about it :-).

It would be nice to have a simpler functional representation, but the
above is by no means heinous, and it's not verbose beyond reason.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncate if exists

2012-10-15 Thread Christopher Browne
On Mon, Oct 15, 2012 at 3:14 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 if (select 1 from pg_class where relname = 'foo' and
 pg_table_is_visible(oid)) then
 truncate table foo;
 end if;

 Yeah, I think the functionality that we need is pretty much there
 already today.  What we need to do is to get the syntax to a point
 where people can write the code they want to write without getting
 tangled up by it.

 What about continuing to extend on that incredibly useful WITH syntax we
 already have:

WITH target AS (
   SELECT oid::regclass AS t
 FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE pg_table_is_visible(oid)
  AND nspname = 'public' AND NOT relname ~ 'exclude-pattern'
)
TRUNCATE TABLE t FROM target;

This still seems to be trying rather too hard.

The original suggestion was that, given the original query:

  truncate table public.foo;

that we add syntax to make the request optional:

  truncate table if exists public.foo;

Throwing in $$, oid, pg_class, joins, and such all seem like way more
syntax than we started with.

There are only so many 'clean' ways to modify the truncate request:

a) We could augment TRUNCATE with an IF EXISTS modifier, as
described in the initial patch.

b) Perhaps the IF EXIST might fit well afterwards, or be reversed somehow.

   truncate table unless not exists public.foo;
   truncate table public.foo if exists;
   truncate table where exists public.foo;

c) My proposal was to add in a more generic modifier that wouldn't be
specific to TRUNCATE.

Thus:

   truncate table public.foo if exists table public.foo;

That's a *little* longer than what's in b), but this would allow
extending the conditional to any kind of statement, which seems like a
more powerful idea to me.  It would also support doing other actions
on the same conditional basis:

   insert into bar (select id, name from public.foo)
  if exists table public.foo;

If you want a more prefix-y version, well, here's how it might look
using a leading WITH clause:

   with exists table public.foo
  truncate public.foo;

  with exists table public.foo
 insert into bar (select id, name from public.foo);

I don't terribly much like that.  I think I'd rather use WHEN than WITH.

   when exists table public.foo
  truncate public.foo;

  when exists table public.foo
 insert into bar (select id, name from public.foo);

That does seem a bit nicer than the { STATEMENT } if (conditional)
idea.  And nary a $$, oid, or pg_class to be seen.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-15 Thread Christopher Browne
On Mon, Oct 15, 2012 at 3:18 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 15 October 2012 19:19, Bruce Momjian br...@momjian.us wrote:
 I think Robert is right that if Slony can't use the API, it is unlikely
 any other replication system could use it.

 I don't accept that. Clearly there is a circular dependency, and
 someone has to go first - why should the Slony guys invest in adopting
 this technology if it is going to necessitate using a forked Postgres
 with an uncertain future? That would be (with respect to the Slony
 guys) a commercial risk that is fairly heavily concentrated with
 Afilias.

Yep, there's something a bit too circular there.

I'd also not be keen on reimplementing the Slony integration over
and over if it turns out that the API churns for a while before
stabilizing.  That shouldn't be misread as I expect horrible amounts
of churn, just that *any* churn comes at a cost.  And if anything
unfortunate happens, that can easily multiply into a multiplicity of
painfulness(es?).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-15 Thread Christopher Browne
On Mon, Oct 15, 2012 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Monday, October 15, 2012 10:08:28 PM Christopher Browne wrote:
 On Mon, Oct 15, 2012 at 3:18 PM, Peter Geoghegan pe...@2ndquadrant.com
 wrote:
  On 15 October 2012 19:19, Bruce Momjian br...@momjian.us wrote:
  I think Robert is right that if Slony can't use the API, it is unlikely
  any other replication system could use it.
 
  I don't accept that. Clearly there is a circular dependency, and
  someone has to go first - why should the Slony guys invest in adopting
  this technology if it is going to necessitate using a forked Postgres
  with an uncertain future? That would be (with respect to the Slony
  guys) a commercial risk that is fairly heavily concentrated with
  Afilias.

 Yep, there's something a bit too circular there.

 I'd also not be keen on reimplementing the Slony integration over
 and over if it turns out that the API churns for a while before
 stabilizing.  That shouldn't be misread as I expect horrible amounts
 of churn, just that *any* churn comes at a cost.  And if anything
 unfortunate happens, that can easily multiply into a multiplicity of
 painfulness(es?).

 Well, as a crosscheck, could you list your requirements?

 Do you need anything more than outputting data in a format compatible to whats
 stored in sl_log_*? You wouldn't have sl_actionseq, everything else should be
 there (Well, you would need to do lookups to get the tableid, but thats not
 really much of a problem). The results would be ordered in complete
 transactions, in commit order.

Hmm.  We need to have log data that's in a compatible ordering.

We use sl_actionseq, and can mix data from multiple transactions
together; if what you're providing is, instead, in order based on
transaction commit order followed by some sequencing within each
transaction, then that should be acceptable.

The stylized query on sl_log_* looks like...

select log_origin, log_txid, log_tableid,
log_actionseq, log_tablenspname,
log_tablerelname, log_cmdtype,
log_cmdupdncols, log_cmdargs
from %s.sl_log_%d
where log_origin = %d

How about I quibble about each of these:

a) log_origin - this indicates the node from which the data
originates.  Presumably, this is implicit in a chunk of data that is
coming in.

b) log_txid - indicating the transaction ID.  I presume you've got
this available.  It's less important with the WAL-based scheme in that
we'd probably not be using it as a basis for querying as is the case
today with Slony.

c) log_tableid - indicating the ID of the table.  Are you capturing an
OID equivalent to this?  Or what?

d) log_actionseq - indicating relative sequences of updates.  You
don't have this, but if you're capturing commit ordering, we don't
need it.

e) log_tablenspname, log_tablerelname - some small amount of magic
needful to get this.  Or perhaps you are already capturing it?

f) log_cmdtype - I/U/D/T - indicating the action
(insert/update/delete/truncate).  Hopefully you have something like
this?

g) log_cmdupdncols - for UPDATE action, the number of updated columns.
 Probably not mandatory; this was a new 2.1 thing...

h) log_cmdargs - the actual data needed to do the I/U/D.  The form of
this matters a fair bit.  Before Slony 2.1, this was a portion of a
SQL statement, omitting the operation (provided in log_cmdtype) and
the table name (in log_tablerelname et al).  In Slony 2.1, this
changes to be a text[] array that essentially consists of pairs of
[column name, column value] values.

I see one place, very notable in Slony 2.2, that would also be more
complicated, which is the handling of DDL.

In 2.1 and earlier, we handled DDL as events, essentially out of
band.  This isn't actually correct; it could mix very badly if you had
replication activity mixing with DDL requests.  (More detail than you
want is in a bug on this...
http://www.slony.info/bugzilla/show_bug.cgi?id=137).

In Slony 2.2, we added a third log table where DDL gets captured.
sl_log_script has much the same schema as sl_log_{1,2}; it needs to
get mixed in in compatible order.  What I imagine would pointedly
complicate life is if a single transaction contained both DDL and
regular replicable activity.  Slony 2.2 mixes this in using XID +
log_actionseq; how this would play out with your log capture mechanism
isn't completely clear to me.  That's the place where I'd expect the
very messiest interaction.

 I guess the other tables would stay as they are as they contain the added
 value of slony?

A fair bit of Slony is about the event infrastructure, and some of
that ceases to be as needful.  The configuration bits probably
continue to remain interesting.

The parts that seem notably mysterious to me at the moment are:

a) How do we pull result sets (e.g. - sl_log_* data)?

b) How is the command data represented?

c) If we have a need to mix together your 'raw logs' and other
material (e.g. - our sl_log_script that captures DDL-like changes to
be mixed back in), how easy

Re: [HACKERS] Global Sequences

2012-10-15 Thread Christopher Browne
On Mon, Oct 15, 2012 at 5:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Sequences, as defined by SQL Standard, provide a series of unique
 values. The current implementation on PostgreSQL isolates the
 generation mechanism to only a single node, as is common on many
 RDBMS.

I remember constructing at least the thought process surrounding your
option #3 as the one thing thing I arrived at that seemed as though
it might be valuable for the many-years-ago Slony-II summit.

The notion of having each node give out sections of a sequence seems
pretty viable; as a hugely loose analogy, DHCP servers  take a
somewhat similar approach in assigning IP addresses in ranges shared
across those servers.

At the time, back in, I think, 2005, there was some agreement that it
was a viable idea to give out chunks of sequence range; it wasn't one
of the tough problems warranting Heavy Thought, so there wasn't any
concentration on it, and that pretty all went by the wayside.

Today, I am somewhat skeptical that there's much value to the
exercise.  It isn't difficult to come up with pretty unique values,
between constructing something with a node ID prefix or such, or using
a DCE UUID that is very likely to be globally unique.

The reason to want a global sequence is that it's supposed to give
out values in pretty much a sequential order.  But a shared global
sequence will have aspects of that that are decidedly *not* in
sequential order.  If it's partially *un*ordered, I suspect this
undermines the value of it.

There's a necessary trade-off; you can either have it globally
*strongly* ordered, and, if so, you'll have to pay a hefty
coordination price, or you can have the cheaper answer of a weakly
ordered sequence.  The latter leaves me feeling rather meh.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncate if exists

2012-10-12 Thread Christopher Browne
On Fri, Oct 12, 2012 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 11, 2012 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 So we just need a function called pg_if_table_exists(table, SQL) which
 wraps a test in a subtransaction.

 And you write

 SELECT pg_if_table_exists('foo', 'TRUNCATE TABLE foo');

 and we can even get rid of all that other DDL crud that's been added

 and we can have pg_if_table_not_exists() also.

 You could make this more composable by having pg_table_exists() and
 pg_execute_sql_from_string().  Then you can write: SELECT CASE WHEN
 pg_table_exists(...) THEN pg_execute_sql_from_string(...) END.  And if
 you want the if-not-exists case then just stick a NOT in there.  And
 if you want a more complicated condition, you can easily write that as
 well.

While that certainly has the merit of being compact, it mixes kinds of
evaluation (e.g. - parts of it are parsed at different times) and
requires quoting that isn't true for the other sorts of IF EXISTS
queries.

To be sure, you can do anything you like inside a DO $$ $$ language
plpgsql; block, but it's not nice to have to do a lot of work
involving embedding code between languages.  Makes it harder to
manipulate, analyze, and verify.

Let me observe that Perl has, as one of its conditional concepts, the
notion of a statement modifier
http://perldoc.perl.org/perlsyn.html#Statement-Modifiers, which
corresponds pretty much to the IF EXISTS/IF NOT EXISTS modifiers that
have gotten added to Postgres over the last few versions.  (I *think*
statement modifiers are attributable to SNOBOL, not 100% sure.  I'm
pretty sure it predates Perl.)

I suggest the though of embracing statement modifiers in DDL, with
some options possible:
  a) { DDL STATEMENT } IF CONDITION;
  b) { DDL STATEMENT } UNLESS CONDITION;

where CONDITION has several possible forms:
  i) {IF|UNLESS} ( SQL expression returning T/F )
  ii) {IF|UNLESS} {EXISTS|NOT EXISTS}
{TABLE|SCHEMA|COLUMN|FUNCTION|...} object_name

That feels like a cleaner extension than what we have had, with the IF
EXISTS/IF NOT EXISTS clauses that have been added to various
CREATE/DROP/ALTER commands.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncate if exists

2012-10-12 Thread Christopher Browne
On Fri, Oct 12, 2012 at 5:52 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Christopher Browne cbbro...@gmail.com writes:
 I suggest the though of embracing statement modifiers in DDL, with
 some options possible:
   a) { DDL STATEMENT } IF CONDITION;
   b) { DDL STATEMENT } UNLESS CONDITION;

 Just saying. I hate that. Makes it harder to read, that last bit at the
 end of the command changes it all. It's cool for a linguist, I guess,
 but we're not typing sentences at the psql prompt…

I could see it being
 WHEN CONDITION { STATEMENT } OTHERWISE { STATEMENT };

It's all a strawman proposal, where I'm perfectly happy if there's
something people like better.  I like to think this is cleaner than
the present proliferation of {IF EXISTS|IF NOT EXISTS}, but if others
don't concur, there's little point to taking it further.

 where CONDITION has several possible forms:
   i) {IF|UNLESS} ( SQL expression returning T/F )
   ii) {IF|UNLESS} {EXISTS|NOT EXISTS}
 {TABLE|SCHEMA|COLUMN|FUNCTION|...} object_name

 Now we're talking about T-SQL? Sorry, I suppose you didn't mean to lead
 us that way, but I couldn't resist comparing. Soon enough you want a
 full programming language there.

Heh.  Next, I'll be proposing LETREC*, or of adopting the EVERY
operator from Icon, and coroutines from BCPL :-).  Keen on LOOP? :-)

The fact that we now have WITH RECURSIVE does extend what's reasonable
to hope for :-).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-11 Thread Christopher Browne
On Thu, Oct 11, 2012 at 6:25 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Anyway, lets start with a discussion of what rules give us that SQL
 standard features do not?

The somewhat broader question that this elicits is How would we go
about deprecating a feature that seems to be troublesome?  I think
Josh Berkus describes this downthread in a reasonable way.

There are pretty cool things you can do with rules, but they don't
seem to scale very successfully to important/massive usage.  I tried
them out several times, and it was pretty cool that they worked as
well as they did, but it sure didn't go to production.  I'd be
saddened if rules went away because they seemed pretty cool, but it
wouldn't be practical disappointment.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 8/8] Introduce wal decoding via catalog timetravel

2012-10-11 Thread Christopher Browne
On Wed, Oct 10, 2012 at 10:26 PM, Bruce Momjian br...@momjian.us wrote:
 How does Slony write its changes without causing serialization replay
 conflicts?

It uses a sequence to break any ordering conflicts at the time that
data is inserted into its log tables.

If there are two transactions, A and B, that were fighting over a
tuple on the origin, then either:

a) A went first, B went second, and the ordering in the log makes that
order clear;
b) A succeeds, then B fails, so there's no conflict;
c) A is doing its thing, and B is blocked behind it for a while, then
A fails, and B gets to go through, and there's no conflict.

Switch A and B as needed.

The sequence that is used establishes what is termed a compatible
ordering.  There are multiple possible compatible orderings; ours
happen to interleave transactions together, with the sequence
guaranteeing absence of conflict.

If we could get commit orderings, then a different but still
compatible ordering would be to have each transaction establish its
own internal sequence, and apply things in order based on
(commit_tx_order, sequence_within).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncate if exists

2012-10-10 Thread Christopher Browne
On Wed, Oct 10, 2012 at 3:32 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 October 2012 02:10, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Oct 9, 2012 at 4:18 PM, Josh Berkus j...@agliodbs.com wrote:
 The second is for making deployment scripts idempotent.  For example,
 say you have script A which creates table josh, and script B which
 needs table josh to be empty, if present.  Since the two scripts are
 tied to different database features, and you don't know which one will
 be deployed first, it's useful to have TRUNCATE IF EXISTS.  Yes, you can
 solve that problem with DO, but why make users go to the extra effort?

 Hmm.  That's an interesting point.  I think we're currently in
 somewhat of a limbo zone about where we ought to have IF EXISTS and IF
 NOT EXISTS options, and where we should not.  Really, I'd like to
 figure out what policy we want to have, and then go make everything
 work that way.  I don't exactly know what the policy should be, but if
 we don't have one then we're going to have to argue about every patch
 individually, which is already getting to be more than tedious.  At
 the one extreme, you have Tom, who probably would not have added any
 of these given his druthers; at the other extreme, there are probably
 some people who would say we ought to have this for every command in
 the book, right down to INSERT IF EXISTS (and, hey, why not INSERT OR
 CREATE for good measure?).  I'm not sure what the right thing to do
 is... but we should probably come up with some consensus position we
 can all live with, and then go make this uniform[1].

 Damn it, now I have an opinion.

 I would say two things:

 1) Consistency for DDL syntax is important. Sometimes humans still
 write SQL and often, ORMs generate SQL. Asking poeple to guess what
 our syntax is from release to release is a good way to have people not
 bother to support us properly. As Peter says, Truncate is not DDL (and
 argument I have used), but it is often used alongside DDL and does
 have many of the same characteristics. INSERT IF EXISTS is simply an
 argument ad absurdum, not a requirement that needs to be addressed.

I think I agree.  We should not go down the well, we haven't got
UPSERT yet, and that's why we shouldn't do this one road.

 Clearly, rollout scripts benefit from not throwing errors.
 Personally I would prefer setting SET ddl_abort_on_missing_object =
 false; at the top of a script than having to go through every SQL
 statement and add extra syntax. That might even help people more than
 littering SQL with extra clauses.

Here, I'm rather less comfortable.

I could easily take the opposite tack, that rollout scripts benefit
from yes, indeed, throwing errors, so that inconsistencies get
rectified.  I don't want to take that argument *too* far, mind you.

Doing things that avoid throwing errors isn't purely a good thing.
If a DDL script is doing genuinely different things when running in
different environments, it's difficult to be confident that the result
is correct in all cases.  Hiding errors might lead to ignoring
important differences.

Given two further bits of processing model, I might be made more
comfortable...

1.  A direction we're trying to go is to have good comparison tools to
see where schemas differ between environments.  (I need to poke at
getting a tool I call pgcmp released publicly.)  If you have the
capability to compare the starting schema against what you imagined it
was supposed to be, as well as to compare the post-rollout schema
against what it was supposed to become, then that keeps things
relatively under control.  If you can quickly determine divergence
from expected schema, then you can more easily keep on track.

2.  [More on the SQL syntax/clauses front]  In order to NOT litter the
SQL with extra clauses, I expect that there needs to be something of a
model of How You SHOULD Update Your Schema, in effect, some idiomatic
'best practice' that tends to cut risk and diminish the need for IF
EXISTS/IF NOT EXISTS.  I'd be interested to see an outline of that
model.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add FET to Default and Europe.txt

2012-10-08 Thread Christopher Browne
On Mon, Oct 8, 2012 at 11:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08.10.2012 18:26, Tom Lane wrote:
 The other thing that the abbreviation list files are doing for us is
 providing a user-configurable way to resolve conflicting abbreviations,
 for instance IST (the Indians and the Israelis both use this, but not to
 mean the same thing).  This requirement isn't ever going to go away.

 Locale-specific abbreviation lists would be nice.

 Yeah, that's a good thought, although the lack of standardization of
 locale names seems to make it a bit hard to implement.  My first idea
 was look for a tznames file matching the value of LC_TIME, and if
 found, concatenate its contents with 'Default'.  But there are enough
 ways to spell en_IN to make that a bit problematic, and besides which
 a user in India might well be running with C locale anyway.  Still,
 there might be a useful incremental usability gain there.

 We aren't ever going to get out of the need for user configurability
 though.  For instance, if a user in India writes EST, is he thinking
 of the Australian or American meaning?  It's plausible that either might
 be preferred, depending on who that user interacts with regularly.

That sounds pretty cool, but coolness isn't the right way to
evaluate whether this is good or not.

If we introduce cases where peoples' expectations are liable to be
disappointed (e.g. - they get the wrong EST, and report a bug on
that), then we lose.

We have, in effect, been treating the handling of time zones in a
manner where we're imagining there's general agreement on their
interpretation.  We presently get bug reports (and are
losing/getting it not as right as would be nice) in cases where we
leave TZ symbols out, where they could have been included.

The scenario where we could unambiguously include time zones is where
the symbols are unique.  If we were to include *all* uniquely-named
symbols, that would minimize the number of complaints about missing
zones, whilst evading the cases where the symbols are non-unique.
That might be worth considering, though it'll certainly attract
complaints in that some odd-ball zones would be included whilst
well-known ones wouldn't.

I would tend to think that local variations (e.g. - having a list for
LC_TIME=en_IN) heads us into a morass of complexity.  As you suggest,
two different people using en_IN might have different preferences for
what EST should mean.

That being said, if we had a way to configure preferred localizations,
people could set up their own set of associations.  You want your own
morass?  You can build it yourself...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers truncate triggers

2012-09-26 Thread Christopher Browne
On Wed, Sep 26, 2012 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova ja...@2ndquadrant.com 
 wrote:
 The definition of information_schema.triggers contains this:
 -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
 so it seems that we are not showing TRUNCATE triggers intentionally,
 but that comment fails to explain why

 Wouldn't it be because TRUNCATE is a PostgreSQL language extension?

 Yeah.  The SQL standard specifies the allowed values in that column,
 and TRUNCATE is not among them.

 For similar reasons, you won't find exclusion constraints represented
 in the information_schema views, and there are some other cases that
 I don't recall this early in the morning.

 The point of the information_schema (at least IMHO) is to present
 standard-conforming information about standard-conforming database
 objects in a standard-conforming way, so that cross-DBMS applications
 can rely on what they'll see there.  If you are doing anything that's
 not described by the SQL standard, you will get at best an incomplete
 view of it from the information_schema.  In that case you're a lot
 better off looking directly at the underlying catalogs.

 (Yes, I'm aware that some other DBMS vendors have a more liberal
 interpretation of what standards compliance means in this area.)

Let me grouse about this a bit...  grouse

I appreciate that standards compliance means that information_schema
needs to be circumspect as to what it includes.

But it is irritating that information_schema provides a representation
of (for instance) triggers that, at first, looks nice and clean and
somewhat version-independent, only to fall over because there's a
class of triggers that it consciously ignores.

If I'm wanting to do schema analytics on this (and I do), I'm left
debating between painful choices:

a) Use information_schema for what it *does* have, and then add in a
surprising-looking hack that's pretty version-dependent to draw in the
other triggers that it left out

b) Ignore the seeming-nice information_schema representation, and
construct a version-dependent extraction covering everything that more
or less duplicates the work being done by information_schema.triggers.

I'd really like to have something like

c) Something like information_schema that takes the
standards-conformance gloves off and gives a nice representation of
all the triggers.

Make no mistake, I'm not casting aspersions at how pg_trigger was
implemented; I have no complaint there, as it's quite fair that the
internal representation won't be totally human-readability-friendly.
 That is a structure that is continuously accessed by backends, and it
is entirely proper to bias implementation to internal considerations.
But I'd sure like ways to get at more analytically-friendly
representations.

A different place where I wound up having to jump through considerable
hoops when doing schema analytics was vis-a-vis identifying functions.
 I need to be able to compare schemas across databases, so oid-based
identification of functions is a total non-starter.  It appears that
the best identification of a function would be based on the
combination of schema name, function name, and the concatenation of
argument data types.  It wasn't terribly difficult to construct that
third bit, but it surely would be nice if there was a view capturing
it, and possibly even serializing it into a table to enable indexing
on it.  Performance-wise, function comparisons turned out to be one of
the most expensive things I did, specifically because of that mapping
surrounding arguments.

/grouse
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Christopher Browne
On Mon, Sep 24, 2012 at 10:17 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Excerpts from Daniele Varrazzo's message of dom sep 23 22:02:51 -0300 2012:
 On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

  As proposed by Masahiko, a single organization grouping all the tools (one
  repository per tool) would be enough. Please note that github can also host
  documentation. Bug tracker would be tool-dedicated in this case.

 From this PoV, pgFoundry allows your tool to be under
 http://yourtool.projects.postgresql.org instead of under a more
 generic namespace: I find it a nice and cozy place in the url space
 where to put your project. If pgFoundry will be dismissed I hope at
 least a hosting service for static pages will remain.

 I don't think that has been offered.

But I don't think it's necessarily the case that pgFoundry is getting
dismissed, either.

I got a note from Marc Fournier not too long ago (sent to some
probably-not-small set of people with pgFoundry accounts) indicating
that they were planning to upgrade gForge as far as they could, and
then switch to FusionForge http://fusionforge.org/, which is
evidently the successor.  It shouldn't be assumed that the upgrade
process will be easy or quick.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] load balancing in libpq

2012-09-23 Thread Christopher Browne
We historically have connection pooling as an external thing; with the high
degree to which people keep implementing and reimplementing this, I think
*something* more than we have ought to be built in.

This, with perhaps better implementation, might be an apropos start.

Parallel with LDAP: it takes very much this approach, where configuration
typically offers a list of LDAP servers.  I am not certain if OpenLDAP does
round robin on the list, or if it tries targets in order, stopping when it
succeeds.  A decent debate fits in, there.

I could see this being implemented instead via something alongside
PGSERVICE; that already offers a well-defined way to capture a registry
of connection configuration.  Specifying a list of service names would
allow the command line configuration to remain short and yet very flexible.
On 2012-09-23 10:01 AM, Euler Taveira eu...@timbira.com wrote:

 On 23-09-2012 07:50, Satoshi Nagayasu wrote:
  I have just written the first PoC code to enable load balancing
  in the libpq library.
 
 Your POC is totally broken. Just to point out two problems: (i) semicolon
 (;)
 is a valid character for any option in the connection string and (ii) you
 didn't think about PQsetdb[Login](), PQconnectdbParams() and
 PQconnectStartParams(). If you want to pursue this idea, you should think a
 way to support same option multiple times (one idea is host1, host2, etc).

 Isn't it easier to add support on your application or polling software?


 --
Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Christopher Browne
On Sat, Sep 22, 2012 at 7:45 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:
 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

 Well, that was pgfoundry, and it didn't work out.

There seem to be some efforts to update it, but yeah, the software
behind it didn't age gracefully, and it seems doubtful to me that
people will be flocking back to pgfoundry.

The other ongoing attempt at an umbrella is PGXN, and it's different
enough in approach that, while it's not obvious that it'll succeed, if
it fails, the failure wouldn't involve the same set of issues that
made pgfoundry problematic.

PGXN notably captures metadata about the project; resources (e.g. -
SCM) don't have to be kept there.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Christopher Browne
If the present project is having a tough time doing enhancements, I should
think it mighty questionable to try to draw it into core, that presses it
towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and participate.
There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a merge
into core argument.

A *lot* of projects are a lot like this.  On the Slony project, we have
tried hard to maintain this sort of visibility.  Steve Singer, Jan Wieck
and I do our individual efforts on git repos visible at GitHub to ensure
ongoing efforts aren't invisible inside a corporate repo.  It hasn't led to
any massive of extra developers, but I am always grateful to see Peter
Eisentraut's bug reports.


  1   2   3   4   5   >